Author: Claudia Wagner | Published: 16th February 2009 | RSS | LINK
I extended Uldis Bojars WP SIOC exporter to also export semantic metadata which are embedded in the HTML content of a Posting.
Why is this useful?
Semantic metadata embedded in the HTML of a posting’s content can reveal more information about the topic of a posting (i.e. about what a posting is about). Tools such as Structured Blogging (http://structuredblogging.org) or the Semantic Reblog prototype I am working on, embed semantic metadata directly into the HTML content of postings.
The WP SIOC Exporter relates at the moment the whole plain text of a post’s content with the resource representing the posting itself via the sioc:content property. The html representation of the post’s content is related with the resource representing the post via the content:encoded property. Additionally links are extracted from the post’s content and related with the post via the sioc:links_to property.
My extended version of the exporter also extracts images from posts’s content and relates them via the sioc:embeds property with the resource representing the post. If the image is a flickr image an rdf:seeAlso link is generated that points to the RDF description of the image obtained via Masahide Kanzaki’s wrapper. Furthermore semantic metadata, which are embedded in the HTML content of a post, are extracted and relate with the post via a sioc:embeds property. I am not sure if sioc:embeds is the best property to relate the embedded entities with its container post. Maybe something like sioc:topic would be better. However the URI of the embedded resources are related with the post URI and the parts of the resource description, which has been embedded, are also exposed (because if only parts of a resource’s description are reused or embedded in a post’s content, it might be also interesting for machines to know which parts have been reused/embedded in the posting and if the reused/embedded resource is described via microformats, it might not have an URI which identifies the resource).
I use the ARC2 library (version from 2009-02-12 -> it is important to use this version or higher) which provides a parser to extract different embedded semantic metadata formats such as RDFa, eRDF and MF. I modified the method declaration of the toRDFXML method in the ARC2_Class.php file . Thats why at the moment “my” version of the ARC2_Class.php must be included to the SIOC Exporter arc folder. But Benjamin already told me that the modification will be included in the next ARC2 version.
If you fancy to test this version of the WP SIOC Exporter, download it here.
Any thoughts are of course welcome!