Author: Claudia Wagner | Published: 16th February 2009 | RSS |  LINK

I extended Uldis Bojars WP SIOC exporter to also export semantic metadata which are embedded in the HTML content of a Posting.

Why is this useful?

Semantic metadata embedded in the HTML of a posting’s content can reveal more information about the topic of a posting (i.e. about what a posting is about). Tools such as Structured Blogging (http://structuredblogging.org) or the Semantic Reblog prototype I am working on, embed semantic metadata directly into the HTML content of postings.

The WP SIOC Exporter relates at the moment the whole plain text of a post’s content with the resource representing the posting itself via the sioc:content property. The html representation of the post’s content is related with the resource representing the post via the content:encoded property. Additionally links are extracted from the post’s content and related with the post via the sioc:links_to property.

My extended version of the exporter also extracts images from posts’s content and relates them via the sioc:embeds property with the resource representing the post. If the image is a flickr image an rdf:seeAlso link is generated that points to the RDF description of the image obtained via Masahide Kanzaki’s  wrapper. Furthermore semantic metadata,  which are embedded in the HTML content of a post, are extracted and relate with the post via a sioc:embeds property. I am not sure if sioc:embeds is the best  property to relate the embedded entities with its container post. Maybe something like sioc:topic would be better. However the URI of the embedded resources are related with the post URI and the parts of the resource description, which has been embedded, are also exposed (because if only parts of a resource’s description are reused or embedded in a post’s content, it might be also interesting for machines to know which parts have been reused/embedded in the posting and if the reused/embedded resource is described via microformats, it might not have an URI which identifies the resource).

I use the ARC2 library (version from 2009-02-12 -> it is important to use this version or higher) which provides a parser to extract different embedded semantic metadata formats such as RDFa, eRDF and MF. I modified the method declaration of the toRDFXML method in the ARC2_Class.php file . Thats why at the moment “my” version of the ARC2_Class.php must be included to the SIOC Exporter arc folder. But Benjamin already told me that the modification will be included in the next ARC2 version.

If you fancy to test this version of the WP SIOC Exporter, download it here.

Any thoughts are of course welcome!

8 Comments. Add yours!

    Patrick Murray-John
    12:46 pm on February 19th, 2009

    (I followed Uldis’ repost to get here.) This looks great!

    I encountered a similar question about how to represent the embedded content. Eventually I just bailed out to dcterms:hasPart — I’m curious about your thoughts on that approach.

    Thanks!

    Claudia Wagner
    4:32 pm on February 19th, 2009

    Patrick,

    thanks for pointing me to dcterms:hasPart. It looks really good!
    Now I wonder which property would be the most appropriate to model the relation between a post and an embedded resource? sioc:embed, dcterms:hasPart, sioc:topic or something else?
    I think sioc:embed is more or less equivalent with dcterms:hasPart and sioc:topic seems to be the same as dc:subject.
    As sioc:embeds is not part of the sioc core, I guess I should go for dcterms:hasPart, or what do you think?

    The application area of sioc:embed/dcterms:hasPart, sioc:topic/dc:subject and sioc:links_to/dc:references are not clearly seperated from my point of view. So I guess everyone can feel free to choose :)

    Patrick Murray-John
    6:44 pm on February 19th, 2009

    The way I think of it, at least in terms of web pages, is that if a piece of content is stuffed into a page and loads up and displays when you view the page, that’s a dcterms:hasPart –> sioc:embed (BTW, I like the sioc:embed, it reflects well how people are using the term ‘embed’. For example, I’ve heard people talking about ‘embedding’ an image. Not HTML language, but reflects the concept people have!)

    Seems to me that any distinction between sioc:links_to and dcterms:reference is contingent on the subject in the triple. A web page will sioc:links_to, while a print publication will need the more general dcterms:references? Bibliographic Ontology might help us here. But the incorporation of links into most any document makes that break down, especially while docs bound for print include footnotes. It’s a hard problem, and we’ll need to resurrect Bakhtin to solve it satisfactorily!

    sioc:topic vs. dcterms:subject, I agree, seem like a very fine line. I kinda think that we (”we” being the big semweb community) might need to encounter some more edge cases to help us tease it out. But maybe other folks out there have better insights?

    enore savoia
    6:53 am on February 21st, 2009

    I just updated the WordPress SIOC exporter plugin to the new version … tnx !

    ps: Why this CAPTCHA ? please think different … , is only a suggestion !

    Matteo Brunati
    2:40 am on February 22nd, 2009

    Hi, good idea for the plugin .)

    One thing: i have on my blog also rdf tools by novack as plugin actived.
    There is a problem with arc2, it returns on the rdf tools admin panel this kind error:

    Cannot redeclare class arc2 in /www/dagoneye.it/blog/wp-content/plugins/rdf-tools/arc/ARC2.php on line 15

    Probably with the next version of the arc framework, it works, i think…
    Nice stuff :)

    Claudia Wagner
    9:46 am on February 26th, 2009

    Matteo,

    just change the require_once path in the sioc_include.php so that it points to the arc folder included in the rdftools folder.
    Both plugins should of course share one ARC2 folder. Than require_once works and no redeclare errors appear.

    Tales from the SIOC-o-sphere part #9 « Cloudlands
    1:12 pm on March 19th, 2009

    [...] Wagner has published an extended version of the WordPress SIOC Exporter that also exports any semantic metadata embedded within the content of a blog [...]

    Ryan Jones
    12:10 pm on July 3rd, 2009

    Great idea for the SIOC plugin, thanks for the great info.

Leave a Reply

Some basic HTML is allowed. Please keep all comments constructive, polite and on-topic. Any spam or offensive comments will be deleted.