Author: Claudia Wagner | Published: 12th January 2009 | RSS |  LINK | 2 Comments. Leave yours!

I will quickly try to summarize the state of my work for my supervision meeting today.


Done:

*) Identified and described Data Portability problems in the Social Web –> argued that Semantic Web Technologies provide the required mechanisms to overcome the identified limitations

*) Developed Use Cases and derived requirements

*) Checked if existing Portability Tools can answer my requirements


To Do now:

Develop demonstrator which shows how Semantic Web technologies can be used to overcome the identified limitations

Demonstrator consists of:

*) Browser Extension which allows to extract and reuse content and resources of an source application site on a specific target application site. The target application site must be a Wordpress Blog, because the Browser extension implements the WP API. The source application site must embed the semantic metadata via RDFa. External semantic metadata are not used by the Browser Extension, because its difficult to find out to which resource of an external RDF file a selected content part belongs. The positional information provided via the RDFa serialization makes it easy to find out to which resource a selected content part belongs.  For Wordpress only an RDF/XML exporter exist, but no RDFa Exporter or Embedder.

*) Wordpress Plugin which embeds RDFa into selected theme. If also external semantic metadata exists in an RDF file, the RDFa metadata can simply link to it.

*) Modify Wordpress Press-This component to allow reuse selected content by value and reuse resources by embedding references. If content is reused by value the Press-This component should expose from which resource (not page!) the content originates from. If the content is reused by reference the Press-This components embeds the URI of the external resource to the Post content and marks it via a sioc:embeds attribute.

*) Wordpress Plugin (simple Filter) which allows to reuse embedded resources by reference. Whenever a sioc:embeds RDFa attribute is found inside the content of a posting, the description of the posting is fetched and at least the content of the external post is displayed.

Author: Claudia Wagner | Published: 09th January 2009 | RSS |  LINK | No Comments yet. Start talking!

I just read this interesting Blog Posting from Uldis, in which he talks about automatically creating connections between a HTML page and its associated external RDF file via RDFa. A lot of web sites expose their data via HTML sites and external RDF files, which are only connected via one link.

For example the HTML sites of this blog is connected with its external RDF data (generated by the SIOC Wordpress Exporter) via this link:

The basic question is how can the DOM nodes of an HTML page be automatically connected (e.g. via RDFa attributes) with the external description of the resource to which the nodes belong?

I like this idea of creating additional connections between HTML sites and related RDF files, because

1) they enrich the external RDF data with positional information (layout information) –> an agents can then look up for example in which order some embedded resources appear in the content of a certain blog post

2) they make it easier to find the semantic metadata belonging to a certain piece of HTML content –> for example think about a situation in which a user selects some content of a HTML page and a client site applications wants to get the semantic metadata of the resource to which the content belongs. The client site application knows to which DOM node the selected content belongs and could exploit the additional information which relate certain DOM nodes with external RDF information.

3) they make it possible to have different amounts of machine-readable and human-readable data which are still connected. This can be desirable if for example some data are simply not displayed on the HTML pages in order to not break the layout and design of a certain page. In this case no reason exist to also not make these data available in a machine-readable form. If all semantic metadata are directly embedded in the HTML page via RDFa than the machine-readable and human-readable amount of data must be the same or the HTML page must be wasted with a lot of hidden HTML tags.

Author: Claudia Wagner | Published: 04th January 2009 | RSS |  LINK | No Comments yet. Start talking!

In the current Social Web tools such as Zemanta Reblog, Share This, Reblog, Move My Data or Press This, allow users to reuse content across application boundaries, but have several limitations:

1.) Loosing semantic metadata and relations: The information about the relations, which connect
the resource to reuse with other resources stored on the source application (= the application from where the content to reuse originates), get lost. Furthermore the property values, which are not selected to be reused, but belong to the resource, get lost. Only the selected property values (which are in the case of a reblogging scenario part of the content property value of a resource from type post) are copied. For example comments or tags, which are related with the original resource, get lost and are not associated with the newly created resource.

2.) Loosing origin: The information about the origin of the copied content get lost. That means the newly created resource, which holds all or parts of the property values of a resource, does not contain any machine understandable information about its origin. No machine understandable connection is exposed between the source application from, where the content originates and the target application, where the content is republished.

3.) No ”Reuse As” possible: The user has no way to influence the reusing process. That means users
cannot freely define that they want for example an external resource to be reused as comment of a certain
internal resource of the target application. They can also not define that the external resource
should only be visible for certain users of the target application. The user cannot define relations
between the resource to reuse and the internal resources of the target application, although this
would allow users to express how they want an application to reuse resources.

4.) No Reuse by Reference possible: Users cannot choose if they want to reuse a resource by value
or by reference, because resources can only be reused by value. If a resource is reused by value,
the whole resource with all data property values is copied. If a resource is reuse by reference,
only the reference is copied and the data belonging to the reference are fetched on demand. The
resource to reuse is stored on one application, but is displayed on several applications. In the
current Social Web reusing data by reference is not possible, because no global identifiers are used
and no descriptions can be fetched via these identifiers. To get data from a Social Web application
a client must firstly know via which method it can get the description of the resource and must
secondly know the local, application specific identifiers of the resource which the method needs as
parameter, in order to return the description of the right resource.

5.) Limited set of supported target applications: Users can reuse the selected content only on one
of the target applications, which is supported by the reusing tool they use. Each tool supports only
a defined set of target applications, because it must implement the API of each target application. No standarized data manipulation method exists in the current Social Web.

Semantic Web technologies provide the required mechanisms to overcome the identified Data Portability limitations. To show this is the aim of my prototype.

Author: Claudia Wagner | Published: 11th December 2008 | RSS |  LINK | No Comments yet. Start talking!

As described in one previous postings, I want to develop a demonstrator which shows more sophisticated reblogging (or in general resource reusing) as current non-semantic tools allow.

More sophisticated means:

1) The user should be able to choose if the resources should be reblogged by reference or by value! If resources are reblogged by reference only the references are copied and embedded. The reblogged resource is a mirror of the original resource. If resources are reblogged by value, the values are copied and the tow instances of the same resource are independent and disconnected.

2) The information about the origin of a reblogged resource should be understandable for machines and humans.

3) Optionally (via an advanced tab) the user should be able to define how resources should be reused (via expressing the relations between resource to reuse and internal resources).

I will need to develop a FireFox Add-On which detects (such as Semantic Radar or Operator already do)  “reblogable” items (e.g. resources of type sioc:Post) and allows users to select which items they want to reblog, where they want to reblog them (this will be limited to Wordpress based blogs at the beginning) and optionally how they should be reblogged.

The references, of the selected items to reblog, are embedded in a newly created skeleton post (or skeleton comment –> depending on the optionally defined advanced relations) at the target application.

The idea was to embed the references directly into the content of the Blog Posting and use RDFa to describe how this reference is related with the rest. This would mean that the sioc:content of a sioc:Post does not only contain text, but also sioc:embeds resources. (related sioc-dev community discussion can be found here)

Lastly I must also develop a Wordpress plug-in which enables Wordpress to fetch external resources by reference and display them.

Author: Claudia Wagner | Published: 08th December 2008 | RSS |  LINK | No Comments yet. Start talking!

Press This” is the reblog bookmarklet for Wordpress. It allows users to reblog content from any site to wordpress. Watch the movie (min 0:44) to see it in action:

Zemanta Reblog allows user to reblog content from blogs, which have the reblog feature installed to any supported blog software. At the moment zemanta supports blogspot, typepad and wordpress blogs as target applications.

Share This can be used like the Zemanta Reblog feature directly on several applications or as Browser Plugin. The user can reuse the selected content and/or the focused paged via pressing the share this button on several social web applications (e.g. facebook, digg, wordpress and so on).

Limitations:
All three tools copy the values of the data to share from one application to another. That means they create new resources via using the APIs of the applications and copy the values into the newly created resource skeletons. The origin of the newly created source is exposed in form of a normal link. That means that it is only understandable for humans what is the original source of the content.

All three tools only allow the user to reuse content in general, but without being able to influence how the selected content is reused, i.e. how it is related with the content of the target application.

Author: Claudia Wagner | Published: 04th December 2008 | RSS |  LINK | No Comments yet. Start talking!

The last weeks I was working on my use cases, which should provide me requirements for my prototype. The use cases describe situations in which users want to reuse resources across application boundaries in a controlled and transparent way. In this situations applications must share resources transparently and users need ways to control the sharing process.

To share a resource between applications means for the applications that they must integrate external resources or at least the values of the external resources into their internal structure. Therefore basically three possibilities exist:
1) applications can copy external resources, create new internal resource skeleton and paste the content values of the external resources to the skeleton. The internal resource skeleton is as usual interlinked with the rest of the internal resources.
2) applications can embed external resources, create new internal resource skeleton and embeds the content values of the external resources to the skeleton. The internal resource skeleton is as usual interlinked with the rest of the internal resources.
3) application can interlink external resource with the internal resources simply by references. Therefore the relations between external resources and internal resources must be established and typed. The applications must be able to handle external resources.

Controlling the sharing process means to control which resources are shared between which applications and to control how they are shared, that means defining how the external resources to share or the skeletons, in which they have been embedded or copied, are related with the internal resources (e.g. user identities, content items) stored on the target application
The sharing process must be transparent, because third party application should also be able to find out who shared what with whom. It is important the information about the origin of a resource does not get lost.

First and simplest Use Cases - Semantic Reblogging:

Imagine a user Tim is reading several weblogs. When he reads the posting A on the weblog A, he decides to reblog the posting A on his own weblog.
To reblog posting A Tim opens his favorite browser Mozilla Firefox and navigates to the weblog where the posting A is stored and published.
Tim selects some text on the weblog A, which, I assume for simplicity, belongs exactly to one resource of the type sioc:Item, namely the posting A.

Tim opens the firefox ubiquity command box and types in the command “reblog”, which displays already the URI identifying the selected resource as first parameter.

The “reblog” command requires at least one more parameter, the URI of the target application (in this case the URI of Tim’s personal weblog) where the resource should be reblogged.
Furthermore the “reblog” command has an optional third parameter, the copy-flag. This parameter is optional and by default it is zero. That means that the resources are embedded. If it is 1 the values of the resource are copied. The fourth optional parameter is a collection of context relation triples. This parameter is optional and holds triples which describe the relation between the external resource and the internal resources (for example the triples can express that the external resource can only be read by some users of the target applciation or that the external resource is a reply of an internal target application resource and so on).

Again for simplicity I assume that the target application URI, which is passed as second parameter, identifies a resource from type sioc:Container or sioc:Site.

Use Case 1.1:

The third parameter is set to one. That means that the values of the external resource are copied.
When the command is executed the script asks the user to authenticate on the target application and makes an authenticated remote call of the “reblog” method provided by target application. It passes the URI of the resource to share as parameter for the “reblog” method, which creates a new Blog Posting Skeleton and copies the values of external resource to the associated values of the internal Resource Skeleton. The target application can handle the internally created Resource Skeleton as a normal internally created resource.

Use Case 1.2:

The third parameter is set to zero. That means that the values of the external resource are embedded.
When the command is executed the script asks the user to authenticate on the target application and makes an authenticated remote call of the “reblog” method provided by target application. It passes the URI of the resource to share as parameter for the “reblog” method, which creates a new Blog Posting Skeleton and embeds the URIs of external resource into the Resource Skeleton. I am not sure what would be the best way to do this. Normally a resource should be embedded in the content of another resource. But the sioc:content property only holds literals as value. Properties like sioc:embeds have been proposed in http://sioc-project.org/node/226 , but are not part of the SIOC Core Ontology.

In the case of embedding external resource references, the target application must be able to deference URIs and display parts of the resource description as embedded content.

In all cases (i.e. no matter if external resources are copied or embedded) it should be clear for humans and machines who has reused the resource, when has it be reused and what is the origin of the resource. That means that the HTML page must mark the external resource as external and must show the origin of the resource and who reblogged it when. Also the RDF Graph of the site must expose that a newly created resource (Reblog Skeleton) embeds one or more external resources.

Author: Claudia Wagner | Published: 27th November 2008 | RSS |  LINK | No Comments yet. Start talking!

Few days ago I tried out the SIOC Importer for Wordpress, which is really cool. The importer requires as input a URI, which must point to an RDF graph, parses the graph for all instances of the type sioc:Post (this includes normally Blog Postings and comments) and creates new content on the target site using WordPress API calls.

The things I wondered about were:

  • * the newly created resources, don’t include any machine-understandable information about their origin. (only the sioc:content text refers to the original source of the posting)
  • * the resource should be a copy of the original postings, but e.g. the author of the posting is not copied. Only the content of the posting is copied.
  • * why is the resource content copied and not embedded?

All in all I was thinking that for some use cases (e.g. reblogging) it would be better to embed external resources into the sioc:content of the automatically created sioc:Post Skeleton. This would have the advantage that the instances of the resource to share stay synchronized and that the origin of the displayed content would be machine-understandable.

The SIOC Core Ontology provides for resources of the type sioc:Item the property sioc:content. But this property only contain literals as value.  Maybe the sioc:attachment property can be used to expose that external resources should be embedded into a sioc:Post or proposed properties (such as sioc:embeds) can be used.

Author: Claudia Wagner | Published: 12th November 2008 | RSS |  LINK | No Comments yet. Start talking!

The “Same Origin Policy” or “Single Domain Restriction” which is implemented by most web browser security models, prevents a document or script loaded from one web page domain from getting or setting properties of a document from another domain (that means that they must have the same protocol, domain and port in order to be allowed to access and modify each other).

Sharing data by references (that means by copying references of data and not copying values of data) across application boundaries, makes cross-site requests necessary. If for example a resource A is stored on an application A and should be shared with application B, application B only holds a reference of resource A and needs to make an cross-site request in order to fetch the description of the resource.

Solutions which allow Cross-Site Requests:

1) Cross Site Scripting (XSS)

With XSS, a web page from one origin can contain a script element from a different origin. The “foreign script”  runs with the same authority as scripts from the originating domain, allowing the script to steal cookies or directly access the originating server.

A possible solution to allow cross-site scripting without being so unsafe, is to limit the Javascript to a subset which is powerful enough to interchange data, but limits the security problems. JSONRequest is such a defined subset. It is a a global JavaScript object, which can only be used to send and receive JSON-encoded values and cannot be used to retrieve other text formats or cookies or passwords in HTTP headers. [1]

ADSafe and Caja seems also to define a “secure” subset of Javascript to allow cross-site scripting.

2) Using fragment identifiers (the hash part of an URL, like http://some.domain.com/path/to/page.html#fragmentIdentifier) for cross frame communication. Changing the fragment identifier does not cause the page to reload. Since the pages don’t reload, state can be maintained inside the page. This approach has several limitations. [2][3]

3) Using a Proxy which delegates the requests to right URL of the external domain instead of sending the request directly

4) Using Flash with cross-domain policy files which control the cross-domain calls. [4]

5) Waiting for the implementation of the W3C  proposal about access control for Cross-Site requests. The purposed staff will probably be integrated in HTML 5. They purpose mechanism to control which external sites are allowed to access which data via cross-site requests and also allow to define credentials which must be shown by any requesting client in order to be allowed to perform the request. [5]

6) Waiting for the implementation of the XMLHttpRequest 2 proposal of the W3C,  which should extend the functionality of the existing XMLHttpRequest object in order to allow for example cross-site requests. [6]

I am not sure if I mentioned ALL possibilities. If someone knows others, please tell me per email (clauwa{at}sbox{dot}tugraz{dot}at)  or comment.

[1] http://www.json.org/JSONRequest.html

[2] http://tagneto.blogspot.com/2006/06/cross-domain-frame-communication-with.html

[3] http://dojotoolkit.org/node/87

[4] http://code.google.com/p/doctype/wiki/ArticleFlashSecurityCrossDomain

[5] http://www.w3.org/TR/access-control/

[6] http://www.w3.org/TR/XMLHttpRequest2/

Author: Claudia Wagner | Published: 12th November 2008 | RSS |  LINK | No Comments yet. Start talking!

oEmbed is a format which allows to embed the representation of a resource on a third party site without needing to parse it. Therefore a Provider implements the oEmbed API and allows a Consumer to fetch the representation of a resource and embed it directly on the Consumer’s site.

The advantage of oEmbed is that the Consumer only needs to make one HTTP GET request with the URI of the resource to embed (and optionally the maxwidth, the maxheight and the desired format) as parameter, in order to get the representation of a resource which the Consumer can directly display.

But if we look for example at the  flickrcurl library created by Dave Becket, we see that for each photo stored on flickr an rdf representation can be created easily. That means that the Provider (flickr) does not need to support the oEmbed API. It is enough (or even better) if the Provider exposes its resources in RDF and makes them accessible for Consumers.

Why is it better if a Provider allows third party sites to embed its resources via querying an RDF graph than via querying an oEmbed API?

In the case of an oEmbed API a Consumer can only access one resource at time and he needs to know the identifier of the resource he wants to embed. In the case that an application exposes its resources as RDF graph, the Consumer can perform more complex queries, such as give me all representations of resources which have certain properties (e.g. are from type foaf:Image and are depicting a special foaf:Person).

So all in all oEmbed is nice for the Social Web, but becomes useless in the Social Semantic Web.

Author: Claudia Wagner | Published: 11th November 2008 | RSS |  LINK | No Comments yet. Start talking!

I am currently looking at the oAuth protocol, asking myself what is possible with oAuth and what is not possible?

Basically if you look at the oAuth flow diagram, you can see that the data sharing process is initiated by the Consumer (i.e. by a user stating on the Consumer application that he wants to allow the Consumer application to use resources stored on another application, called Provider). The Consumer sends a signed request to the Provider in order to get an unauthorized request token. This token must be authorized at the Provider by the user who owns the protected resources. Therefore the user is redirected from the Consumer site to the Service Provider’s site (see step C in the flow diagram) where he must authenticate. OAuth does not specify how the Service Provider authenticates the user. It only defines that the Service Provider must verify the User’s identity in order to prove if the user is authorized to grant or deny access to any Consumers.

In normal Social Web application basically only the owner of a user account will be authorized to grant or deny access to resources stored on this account. Therefore the user can authenticate at the Service Provider simply via log in to the web application.

All in all that means the basic use case for oAuth is to control data sharing across applications acting on behalf of ONE user.

For example a user U controls via oAuth which Consumer applications (acting on behalf of the user U) are allowed to share which parts of the protected resources, which are stored on a  Provider application acting on behalf of user U and for which user U is authorized to control the access.

Would it be also possible to use oAuth to control data sharing across applications acting on behalf of different users?

For example would it possible that the user U of flickr states that he wants to share his picture with the user B of facebook? In this case a flickr client acting on behalf of user U would be the Provider and a facebook client acting on behalf of user B would be the Consumer. In this case the redirection of the user B from the Consumer site (facebook) to the Service Provider site (flickr) makes no sense, because the user B only owns the facebook account and the associated resources. Instead a notification should be sent to the user U, because user U should go the Provider site and grant or deny access to the request send by the Consumer client belonging to user B.

In this cross-user-cross-application data sharing use case,  the initiation of the data sharing process should be able to be started by the Provider or the Consumer. If the Provider client directly starts the data sharing process, he can directly send the request token to the desired Consumer. The Consumer then needs to care about that the user owning the protected resources goes to the Service Provider and authorizes the Consumer’s request token.