|
|
|
|
|
|
|
|
|
Edd Dumbill's Weblog |
|
|
|
|
|
RDF - why we should care - and RSSPosted by Edd Dumbill, 6/3/00 at 10:46:43 AM.
IntroductionThis essay was originally part of an email in response to a message from Dale Dougherty asking whether RDF was important or just a dead end. The question was asked in the context of extending the RSS XML format, used to syndicate "site summary" headline information around the web. At WWW9 recently there was a spontaneous discussion about extending RSS involving Dale, Dave Winer, David Galbraith from Moreover and various others. Some of the thoughts here are pertinent to that discussion, as well. I believe that any revision of RSS should use RDF: I didn't have time to expand on that at WWW9, so this essay takes things a little further. On why RDF is importantFirst, one needs to put out of one's mind anything connected with the current XML syntactical representation of RDF. This has its flaws and isn't central to the importance of the technology (in fact a recent message from TimBL implies that the RDF guys would have used s-expressions a la PICS if XML hadn't been mandated). Analogous to XML being the lingua franca for data, RDF is a lingua franca for logical statements, or facts. The potential behind it lies if we all agree on a central vocabulary for expressing assertions about things. So, if I say "my car is red" and my wife says "all people with red cars drive fast" you can say "Edd drives fast". Well, that's nothing new, Prolog's been doing that for years. The real power with RDF is the same we got with XML, an agreement of the form of expression of these facts, and the distribution of them over the Internet. These are the terms in which I see RDF. Some of the more advanced RDF concepts are confusing, but so again are those from XML (NOTATIONs, anyone?). In XML it seems the more advanced/confusing stuff just doesn't get used. What would anyone do with RDF?
Dale writes: This is a key point. The end game is that with a web full of metadata in RDF, we'll be able to do useful work by traversing assertions and finding information. This will parallel the massive increase in utility of the Web as a large number of sites emerged. BootstrappingBut there needs to be some utility at the lowest level, as well. RDF needs applications like RSS to bootstrap. (I've been thinking about the bootstrapping problem for a little while, and I'm of the opinion that Mozilla could have a great impact here. As you're probably aware, the "native" way to get data into Mozilla is via RDF. Add to that the ease with which little applications can be slotted into Mozilla, and we could see mini-apps augmenting your usual browsing/emailing experience with information presented by the web site in RDF. Directory services would be useful in RDF in a similar way.) Another application, which I would write if I had the time, and I know some people have gone some of the way on this, is an RDF extractor for all my personal data. I want facts about my email, the web pages I visit, my documents, my address book, any annotations etc to be part of a homogenous store, so I can quickly cross reference my data (and have my operating system or at least browser become aware of this capability). There has been talk from various departments of late that the hierarchical filesystem should be deprecated. I imagine a system where all your data (each item having a URI, naturally) is connected via multiple logical axes (one of which could be 'path', for the sake of interest, but might be more likely 'author', 'topic', 'date'). My imagined system is in fact the Web - where today the axes are provided by <A HREF=""> tags and their semantic value is only interpretable by a human. A common expression of logical relationships, such as could be provided by RDF, drives this rich, fuzzy, linking down into the fabric of the Web. Rather than asking why anyone would do anything with RDF, it's really more "why would anyone create metadata?". We need the "feel-good" applications in place to encourage its creation -- and they need to go a lot further than even RSS in providing useful functionality. On RSS and RDFThe only relationship RSS today has to RDF is that RSS represents metadata, and an application of RDF is to represent metadata. RSS just uses a straight XML schema (DTD), which is very restrictive. I think of XML schemas as a C struct, or a Java class -- there's centralized control and no extensibility mechanism. The plus side is it's simple for applications. My main reason for suggesting RSS is better reformulated in RDF is precisely that there are no limits what can go in. In programming language terms, RDF is a Perl hash or a Java Hashtable. The processing application is responsible for figuring out which bits it wants and which it doesn't -- a little more complicated than the schema situation. ExtensibilityRSS came to a halt because nobody knew how to extend it after Netscape dropped the ball. I propose that RSS itself escapes the concept of being an atomic file and becomes instead a vocabulary. Because RDF is underpinned by the use of XML Namespaces to uniquely identify names, you can extend an RDF-based RSS for your application without polluting the globally accepted concept of what RSS is. For instance, imagine that our RSS file has an <rss:Image> element which describes the 88x31 icon. What if we wanted to add another element which describes the icon used by some other application of O'Reilly's invention that needs a 24x24 icon. At the most basic, we could just add an <ora:Icon> element into it (where we uniquely say what we mean by 'ora:' by including "xmlns:ora='http://www.oreilly.com/2000/ora-application/'" or somesuch on the root element) and, because processing applications know they're looking at RDF, this won't break existing implementations. An RDF formulation would be both backwards and future compatible with itself. (Backwards compatibility with RSS is an interesting issue, I suggest that it can be solved by the simple application of an XSLT stylesheet). Escaping the filesystem, part III referred above to some ideas about leaving the hierarchical filesystem behind. The very concept of an RSS file can be seen as limiting and a source of the difficulty. Every person wants different things in that file, because it's the one everybody else reads. Detach the vocabulary from the file and we start to see interesting things happen: (1) the file could say more than just RSS, for those whose processors understood, and (2) RSS-type info could be placed in many places (in embedded metadata in the HTML, for instance), or in databases, or on the end of SOAP calls. RDF is '404' tolerantTo revive my crude analogy: RDF is to XML as Perl is to C++. RDF is in fact a lot more "Webby" than XML is itself. Processing applications cope well with things being there that are unexpected, and with unexpected things missing. The web needs a loosely typed, free spirited, language for expressing its metadata. I would go so far as to say that it would be a short-sighted move to formulate RSS2 in anything other than a liberal, extensible, framework -- otherwise, we'll just end up at this point again when the need becomes pressing to do more with RSS2. What is needed is something that will cater for all, while preserving an interoperable core. One thing that seems inevitable to me is that all those on the RSS mailing list will not agree 100% on what the future should look like. We need to get agreement on a core 80% and provide easy ways for the remaining 20% to be achieved without consensus. Why does RSS work?A closing thought on why Dale says RSS "appears to be working". Is it that RSS files are simple to create? Well, yes, but only partly. HTML is a lot more complex, and an enormous amount of people create pages of reasonable complexity all the time. IMHO, RSS is successful because of the presence of useful applications (ie. My Netscape) providing a positive feedback loop to developers and users from day 1. It's creating the killer apps that will give metadata, and perhaps RDF in particular, the boost to become widespread on the Web. |
|
|
||||||||||||||||||||