Taxonomies 1, Semantic Web (and Linked Data) 0

I’m not a big fan of Semantic Web{{1}}. For something that has been around for just over ten years — and which has been aggressively promoted by the likes of Tim Berners-Lee{{2}} — very little real has come of it.

Taxonomies, on the other hand, are going gangbusters, with solutions like GovDirect{{3}} showing that there is a real need for this sort of data-relationship driven approach{{4}}. Given this need, if the flexibility provided by Semantic Web (and more recently, Linked Data{{5}}) was really needed, then we would have expected someone to have invested in building significant solutions which use the technology.

While the technology behind Semantic Web and Linked Data is interesting, it seems that most people don’t think it’s worth the effort.

All this makes me think: the future of data management and standardisation is ad hoc, with communities or vendors scratching specific itches, rather than formal, top-down, theory driven approaches such as Semantic Web and Linked Data, or even other formal standardisation efforts of old.

[[1]]SemanticWeb.org[[1]]
[[2]]Tim Berners-Lee on Twitter[[2]]
[[3]]GovDirect[[3]]
[[4]]Peter Williams on the The Power of Taxonomies @ the Australian Government’s Standard Business Reporting Initiative[[4]]
[[5]]LinkedData.org[[5]]

The technologies behind the likes of Semantic Web and Linked Data have a long heritage. You can trace them back to at least the seventies when ontology and logic driven approaches to data management faced off against relational methodologies. Relational methods won that round — just ask Oracle or the nearest DBA.

That said, there has been a small number of interesting solutions built in the intervening years. I was involved in a few in one of my past lives{{6}}, and I’ve heard of more than a few built by colleagues and friends. The majority of these solutions used ontology management as a way to streamline service configuration, and therefor ease the pain of business change. Rather than being forced to rebuild a bunch of services, you could change some definitions, and off you go.

[[6]]AAII[[6]]

What we haven’t seen is a well placed Semantic Web SPARQL{{7}} query which makes all the difference. I’m still waiting for that travel website where I can ask for a holiday, somewhere warm, within my budget, and without too many tourists who use beach towels to reserve lounge chairs at six in the morning; and get a sensible result.

[[7]]SPARQL @ w3.org[[7]]

The flexibility which we could justify in the service delivery solutions just doesn’t appear to be justifiable in the data-driven solution. A colleague showed my a Semantic Web solution that consumed a million or so pounds worth of tax payer money to build a semantic-driven database for a small art collection. All this sophisticated technology would allow the user to ask all sorts of sophisticated questions, if they could navigate the (necessarily) complicated user interface, or if they could construct an even more daunting SPARQL query. A more pragmatic approach would have built a conventional web application — one which would easily satisfy 95% of users — for a fraction of the cost.

When you come down to it, the sort of power and flexibility provided by Semantic Web and Linked Data could only be used by a tiny fraction of the user population. For most people, something which gets them most of the way (with a little bit of trial and error) is good enough. Fire and forget. While the snazzy solution with the sophisticated technology might demo well (making it good TED{{8}} fodder), it’s not going to improve the day-to-day travail for most of the population.

[[8]]TED[[8]]

Then we get solutions like GovDirect. As the website puts it:

GovDirect® facilitates reporting to government agencies such as the Australian Tax Office via a single, secure online channel enabling you to reduce the complexity and cost of meeting your reporting obligations to government.

which make it, essentially, a Semantic Web solution. Except its not, as GovDirect is built on XBRL{{9}} with a cobbled together taxonomy.

[[9]]eXtensible Business Reporting Language[[9]]

Taxonomy driven solutions, such as GovDirect might not offer the power and sophistication of a Semantic Web driven solution, but they do get the job done. These taxonomies are also more likely to be ad hoc — codifying a vendor’s solution, or accreted whilst on the job — than the result of some formal, top down ontology{{10}} development methodology (such as those buried in the Semantic Web and Linked Data).

[[10]]Ontology defined in Wikipedia[[10]]

Take Salesforce.com{{11}} as an example. If we were to develop a taxonomy to exchange CRM data, then the most likely source will be other venders reverse engineering{{12}} whatever Salesforce.com is doing. The driver, after all, is to enable clients to get their data out of Salesforce.com. Or the source might be whatever a government working group publishes, given a government’s dominant role in its geography. By extension we can also see the end of the formal standardisation efforts of old, as they devolve into the sort of information frameworks represented by XBRL, which accrete attributes as needed.

[[11]]SalesForce.com[[11]]
[[12]]Reverse engineering defined in Wikipedia[[12]]

The general trend we’re seeing is a move away from top-down, tightly defined and structured definitions of data interchange formats, as they’re replaced by bottom-up, looser definitions.