Tag Archives: Data management

Taxonomies 1, Semantic Web (and Linked Data) 0

I’m not a big fan of Semantic Web{{1}}. For something that has been around for just over ten years — and which has been aggressively promoted by the likes of Tim Berners-Lee{{2}} — very little real has come of it.

Taxonomies, on the other hand, are going gangbusters, with solutions like GovDirect{{3}} showing that there is a real need for this sort of data-relationship driven approach{{4}}. Given this need, if the flexibility provided by Semantic Web (and more recently, Linked Data{{5}}) was really needed, then we would have expected someone to have invested in building significant solutions which use the technology.

While the technology behind Semantic Web and Linked Data is interesting, it seems that most people don’t think it’s worth the effort.

All this makes me think: the future of data management and standardisation is ad hoc, with communities or vendors scratching specific itches, rather than formal, top-down, theory driven approaches such as Semantic Web and Linked Data, or even other formal standardisation efforts of old.

[[1]]SemanticWeb.org[[1]]
[[2]]Tim Berners-Lee on Twitter[[2]]
[[3]]GovDirect[[3]]
[[4]]Peter Williams on the The Power of Taxonomies @ the Australian Government’s Standard Business Reporting Initiative[[4]]
[[5]]LinkedData.org[[5]]

The technologies behind the likes of Semantic Web and Linked Data have a long heritage. You can trace them back to at least the seventies when ontology and logic driven approaches to data management faced off against relational methodologies. Relational methods won that round — just ask Oracle or the nearest DBA.

That said, there has been a small number of interesting solutions built in the intervening years. I was involved in a few in one of my past lives{{6}}, and I’ve heard of more than a few built by colleagues and friends. The majority of these solutions used ontology management as a way to streamline service configuration, and therefor ease the pain of business change. Rather than being forced to rebuild a bunch of services, you could change some definitions, and off you go.

[[6]]AAII[[6]]

What we haven’t seen is a well placed Semantic Web SPARQL{{7}} query which makes all the difference. I’m still waiting for that travel website where I can ask for a holiday, somewhere warm, within my budget, and without too many tourists who use beach towels to reserve lounge chairs at six in the morning; and get a sensible result.

[[7]]SPARQL @ w3.org[[7]]

The flexibility which we could justify in the service delivery solutions just doesn’t appear to be justifiable in the data-driven solution. A colleague showed my a Semantic Web solution that consumed a million or so pounds worth of tax payer money to build a semantic-driven database for a small art collection. All this sophisticated technology would allow the user to ask all sorts of sophisticated questions, if they could navigate the (necessarily) complicated user interface, or if they could construct an even more daunting SPARQL query. A more pragmatic approach would have built a conventional web application — one which would easily satisfy 95% of users — for a fraction of the cost.

When you come down to it, the sort of power and flexibility provided by Semantic Web and Linked Data could only be used by a tiny fraction of the user population. For most people, something which gets them most of the way (with a little bit of trial and error) is good enough. Fire and forget. While the snazzy solution with the sophisticated technology might demo well (making it good TED{{8}} fodder), it’s not going to improve the day-to-day travail for most of the population.

[[8]]TED[[8]]

Then we get solutions like GovDirect. As the website puts it:

GovDirect® facilitates reporting to government agencies such as the Australian Tax Office via a single, secure online channel enabling you to reduce the complexity and cost of meeting your reporting obligations to government.

which make it, essentially, a Semantic Web solution. Except its not, as GovDirect is built on XBRL{{9}} with a cobbled together taxonomy.

[[9]]eXtensible Business Reporting Language[[9]]

Taxonomy driven solutions, such as GovDirect might not offer the power and sophistication of a Semantic Web driven solution, but they do get the job done. These taxonomies are also more likely to be ad hoc — codifying a vendor’s solution, or accreted whilst on the job — than the result of some formal, top down ontology{{10}} development methodology (such as those buried in the Semantic Web and Linked Data).

[[10]]Ontology defined in Wikipedia[[10]]

Take Salesforce.com{{11}} as an example. If we were to develop a taxonomy to exchange CRM data, then the most likely source will be other venders reverse engineering{{12}} whatever Salesforce.com is doing. The driver, after all, is to enable clients to get their data out of Salesforce.com. Or the source might be whatever a government working group publishes, given a government’s dominant role in its geography. By extension we can also see the end of the formal standardisation efforts of old, as they devolve into the sort of information frameworks represented by XBRL, which accrete attributes as needed.

[[11]]SalesForce.com[[11]]
[[12]]Reverse engineering defined in Wikipedia[[12]]

The general trend we’re seeing is a move away from top-down, tightly defined and structured definitions of data interchange formats, as they’re replaced by bottom-up, looser definitions.

Decisions are more important than data

Names and categories are important. Just look at the challenges faced by the archeology community as DNA evidence forces history to be rewritten when it breaks old understandings, changing how we think and feel in the process. Just who invaded who? Or was related to who?

We have the same problem with (enterprise) technology; how we think about the building blocks of the IT estate has a strong influence on how approach the problems we need to solve. Unfortunately our current taxonomy has a very functional basis, rooted as it is in the original challenge of creating the major IT assets we have today. This is a problem, as it’s preventing us to taking full advantage of the technologies available to us. If we want to move forward, creating solutions that will thrive in a post GFC world, then we need to think about enterprise IT in a different way.

Enterprise applications – the applications we often know and love (or hate) – fall into a few distinct types. A taxonomy, if you will. This taxonomy has a very functional basis, founded as it is on the challenge of delivering high performance and stable solutions into difficult operational environments. Categories tend to be focused on the technical role a group of assets have in the overall IT estate. We might quibble over the precise number of categories and their makeup, but for the purposes of this argument I’m going to go with three distinct categories (plus another one).

SABER
SABER @ American Airlines

First, there’s the applications responsible for data storage and coherence: the electronic filing cabinets that replaced rooms full of clerks and accountants back in the day. From the first computerised general ledger through to CRM, their business case is a simple one of automating paper shuffling. Put the data in on place and making access quick and easy; like SABER did, which I’ve mentioned before.

Next, are the data transformation tools. Applications which take a bunch of inputs and generate an answer. This might be a plan (production plan, staffing roster, transport planning or supply chain movements …) or a figure (price, tax, overnight interest calculation). State might be stored somewhere else, but these solutions still need some some serious computing power to cope with hugh bursts in demand.

Third is data presentation: taking corporate information and presenting in some form that humans can consume (though looking at my latest phone bill, there’s no attempt to make the data easy to consume). This might be billing or invoicing engines, application specific GUIs, or even portals.

We can also typically add one more category – data integration – though this is mainly the domain of data warehouses. Solutions that pull together data from multiple sources to create a summary view. This category of solutions wouldn’t exist aside from the fact that our operational, data management solutions, can’t cope with an additional reporting load. This is also the category for all those XLS spreadsheets that spread through business like a virus, as high integration costs or more important projects prevent us from supporting user requests.

A long time ago we’d bake all these layers into the one solution. SABER, I’m sure, did a bit of everything, though its main focus was data management. Client-server changed things a bit by breaking user interface from back-end data management, and then portals took this a step further. Planning tools (and other data transformation tools) started as modules in larger applications, eventually popping out as stand alone solutions when they grew large enough (and complex enough) to justify their own delivery effort. Now we have separate solutions in each of these categories, and a major integration problem.

This categorisation creates a number of problems for me. First and foremost is the disconnection between what business has become, and what technology is trying to be. Back in the day when “computer” referred to someone sitting at a desk computing ballistics tables, we organised data processing in much the same way that Henry Ford organised his production line. Our current approach to technology is simply the latest step in the automation of this production line.

Computers in the past
Computers in the past

Quite a bit has changed since then. We’ve reconfigured out businesses, we’re reconfiguring our IT departments, and we need to reconfigure our approach to IT. Business today is really a network of actors who collaborate to make decisions, with most (if not all) of the heavy data lifting done by technology. Retail chains are trying to reduce the transaction load on their team working the tills so that they can focus on customer relationships. The focus in supply chains to on ensuring that your network of exception managers can work together to effectively manage disruptions in the supply chain. Even head office focused on understanding and responding to market changes, rather than trying to optimise the business in an unchanging market.

The moving parts of business have changed. Henry Ford focused on mass: the challenge of scaling manufacturing processes to get cost down. We’re moved well beyond mass, through velocity, to focus on agility. A modern business is a collection of actors collaborating and making decisions, not a set of statically defined processes backed by technology assets. Trying to force modern business practices into yesterdays IT taxonomy is the source of one of the disconnects between business and IT that we complain so much about.

There’s no finer example of this than Sales and Operations Planning (S&OP). What should be a collaborative and fluid process – forward planning among a network of stakeholders – has been shoehorned into a traditional n-tier, database driven, enterprise solution. While an S&OP solution can provided significant cost saving, many companies find it too hard to fit themselves into the solution. It’s not surprising that S&OP has a reputation for being difficult to deploy and use, with many planners preferring to work around the system than with it.

I’ve been toying with a new taxonomy for a little while now, one that tries to reflect the decision, actor and collaboration centric nature of modern business. Rather than fit the people to the factory, which was the approach during the industrial revolution, the idea is to fit the factory to the people, which is the approach we use today post LEAN and flexible manufacturing. While it’s a work in progress, it still provides a good starting point for discussions on how we might use technology to support business in the new normal.

In no particular order…

Fusion solutions blend data and process to create a clear and coherent environment to support specific roles and decisions. The idea is to provide the right data and process, at the right time, in a format that is easy to consume and use, to drive the best possible decisions. This might involve blending internal data with externally sourced data (potentially scraped from a competitor’s web site); whatever data required. Providing a clear and consistent knowledge work environment, rather than the siloed and portaled environment we have today, will improve productivity (more time on work that matters, and less time on busy work) and efficiency (fewer mistakes).

Next, decisioning solutions automate key decisions in the enterprise. These decisions might range from mortgage approvals through office work, such as logistics exception management, to supporting knowledge workers workers in the field. We also need to acknowledge that decisions are often decision making processes which require logic (roles) applied over a number of discrete steps (processes). This should not be seen as replacing knowledge workers, as a more productive approach is to view decision automation as a way of amplifying our users talents.

While we have a lot of information, some information will need to be manufactured ourselves. This might range from simple charts generated from tabular data, through to logistics plans or maintenance scheduling, or even payroll.

Information and process access provide stakeholders (both people and organisations) with access to our corporate services. This is not your traditional portal to web based GUI, as the focus will be on providing stakeholders with access wherever and whenever they need, on whatever device they happen to be using. This would mean embedding your content into a Facebook app, rather than investing in a strategic portal infrastructure project. Or it might involve developing a payment gateway.

Finally we have asset management, responsible for managing your data as a corporate asset. This looks beyond the traditional storage and consistency requires for existing enterprise applications to include the political dimension, accessibility (I can get at my data whenever and wherever I want to) and stability (earthquakes, disaster recovery and the like).

It’s interesting to consider the sort of strategy a company might use around each of these categories. Manufacturing solutions – such as crew scheduling – are very transactional. Old data out, new data in. This makes them easily outsourced, or run as a bureau service. Asset management solutions map very well to SaaS: commoditized, simple and cost effective. Access solutions are similar to asset management.

Fusion and decisioning solutions are interesting. The complete solution is difficult to outsource. For many fusion solutions, the data and process set presented to knowledge workers will be unique and will change frequently, while decisioning solutions contain decisions which can represent our competitive advantage. On the other hand, it’s the intellectual content in these solutions, and not the platform, which makes them special. We could sell our platform to our competitors, or even use a commonly available SaaS platform, and still retain our competitive advantage, as the advantage is in the content, while our barrier to competition is the effort required to recreate the content.

This set of categories seems to map better to where we’re going with enterprise IT at the moment. Consider the S&OP solution I mention before. Rather than construct a large, traditional, data-centric enterprise application and change our work practices to suit, we break the problem into a number of mid-sized components and focus on driving the right decisions: fusion, decisioning, manufacturing, access, and asset management. Our solution strategy becomes more nuanced, as our goal is to blend components from each category to provide planners with the right information at the right time to enable them to make the best possible decision.

After all, when the focus is on business agility, and when we’re drowning in a see of information, decisions are more important than data.

Inside vs. Outside

As Andy Mullholland pointed out in a recent post, all too often we manage our businesses by looking out the rear window to see where we’ve been, rather than looking forward to see where we’re going. How we use information too drive informed business decisions has a significant impact on our competitiveness.

I’ve made the point previously (which Andy built on) that not all information is of equal value. Success in today’s rapidly changing and uncertain business environment rests on our ability to make timely, appropriate and decisive action in response to new insights. Execution speed or organizational intelligence are not enough on their own: we need an intimate connection to the environment we operate in. Simply collecting more historical data will not solve the problem. If we want to look out the front window and see where we’re going, then we need to consider external market information, and not just internal historical information, or predictions derived from this information.

A little while ago I wrote about the value of information. My main point was that we tend to think of most information in one of two modes—either transactionally, with the information part of current business operations; or historically, when the information represents past business performance—where it’s more productive to think of an information age continuum.

The value of information
The value of information

Andy Mulholland posted an interesting build on this idea on the Capgemini CTO blog, adding the idea that information from our external environment provides mixed and weak signals, while internal, historical information provides focused and strong signals.

The value of information and internal vs. external drivers
The value of information and internal vs. external drivers

Andy’s major point was that traditional approaches to Business Intelligence (BI) focus on these strong, historical signals, which is much like driving a car by looking out the back window. While this works in a (relatively) unchanging environment (if the road was curving right, then keep turning right), it’s less useful in a rapidly changing environment as we won’t see the unexpected speed bump until we hit it. As Andy commented:

Unfortunately stability and lack of change are two elements that are conspicuously lacking in the global markets of today. Added to which, social and technology changes are creating new ideas, waves, and markets – almost overnight in some cases. These are the ‘opportunities’ to achieve ‘stretch targets’, or even to adjust positioning and the current business plan and budget. But the information is difficult to understand and use, as it is comprised of ‘mixed and weak signals’. As an example, we can look to what signals did the rise of the iPod and iTunes send to the music industry. There were definite signals in the market that change was occurring, but the BI of the music industry was monitoring its sales of CDs and didn’t react until these were impacted, by which point it was probably too late. Too late meaning the market had chosen to change and the new arrival had the strength to fight off the late actions of the previous established players.

We’ve become quite sophisticated at looking out the back window to manage moving forward. A whole class of enterprise applications, Enterprise Performance Management (EPM), has been created to harvest and analyze this data, aligning it with enterprise strategies and targets. With our own quants, we can create sophisticated models of our business, market, competitors and clients to predict where they’ll go next.

Robert K. Merton: Father of Quants
Robert K. Merton: Father of Quants

Despite EPM’s impressive theories and product sheets, it cannot, on its own, help us leverage these new market opportunities. These tools simply cannot predict where the speed bumps in the market, no matter how sophisticated they are.

There’s a simple thought experiment economists use to show the inherent limitations in using mathematical models to simulate the market. (A topical subject given the recent global financial crisis.) Imagine, for a moment, that you have a perfect model of the market; you can predict when and where the market will move with startling accuracy. However, as Sun likes to point out, statistically, the smartest people in your field do not work for your company; the resources in the general market are too big when compared to your company. If you have a perfect model, then you must assume that your competitors also have a perfect model. Assuming you’ll both use these models as triggers for action, you’ll both act earlier, and in possibly the same way, changing the state of the market. The fact that you’ve invented a tool to predicts the speed bumps causes the speed bumps to move. Scary!

Enterprise Performance Management is firmly in the grasp of the law of diminishing returns. Once you have the critical mass of data required to create a reasonable prediction, collecting additional data will have a negligible impact on the quality of this prediction. The harder your quants work, the more sophisticated your models, the larger the volume of data you collect and trawl, the lower the incremental impact will be on your business.

Andy’s point is a big one. It’s not possible to accurately predict future market disruptions with on historical data alone. Real insight is dependent on data sourced from outside the organization, not inside. This is not to diminish the important role BI and EPM play in modern business management, but to highlight that we need to look outside the organization if we are to deliver the next step change in performance.

Zara, a fashion retailer, is an interesting example of this. Rather than attempt to predict or create demand on a seasonal fashion cycle, and deliver product appropriately (an internally driven approach), Zara tracks customer preferences and trends as they happen in the stores and tries to deliver an appropriate design as rapidly as possible (an externally driven approach). This approach has made Zara the most profitable arm of Inditex, a holding company of eight retail brands, and one of the biggest success stories in Spanish business. You could say that Quants are out, and Blink is in.

At this point we can return to my original goal: creating a simple graphic that captures and communicates what drives the value of information. Building on both my own and Andy’s ideas we can create a new chart. This chart needs to capture how the value of information is effected by age, as well as the impact of externally vs. internally sourced. Using these two factors as dimensions, we can create a heat map capturing information value, as shown below.

Time and distance drive the value of information
Time and distance drive the value of information

Vertically we have the divide between inside and outside: internally created from processes; though information at the surface of our organization, sourced from current customers and partners; to information sourced from the general market and environment outside the organization. Horizontally we have information age, from information we obtain proactively (we think that customer might want a product), through reactively (the customer has indicated that they want a product) to historical (we sold a product to a customer). Highest value, in the top right corner, represents the external market disruption that we can tap into. Lowest value (though still important) represents an internal transactional processes.

As an acid test, I’ve plotted some of the case studies mentioned in to the conversation so far on a copy of this diagram.

  • The maintenance story I used in my original post. Internal, historical data lets us do predictive maintenance on equipment, while  external data enables us to maintain just before (detected) failure. Note: This also applies tasks like vegetation management (trimming trees to avoid power lines), as real time data and be used to determine where vegetation is a problem, rather than simply eyeballing the entire power network.
  • The Walkman and iPod examples from Andy’s follow-up post. Check out Snake Coffee for a discussion on how information driven the evolution of the Walkman.
  • The Walmart Telxon story, using floor staff to capture word of mouth sales.
  • The example from my follow-up (of Andy’s follow-up), of Albert Heijn (a Dutch Supermarket group) lifting the pricing of ice cream and certain drinks when the temperature goes above 25° C.
  • Netflix vs. (traditional) Blockbuster (via. Nigel Walsh in the comments), where Netflix helps you maintain a list of files you would like to see, rather than a more traditional brick-and-morter store which reacts to your desire to see a film.

Send me any examples that you know of (or think of) and I’ll add them to the acid test chart.

An acid test for our chart
An acid test for our chart

An interesting exercise left to the reader is to map Peter Drucker’s Seven Drivers for change onto the same figure.

Update: A discussion with a different take on the value of information is happening over at the Information Architects.

Update: The latest instalment in this thread is Working from the outside in.

Update: MIT Sloan Management Review weighs in with an interesting article on How to make sense of weak signals.