Tag Archives: Andrew McAfee

Why scanning more data will not (necessarily) help BI

I pointed out the other day, that we seem to be at a tipping point for BI. The quest for more seems to be loosing its head of steam, with most decision makers drowning in a sea of massaged and smoothed data. There are some good moves to look beyond our traditional stomping ground of transactional data, but the real challenge is not in considering more data, but to consider the right data.

Most interesting business decisions seem to be a synthesis process. We take a handful of data and fuse them to create an insight. The invention of breath strips is a case in point. We can rarely break our problem down to a single (computed) metric, the world just doesn’t work that way.

Most business decisions rest on small number of data points. It’s just one of our cognitive limits: our working memory is only large enough to hold (approximately) four things (concepts and/or data points) in our head at once. This is one reason that I think Andrew McAfee’s cut-down business case works so well; it works with our human limitations rather than against them.

I was watching an interesting talk the other day — Peter Norvig was providing some gentle suggestions on what features should be beneficial in a language to support scientific computing. Somewhere in the middle of the talk he mentioned the Curse of dimensionality, which is something I hadn’t thought of for a while. This is the problem caused by the exponential increase in volume associated with each additional dimension of (mathematical) space.

In terms of the problem we’re considering, this means that if you are looking for n insights to a problem in a field of data (the n best data points to drive our decision), then finding them becomes exponentially harder for each data set (dimension) we add. More isn’t necessarily better. While the addition of new data sets (such as sourcing data from social networks) enables us to create new correlations, we’re also forced to search an exponentially larger area to find them. It’s the law of diminishing returns.

Our inbuilt cognitive limit only complicates this. When we hit our cognitive limit — when n becomes as large as we can usefully use — any additional correlations can become a burden rather than a benefit. In today’s rich and varied information environment, the problem isn’t to consider more data, or to find more correlations, its to find the best three or features in the data which will drive our decision in the right direction.

How do we navigate from the outside in? From the decision we need, to the data that will drive it. This is the problem I hope the Value of Information discussion addresses.

Posted via web from PEG @ Posterous

Is BI really the next big thing?

I think we’re at a tipping point with BI. Yes, it makes sense that BI should be the next big thing in the new year, as many pundits are predicting, driven by the need to make sense of the massive volume of data we’re accumulated. However, I doubt that BI in its current form is up to the task.

As one of the CEOs Andy Mulholland spoke to mentioned “I want to know … when I need to focus in.” The CEO’s problem is not more data, but the right data. As Andy rightfully points out in an earlier blog post, we’ve been focused on harvesting the value from our internal, manufactured data, ignoring the latent potential in our unstructured data (let alone the unstructured data we can find outside the enterprise). The challenge is not to find more data, but the right data to drive the CEO’s decision on where to focus.

It’s amazing how little data you need to make an effective decision—if you have the right data. Andrew McAfee wrote a nice blog post a few years ago (The case against the business case is the closest I can find to it), pointing out that the mass of data we pile into a conventional business case just clouds the issues, creating long cause-and-effect chains that make it hard to come to an effective decision. His solution was the one page business case: capability delivered, (rough) business requirements, solution footprint, and (rough) costing. It might be one page, but there is enough information, the right information, to make an effective decision. I’ve used his approach ever since.

Current BI seems to be approaching the horse from the wrong direction, much like Andrew’s business case problem. We focus on sifting through all the information we have, trying to glean any trends and correlations which might be useful. This works as small to moderate scales, but once we reach the huge end of the scale it starts to groan under its own weight. It’s the law of diminishing returns—adding more information to the mix will only have a moderate benefit compared to the effort required to integrate and process it.

A more productive method might be to use a hypothesis-driven approach. Rather than look for anything that might be interesting, why not go spelunking for specific features which we know will be interesting?  The features we’re looking for in the information are (almost always) to support a decision. Why not map out that decision, similar to how we map out the requires for a feedback loop in a control system, and identify the types of features that we need to support the decision we want to make? We can segment our data sets based on the features’ gross characteristics (inside vs. outside, predictive vs. historical …) and then search in the appropriate segments for the features we need. We’ve broken one large problem—find correlations in one massive data set—into a series of much more manageable tasks.

The information arms race, the race to search through more information for that golden ticket, is just a relic of the lack of information we’ve lived with in the past. In today’s land of plenty, more is not necessarily better. Finding the right features is our real challenge.

Posted via email from PEG @ Posterous