Why scanning more data will not (necessarily) help BI

I pointed out the other day, that we seem to be at a tipping point for BI. The quest for more seems to be loosing its head of steam, with most decision makers drowning in a sea of massaged and smoothed data. There are some good moves to look beyond our traditional stomping ground of transactional data, but the real challenge is not in considering more data, but to consider the right data.

Most interesting business decisions seem to be a synthesis process. We take a handful of data and fuse them to create an insight. The invention of breath strips is a case in point. We can rarely break our problem down to a single (computed) metric, the world just doesn’t work that way.

Most business decisions rest on small number of data points. It’s just one of our cognitive limits: our working memory is only large enough to hold (approximately) four things (concepts and/or data points) in our head at once. This is one reason that I think Andrew McAfee’s cut-down business case works so well; it works with our human limitations rather than against them.

I was watching an interesting talk the other day — Peter Norvig was providing some gentle suggestions on what features should be beneficial in a language to support scientific computing. Somewhere in the middle of the talk he mentioned the Curse of dimensionality, which is something I hadn’t thought of for a while. This is the problem caused by the exponential increase in volume associated with each additional dimension of (mathematical) space.

In terms of the problem we’re considering, this means that if you are looking for n insights to a problem in a field of data (the n best data points to drive our decision), then finding them becomes exponentially harder for each data set (dimension) we add. More isn’t necessarily better. While the addition of new data sets (such as sourcing data from social networks) enables us to create new correlations, we’re also forced to search an exponentially larger area to find them. It’s the law of diminishing returns.

Our inbuilt cognitive limit only complicates this. When we hit our cognitive limit — when n becomes as large as we can usefully use — any additional correlations can become a burden rather than a benefit. In today’s rich and varied information environment, the problem isn’t to consider more data, or to find more correlations, its to find the best three or features in the data which will drive our decision in the right direction.

How do we navigate from the outside in? From the decision we need, to the data that will drive it. This is the problem I hope the Value of Information discussion addresses.

Posted via web from PEG @ Posterous