<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PEG &#187; Curse of dimensionality</title>
	<atom:link href="http://peter.evans-greenwood.com/tag/curse-of-dimensionality/feed/" rel="self" type="application/rss+xml" />
	<link>http://peter.evans-greenwood.com</link>
	<description>Trying to understand the intersection between business and technology</description>
	<lastBuildDate>Fri, 03 Sep 2010 00:25:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
<cloud domain='peter.evans-greenwood.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Why scanning more data will not (necessarily) help BI</title>
		<link>http://peter.evans-greenwood.com/2009/12/22/why-scanning-more-data-will-not-necessarily-help-bi/</link>
		<comments>http://peter.evans-greenwood.com/2009/12/22/why-scanning-more-data-will-not-necessarily-help-bi/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 03:10:19 +0000</pubDate>
		<dc:creator>peg</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Posterous]]></category>
		<category><![CDATA[The Value of Information]]></category>
		<category><![CDATA[Andrew McAfee]]></category>
		<category><![CDATA[Curse of dimensionality]]></category>
		<category><![CDATA[Peter Norvig]]></category>
		<category><![CDATA[Synthesis]]></category>
		<category><![CDATA[Value of Information]]></category>

		<guid isPermaLink="false">http://peter.evans-greenwood.com/2009/12/22/why-scanning-more-data-will-not-necessarily-help-bi/</guid>
		<description><![CDATA[I pointed out the other day, that we seem to be at a tipping point for BI. The quest for more seems to be loosing its head of steam, with most decision makers drowning in a sea of massaged and smoothed data. There are some good moves to look beyond our traditional stomping ground of [...]]]></description>
			<content:encoded><![CDATA[<p>I pointed out the other day, that <a href="http://peter.evans-greenwood.com/2009/12/16/is-bi-really-the-next-big-thing/">we seem to be at a tipping point for BI</a>. The <em>quest for more</em> seems to be loosing its head of steam, with most decision makers drowning in a sea of massaged and smoothed data. There are some good moves to <a href="http://www.capgemini.com/ctoblog/2009/12/unstructured_events_call_for_u.php">look beyond our traditional stomping ground of transactional data</a>, but the real challenge is not in considering more data, but to consider the right data.</p>
<p>Most interesting business decisions seem to be a <a href="http://peter.evans-greenwood.com/2009/10/26/the-role-of-snowmobiles-in-innovation/">synthesis process</a>. We take a handful of data and fuse them to create an insight. The <a href="http://peter.evans-greenwood.com/2009/09/14/innovation-should-not-be-the-race-for-the-new-new-thing/">invention of breath strips</a> is a case in point. We can rarely break our problem down to a single (computed) metric, the world just doesn&#8217;t work that way.</p>
<p>Most business decisions rest on small number of data points. It&#8217;s just one of our <a href="http://journals.cambridge.org/action/displayAbstract;jsessionid=3207F8D3D250591449FE181355A70FF6.tomcat1?fromPage=online&amp;aid=54187">cognitive limits</a>: our working memory is only large enough to hold (approximately) four things (concepts and/or data points) in our head at once. This is one reason that I think <a href="http://andrewmcafee.org/2006/07/the_case_against_the_business_case/">Andrew McAfee&#8217;s cut-down business case</a> works so well; it works with our human limitations rather than against them.</p>
<p>I was watching an <a href="http://www.archive.org/details/scipy09_day1_03-Peter_Norvig">interesting talk</a> the other day — <a href="http://norvig.com/">Peter Norvig</a> was providing some gentle suggestions on what features should be beneficial in a language to support scientific computing. Somewhere in the middle of the talk he mentioned the <a href="http://en.wikipedia.org/wiki/Curse_of_dimensionality">Curse of dimensionality</a>, which is something I hadn&#8217;t thought of for a while. This is the problem caused by the exponential increase in volume associated with each additional dimension of (mathematical) space.</p>
<p>In terms of the problem we&#8217;re considering, this means that if you are looking for <em>n</em> insights to a problem in a field of data (the <em>n</em> best data points to drive our decision), then finding them becomes exponentially harder for each data set (dimension) we add. More isn&#8217;t necessarily better. While the addition of new data sets (such as sourcing data from social networks) enables us to create new correlations, we&#8217;re also forced to search an exponentially larger area to find them. It&#8217;s the law of diminishing returns.</p>
<p>Our inbuilt cognitive limit only complicates this. When we hit our cognitive limit — when <em>n</em> becomes as large as we can usefully use — any additional correlations can become a burden rather than a benefit. In today&#8217;s rich and varied information environment, the problem isn&#8217;t to consider more data, or to find more correlations, its to find the best three or features in the data which will drive our decision in the right direction.</p>
<p>How do we navigate from the outside in? From the decision we need, to the data that will drive it. This is the problem I hope the <a href="http://peter.evans-greenwood.com/2009/10/12/working-from-the-outside-in/">Value of Information</a> discussion addresses.</p>
<p style="font-size: 10px"><a href="http://posterous.com">Posted via web</a> from <a href="http://pevansgreenwood.posterous.com/why-scanning-more-data-will-not-necessarily-h">PEG @ Posterous</a></p>
<img src="http://peter.evans-greenwood.com/?ak_action=api_record_view&id=1158&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://peter.evans-greenwood.com/2009/12/22/why-scanning-more-data-will-not-necessarily-help-bi/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
