Archive for April, 2007

Chance Discovery - is there a there there?

I suppose the “there there” phrase was well enough known, but to avoid being sloppy I traced down the original in WikiQuote

“What was the use of my having come from Oakland it was not natural to have come from there yes write about if I like or anything if I like but not there, there is no there there.”

I have been trying to come to grips with the “Chance Discovery Movement”, to form an opinion on it, to see if it has some substance.

It’s not so easy to nail it down - its origins are in Japan, the documentation and papers tend to be in “Japanese English”, rather confusing and somewhat contradictory, it might be thought that “chance discovery” is at the heart of what we routinely do (after all, the examination of residuals from models can give us that aha moment, as can rearranging things into groups, as can visualization..).

And yet, lots of good things come out of Japan. Remember Taguchi. Taguchi designs were widely criticized amongst Western statistical academics (as being less efficient than theoretically possible) .. but they were enthusiastically and widely adopted by industry.

Score 1 for the Japanese, 0 for the statisticians who worked on experimental design theory but were not able to convince Western industry to put them into practice in routine quality control and process improvement.

So, perhaps Prof. Yukio Ohsawa is also onto something here.

Let me clarify things as far as I am able : some material loosely adapted from around the web. See ChanceDiscovery.com for more details and some concrete examples here.

Chance Discovery is the discovery of chance, rather than discovery by chance.

A “chance” here means a new event/situation that can be conceived either as an opportunity or as a risk in the future.

The “discovery” of chances is of crucial importance since it may have a significant impact on human decision making. Desirable effects of opportunities should be actively promoted, whereas preventive measures should be taken in the case of discovered risks.

In other words, Chance Discovery aims to provide means for inventing or surviving the future, rather than simply predicting the future.

.. Chance Discovery is a research to study how to discover rare or novel events causing potentially significant situation. Although the event itself could not be significant.

Indeed, some data mining techniques can be applied to Chance Discovery. However, they are not sufficient. Since, usually, conventional data mining shows average events.

Our main target is to study how to discover rare or novel events. They are not average
matters but exceptions.

So, is it just analysis of residuals?

Ok, the discovery of opportunities and possibilities, it would appear. Or the discovery of exceptions .. as in, cases that don’t fit the rules. This sounds a lot like residual analysis to me, but with the intention of “finding the unusual” and “generating insights” : good, just the sort of work I like to do.

In other words, it is not the average that is informative, but the deviations from the average.

Perhaps this is not normal practice for statisticians, and perhaps not for those dealing with large datasets (where, by virtue of the size, you are going to get many outliers and manually examining them would be tedious ). Also a lot of statistical analysis systems do not have good “drill down” features, so that you can examine individual cases.

However, it is a practice that I strongly advocate. I was recently doing some work with house price data .. examining those cases where the predicted price was wildly different from the actual price was very informative.

the “KeyGraph”

The technique most commonly employed in “Chance Discovery” is called KeyGraph. It appears to be a form of data visualization related to multidimensional scaling and perhaps Self Organizing maps, where the input data is a matrix of frequencies of co-occurrence. There is an additional step of separating the items (nodes) into “islands” and “bridges” .. David Goldberg ( he of Genetic Algorithms fame) has done some work on simplifying these maps.

See here for examples of KeyGraph applied to textual content analysis (about wines, politics,), event analysis (earthquakes), customer fashion and textiles.

For more details on the earthquake app, see

Ohsawa, Yukio, “KeyGraph as Risk Explorer in Earthquake-Sequence” . Journal of Contingencies and Crisis Management, Vol. 10, pp. 119-128, 2002

Abstract:
KeyGraph, a document-indexing (keyword-extraction) algorithm, is applied for a new purpose: Extracting active faults with risks of near-future large earthquakes from earthquake-sequences. This paper presents KeyGraph as an extractor of causalities from an event-sequence. This validates KeyGraph as a tool for showing why and which active faults are risky, as well as for showing why and which words abstract a document. The risky faults that are empirically obtained by KeyGraph correspond closely to real earthquake occurrences and seismologists’ risk estimation.

Now, earthquake prediction is a really big deal and I am not overly inclined to believe they have got that problem solved.

BUT, note that they redefine the problem somewhat to “showing why and which active faults are risky”, which seems to me sensible and responsible. It does not sound like they are attempting to assign probabilities (which would all be small anyway, hence hard to discriminate between), just presenting the mechanisms in such a way that the end-user can form informed assessments.

Note that the “chance” in this context is not so much related to estimation of probability but to understanding of opportunity.

So, “chance” as used in the Chance Discovery Movement is quite an overloaded term. Peter McBurney’s book “Chance Discovery (Advance Information Processing)” which I have not yet read, does, to its credit, devote a chapter to the meaning and etymology of chance ..

* chance as Opportunity
* chance as Probability
* chance as Fortune
* chance as Degree of Risk

McBurney has also written on “chance discovery” more widely .. you might like to peruse his paper “Chance Discovery Using Dialectical Argumentation” which is fairly heavy going and theoretical but of some interest.

Of course, discovering chance is by no means certain and so there is some simplification work afoot.

A successful process of chance discovery using the visual maps proposed by KeyGraphs requires the usage of graphs with an appropriate degree of complexity.

Complex KeyGraphs often prevent users from discovering chances because of the difficulties of interpretation. On the other hand, overly simplistic KeyGraphs seldom includes a chance because of the sparseness of information.

In a useful KeyGraph the concept clusters should be easy to find, the clusters should be easy to understand, and the relations among them should be easy to comprehend and help in the process of chance identification.

This paper systematize the process of KeyGraph exploration by means of evolutionary computation, as well as structural graph properties such as small-world topologies. The proposed techniques are successfully applied to create useful KeyGraphs for chance discovery from several documents.

Hmm. Yes, visual representations should be accessible but non-trivial. I guess that is what we have been doing for the last few decades.

Genetic Algorithms and Creativity

The connection with genetic algorithms is interesting

Since the appearances of Applied Imagination by Osborn in 1953 (Osborn, 1953), the need for methodological approaches to innovation and creativity has become a key element to success in field such as decision making, problem solving, or total quality management to mention a few. Usually, such methodologies require the usage of manual protocols, becoming tedious and counterproductive if not performed correctly. Moreover, techniques such as brainstorming face handicaps raised by the skills of the session leader and the users’ fatigue that bound the duration of the manual protocols of such creative sessions.

Yes fair enough. There IS a need to be more systematic about creativity.

Genetic algorithms are a core technology for the innovation technology endeavor. Starting in 1983 (Goldberg, 1983), Goldberg (Goldberg, 2002b) developed the so called fundamental intuition of genetic algorithms, or the innovation intuition. Specifically, the innovation intuition of GAs is about the work together of: (1) selection and mutation, and (2) selection and recombination.

Moreover, the innovation intuition of GAs provide a facet-wise modeling of human innovation.

This approach models two orthogonal facets of human innovation.

Selection + mutation = Continual improvement.

Selection and mutation working together are a form of hill-climbing mechanism. Mutation suggests variants in the neighborhood of the current solutions; selection acts as the decision process which accepts improving changes with a high probability.

This simple model describes one of the facets of human innovation, the so called continual improvement in total quality management literature, or as Japanese call it, kaizen.

Selection + crossover = innovation.

Another facet of human innovation is the so called cross-fertilizing innovation. People usually grasp a set of good solution features in one context, and a notion in another context and juxtaposing them, thereby speculating that the combination might be better than either notion taken individually. Taking together selection and crossover, GAs are a computation model of cross-fertilizing innovation.

GAs also are main role players for the innovation technology revolution. As early mentioned, humans are to become the main measure of such a technology. Pervasive GA-guided interaction between human and computers opens a new research path to creativity- and innovation-support.

Two well-known models of such support are interactive GAs, and human-based GAs. Interactive GAs (iGAs) replace the computer computation of the relative fitness of solutions and the selection process by the judgment of a human evaluation. More detailed information about the progress of interactive GAs and interactive evolutionary computation (iEC) are presented in a review by Takagi (Takagi, 2001). Whereas iGAs replace the evaluation and selection by the human judgment, human based GAs (HBGAs) (Kosoruko & Goldberg, 2002) move one step further and permit evaluation, selection, and variation to be performed by a human. For such reasons, the previous facets of GAs may be regarded as a first order model of human innovation.

OK. I can see how a GA could lead to insights. I’d like to give it a good try in practice, though.

The paper “The Evolutionary Path to Innovation and Creativity” is worth a read.

Other references and resources of interest

* Chance Discovery (Advance Information Processing)

Chance discovery means discovering chances - the breaking points in systems, the marketing windows in business, etc. It involves determining the significance of some piece of information about an event and then using this new knowledge in decision making. The techniques developed combine data mining methods for finding rare but important events with knowledge management, groupware, and social psychology. The reader will find many applications, such as finding information on the Internet, recognizing changes in customer behaviour, detecting the first signs of an imminent earthquake, etc. This first book dedicated to chance discovery covers the state of the art in the theory and methods and examines typical scenarios, and it thus appeals to researchers working on new techniques and algorithms and also to professionals dealing with real-world applications. ”

* Chance Discovery Consortium - Application Examples. http://www.chancediscovery.com/english/modules/tinycontent/index.php?id=7

* Discovery of Emerging Topics between Communities on WWW . http://www.springerlink.com/content/a42a9mdmwt9hy9ek/

In the real world, discovering new topics covering profitable items and ideas (e.g., mobile phone, global warming, human genome project, etc) is important and interesting. However, since we cannot completely encode the world surrounding us, it’s difficult to detect such topics and their mechanisms in advance. In order to support the detection, we show a method for revealing the structure of WWW by using the KeyGraph algorithm. Empirical results are reported

* DISCUS - Distributed Innovation and Scalable Collaboration in Uncertain Settings. http://www-discus.ge.uiuc.edu/blojsom/blog/default/Presentations/

* Chance Discovery: An Emerging Japanese Marketing Analysis http://www.linkshare.com/pdf/chance_discovery.pdf

So, is there a there there?

Quite possibly.

If I was running a leading agency with deep pockets, I’d start reading and thinking and designing some test projects to see how far I can go with this. But I would not expect it to be easy to get a good assessment of the potential, to separate out the hype and the re-invention of the wheel from the sensible foundation, to plough through the hyperbole.

* Computer assisted creativity? Why not?

* Opportunity discovery? why not?

* Insight building? why not?

Remember the Taguchi revolution. Is Ohsawa the new Taguchi?

Comments (2)

GapMinder - Visualizing World Data

Apparently now acquired by Google, GapMinder does a 3D animation across time of some world “statistics”, probably using some client side javascript (to drive the slider) and maybe some AJAX .. see for example the plot of “life expectancy” vs “income per capita”.

Very pretty. You can select countries and move the slider around to see the progress over the years, using trails (like mouse trails).. some are somewhat linear, others not so (for instance Papua New Guinea).

Very pretty.

But is the underlying data correct?

Life expectancy seems to be a bit of a curious beast, if one means life expectancy at birth, involving as it does some imputation from historical records. And Government statistics may not always record births and deaths scrupulously : what about the impact of wars? (for example, Japan has the highest “life expectancy” of 81 years .. hmm, how did Hiroshima and Nagasaki get factored into the calculations?) . I wonder too if, 50 or 80 years ago, early childhod deaths were recorded accurately and if deaths from natural causes also were counted - this is a society that has seen a massive shift from urban to rural living. China? Same problems, magnified.

So, it’s pretty .. but should one use such a tool for anything other than entertainment?

Gapminder says

Make sense of the world by having fun with statistics

Gapminder and Google share an enthusiasm for technology that makes data easily accessible and understandable to the world. Gapminder’s Trendalyzer software unveils the beauty of statistics by converting boring numbers into enjoyable interactive animations.

We believe that Google’s acquisition of Trendalyzer will speed up the achievement of this noble goal. Trendalyzer’s developers have left Gapminder to join Google in Mountain View, where Google intends to improve and scale up Trendalyzer, and make it freely available to those who seek access to statistics.

The Stockholm-based Gapminder Foundation will continue to spearhead the use of new technology for data animations. The goal is to promote a fact-based worldview by bringing statistical story-telling to new levels. In collaboration with producers of accurate statistics that are eager to give the public free access to databases, Gapminder hopes to recruit and inspire many users of public statistics.

Personally I would rather see the fostering of some health scepticism about the quality of trans-national “public statistics” than a simple minded attempt to “make statistics fun

There are some pointers to visualization techniques here, some plots and examination of life expectancy relationships here, and for those interested in issues about official statistics (death count estimation), albeit survey based, have a look at Death Count Estimation in Iraq

Comments

« Previous entries ·