There has been a lot of different news about data mining the last week or so. Youâve probably seen the coverage in the NY Times about project âAble Dangerâ¿ in which âa small, highly classified military intelligence unit identified Mohammed Atta and three other future hijackers as likely members of a cell of Al Qaeda operating in the United States.â¿ The fallout is predictable: confusion, Monday-morning quarterbacking and finger pointing all around because it looks like that DOD intelligence unit was able to âconnect the dotsâ¿ but couldnât get the info the attention it deserved. Slate has a number of links to the stories, some background and some political analysis. Government Security News has many more details.
Meanwhile, the Federation of American Scientists has posted a June 2005 Congressional Research Service report on data mining. It is just an overview (as it purports to be), yet it is so superficial that it strikes me that it was written to identify the political potholes rather than inform Congress about data mining. It discusses only two examples of data mining: the now disbanded Total Information Awareness program â killed by Congress in 2003 because of public outcry over its potential to violate privacy and civil liberties. TIA was an early victim of politics; the technology didnât get much of a chance to prove itself. The other example of data mining in the CRS report is CAPPS II, which was also scrapped (in 2004) because of privacy concerns. Neither program is a fair example of the utility of data mining.
Take a look at FCWâs search results for âdata miningâ¿ for just this year Thereâs about two stories per month, covering just a fraction of the data mining applications in government, let alone industry or intelligence. Wiredâs recent article, Analyze This: Combining Data, discusses unstructured-data analysis, mostly in commercial applications. If itâs not everywhere now, it will be shortly.
Last week, IBM announced an open source framework for analyzing unstructured data or text within web pages, email, audio and video. IBMâs Unstructured Information Management Architecture (UIMA) has been in development for four years and has received support from DARPA and several major universities and companies. According to Physorg.com, UIMA is already in use by the partners who worked with IBM on the project:
The contributors included several leading universities, along with industrial research and development organizations. Some of the universities that participated, such as Carnegie Mellon University, Columbia University, Stanford University and The University of Massachusetts Amherst, are already using UIMA in courses and research projects. The other organizations actively supporting and using UIMA include Science Applications International Corp., BBN Technologies, The Mayo Clinic and MITRE Corporation. In addition, widespread commercial adoption of UIMA was announced today among more than 15 software vendors.
Maybe with this much adacemic and commercial support, accompanied by some solid privacy policies, people will see data mining as a useful tool in an increasingly unstructured and complex world.
View Comments
There are currently no comments to display.
Post a Comment
To post a comment, you must be a registered user of FCW.com and be logged in. Use one of the forms below to login or register for FREE to FCW.com. To protect your privacy, you can use an alias as your username.