Search FCW


Subscribe Now!
Table of Contents
Sprint
Business
BPM
CXOs
Columns
Columnists
Defense
E-Government
Elections 2008
Enterprise Architecture
Funding
Homeland Security
Health IT
IPv6
LOB
Management
Procurement
Privacy
Policy
Program Management
State and Local
Security
Technology
Telework
Training and Certification
Workforce

More Topics
resourcecenter
Home
Letters to the Editor
Current Issue/Download
Print/Online Archives
Editorial Calendar
researchstore
resourcecenter
Communications for Continuity Operations

Oracle Resource Center
NEW! Transforming Data Center
Managed Services
Service Oriented Architecture
Training & Simulation
Networking Communications
Security Directives and Compliance
Data Center Virtualization
Air Force ELSG Contract Guide

More >>


FCW.com BLOG

Latest News
ADVERTISEMENT





 
Culture and Context:

The privacy-discovery paradox

By Susan Miller
Published on January 31, 2006 - 03:51 AM

Comment

Click here to comment on this blog


Newsletters

You might also be interested in these FCW newsletters:

Daily

To learn more, click here.


A recent story from Wired, "Science Puts Enron E-Mail to Use," talks about the Enron e-mail data dump that the Federal Energy Regulatory Commission made public. While some have been browsing through the emails for off-color jokes, recipes, directions and mentions of politicians, others are finding some value in the process of analyzing the data. According to Wired, scientists, students and at least two businesses are all working with the unstructured info, trying to find ways to mine this kind of dataset (called the Enron corpus) for usable info.

In 2004, professor Marti Hearst at the University of California at Berkeley School of Information Management & Systems tasked students in her natural-language-processing course with cleaning up the database to make it searchable.

"It is a way for students to see -- when they run text-classification algorithms on e-mail messages versus newsgroups -- how well those would do," Hearst said. "E-mail is one of the more difficult kinds of information to process."

While Hearst says the jury is still out on the usefulness of the Enron corpus for researchers, she argues that these kinds of shared corpuses are key to advancing computer science research rapidly, as they allow different algorithms to be compared.


What the article doesn't say is that as technology becomes more adept at teasing meaningful information out of unstructured data, the opportunity to create better privacy policies also increases.

Would you agree?

View Comments

There are currently no comments to display.


Post a Comment

To post a comment, you must be a registered user of FCW.com and be logged in. Use one of the forms below to login or register for FREE to FCW.com. To protect your privacy, you can use an alias as your username.

Login to FCW.com

E-mail Address:
Password:
Forgot your password?
Register and Post Comment

* First Name:
* Last Name:
* E-mail Address:
* Password:
* Retype Password:
* Blog Username:
* Comments:


E-mail me when new comments are posted in this thread?


upcoming event

Green Computing Summit, Ronald Reagan Building, Washington, DC
December 2 - December 3, 2008

Trusted Internet Connection and the Comprehensive National Cyber Security Initiative, The Willard Intercontinental Hotel, Washington, DC
December 4, 2008


 

head
fcw
issue
First Name State
Last Name Zip
Title Email