Full Story Full Story







0


Flu search trends for U.S. and Brazil, showing different seasonal patterns of outbreaks (Copyright 2011. Google)

FEATURE

Big Data: The New Frontier of Analytics

Steve Wildstrom
June 27 , 2011

It used to be if you wanted to run a big database, you got yourself a big computer.  For a really big database, that meant a big mainframe or a cluster of hefty servers. But today, businesses and researchers alike are interested in vast collections of data that would swamp even a supercomputer and overwhelm any standard database management software.

Welcome to the world of big data. The exact definition of big data is a bit slippery, but Wikipedia does quite well: "Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set." Examples of such data sets range from billions of Google searches conducted by millions of users to the data collected by millions of weather sensors around the globe to all the purchases of British supermarket shoppers.

The amounts of data collected can be staggering. According to the report "Big Data: The Next Frontier for Innovation, Competition, and Productivity" by the McKinsey Global Institute, UK-based retailer Tesco collects 1.5 billion items of data on customer behavior every month while Facebook shares 30 billion user-produced items of content each month.

Big data enables analysis that is different in kind, not just in scale, from conventional database analysis by allowing analysts to discover information they did not know was in the data. For example, the tracking of disease outbreaks has long depended on the slow filing and compilation of reports by doctors and hospitals. But for the past couple of years, Google has been ahead of public health authorities in monitoring flu outbreaks by compiling public searches for flu-related information by geography. (View diagram of Flu search avtivity above. Click here for the original.)

Another area of health use is gleaning useful treatment information from millions of patient medical records as these go electronic. Predictive Medical Technologies analyzes records of intensive care patients to detect events that might be signals of adverse events, such as cardiac arrest or arrhythmia. Once trends are identified, real-time monitoring of patients can spot similar patterns and give doctors critical early warning.

Big data raises new technical and privacy issues that must be dealt with for the technology to reach its potential. Traditional databases typically run on a single computer or a tightly integrated cluster of servers. Queries are run against the entire database in fairly straightforward fashion, though tremendous effort can go into tweaking performance to the maximum.

Big data, by contrast, is often found on distributed systems that involve hundreds, thousands, or in extreme cases such as Google, millions of servers that are often dispersed all over the globe, linked either by private networks or, more often, the public internet. Efficient processing requires high bandwidth, low latency network links, particularly if data are being used in anything approaching real time.

Analyzing the data also requires different software techniques. Probably the most important is MapReduce, a procedure developed (and patented) by Google for running queries across its vast network of servers. It provides tools that, in rough terms, map just where the data are located in the maze of servers and then collect the desired records into a manageable dataset for analysis. Apache Hadoop, originally developed by Yahoo!, is a widely used open-source version of MapReduce.

Managing the privacy implications of big data may be more difficult. When enough data about an individual is collected, it may become possible to identify a person uniquely even though none of the information is classed as "personally identifiable," a process known as "de-anonymization." So far, this threat is largely theoretical but fears about the privacy implications of big data will have to be addressed if the technology is to reach its full potential.

The contents or opinions in this feature are independent and do not necessarily represent the views of Cisco. They are offered in an effort to encourage continuing conversations on a broad range of innovative, technology subjects. We welcome your comments and engagement.

Related Tags: Storage Networking , Healthcare

 
Web Content Display Web Content Display
 
Web Content Display Web Content Display
Web Content Display Web Content Display

The Network is offering Google Translate in an effort to more easily share our content with a global audience. As these are free, machine translations, we cannot verify that all translations are accurate.

Translate


Web Content Display Web Content Display

Web Content Display Web Content Display

Stay Connected


Cisco Facebook @CiscoSystems on Twitter Cisco on Google Plus (+)
Cisco on LinkedIn Cisco on Pinterest Cisco on Youtube
Cisco - The Network RSS Cisco - The Network Podcast on iTunes Mynewswire Email - Cisco Newsletter
Web Content Display Web Content Display

Leadership@Cisco

Web Content Display Web Content Display
 
Web Content Display Web Content Display

MyNewswire Subscription

Subscribe to MyNewsWire
Sign up now for free.

Our email newsletter, MyNewsWire, is your source for the latest Cisco news, press releases, features and videos. We'll help you stay up-to-date on technologies, events, innovations, and more. And now, you can get it as often as you like—free.

 

Web Content Display Web Content Display

 
 
Like us on Facebook
 
Follow us on Twitter