How big data and IoT initiatives render the ‘garbage in, garbage out’ theory invalid

2019/10/09 Innoverview Read

Garbage in, garbage out: it’s one of the great truisms of technology. Sure, it’s been eclipsed by “software will eat the world,” as something to say in a meeting when you really don’t know what’s going on, but it’s still probably uttered a few thousand times a month to explain away the failure of a recent technology initiative.

But, like its sister expression “you can’t get fired for hiring IBM,” we’re learning in the “garbage doctrine big data era” that it’s, well, garbage.

The problem in most big data or IoT initiatives isn’t that the data is meaningless, inaccurate, vague, or worthless – data harvested from sensors generally is valid. Typically, the problem lies in the massive amounts of data, because data doesn’t naturally organize itself like crystals. The trucks and equipment at a mining site can generate petabytes a day.

Or think of smart meters. If you harvest data from the smart meters in the U.S. every 15 minutes, you can get a gross sense of power consumption. However, if you harvest it every few minutes or seconds, you can begin to unobtrusively harvest power by delaying refrigerator defrost cycles and dimming lights. Unfortunately, that also means juggling exabytes of memory.

Health data and personalized medicine? The world’s total mass of genomic data is doubling every seven months. By 2025, genomic data will dwarf the size of YouTube.

A cacophony of sensors

Worse yet, the data also often comes in incompatible formats measuring distinctly different trends. Take a simple device, like a pump. To conduct predictive maintenance, you might want to track power consumption, water flow, equipment temperature, rotational speed and other phenomena. Which means you’ll be collecting data measured in kilowatt hours, liters, degrees, RPMs and other standards – with some data refreshing every 15 minutes and other signals, like vibrations, emitting new information hundreds of thousands of times a second.

McKinsey & Co., for instance, estimates that only 1% of the data from the roughly 30,000 sensors on offshore oil rigs gets used for decision making because of the challenge of accessing it.

To get around the problem, analysts and others suggest that the solution is to collect less data. Unfortunately, the obscure bits often prove to be the solution to the puzzle. In 2015, researchers at Lawrence Livermore National Laboratories (LLNL) were experiencing rapid and unexpected variations in electrical load for Sequoia, one of the world’s most powerful supercomputers. The swings were large, with power dropping from 9 megawatts to a few hundred kilowatts, and creating substantial management problems for local utilities.

By cross-checking different data streams, the source of the problem emerged: the droop coincided with scheduled maintenance for the massive chilling plant. LLNL was able to smooth out its power ramp ,and help its local utility. But think about it for a moment – the answer was only discovered after some of the leading computing minds in the nation checked on what their co-workers in the facilities department were doing.

The hoarder’s dilemma

Let’s say you save all of your data. Now your highly paid data scientists are bogged down serving as data janitors, which 76% say is the least attractive part of their day.

Luckily, automation in software development and IT management is coming to the fore. A growing number of startups are focused on automatically generating digital twins and harnessing sensor data streams into screens and consoles into ways that make sense to ordinary humans. The movement toward smart edge architectures — where substantial amounts of data and analytics are conducted locally, rather than in the cloud, to reduce latency and bandwidth costs — will also help by reducing the time and overhead of managing massive data sets.

AI will help as well. Video and images until recently were considered “dark” data because they couldn’t be easily searched. However, neural networks have turned this around, leading to things like facial recognition through photo searches. Before these developments, however, video and images often fell into that category of data that was perennially on the “do we really need to keep all of this” chopping block.

Many of these technologies are just emerging into the mainstream, but the future looks promising.

“Running scared from the big data monster is a cop out,” says Neil Strother, principal research analyst at Navigant Research, “The tools now available for collecting, organizing and analyzing large and growing datasets are here and affordable. I’m not saying this kind of effort is trivial, but it’s not beyond their reach either.”

(Copyright: IoT News