What Is So Scary About “Big Data”?

 

Syntelli Big Data

Is Big Data really that big of a deal? And what, really, is so scary about Big Data? Maybe the bigger statement should be why Big Data isn’t scary.

I think back to the articles I’ve seen, including everything from “The Power of Big Data” and “The Future of Business: Big Data!” to “How to Save Your Marriage with BIG DATA” and “You’ll Never Believe what Kim Kardashian Did with BIG DATA!!!”

Simply put, Big Data is a different method of processing data when compared to traditional data- handling techniques. Relational database engines cannot process certain types of data. Data best left to Big Data analytics includes:

• Extremely large volumes of data
• Unstructured data
• Time-sensitive data

A bigger problem that’s occurring with Big Data (other than its intimidation factor) is that not many people are actually using it.

That’s the Big Secret about Big Data. We really aren’t seeing much adoption of Big Data at the enterprise level, at least not where full production operations is concerned. Big Data largely finds its home with specific, highly-specialized business data and sectors.

The underuse of Big Data may be systemic, or it may be due to the actual phrase, “Big Data” resulting in overall fear and trepidation. Or even a perception that Big Data is only to be used in Big Business, or only in highly specialized business operations. Such perceptions serve to make Big Data seem even scarier to the rest of the office. Is Big Data a secret language only the geekiest among us can understand?

Looking back here a bit, creating Big Data analytics used to require immense knowledge of the overall system (i.e. System Administration), and how to actually create the analysis within the data environment (i.e. Development). Considering these past factors, it’s easy to see why some of the fear and trepidation abound around Big Data exists.

Good news! Since the introduction of YARN (Yet Another Resource Negotiator), those functions – analysis creation/development and system admin – are now completely separate.

Now developing Big Data analytics is not a secret society and it’s certainly not scary. So let me dispel some misconceptions about analyzing Big Data.

Note: This list is aimed at entities using Big Data to solve business questions, rather than big data platform administrators. As with any operating system (OS), there are a host of administrative nuances to consider. But that’s for a future post.For now, let’s take a look at some common questions and concerns around Big Data.

“Big Data is hard to understand!”

You can understand small data, right? So why is it more difficult to understand bigger amounts of data? Have you ever thought to yourself, “I can understand how a car works but not a truck. What is that!?” Probably not…

The real challenge when it comes to Big Data is finding ample space for the data. Another formidable challenge is how to have the computing power to process the data. Once you have the data, Big Data analytics and reporting are performed in the same way as they are with small data.

And remember: System space and configuration is something for the admins to worry about, not the analysts.

“If my data is across a bunch of different machines, how do I find it!?”

Take a deep breath. A big misconception I see is people seem to think that, since the data is housed in different machines, they will need to jump around into different locations in order to access that data. Not having to do so is literally the entire point of the Hadoop Distributed File System (HDFS)

Once configured, Hadoop knows where the data is located. As far as the end user is concerned, that data is in one place. You navigate the data exactly the same as you would with Windows Explorer; that is, searching directories (folders) and finding files.

“MapReduce is too complicated!”

If you can SQL, you can Big Data.

When Big Data first arrived, we had to code MapReducing functions to tell our scripts which resources to use for crunching data.

More good news! With modern Hadoop tools, that is no longer the case. And we expect this kind of good news, right? After all, if progression can’t result in simplification, what can?

MapReduce 2.0 or YARN handles all the resource negotiation for your application.

This means determining the resources to use occurs at the administrative level, not at the programming level. Now using Big Data tools is just like coding on your laptop in that typically, you don’t need to tell the program how the code is executed since your OS takes care of that.

“What’s the big deal about YARN?”

Ultimately, YARN means the big challenge in Big Data is administration and configuration. With the introduction of YARN, those tasks are now largely separated from analysts and users of Big Data tools. It’s refreshing to keep serving up good news here!

This also means that those who possess existing skill sets to use Big Data technologies shouldn’t anticipate more than only a slight learning curve.

Some examples of these include the following:

• SQL Developers: HIVE
• JAVA Developers: PIG
• ETL Specialists: Kafka or Flume

Now Big Data is just another tool in your toolbox, not an entirely new technology! Feel free to spread the good news. And by all means, feel free to set aside any fear about Big Data. It really isn’t so scary, now, is it?