When it comes to finding and using fast, accurate data processing software, financial services firms are often in the lead. But now data integration and analytics have climbed to the top of enterprise BI priorities. And anything that enables faster, more accurate analytics gets attention.
Currently, financial services firms are figuring out ways to integrate their disparate silos of data and application workflows. Solutions from MapR Technologies provide a Hadoop distribution with lots of add-ons and a homegrown file system. It’s an approach that enables enterprises to integrate networking and big data processing with their IT infrastructure and its applications.
A Growing MapR Stack Pumps Up Speed and Scalability
For analytic tools like Hadoop, speed and scalability are just the beginning. When Hadoop was first introduced, there was just MapReduce, which was powerful but slow and difficult to program. But in the past two years or so, Hadoop developers have achieved blistering processing speeds by adding various software packages to the MapR Hadoop distribution. Here’s what we mean:
Use this… | to improve this process… | …and get these results |
In-memory features such as Apache Spark | Batch analytics | Reduce 60 minute run times to 10 minutes. |
Interactive SQL on top of Hadoop with projects like Apache Drill | Queries | Reduce development times of several days to a few hour, or even minutes |
MapR-DB (a NoSQL data store that is compatible with Apache HBase APIs) | Get and put operations | Latency of 10 milliseconds |
OpenTSDB, which runs on top of MapR-DB | Ingestion rates | Transfer 100 million data points per second[1] |
The general idea is to use the signals coming out of sensors and applications and store the data in Hadoop permanently. The high-level process includes these steps:
- Use the streaming features of Hadoop, such as Spark Streaming or Storm to do real-time processing on events in the network. And use machine learning to help feed information back into the infrastructure to control it.
- Use Drill, Hive, Impala and other tools to carve out subsets of this operational data with SQL commands.
- Route this data into a tool such as TIBCO™ Spotfire® to explore and visualize the datasets.
But what’s happening in the real world? Here’s an example of how one company puts that blistering speed and flexible architecture to work.
Say “TransUnion”, and consumers around the world think, “Credit cards.” But for more than 40 years, TransUnion LLC has provided financial services to many industries– insurance, telecommunications and banking, among others. The company has worked to gather, analyze and deliver the critical information that businesses and consumers need to better manage their risk and customer relationships.
TransUnion lead architect, Kevin McClowry, views big data technologies as an opportunity to add oomph to company growth. His role: build systems that enable new insights and innovative product development.
McClowry chose a hybrid architecture, which combined commercial databases and a cost-effective Hadoop-based MapR platform. He knew that trends in historical data or newly acquired unstructured data sources were where much of the data’s hidden value lay.
Instead of focusing on solving business problems, McClowry chose a MapR-based platform to improve data analytics processes. This approach reaped many benefits, including:
- More accessible, useful data. The new data processing environment included trillion-row data sets, centralized data stores and ultra-high-speed processing. Statisticians and analysts (some with deep knowledge of TransUnion data) gained deep insights by using MapR and visualization tools such as Tableau.
- Less data discovery and preparation time. Data analysts and statisticians spent far less time requesting, waiting for and piecing together disparate data from siloed information across the organization.
- Lower capex costs. The IT staff could run high-volume analytics from lower-cost storage platforms, which reduced their IT costs.
- Customizable analytics architecture. The flexible MapR-based architecture could be scaled to fit operations resources and requirements at each TransUnion facility worldwide.
McClowry concludes, “Embracing these new big data platforms and architectures has helped lay a foundation [of innovation and growth]. Nurturing the expertise and creativity within our analysts is how we’ll build on that foundation.”[2]
[1] Jim Scott, “Loading a time-series database at 100 million points per second,” 12 September 2014 at https://www.mapr.com/blog/loading-time-series-database-100-million-points-second#.VUACvbtFDik
[2] Datanami, “Leveraging Big Data to Economically Fuel Growth,” 18 November 2014, at https://www.datanami.com/2014/11/18/leveraging-big-data-economically-fuel-growth/.