MapR Streams - Next Step in Enterprise Big DataIn a press release on Tuesday, MapR Technologies Inc announced the introduction of MapR Streams – the first and only converged data platform in the industry. This global event streaming system is reliable and has the ability to connect data consumers to data producers within the information space.

MapR Converged Data Platform –  What Is It?

The Converged Data Platform integrates real-time database capabilities and stream processing with big data enterprise storage. Powered by Hadoop, the industry’s fastest and most secure open data infrastructure, the Platform enables the running of innovative, real-time, and global applications.
The core components of the Converged Data Platform include:

  1. MapR Streams –publish-subscribe and global event streaming system for big data
  2. MapR-DB – highly performant in-Hadoop NoSQL database management sytem
  3. MapR-FS – POSIX file system that provide data storage which is distributed, highly performant, secure, scalable and full read/write

MapR Converged Data Platform – Why is it Important?

Let’s cut through the marketing and technical speak. What does this all mean for big data? One word –  “Integration”. Since its inception, “Big Data” has been synonymous with “Big Storage”. Businesses would use their Hadoop environment as a repository for all the data.

And this is good and none of that will go away. Hadoop still provides a means for analysts to access every bit of data the business produces.

A Converged Data Platform is the next evolution of “Big Data” – this is “Big Integration”. Big Integration means the entire business platform is now on a unified distributed data platform. The same Java programmers who code JMS applications moving data from data producers (client portals, web apps, machine data, etc.) are now able to use those JMS applications to integrate into a big integration system.

Big Data is no longer just a consumer of data, it is the entire pipeline!

We will be writing more about this concept, a lot more. In the meantime, let’s look at some of the technical features.

Key Features & Benefits of the Converged Data Platform:

Feature Key Benefits
Global Namespace
  • Access any data sets on any remote cluster, as if they were part of local cluster
  • Submit jobs from cluster at one site to cluster at a different site
  • Use a single admin interface to administer tasks for any globally remote cluster
High Availability
  • No work loss from node failure – avoids restarting jobs from scratch
  • High uptime with zero data loss
  • Upgrade live clusters one node at a time with Rolling Upgrades
  • No configuration necessary to get High Availability
Data Protection
  • Low RPOs due to immediate updates by MapR-DB and MapR Streams to remote replicas in real time
  • Snapshots that create online backups and protect against data loss
Self Healing
  • Powerful node recovery process
  • Serves big data environments that run 24×7, cannot lose data, and require immediate recovery from node failure
Unified Security
  • Authentication via Kerberos
  • Access controls for files, databases and streams
  • Wire-level encryption to ensure data privacy
  • Auditing on data accesses, authentication, and admin operations
Real Time
  • Instant access to large data files in MapR-FS
  • Interactive read/write operations for business apps
  • Self-service exploration with SQL via Apache Drill
  • Reliable global event streaming with MapR Streams
Multi-Tenancy
  • Manage distinct user groups, data sets & apps in a single cluster
  • Run different jobs simultaneously securely, safely, efficiently
  • YARN – use Hadoop 2.X scheduler as a resource control when running multiple jobs
Management and Monitoring
  • Auto-Provisioning Templates to easily provision nodes
  • Heatmaps and alarms to manage infrastructure
  • View running jobs for utilization auditing or troubleshooting

 

 

Key Features & Benefits of MapR Streams:

 

Feature Key Benefits
Converged Platform for Streaming
  • Single cluster for database, streams, file storage and analytics
  • Eliminates data movement by using persistence of streaming data
  • Secure and unified framework for data-at-rest and data-in-motion (with authentication, encryption, & authorization)
  • High reliability with no single point-of-failure architecture and self healing properties
Continuous Data
  • Data available for instant streaming
  • Kafka API for real-time producers/consumers for easy application migration
  • Integration with steaming frameworks like Spark Streaming, Storm, Flink and Apex
Global
  • Replicates event data at IoT-scale
  • Arbitrary topology supports thousands of global clusters
  • Topology loops automatically avoid data duplication
  • Global metadata replication for business continuity

 


 

Syntelli is a proud MapR partner and excited to help you with any big data problem you may have.

Contact us for more info or request a demo of Syntelli services.

Request Demo!

 



Daniel-Smith
Daniel Smith

Director of Data Science and Innovation
About Daniel: Using Business Intelligence platforms to bridge the gap between Advanced Data Analytics and the efficient effective principles of accounting, Daniel applies technology and mathematics to make business faster and smarter.

Daniel has managed solutions for diverse client sectors such as as advertising, military, insurance, and oil & gas. These solutions include Business Intelligence Platform management, online key performance indicator identification and tracking, to full predictive data model construction. Although the analytic solutions are often mathematically complex, Daniel’s presentation and academic background ensures any insights delivered by solutions are relevant and simple to understand.