It’s human nature—folks want better performance but want to keep familiar tools and procedures they feel comfortable with. That’s the story of Apache™ Kylin™, a new Big Data analytics app that’s moved from incubator to top-level project status.[1] Apache Kylin software is released under the Apache License v2.0.
Kylin development was an antidote to ever-slower pre-calculation processing, the result of increasingly larger datasets. Many business and data science users wanted faster processing speeds but wanted to keep their favorite tools, such as Tableau and Excel.[2]
Fast, Scalable and Powerful
Kylin is the eBay open source, distributed analytics engine on Hadoop. It provides faster analysis on petabyte-scale data at sub-second latency.[3] Kylin is designed to provide a familiar, SQL interface and multi-dimensional analysis (OLAP) on Apache Hadoop™.
Kylin uses Hadoop’s distributed computing power of hundreds of thousands of nodes. This architecture makes it possible to perform parallel calculations and merge the final results. This approach significantly reduces processing time.
Kylin cuts slow pre-processing tasks by using the Cube Build Engine offline process, which changes relational data to key-value data. These changes are made by elements of the Hadoop ecosystem. Kylin:
- Reads data from Hive, which is stored on HDFS.
- Runs MapReduce jobs to pre-process large queries.
- Stores results as key-value cuboids in HBase.
Kylin also uses an open-source dynamic data management framework called Apache Calcite to parse SQL and plug in code.
Major Components and Capabilities
Kylin is a powerful tool used for multi-dimensional data analysis and reporting. Its SQL interface and multi-dimensional analysis (OLAP) on Hadoop supports extremely large datasets. Users are starting to use Kylin for near-real-time data streaming, storage and analytics.
Kylin supports:
- Multi-dimensional analysis on tens of billions of records with latency in seconds.
- ANSI-standard SQL availability for SQL-compatible tool users.
- High cardinality and very large dimensions.
- Thousands of users, who can work on data simultaneously.
- Distributed and scale-out architecture for analysis in the terabyte-to-petabyte range.
Here’s what’s under Kylin’s hood:
- OLAP Cube Build Engine, which is designed to reduce query latency on Hadoop for more than 10 billion rows of data.
- MOLAP cube query, which serves billions of rows.
- ANSI SQL on Hadoop compatibility, which supports most ANSI SQL query functions in the ANSI SQL on Hadoop interface.
- Seamless integration of open-source ODBC driver with BI tools. Kylin currently offers integration with business intelligence tools such as Tableau and Excel.
The following are descriptions of all of the components the Kylin platform includes.
- Metadata Manager: Kylin is a metadata-driven application. The Metadata Manager is the key component that manages all metadata stored in Kylin, including the most important cube metadata. All other components rely on the Metadata Manager.
- Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and MapReduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures.
- Storage Engine: This engine manages the underlying storage – specifically the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis.
- REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on.
- ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard.
- Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user.
How BDA Heavy Hitter eBay Uses Kylin
Kylin is successfully deployed and used in several enterprise businesses, most notably at eBay. When Kylin was open-sourced in 2014, business users and data specialists used it in production use cases such as web traffic analysis and geographical expansion analysis.
- The largest eBay use case is the analysis of more than 12 billion source records, which generate cubes more than 14-TB in size. Its 90% query latency is less than 5 seconds.
- Target analysts and business users can access analytics and get results easily via the Tableau dashboard. Hive query, shell command other features are a thing of the past. (3A)
What Kylin Can Do for Your IT Operations
If your organization takes the leap to Apache Kylin, expect these advantages:
- Painless entry into Big Data analytics. Don’t risk the opportunity costs of standing on the sidelines. Kylin bridges the gap between the high-volume, high-speed analysis you crave and the proven tools and procedures you trust.
- Keep the tools you love. Get the best of modern BDA functionality but avoid the costs of abandoning familiar SQL-compatible tools.
- Faster analysis. Analyze multidimensional queries on massive data sets with sub-second delays.
- Tackle even the biggest BDA jobs. Get insights from use cases based on petabyte-scale analytics. Kylin helps you do the heavy lifting.
For more information about Apache Kylin and how it can help your organization up its BDA game, contact us today at 1-877-796-8355 or at Shikha.Kashyap@syntelli.com