Sooner or later, it had to happen. The promise of big data analytics and a growing appetite to use it has led to a demand for BDA projects throughout organizations. But the enthusiasm is muted by uncoordinated experiments and campaigns that lead to unintended consequences. So now, IT organizations are modernizing an old, valuable playbook: data governance
Data Governance Then and Now
In spite of all the hype surrounding big data analytics, data governance is not new. Data governance is an umbrella term, which refers to the methods, tools and standards we use to maximize data availability, usability, integrity, and security. A sound data governance program is essential. Without it, you can never be sure that decisions you make are based on solid evidence.
That means that big data governance is still a set of controls and data handling methods supported by a decision-making framework. Successful data governance still serves as a road map for IT spending and decision making. Now, it also guide how data is organized, processed and managed by analysts and business users.
Is Extending Data Quality Methods to big data Use Cases a Good Idea?
The advent of big data tools and methods hasn’t changed what’s really important. Customers still want high-quality data, and they want to use it to achieve business goals. Doing that means extending traditional data quality practices to big data use cases.
But a simple use-what-worked-before approach won’t work. A Gartner report suggests that a simple transfer of assets to big data use cases is not a good idea. Here’s why:
- Modern data governance requires faster, more agile methods. Not long ago, data was governed during discovery. Everything was assigned maximum importance and therefore control. Now, data analysts speed up the process by finding data and deciding how it will be used before they decide which level of governance is most useful.
- Analysts must wrangle a greater variety of data. Quality now requires managing many more types of data. And many of these data types are located outside enterprise firewalls.
- The old rules no longer apply. Many assumptions that traditional data management systems made no longer reflect current data formats, structure, and completeness.
- Data lakes are a data management fact of life. Growing use of advanced data governance is a response to the bigger volumes and variety of data that organizations must contend with. As an alternative to conventional data warehouses, data lakes are another response to this trend. They drive the need for data control and quality assurance initiatives. Organizations must find, analyze and share more and more diverse data more quickly and efficiently than ever.
So, the goal is well-governed data that is reliable, accessible, secure and ready to use. But how do you achieve these goals with current tools and best practices?
Practical Matters: Big Data Tools and Best Practices
Start with the notion that data should be accessible, trustworthy and secure. Which tools and best practices can you use to tame oceans of diverse big data into something governable? The ones that help you overcome three major challenges: managing many types of data, minimizing data scrubbing effort and keeping data secure.
There are functional specs that can help you narrow your choices. When it comes to compiling many types of data and minimizing pre-processing steps, the most effective tools:
- Integrate easily with Hadoop frameworks.
- Offer provisioning via cloud-based services.
- Monitor data for quality standards before it’s ingested and compiled with core data.
Effective tools can correct data and records, improve overall data quality and reduce pre-processing effort by working around:
- Incomplete or incorrect fields
- Name or address variations based on profiles in different online communities.
- Duplicate data or records.
That’s fine for tools, but what about best practices? Is there a growing consensus that can guide data analysts?
Growing Consensus of Best Practices
Growing use of data quality as a service (DQaaS) options provides advantages over on-premises data quality solutions. These cloud-based services reduce infrastructure requirements and cost less to operate. Onsite data quality as a service frequently requires significant planning for infrastructure requirements, which can stall when organizations need to scale up.
Big Data Governance Benefits
What’s the final outcome of big data governance benefits? Pretty much the same as those of effective IT governance (pre-big data): higher return on assets, investments tied to strategic priorities, greater agility and less duplication of effort (rework).
Big data quality controls can:
- Simplify big data governance processes.
- Help standardize social media data and other unconventional data sources.
- Enforce data quality over enormous data sets.
- Add a critical governance layer to the data integration process.
Need help with your big data governance? Contact one of our expert consultants to find out how we can help!