How To Export Google Analytics Data into a Data Lake: Case Study

One of our clients recently presented a unique and advanced analytics challenge. Through an innovative approach, the Syntelli team successfully implemented a simple, yet effective solution, to satisfy several other business requirements. Ultimately, the team was able to significantly increase the client’s insight into a specific function of their business and provide valuable, actionable information.

Insurance Client Challenge and Goals

Our client is a leading provider of insurance, reinsurance, and other forms of risk transfer. They also have a number of partners who “participate” in the overall conversion process. In this instance, conversion results in a consumer completing an application for an insurance policy. Participation by the partners, as we alluded to, is a passive process. Specifically, a consumer visits one or more of the partners’ sites during the application process, but ultimately lands on the client’s site to actually complete the application for insurance.

The challenge presented by the client involved gaining deeper insight into the conversion process discussed above. There were also specific key goals that needed to be achieved with this data. Those goals were to know:

  1. Which partner was involved in the conversion process
  2. The total number of conversions that resulted from each distinct partner
  3. Which product the consumer submitted an application for
  4. Whether an agent had assisted the consumer during the application process

For the first step in resolving this challenge, appropriate tags were fired via the Google Tag Manager to capture the required info, both on the partner sites, as well as the client site. Also, as each application was to be uniquely identified and tracked if it had been accessed both by an agent and a consumer before the consumer completed the application, the “UserId” feature of Google Analytics was implemented. This was performed regardless of whether it had been accessed only once, or multiple times, by the agent and consumer. One key distinction was that it was only implemented on the client’s site and not the partners’ sites.

On the Google Analytics end, the custom dimension “AgentorConsumer” was created in order to help identify applications separately, based on whether the consumer was assisted by an agent. This dimension was then used as a filter to generate different views based on “Agent” or “Consumer.”

In order to provide adequate functionality and satisfy the other business requirements mentioned above, multiple views, with custom dimensions used as filters, were implemented. As there were multiple partners involved, all of which had implemented, and were interacting with, one Google Tag Manager, yet no standardized event reporting had been implemented, some amount of functionality duplication had to be utilized because of the feature constraints created by Google Analytics.

The Solutions: Exporting Google Analytics Events into a Data Lake Environment

To handle this complex scenario, a better solution was required; something beyond using Google Analytics as a visualization tool. What we recommended to the client was actually quite simple though – export all of their Google Analytics data. More accurately, that meant writing all data events into a data lake environment. Given the ability to execute arbitrary code within a controlled, or managed environment (i.e. Google Tag Manager), a simple JavaScript could be executed to export the event data to the data lake. This allowed all user interactions with the website to be tracked independently from a webpage, or a screen load. For example, actions the client might want to track as events included downloads, mobile ad clicks, gadgets, Flash elements, AJAX embedded elements, and video plays.

The Process

azure data lake analytics solutionsNow, we will dive a little deeper into the details of this process. Whenever there is a data layer push that reports an event, Google Tag Manager can be configured to fire an event, which includes the details of the user interaction, to Google Analytics. The following table summarizes the event fields:

azure data lake solutions and analytics

The Google Tag Manager Data Layer allows for arbitrary insertion of other keys and values, through code, into the Data Layer array. It is typical for the required values for a Google Analytics Event (“Category” and “Action”) to be included in Data Layer as well. The code itself for inserting these values is often maintained within Google Tag Manager for ease of maintenance. An example of this would be code for implementing an event push to Data Layer through clicking ‘btn’ class. A screenshot of this code is included below.

Data Layer also allows any event to push elements to Universal Data Layer, after which, data objects may be used across platforms. For example, the Google Analytics code above would produce values which can be easily reused in subsequent events.

var restDomain = “https://myDataLakeDomain/eventLog?”

var head = ({

            event : {{Event}},

            gaCategory : {{gaCategory}}

})

Curl(restDomain, head)

Lastly, while exporting event data to a data lake is one available option, other methods of exporting data out of Google Analytics can also be explored. These might include Embed API, Core Reporting API, Real Time Reporting API, Metadata API & Multi Channel Funnel Reporting API.

Our team’s innovative solution identified in this instance was the result of one particular client’s needs in a specific industry. But this solution can be applied to companies within other industries as well. By employing a few additional tools, and thinking outside the box of Google Analytics as purely a visualization tool, Syntelli was able to significantly expand the client’s awareness into this function of their business. The added insight we created will allow our client to better target their high-performing partners and identify their most in-demand products.

azure data lake analytics solutions