Predict the Henry Hub spot price

The Henry Hub spot price is used as a benchmark for the spot (current) price of gas in the United States. Its value depends on the relationships of several variables. If you search Google for “Henry Hub spot price,” you’ll find many research papers from universities and professional organizations that explain these factors in detail.

These factors influence gas and crude oil pricing, whose prices are historically related. Here’s how the general pricing method works:

  1. Collect several years’ worth of gas price data from multiple sources.
  2. Collect gas shell deposit and withdrawal data from local and state sources. Texas, California, Louisiana, New York, and Florida consume the most gas. Texas, Pennsylvania, Louisiana, Oklahoma, and Wyoming produce the most gas.
  3. Gather weather data by location and time from multiple weather sites.
  4. Once enough data is collected for proper statistical analysis (more the better), join price, production and weather datasets by matching their time values. Split this merged dataset into an 80/20 training set. (Eighty percent of the data was used to explain 20 percent of the variance in price, which was the predictor variable.)
  5. Use linear and logistic regression models and analyze relevant metrics, such as multiple R-squared, p-value, and mean-squared error. Because some variables create noise or unnecessary randomness in our models, we reduce the dataset to variables that have the greatest impact on price.
  6. Create a classification variable, which shows whether the price increased or decreased from the previous time period.
  7. Use a random forest classification model on the dataset and analyze relevant metrics.
  8. Gather new price, production and weather data over time and use it to test the classification model.

A cross-validation approach performed better than previously used regression models. We conclude that modeling price movement and classifying a price increase or decrease was more accurate than predicting the price itself, a hard number.

Interested to see what else our team of data-science wizards can do with your data? Contact us today or request a demo of Syntelli services!

Let’s talk!


Mario-Carloni_Syntelli-ProfileMario Carloni
Data Scientist
About Mario: Mario performs statistical analysis and reporting for Energy, Manufacturing, and Medical sectors. He is focused on analysing and visualising business logic to allow the discovery of systemic flaws. Prior to Syntelli, Mario was a Research Assistant at the Public Policy Center. While at PPC, Mario worked on social science research for evidence-based policymaking at the state, regional, and local levels for public, private, and nonprofit partners.

Mario received a B.A. in Political Science from the University of Massachusetts. While at UMass, Mario was International Orientation Leader, International Student Conversation Partner, and IT Assistant, gaining broad systems thinking at the technological and social level.