How to Extract Hundreds of Time Series Features for Machine Learning using Open-Source Python Package tsfresh

Implementation

28 February, 2023

Muneeb Ur Rehman

Introduction

Feature engineering is a crucial step in preparing data for machine learning, especially with time-series data where patterns over time can significantly influence predictions. The traditional manual approach to generating features can be tedious and inefficient. This is where tsfresh, an open-source Python package, comes into play, offering an automated solution to extract time series features for Machine Learning using open-source Python package tsfresh effectively. In this blog, we will look at an open-source Python package called tsfresh that we can use to generate hundreds of time-series features in an automated fashion. First, we will briefly explain Feature Engineering.

Once we are familiar with Feature Engineering, we will look at how we can use tsfresh to automate the process of generating time-series features. All the code used in this blog is available on the following GitHub repository.

What is Feature Engineering

A Machine Learning feature is any measurable value that can be used as an input for a Machine Learning task. In simplest terms, it can be considered a column of the input data to a Machine Learning model where different observations represent the rows. For example, in the famous Iris dataset, where the goal is to predict tRead More: he type of species, the input values of Sepal length, Sepal width, Petal length, and petal length are called features. The task of the Machine Learning model is to predict the Species, given some feature values.

Feature Engineering, therefore, is the process of transforming the raw data into useful features that better characterize the data; thus, enabling the machine learning model to learn better from those features. An example of Feature Engineering for time series features for Machine Learning using open-source Python package tsfresh on time-series sales data is given below. Here we have sales data over time, and we aim to predict future sales. By applying Feature Engineering with tsfresh, we can include additional data such as ‘Mean Sales Last year’ or ‘Sales on the same day last year.’ The main advantage of adding these time series features for machine learning is to enable the Machine Learning model to better forecast future sales using the open-source Python package tsfresh.

Raw Data	Engineered Features
Sales	Sales
Time	Time
	Mean Sales in last 7 days.
	Max Sales in last 7 days.
	Sales same date last year.
	Sales same date last month.
	Holiday Data
	Temperature

In the above figure, we have sequential raw data (based on time). Using tsfresh, we can extract time series features for Machine Learning using open-source Python package tsfresh, such as maximum, minimum, mean, median, number of peaks, etc. Once we have extracted these helpful time series features for machine learning, we can use tsfresh or any other suitable feature selection method to refine the feature set, focusing on retaining only the most impactful features for machine learning using the open-source Python package tsfresh.

Let’s look at the implementation of tsfresh in a Jupyter notebook using Python. First, we need to install the tsfresh module using pip. This can be done from terminal or the Jupyter notebook:

Unlock the Potential of tsfresh with AlphaBOLD!

Implementing time series features just got easier! Explore AlphaBOLD's AI services to seamlessly integrate tsfresh into your machine learning workflows.

Request a Demo

Next, we need to download sample data from the UCI Machine Learning Repository that we can use for our experimentations.

The documentation for the dataset is provided on: http://archive.ics.uci.edu/ml/datasets/Robot+Execution+Failures

The dataset represents Force and Torque measurements from sensors on robots. The dataset contains 88 samples represented by the ‘id’ column. The time column represents the sequence of readings.

As you can see, in the dataframe we have 1320 rows and 8 columns.

The ‘y’ column represents whether or not the sensonrs data represents robot failure. This is a target value, and our goal can be to classify torque and force measurements as either a failure or not.

Next, we need to import the ‘extract_features’ method and use it to extract features for our dataset. We need to pass the data to the ‘extract_features’ method along with information on which column represents the sequence of readings and the ‘id’ column that will be used to differentiate between various datasets. In our case, readings from every robot represent one dataset, differentiated by the ‘id’ column and sorted on the ‘time’ column.

Read more about Time Series Forecasting Using Machine Learning: Top 10 Tips to Take Your Time Series Forecasting Model to the Next Level

In the above snippet, we can see that the tsfresh package has returned 4722 columns that include time-series features for all the datasets and all the numeric columns in our datasets.

You can use the following code to identify all the features calculated by the tsfresh.

Suppose we need to extract limited features for only one column, that is, F_x. We can do that using a features dictionary as below:

We can use the above dictionary to extract features from tsfresh.

As seen above, we now have 10 columns because we opted for 10 features in our dictionary and only requested it for the F_x column. Again, we can confirm the list of features extracted as below:

In this way, we can see how easy it is to extract time-series features using tsfresh package in an automated and customizable way.

Maximize Machine Learning Impact with AlphaBOLD's AI Services

Elevate your machine-learning game! Explore AlphaBOLD's Artificial Intelligence services for expert guidance on implementing impactful time series features. Unlock the potential now!

Request a Demo

Conclusion

To conclude, we have seen what Feature Engineering is and why it is important, especially in the context of time series features for Machine Learning using open-source Python package tsfresh.

We have also discovered how tsfresh simplifies the extraction of time series features for machine learning, making it an invaluable tool for enhancing predictive models. Its seamless integration into existing Machine Learning workflows, such as in Scikit-learn, saves us precious coding and processing time, underscoring its utility for time series features for machine learning. For further details on how tsfresh can revolutionize the extraction of time series features for Machine Learning using open-source Python package tsfresh, you can visit the official documentation.

Explore Recent Blog Posts

Infographics show that Why Your Service Company Needs a Custom Mobile App

Implementation

Technology Offerings

Technology Offerings

BOLDProducts

BOLDProducts

Blog

How to Extract Hundreds of Time Series Features for Machine Learning using Open-Source Python Package tsfresh

Muneeb Ur Rehman

Introduction

What is Feature Engineering

Unlock the Potential of tsfresh with AlphaBOLD!

Maximize Machine Learning Impact with AlphaBOLD's AI Services

Conclusion

Explore Recent Blog Posts

Why Your Service Company Needs a Custom Mobile App

What is Automation Testing: Ultimate Guide to Start Test Automation

Salesforce Commerce Cloud: Transforming the Digital Future

Related Posts

Leveraging Advanced Salesforce Features for Enterprises

How to Write a Usability Testing Report

Top 5 Programming Languages for Automation: A C-Level Guide

Receive Updates on Youtube