bat365 Homepage

Applied Machine Learning, Part 1: Feature Engineering

From the series: Applied Machine Learning

Explore how to perform feature engineering, a technique for transforming raw data into features that are suitable for a machine learning algorithm.

Feature engineering starts with your best guess about what features might influence the action you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if your results have improved.  

This video provides a high-level overview of the topic, and it uses several examples to illustrate basic principles behind feature engineering and established ways for extracting features from signals, text, and images.

Machine learning algorithms don’t always work so well on raw data. Part of our jobs as engineers and scientists is to transform the raw data to make the behavior of the system more obvious to the machine learning algorithm. This is called feature engineering.

Feature engineering starts with your best guess about what features might influence the thing you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if the result improved.

Let’s take a simple example where we want to predict whether a flight is going to be delayed or not.

In the raw data, we have information such as the month of the flight, the destination, and the day of the week.

If I fit a decision tree just to this data, I’ll get an accuracy of 70%. What else could we calculate from this data that might help improve our predictions?

Well, how about the number of flights per day? There are more flights on some days than others, which may mean they’re more likely to be delayed.

I already have this feature from my dataset in the app, so let’s add it and retrain the model. You can see the model accuracy improved to 74%. Not bad for just adding a feature.

Feature engineering is often referred to as a creative process, more of an art than a science. There’s no correct way to do it, but if you have domain expertise and a solid understanding of the data, you’ll be in a good position to perform feature engineering. As you’ll see later, techniques used for feature engineering are things you may already be familiar with, but you might not have thought about them in this context before.

Let’s see another example that’s a bit more interesting. Here, we’re trying to predict whether a heart is behaving normally or abnormally by classifying the sounds it makes.

The sounds come in the form of audio signals. Rather than training on the raw signals, we can engineer features and then use those values to train a model.

Recently, deep learning approaches are becoming popular, as they require less manual feature engineering. Instead, the features are learned as part of the training process. While this has often shown very promising results, deep learning models require more data, take longer to train, and the resulting model is typically less interpretable than if you were to manually engineer the features.

The features we used to classify heart sounds come from the signal processing field. We calculated things such as skewness, kurtosis, and dominant frequencies. These calculations extract characteristics that make it easier for the model to distinguish between an abnormal heart sound and a normal one.

So what other features do people use? Many use traditional statistical techniques like mean, median, and mode, as well as basic things like counting the number of times something happens.

Lots of data has a timestamp associated with it. There are a number of features you can extract from a timestamp that might improve model performance. What was the month, or day of week, or hour of the day? Was it a weekend or a holiday? Such features play a big role in determining human behavior, for example, if you were trying to predict how much electricity people use.

Another class of feature engineering has to do with text data. Counting the number of times certain words occur in a text is one technique, which is often combined with normalization techniques like term-frequency-inverse-document-frequency. Word2vec, in which words are converted to a high-dimensional vector representation, is another popular feature engineering technique for text.

The last class of techniques I’ll talk about has to do with images. Images contain lots of information, so you often need to extract the important parts. Traditional techniques calculate the histogram of colors or apply transforms such as the Haar wavelet. More recently, researchers have started using convolutional neural networks to extract features from images.

Depending on the type of data you’re working with, it may make sense to use a variety of the techniques we’ve discussed. Feature engineering is a trial and error process. The only way to know if a feature is any good is to add it to a model and check if it improves the results.

To wrap up, that was a brief explanation of feature engineering. We have many more examples on our site, so check them out.