Skip to content
MathWorks - Mobile View
  • Sign In to Your MathWorks AccountSign In to Your MathWorks Account
  • Access your MathWorks Account
    • My Account
    • My Community Profile
    • Link License
    • Sign Out
  • Products
  • Solutions
  • Academia
  • Support
  • Community
  • Events
  • Get MATLAB
MathWorks
  • Products
  • Solutions
  • Academia
  • Support
  • Community
  • Events
  • Get MATLAB
  • Sign In to Your MathWorks AccountSign In to Your MathWorks Account
  • Access your MathWorks Account
    • My Account
    • My Community Profile
    • Link License
    • Sign Out

Videos and Webinars

  • MathWorks
  • Videos
  • Videos Home
  • Search
  • Videos Home
  • Search
  • Contact sales
  • Trial software
4:35 Video length is 4:35.
  • Description
  • Full Transcript
  • Related Resources

Applied Machine Learning, Part 1: Feature Engineering

From the series: Applied Machine Learning

Explore how to perform feature engineering, a technique for transforming raw data into features that are suitable for a machine learning algorithm. 

Feature engineering starts with your best guess about what features might influence the action you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if your results have improved.   

This video provides a high-level overview of the topic, and it uses several examples to illustrate basic principles behind feature engineering and established ways for extracting features from signals, text, and images. 

­­­­­Machine learning algorithms don’t always work so ­­­­well on raw data. Part of our jobs as engineers and scientists is to transform the raw data to make the behavior of the system more obvious to the machine learning algorithm. This is called feature engineering.   

Feature engineering starts with your best guess about what features might influence the thing you’re trying to predict.  After that, it’s an iterative process where you create new features, add them to your model, and see if the result improved. 

Let’s take a simple example where we want to predict whether a flight is going to be delayed or not. 

In the raw data, we have information such as the month of the flight, the destination, and the day of the week.  

If I fit a decision tree just to this data, I’ll get an accuracy of 70%. What else could we calculate from this data that might help improve our predictions?

Well, how about the number of flights per day?  There are more flights on some days than others, which may mean they’re more likely to be delayed. 

I already have this feature from my dataset in the app, so let’s add it and retrain the model. You can see the model accuracy improved to 74%. Not bad for just adding a feature.

Feature engineering is often referred to as a creative process, more of an art than a science.  There’s no correct way to do it, but if you have domain expertise and a solid understanding of the data, you’ll be in a good position to perform feature engineering.  As you’ll see later, techniques used for feature engineering are things you may already be familiar with, but you might not have thought about them in this context before.

Let’s see another example that’s a bit more interesting.  Here, we’re trying to predict whether a heart is behaving normally or abnormally by classifying the sounds it makes.

The sounds come in the form of audio signals.  Rather than training on the raw signals, we can engineer features and then use those values to train a model.  

Recently, deep learning approaches are becoming popular, as they require less manual feature engineering. Instead, the features are learned as part of the training process.  While this has often shown very promising results, deep learning models require more data, take longer to train, and the resulting model is typically less interpretable than if you were to manually engineer the features.

The features we used to classify heart sounds come from the signal processing field.  We calculated things such as skewness, kurtosis, and dominant frequencies.  These calculations extract characteristics that make it easier for the model to distinguish between an abnormal heart sound and a normal one.

So what other features do people use?  Many use traditional statistical techniques like mean, median, and mode, as well as basic things like counting the number of times something happens.

Lots of data has a timestamp associated with it. There are a number of features you can extract from a timestamp that might improve model performance.  What was the month, or day of week, or hour of the day?  Was it a weekend or a holiday?  Such features play a big role in determining human behavior, for example, if you were trying to predict how much electricity people use.

Another class of feature engineering has to do with text data.  Counting the number of times certain words occur in a text is one technique, which is often combined with normalization techniques like term-frequency-inverse-document-frequency.  Word2vec, in which words are converted to a high-dimensional vector representation, is another popular feature engineering technique for text.

The last class of techniques I’ll talk about has to do with images.  Images contain lots of information, so you often need to extract the important parts. Traditional techniques calculate the histogram of colors or apply transforms such as the Haar wavelet.  More recently, researchers have started using convolutional neural networks to extract features from images.

Depending on the type of data you’re working with, it may make sense to use a variety of the techniques we’ve discussed. Feature engineering is a trial and error process.  The only way to know if a feature is any good is to add it to a model and check if it improves the results.

To wrap up, that was a brief explanation of feature engineering. We have many more examples on our site, so check them out.

 

Related Products

  • Statistics and Machine Learning Toolbox

Learn More

Feature Extraction for Signals
Feature Extraction for Images
Text Feature Extraction
Statistical Feature Extraction

Bridging Wireless Communications Design and Testing with MATLAB

Read white paper
Related Information
Related Information
MATLAB for Machine Learning

Feedback

Featured Product

Statistics and Machine Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

Use ROC curves to assess classification models. Walk through several examples that illustrate what ROC curves are and why you’d use them.  
4:43
Part 2: ROC Curves
View full series (4 Videos)

Related Videos:

34:34
Machine Learning Made Easy
5:36
Machine Learning for Predictive Modelling (Highlights)
44:37
Machine Learning for Predictive Modelling
41:25
Machine Learning with MATLAB
34:31
Machine Learning with MATLAB: Getting Started with...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (简体中文)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • Contact sales
  • Trial software

MathWorks

Accelerating the pace of engineering and science

MathWorks is the leading developer of mathematical computing software for engineers and scientists.

Discover…

Explore Products

  • MATLAB
  • Simulink
  • Student Software
  • Hardware Support
  • File Exchange

Try or Buy

  • Downloads
  • Trial Software
  • Contact Sales
  • Pricing and Licensing
  • How to Buy

Learn to Use

  • Documentation
  • Tutorials
  • Examples
  • Videos and Webinars
  • Training

Get Support

  • Installation Help
  • MATLAB Answers
  • Consulting
  • License Center
  • Contact Support

About MathWorks

  • Careers
  • Newsroom
  • Social Mission
  • Customer Stories
  • About MathWorks
  • Select a Web Site United States
  • Trust Center
  • Trademarks
  • Privacy Policy
  • Preventing Piracy
  • Application Status

© 1994-2022 The MathWorks, Inc.

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • LinkedIn
  • RSS

Join the conversation