Signal Processing and Machine Learning Techniques for Sensor Data Analytics
An increasing number of applications require the joint use of signal processing and machine learning techniques on time series and sensor data. MATLAB® can accelerate the development of data analytics and sensor processing systems by providing a full range of modelling and design capabilities within a single environment.
In this webinar, we present an example of a classification system able to identify the physical activity that a human subject is engaged in, solely based on the accelerometer signals generated by his or her smartphone.
We introduce common signal processing methods in MATLAB (including digital filtering and frequency-domain analysis) that help extract descripting features from raw waveforms, and we show how parallel computing can accelerate the processing of large datasets. We then discuss how to explore and test different classification algorithms (such as decision trees, support vector machines, or neural networks) both programmatically and interactively.
Finally, we demonstrate the use of automatic C/C++ code generation from MATLAB to deploy a streaming classification algorithm for embedded sensor analytics.
Recorded: 21 Mar 2018
Welcome to this webinar on Signal Processing and Machine Learning Techniques for Sensor Data Analytics Using MATLAB. My name is Gabriele Bunkheila, and I'm part of the Product Management team here at MathWorks. My background is in signal processing, which is also the area where I spent most of my career here, helping engineering scientists apply MATLAB to their own challenges.
In this webinar, I'm going to discuss the application of some standard MATLAB techniques for research problems and product design workflows that require the joint use of machine learning—for example, clustering or classification methods—on signals or time series, meaning sampled one devalues varying over time.
As we'll see, much of the complexity of these problems stems from the need of a fair bit of domain knowledge in both machine learning and signal processing. And that often poses a challenge. Whether you're unfamiliar with either signal processing, machine learning, or both, or neither, I hope the session will help you understand how MATLAB can greatly accelerate the algorithm design workflow for these problems.
We are also seeing a growing industry trend in pushing machine learning algorithms along with single processing onto embedded devices and closer to the actual signal sensors. Here, I'll refer to that class of applications as sensor data analytics or embedded analytics. Because the development of such products poses additional engineering challenges, I thought it'd be useful to allocate some time at the end of the session to review the additional support that MATLAB provides for those types of workflows.
So here's a course list of what we'll look at. We'll spend quite some time revisiting how common signal processing methods can be applied to both preparing or preprocessing signals and to extracting descriptive features. I'll show how to select, train, and test classification or pattern recognition algorithm in MATLAB, including some simple approaches to scale up performance for computationally intensive problems. Doing this will give us an opportunity to explore a lot of basic MATLAB features, from interactive apps to interior language constructs, that enable and accelerate algorithm development for workflows.
And finally, we’ll discuss some MATLAB capabilities for the design of online predictive systems, including the design and simulation of DSP functionality and the generation of source C code from predictive algorithms to have them running on embedded architectures.
Now I’ll come back to the slides later. But I’m going to spend most of this hour discussing a practical example running MATLAB. What you’re looking at here are the three components of the accelerometer signal capture using a smartphone. The signals are generated by a subject wearing a smartphone in a fixed position on their body and engaging on different physical activities. We are running an algorithm as if it operated on live signals. But in this case, we’re using labeled recorded data for validation.
So we get to know the ground truth. But we are also trying to automatically understand what activity they are doing purely based on signal processing and machine learning methods. And as you can see, most of the time we’re successfully guessing that activity.
Now I’d like to make a simple point that this is just a simple example walking with accelerometer data. But the techniques that I’ll discuss are relevant to a wide spectrum of applications and to most types of sampled signals or time series.
To make my point, I’ve collected a short list of examples that I came across personally work with MATLAB users, which, broadly speaking, use the same types of techniques. These already capture use cases in a number of different industries, like electronics, aerospace, defense, finance, automotive. And again, even this is just a random list of examples. The relevance of techniques that we’re discussing here is much wider.
I’ll take advantage of one last slide before going back to MATLAB to review the different pieces of the examples that we’ve just seen. We take the three components of the sample acceleration coming from a smartphone. And we predict the physical activity of the subject as a choice between six different options or classes, walking, walking up stairs, walking down stairs, sitting, standing, and laying.
The prediction is done through a classification algorithm. Classification describes a popular class of machine learning algorithms. The key idea is guessing or predicting the class of a new sample, in this case, a signal buffer, based on previous knowledge of similar data. The way it works is that first, the algorithm is trained with a large set of known or labeled cases optimizing its free parameters to identify those known cases as accurately as possible.
Once trained, it can be run on unknown new data to formulate a guess or prediction on what’s the most likely class for that new data. In general, the training phase is a lot more data and computationally intensive than the test or runtime phase. So for embedded applications, it is not uncommon to run the training phase on a host system or computer cluster, but only to deploy a fixed, pretrained algorithm onto the embedded product.
Regardless of training or runtime use, it is very uncommon for classifiers to be able to work on raw waveforms like these. In practice, for each signal segment or buffer, once you have a way to extract a finite set of measurements, often called features, from the raw waveforms, the bottom line to choose what features to use is that they should capture the similarities between signals in the same class and the differences between those in different ones. We’ll spend most of the time left to show how MATLAB can be used to design these two big algorithmic steps, starting right from the initial exploratory phase.
Now as a quick note, a key part of working through a similar task or project is the availability of a reference data set. That will be a collection of signal recordings acquired in a controlled experiment and carefully labeled so that signals are known and associated to the right activity. For this example, I’m borrowing a nice data set made available by two research groups, respectively from Spain and Italy, and available at the address in this slide.
I hope that the general problem is clear enough by now. So let me go back to MATLAB. In the following, I’m going to assume that you’re familiar with the basics of MATLAB, including things like scripts and functions and basic plotting and visualization.
To walk you through this example, I’m going to use a script formed of a number of code cells that can be executed independently. I’ll skip the very first cell, which I used to launch my completed application at the very beginning. Executing this for a cell loads a portion of my data sets and plots it.
Let’s not care too much right now about how I loaded the data or produced this plot. And let’s take a look at the data that we have available. We have a vector, acc, containing the samples of the vertical acceleration for a subject over a period of time. The data set itself is 3D acceleration recordings for 30 different subjects. The acceleration will sample at 50 samples per second. That’s the meaning of the sampling frequency variable, fs. We also have the time vector, t, with the same length of acc, which is good.
T and acc can be plotted one against the other. And the plot shows us that for this subject, we have around 8 minutes’ worth of samples. Know that time here is regularly spaced. In some application, that may not be the case or samples may be missing. But keep in mind that MATLAB has plenty of techniques to regularize and preprocess those types of signals.
The other obvious things to notice is the other long vector, actid, shorthand for activity ID, which is telling us what activity each data sample corresponds to as an integer between 1 and 6. We can interpret those integers by looking at the remaining variable acc labels, so 1 is walking, 2, walking up stairs, 3, walking down stairs, and so on.
So the plot here looks very similar to our final objective, which is identifying the activity given a portion of signals. But remember, this is known, the label data. We're only visualizing available information. What we want to do is design a method that can learn from this information to guess it on new data without previous knowledge.
So how could we do that? All first attempts, we'll try to use intuitive approaches. So for example, in this plot, you already see that the acceleration waveform does indeed look different depending on the activity. You can see that almost all activities have an average of around 10, that's g or a 9.8, while 1 is more around 0. And guess what? That's laying because the body has a different orientation with the gravitational field.
There are then three of these that look fairly static—no surprise—that's sitting, standing, and laying, while the other three appear to oscillate up and down a lot more. So one could start by doing something very simple, just use some statistical measurements on a portion of consecutive samples regardless of how they're distributed in time.
For example, looking at the distributions of the walking and laying samples respectively, one can see that just computing the mean value in comparing to a threshold—so you see, 5, here—would give us a pretty good chance of making the difference between the two. Similar considerations between, say, walking and standing, but in this case, we'd probably want to measure the standard deviation and compare it to something like 1 or 2 meters per second squared.
But what about if we had to work out the difference between plain walking and walking up stairs? In this case, mean and standard deviation or the width of the distribution look very similar. Here, what you should really consider is some more advanced analysis of how values vary over time and look at things like, say, the rate or the shape of the oscillations.
An intuitive reason for that may be because people move faster when they walk down stairs, say, compared to when they would walk up stairs, or the types of movements differ for different activities. This is precisely where signal processing methods start to be part of the picture.
Before going there, let me make a point in passing on what I've just done. I've just casually drawn three histogram plots and quickly discussed their meaning. Even only this task could be a fairly hard one if you had to do it by hand from scratch. But these type of things are available in MATLAB as single functions. So despite using a pre-edited function of mine to put those two plots in a figure and give them the right colors, inside it a single call to the histogram function is doing all the hard work for me.
Histogram as opposed to the old hist was introduced and released R 2014b of MATLAB. And it provides a new, more efficient way of plotting histograms.
Now to analyze variations over time, we want to focus on the acceleration caused by the body movements. It's reasonable to assume that the body movements produce faster variations while the gravity contribution is often almost constant. If I had two contributions that are blended together into a single signal and I want to separate them out, then a technique that often applies is digital filtering.
In this case, for example, we want to keep only the oscillations quicker than about one cycle per second, say, which is a rough figure for the average number of steps per second, and discard contributions with slower oscillations. Using the right jargon, that requires designing and applying to our data an appropriate high-pass filter. I'll repeat these ideas as we go through the process.
Now designing and applying a digital filter is hard unless you have the right tools. The design phase in particular requires quite a bit of math and a lot of domain-specific knowledge. In MATLAB, there are many different ways in which one could design a digital filter. For example, one may choose to do it entirely programmatically, which means only using MATLAB commons, or through built-in apps.
Let's first take a look at what the latter would look like. Using an app is generally a great idea when you approach a problem for the first time. To do that, I go to the Apps tab of the MATLAB tool strip, and I scroll down to the Signal Processing and Communications app group. In here, I will pick the Filter Design and Analysis tool. For more advanced filter design, you may also want to try the Filter Builder app.
The Filter Design and Analysis tool comes with several sections. For example, this Filter Specifications pane will help us specify the right requirements to our filter. Down here to the left is where we start to define what we're looking to achieve. In this case, I'll select a high-pass filter. But you can see that a lot of other choices are also possible.
Down here, there's a more technical choice. If you know about digital filters, you'll probably know what an FIR and IR mean. And the design methods listed here will probably resonate quite a bit. Here, I'll skip the details. And I'll just choose one of these IAR options. Moving to the right here, all I need to do is capture my requirements using the specification pane above.
The things I have to say include we're using sampling frequency of 50 hertz. We want to keep unaltered; that is, we want to attenuate by a factor of 1 or 0 dB. All signal components oscillating more quickly than one time per second or 1 hertz. Let's be generous and say 0.8 hertz. Then everything to the left of this other value, fstop, is attenuated by at least a given number of dB. I'll set this to, say, 0.4 hertz and correspondingly, this Astop to 60 dB. This means that all oscillations slower than 0.4 times per second will be made 1,000 times smaller by the filter.
Finally, by pressing design, the tool does all the job for us. And we end up with a filter that satisfies our requirements. We have a set of analysis tools available right within this app to verify that the filter is behaving as expected.
For example, now we're looking at what's called margin sheet response of a frequency. If I need to confirm this is honoring the specifications, I can overlay a specification mask. Or if I wanted to understand the transient behavior, by press of a button, I can quickly visualize things like the step or the impulse response.
Once my filter is designed, what I really want to do is to use it in MATLAB to apply it to my signal. For that, I can choose between two types of approaches. I can export the filter into my MATLAB workspace as one or more variables. Or I can generate some MATLAB code that realizes all that I've just done interactively through a programmatic script.
The code that you see here has just been generated automatically as the header of this function is telling us. However, I could have just as well decided to use similar comments independently. Having this generated automatically for me can also help me gain some insight so that the next time around, I could more quickly design my filter programmatically. But more importantly, this now gives me a quick way of realizing the filter from my own code just by using this function call.
I'm not going to discard this new function because I have a previously saved version already available in my working folder called HP filter. And going back to my script, you can see that I'm creating a filter via my preset function using one line of code. And in the next line, I'm applying the filter to my vertical acceleration. And that creates a new signal where we hope to find only the contributions due to the body movements. If I execute this section, I'm also plotting the new filtered signal against the original one.
In the plot, we can see that the new signal is now all centered on 0, as expected. That's desired. And some transient behavior due to the filter is present, which is perfectly normal.
Now let me focus on one activity at a time. How can I do that? There's a very effective MATLAB feature called logical indexing that I can use to that end. Look at this. Say I want to isolate this walking portion of my signal. The activity type is stored in the vector actid. And I check here when actid is equal to 1. And because I have two working portions here, you only look at when the time is smaller than 250 seconds.
The result here is a vector of the same length as my signal that I can use to select only the samples that honor those criteria. And here is our working segment. We can zoom in and confirm that the signal oscillates fairly regularly, almost periodically. Now the question is, how can I measure how quickly this is oscillating or some parameters to quantify the shape or fingerprints of these oscillations?
A good answer would be by looking at a spectral representation of my signal, or some could say through computing it's FFT. Much better than FFT—which is a pretty low-level operation—the right phrase here is power spectral density that may use the FFT along with a few other bits and pieces. Again, my best bet is to focus on my objective and see if MATLAB can simply do that for me, as is the case.
Out of the many functions available to estimate the power spectral density of our signal here, I'm using the Welsh method, which is pretty popular. In light of code here, I have my spectrum. On the x-axis, I have the frequency from 0 to 1/2 of my sampling frequency, which was 50 hertz. And on the y-axis, I have dB per hertz or power density. Then the region when the values in this plot are higher is likely to carry the information that I'm after.
For our signals, this pattern of peaks between 0 and 10 hertz with higher energy holds a lot of measurable information. If you have a sudden signal theory class, you remember about the spectra of signals that are periodic or almost periodic. We can see a fundamental frequency, roughly around 1 hertz, and a number of harmonics at positions multiple of that frequency.
As for extracting information from this, the distance in frequency between these peaks is the rate of time domain oscillations. And the relative amplitudes of the peaks described the shape of oscillations, a bit like the timbre in musical signals. To validate these, I'll also show you the spectrum for walking on top of the one for walking up stairs in a range between 0 and 10 hertz.
Here, walking up stairs produces slower and smoother movements because the slower of these peaks are all pushed to the left. And the smoothness in the time domain causes the peaks to the right of the fundamental to decrease very quickly, indicating softer time-domain transitions. If this sounds unfamiliar, think about the spectrum of a pure sine wave, which has a single peak in its spectrum compared to that of a square wave, which is full of high-frequency harmonics.
Once established that special peaks carry information, we'd like a programmatic way of measuring their height and position. So here's the next question. How do you identify peaks in a curve? Contrary to what some may think, that's not trivial. The Signal Processing Toolbox comes to rescue with a function called Find Peaks that is built to do just that.
If we use it while providing no other information but our raw spectral density, then it will return the complete set of local peaks found in my plot. But if we put some more effort in defining what we want, for example, how many peaks it should return and what is the peak prominence that we require or what is the minimum distance between nearby peaks that we expect, then the results are much more encouraging. And just using a few lines of code, we now have a programmatic measurement approach that can be automated. And it's highly descriptive of our signal characteristics.
In the example that I showed you at the beginning, I was using a couple of more signal processing measurements to extract other features. But I think by now you'd get the general spirit of an exploratory approach for extracting features from signals. What I did at the end of this phase was to collect all the useful measurements identified into a function so that for each new signal segment available, I'm able to automatically produce the collection of all my measurements or features that describe it.
Let me show it to you quickly. For every new buffer of acceleration samples in the three directions here, I am computing the mean, filtering out the gravity contribution, computing the RMS, measuring the spectral peaks, as well as a couple of other things. If I look at the spectral features subfunction, you can recognize the functions PWELCH and Find Peaks from a few minutes ago.
In total, this function returns 66 highly descriptive features for every new signal buffer it is passed to as an input. What I really like about it is that if I measure the net number of code lines, excluding comments and empty lines, that sums up to only 54. That's 54 lines of code for 66 features or much less than a single line per feature, which I find indicative of how the MATLAB language is concise to the advantage of both understanding and productivity.
With that, I think we can now say that we're halfway through our exploratory workflow. We put in place a method to extract a finite set of features for every given segment of signal. We now need to design a classifier able to learn how to associate measurements, or sets of 66 features in this case, to a class or a choice of activity among six available options.
To work with a classifier, we first need to map all our data into the new feature-based representation. Let me open this other script to show you quickly what I mean. Imagine we had first reorganized our data, say, eight minutes of samples times 30 subjects, in a large number of small buffers of equal length, say, 128 samples. What we do now is that for every one of those buffers, we call our feature extraction function to compute our 66 features. And we end up with a new feature data set with as many columns as the number of features and as many rows that are available buffers.
Because classification algorithms often need a lot of data to learn, extracting features from an entire data can take a very long time. And if along this exploratory phase one decides to use different features, then the whole operation needs starting over. Let me show what I mean here on a small scale.
Let's reduce the number of data buffer here to a mere 600 and run this. I start a timer before the loop and stop it right afterwards. We can monitor the progress as my 600 data buffers are converted to features one after the other. And the process terminates roughly around 17 seconds later. Let the number of buffers grow, and this will grow linearly with that.
Now think about this. The computations in each cycle of the for loop are all independent of each other. So if we had more computational resources available, we could start to think about distributing the burden across the computing nodes available. I suspect most of you would think that that would be a hard task. So let me challenge that perception.
What I'll do here is change my for keyword to parfor, make sure I have a parallel pool enabled, then run again my loop. The buffer is now processed asynchronously by a pool of four server MATLAB worker sessions running in the background. And I'm finished in a fraction of the original time.
The actual performance gain will change depending on the particular problem. The bottom line is that because I have the Parallel Computing Toolbox installed with my machine, I was able to open locally a number of MATLAB workers equal to the number of available cores on my machine. I have four cores here. But with external resources like a cluster, the number can be driven up at will. And then I was able to distribute independent iterations of a long for loop simply by changing for into parfor, like parallel for.
Once we're done, we can save our featured data set and go back to where we left our problem. We left our problem at the stage where we needed to select a classification algorithm. And now we have the data ready to go ahead. When you need a classifier, you have a choice among a high number of different types of algorithms. The MATLAB documentation provides some guidance on which types are best suited to which problems, but the whole process of trial and error can be intimidating, especially if you're not familiar with machine learning in general or in particular, with a reasonable number of classification algorithms.
To address that issue from release 2015a, MATLAB has a new app called Classification Learner. You can pick it from the Apps tab. But let me just load it with preset feature data and open the app by running the common Classification Learner right from my script.
To start, I load my data and pick this option on the right-hand side here to leave out a fraction of my data set for validation. Know that before loading the data in my script, I'd also range it as a MATLAB table. And that will allow the tool to associate names to my feature and display some simple statistics for each of them. My data also included an actid and activity label for each available feature vector.
As I click Import Data on the right, I have a simple visual of my data points in a 2D-feature space. I can choose which of my 66 features to use for x and which for y and get a feel for how well my data samples are separable. At this point, we simply start selecting different classifier algorithms from this catalog and train them one by one on our data set using this button.
You don't really need to know what these algorithms are or how they work and what parameters they need to even work because the tool selects them all smartly for you. And if you want, you can change them by using this advanced button. As I hope you can see, when the training completes, the tool displays an accuracy summary beside each of the selected options and highlights in green the choice with best accuracy.
At this stage, you may also need to understand a bit more closely the performance of the Classifier. And this app has a few diagnostic options available right from within it, like, for example, the confusion matrix, which shows how well our predictions map to the actual known values in the data set. For example, a full green diagonal here and no instances outside it would indicate 100% prediction accuracy.
As in many other cases with MATLAB apps, you can then turn your interactive exploratory work into a bit of code to automate the same steps programmatically. In our script, we're using a preset version that comes exactly from the same workflow. What's interesting here is a three-line of code pattern consisting in choosing the setting for the classifier, training it—note the fit keyword in here—and running it on new data to return the predicted class.
We can use this generated function right from within our script to return a trained classifier and use it on new unknown feature vectors. I wouldn't do that just as yet because there's something more that I need to mention. The Classification Learner provides an intuitive way to access a good number of conventional classifiers that ship with the Statistics and Machine Learning Toolbox.
An example of an alternative way to address this problem is neural networks. Again, in this case, designing and training a network from scratch would be very complicated. But the Neural Network Toolbox provides apps and functions to get started quickly and design a functional network with only a few lines of code.
In the interest of time and to take a different perspective, let me just share how one could use a programmatic approach in this case to do the same job. Here I initialize a pattern recognition network with 80 neurons in a single hidden layer with a single line of code. Then I trained it and returned the predicted classes in the test set in just a couple of more lines.
If you ever came across the theory of neural networks, then you'll know that the complexity of the math underneath these operations is considerable. Just think about using back propagation for a [INAUDIBLE] network architecture and all the optimization options that you may need to consider for your cost function. In this case, most of the well-established algorithm is just available to use so you can focus on solving a specific problem. When I execute this code, I get an interface to monitor the training progress also confirming the architecture of the network: 80 neurons in the hidden layer, 66 inputs as the number of features, and 6 output classes.
When we're done, the trained neural network is available in my workspace. And again, through a programmatic approach, I can use it to run the prediction on the whole test set portion of my data set. As we did before interactively, I can generate diagnostics programmatically as in the case of this confusion matrix.
This reports around 92% accuracy, which is pretty good, along with a detailed view on how all the predictor classes match the already known value. As an example here, we could notice that a lot of the sitting were confused for standing and vice versa. So that would be an area for improvement in our algorithm.
Now let me take a step back and review what we have achieved. We were able to train and use a classifier operating on high-quality features extracted with signal processing methods. We tackled a problem that required significant domain expertise in the two domains of signal processing and machine learning, which had no single way of being addressed and could have taken a long time to solve. Instead, this only took a few iterations. I used a few different apps and algorithms that were readily available to use without needing to open any complicated book and program any math from scratch. A remarkable result was a signal processing function able to extract 66 features in only 54 lines of code.
Now I'd like to spend the last 10 minutes or so of our session to consider a few common engineering challenges slightly beyond the algorithm exploration phase that we discussed until now. Imagine that our final objective was to run our predictions on signals coming from a new acceleration sensor. In this case, we used an existing data set. And we didn't ask ourselves many questions on how that had been collected.
In general, getting hold of relevant data may well come to be the first problem on our list. I very often talk to engineers who assume that to acquire real-world signals and to explore your algorithms you need two different tools, and who end up spending quite some time transitioning data between MATLAB and some external data acquisition software. But it turns out that MATLAB can directly connect to a number of sensors and data acquisition devices. And using that connectivity can further accelerate your discovery cycles.
Because our example uses accelerometers and data from a smartphone, I thought I'd also include a reference to two free support packages downloadable from MathWorks website, which allow to stream sensor signals from iOS and Android devices directly into MATLAB.
Now thinking about the end of a development workflow now, imagine that your shiny MATLAB algorithm had to be implemented on a real-time system; for example, on an embedded device close to the accelerometer itself. In this case, not only the final real-time software will probably help to be rewritten in C or C++, but the actual functionality of the algorithm would have to be readapted in the final product.
The machine learning portion will probably need to be simpler. It's common, for example, for embedded classifiers to be pretrained offline and to be implemented in a lightweight version that only does online prediction. The signal processing will also vary probably even more significantly. For example, filters that work on signals that stream from sensors will continuously accept new samples and update their internal states accordingly.
If the original MATLAB model didn't take into account these effects, then it's possible in the final implementation could never match the performance of the original simulation, potentially compromising the success of the actual end product.
The good news is that MATLAB is not only relevant to the initial signal analysis and algorithm exploration phase; they can also be used to simulate real-time systems and generate embeddable source C code. It's beyond the scope of this webinar to cover these aspects in detail. But let me give you an idea of what's possible in this area.
The quickest point to cover is the deployment of the classifier. When we trained and tested our neural network classifier, everything had been done through a network object that we'd called net. This has a wealth of functionality attached to it. And the actual code for the math used for prediction may be quite hard to find. But from my object net, I can run this genFunction method and generate a simple prediction function that only models what needs to happen real time just using basic constructs.
Let's now take a look at the modeling of the digital signal processing. On the left here, in extractSignalFeatures.m I have the 54-line feature extraction function that we reviewed before. More signal processing function used here come from the Signal Processing Toolbox. These were extremely valuable during our exploration phase. They are the best choice for data analysis tasks. But they are not intended to model the behavior of a real-time system. And that's not what we had in mind when we put this code together in the first place.
Look, for example, at how we filter our signals. My first consideration really is just a side note. But it may help us get into the right mindset. Here we compute a filter coefficient for every new signal portion even if they are always the same. We take in the full sequence of entries at once. And we assume to be operating on a pretty long one. As we do that, every time we assume to start with a filter with clean history with zero internal states.
Now for comparison, let me look at another way to model this process that has a real-time implementation in mind. And that's in this other function features from buffer.m. Most of the signal processing in this right-hand side function comes from a DSP System Toolbox. These objects have been developed with system design and simulation in mind. They may be less practical to use for signal analysis. But they can be used to accurately model real-time DSP systems.
If we just look at the filter for comparison, here it's a filter object with a notion of internal structure. And as you could see by simply creating one in MATLAB, one could even get better accurate behavior by capturing the complete data type specifications; for example, for a fixed-point implementation.
The object keeps hold of its internal state and is declared as persistent. So exiting and re-entering the function here will find it again in its previous state. So really, if required, it could even take in one sample at a time. And it would still operate as expected.
As for the coefficients, they're computed only once the first time this function is called when this persistent variable is initialized. Then they're just used at runtime by calling the step method on every new buffer of data. As a side effect, this filter runs very efficiently as it's initialized just once. And from the second time around, it only executes the strictly necessary computations to process the input. These attributes make it ideal for using it with stream signals in the context of signal design and simulation. A first advantage of this new system model, as we may call it now, is that by simulating it, we can verify early on the design of our algorithms for an embedded system and check that the behavior is as expected.
I'm sure this visualization now looks familiar. This is what I showed you right at the beginning of this session to introduce our example. The dynamic simulation here offers a different perspective. For example, I can check the stability of the prediction in the transitions from one activity to the other. Or another application, I may need to analyze the signal as I would do on an oscilloscope, for example, using triggers and markers.
This here is a time scope. But other types of visualizations are also possible—including, obviously, in the frequency domain with a spectrum analyzer. But we won't look at that right now.
If we take a look at the code that produces the simulation, we'll meet again a lot of programming practices that we've seen in the new feature extraction function. So, for example, we use a while loop with a new data process at every iteration. This code object here is what we're using from continuous online visualization. Within the while loop, we just keep pushing in more data using the same simple constructs of that step method that we have already seen earlier.
At the beginning of this loop we use a file reader object in a similar way to incrementally advance on a data file without needing to load a potentially huge file in memory or to do any complex indexing into the source data. We're simply passing the file name at the beginning and get a new frame of sample that every iteration.
Here I also used a buffer to help me operate on a longer data window than the system may be receiving in a single iteration all wrapped in a separate object to hide away in indexing and used through the same step interface. And right in the middle of the loop, you can see the prediction function that we are simulating complete with our new DSP models and the lightweight neural network classifier.
Beyond the ability to simulate our system online, a second substantial advantage of having real-time models of both the DSP processing components and the deployed neural network is that we can now automatically generate source C or C++ code from them. That can be used in an embedded product, an embedded prototype, or simply as reference to share with a downstream software engineering team.
There will be a lot more to say on this, including the ability, say, to directly generate fixed-point or target-optimized C. But I'll just show you the general idea of how that works. Although the first time you could also go through this workflow through a dedicated built-in app, the general idea is that with this simple common codegen, we could turn our MATLAB function predict activity from signal buffer into a fully equivalent open C function with no libraries attached. The generated C is fully open. In this case, I put no effort in optimizing the generated code. But a lot of features are there to do that, including the ability to generate all fixed-point code.
Okay. I think we've seen in action all that I was planning to show you. Now let me go back to my slides. I'll take a step back and review what we've done along with the capabilities and tools that I've used in the various parts of this presentation.
Signal analysis was the first area where the use of de facto standard built-in functions saved us a lot of time. And Signal Processing Toolbox is where all these useful functions came from. Just imagine having to implement all these formulae from scratch, let alone looking them up and try to understand them. Parallel Computing Toolbox let us distribute computationally intensive for loops simply by changing a for loop to a parfor loop. Additional parallel computing options are available to scale up the framework that we use to larger architectures, including computer clusters and the cloud.
The Statistics and Machine Learning Toolbox not only allowed us to test a good number of classifiers but also to quickly explore and compare different options interactively in the Classification Learner app. I feel that sped up considerably our discovery cycle. After exploring a few conventional classifiers, we also used the Neural Network Toolbox to create a common network topology used for pattern recognition, train it, and test it. We also generated a lightweight pretrained version of that network, which captured the runtime computations using basic MATLAB constructs fully supported by the C cogeneration engine.
With the fundamentals of our signal processing algorithms already consolidated, we used the number of objects from the DSP System Toolbox to model the real-time implementation of our algorithms. We run an online simulation of our system designed using objects that facilitate the streaming of data from long signals stored on disk. And we use scopes that are optimized to handle the continuous visualization of stream signals, similarly to how one would visualize a real-world signal with a benchtop instrument.
As a side effect, our online modeling efforts made our algorithm already a lot more efficient to execute in simulation and also made them ready to generate C or C++ source code that could be directly deployed onto an embedded processor. And that's where we used MATLAB Coder.
MATLAB Coder is a cogeneration engine that can turn MATLAB algorithms into fully open C or C++ source code. There will be a lot to say about what MATLAB Coder can do, especially by generating embeddable source code. So I thought I'd refer you to a great introductory webinar available in prerecorded format on our website that's called, "MATLAB to C Made Easy."
On that, we've come to an end of this webinar on Signal Processing and Machine Learning Techniques for Sensor Data Analytics. I hope that's been useful in highlighting some MATLAB capabilities that you weren't yet familiar with. I will aim to make the code that I used available in the coming few weeks so you can review the example at your own pace.
If you have to forget about everything that I touched on today, I hope at least you take away the following three key ideas. First of all, our open-ended project was made possible by having available an extensive range of built-in functions, both for signal processing and machine learning. That allowed us to experiment quickly with different options without having to implement any math from scratch.
The complementary part of the picture was the MATLAB environment itself, from the basic visualization capabilities to the built-in apps that generate reusable code, making constant use of a language that makes it easy to let advanced things happen within a few lines of code.
Finally, it took you for a tour through a set of MATLAB capabilities for transitioning abstract ideas to real-time algorithm implementations. We turned signal processing algorithms into detailed DSP system models that could be simulated over time. And from those, we generated source C code that could be recompiled on an embedded platform.