Video length is 19:06

What Is Iterative Learning Control?

From the series: Data-Driven Control

Brian Douglas, Mathworks

Iterative learning control (ILC) is a fascinating technique that allows systems to improve performance over repeated tasks. If you’ve ever played a video game and learned, through trial and error, the perfect sequence of moves to beat a level, then you already have an intuitive understanding of how ILC works. Unlike traditional feedback control methods, ILC focuses on refining feedforward commands, making it ideal for systems that perform the same motion repeatedly—like robotic arms, medical devices, and even subway trains.

This tech talk breaks down the fundamentals of ILC, exploring how it learns optimal control inputs over multiple iterations. You’ll see how simple mathematical operations, such as matrix multiplication, allow ILC to adjust commands efficiently, making it viable even for low-power embedded systems. Discover different learning functions—both model-based and model-free—and how to ensure stability in the learning process.

By the end of this tech talk, you’ll have a solid understanding of how ILC works and when to use it.

Published: 4 Apr 2025

Iterative learning control, or ILC as I’ll often call it, is an interesting concept.  As it’s name suggests, it’s a controller that learns the optimal sequence of feedforward commands, over the course of many iterations. I think the two things that really set this controller apart from other data driven methods is that one it’s learning a feedforward solution rather than a feedback solution which makes ILC best suited for systems that repeat the same motion over and over again. And two, the way this method learns, is really lightweight. In a lot of cases it’s just a simple matrix multiplication which can make ILC well-suited for even low powered embedded applications.

Now, even if you don’t plan on doing ILC in the future, I hope you stick around for this because I think it’s a really elegant solution.  And now there is a built in block in Simulink that makes it very easy for you to try out ILC on your project. But before you try implementing it right away, we should first spend some time understanding how ILC works. I’m Brian, and welcome to a MATLAB Tech Talk.

Imagine you’re trying to get a system to repeat the same motion, through the same environment, over and over again. For example, think about a pick and place robotic arm. The  arm starts at some known point, then moves to pick up a piece, and place it somewhere else, and then moves back to the starting point and waits until the next iteration. After that, the whole sequence is repeated again.

Since the path and the environment is the same every single iteration, we could learn the sequence of inputs to the actuators that moves this system perfectly along this desired path, and then just execute those commands open loop or feedforward.

I think a good analogy for this is like playing a video game. You want to get your character to the end goal.  And one way to do this is to try a sequence of commands over one iteration and assess afterwards how they did. If you failed to jump at the right time, then adjust your commands accordingly between iterations and try again.

And if the environment is the always the same, for example the bad guys always appear in the same place at the same time each time you start over, then with trial and error you could learn the exact sequence of commands needed to beat the game. At this point, you could just program those commands and execute them autonomously anytime you want to beat the game. In this way, you wouldn’t even have to observe your character or measure anything at all to make corrections with feedback. You would just know that the learned sequence of commands, always successfully gets your character through the environment and to the goal every time.

This is the idea of iterative learning control.

Now, hopefully, it’s obvious that these specific commands would only work if you start in the same place every iteration, and end in the same place, and that the environment and its disturbances are the same every single iteration.

This is true for some video games, but it’s also true for many real world systems. For example, manufacturing robots often work in environments like this, like the pick and place arm or one that has to weld the same parts in the same place each time. Or ILC could be used for pointing an antenna that has to trace the same path through the sky, or for medical robots that have to do pre-defined surgical procedures, or for accelerating and decelerating subway trains to ensure smooth operation. Pretty much any repetitive task with low variation between iterations are good candidates for ILC.

Alright, now that we know when to use it, let’s get into how it works.

Let’s say the system that we want to control has the following dynamics in state space form, and the initial state of the system is zero. So, the input u at time t, comes into the system, and this generates an output Y at time t. Now, we can take the input and output across the entire iteration and package them into two matrices, capital Yk and capital Uk. These capture the whole sequence for the inputs and output of one iteration. From here, we can write the input output relationship as Yk = some matrix G times Uk. And G is the input/output matrix defined by the system dynamics.

Now it’s worth noting that the system could be single-input, single-output, or multi-input, multi-output and G will be sized accordingly based on the A, B, and C matrices. For example, for a single input single output system with N control actions per iteration, G becomes an N by N matrix.

So, in block diagram form, the system looks like this. Alright, from here, let’s keep building out this block diagram for ILC.

Since U changes from one iteration to another, we’ll say that this particular one is the kth iteration of U, and it produces the kth iteration of Y.

Ok, so U is just a sequence of hard coded open loop commands and we want to know how well this particular set of inputs does at making the system follow some reference. Therefore, we compare the output to this reference signal, and the difference between them is the error across that entire iteration.

Alright, the question we have now is how do we update U in the next iteration, such that the error across the entire run is reduced? This is how ILC does it. We filter the error through some learning function L. Ill get to what L looks like in a second, but for now, well just keep it as a generic function. This produces a delta u signal, and we add that to Uk to get the new input U(k+1).

Now in many practical applications of ILC, often a low pass filter is added here to remove high frequency noise and reduce chattering in the controller. However, to keep this as simple as possible so we can really just focus on how ILC learns, I’m going to set this function to 1 in our block diagram to effectively remove it.

Ok, now that we have an updated U, we input that into the system which produces Y k+1. And we compare that to the same reference as before, and we get the new error for this second iteration. And start the cycle over again.

But here’s the interesting question. Can we guarantee that the error at k+1 is always less than the error at k? If so, then we know that U, is being adjusted in a way that makes the system better at following the reference, since the error is being reduced.

So, to answer that question, we need to find a function that compares E at k+1 to E at k.

E k+1 is equal to R - Y k+1. Alright, Y k+1 is just G times u k+1. Going back another step, U k+1 is Uk + L times Ek. And here was can expand this function into these three terms.

Now we can reduce this by realizing that G * Uk is Yk, and then r - Yk is Ek. Then after we pull out Ek from the right hand side, what we’re left with is that the error in the next iteration, is equal to the identity matrix - GL times the current error.  This is called the error dynamics equation - which dictates how the error changes each iteration. And the important part of this equation is this I - GL, part. Because for the error to be reduced, then the magnitude of I - GL must be less than 1. And that makes sense, right? We’ll always be multiplying the error by a number less than 1 and so the error will shrink over time.

Making sure this is true is where the clever selection of the learning function comes into play.

The first two learning functions I’m going to cover are both model based approaches. This means that we need to know the state space equations that represent our system, and therefore, we know the input-output matrix, G.

In that case, one option for L could be inverting the dynamics of the system. In this case, L = G inverse. With this, the error dynamics equation would look like this, where the error in the next iteration is equal to I - G*G inverse times the current error. G G inverse is the identity matrix. And subtracting the two is zero. So, if we have a perfect model, then error would go to zero after one iteration.

But making the learning function exactly the inverse of G is risky because we almost never have a perfect model, so inverting it wouldn’t completely cancel the system and it could lead to instability. Therefore, instead of trying to reduce the error in a single iteration, we can add a scalar term, gamma, and we can use that to remove part of the error each iteration. Now gamma must be between 0 and 2, to make sure the gain of this equation stays below one, and there learning will converge.

Alright, so inverting the dynamics is the first model-based method.

There is another popular model-based approach that is based on gradient-descent. Here’s how it works.

We can create a cost function, J, that is a function of the error squared - and error is a function of the input u. So, if we want to lower error, then between iterations we need to change the input U in the direction that lowers J.  The way we do this is by taking the partial derivative of J with respect to U to get the gradient. And then adjusting Uk in the negative direction, or the direction that descends the gradient.

Now to take this partial derivative there are a few steps, and I’ll quickly go through them. So you may want to pause the video to go line by line yourself if I go too fast. To start, we substitute R - GU for E in the cost function to get the relationship between U and J. And then expand the equation. Now we take the partial with respect to U of each of these terms and sum like terms to get this result. Here, the 1/2 and the 2’s cancel out, and then we can pull G transpose out from both terms. And you’ll notice that we have this r - GU term which is just the error in the current iteration.

So, finally, we want the negative slope, which turns out to be G transpose times Ek. Now if we go back, we can replace this negative gradient that we are adjusting our input by G transpose times Ek. And this means that our learning function, L, is just G transpose.

And if we plug that into the error dynamics equation, we get that the magnitude of I - G G transpose needs to be less than 1. However, once again, we can adjust the learning rate by adding in a scalar term gamma. And so gamma must be between 0 and 2 over the magnitude of G^2, to once again guarantee that learning will converge.

It’s kind of interesting to me that these two methods look very similar, you know with G transpose and G inverse, but we came at it from two different ways.

A downside of both of these methods is that they require a model of the system - we have to have an understanding of G even if it’s not a perfect representation, but the upside is that since we’re dealing with matrices, they work for both single input single output systems as well as multi input multi output systems. So, they’re very powerful approaches.

However, if you don’t have a reliable model of the system, then then we need to turn to a model free approach. The downside of model free however, is that it is really best suited for just single input single output systems.

I think a good way to understand how model free, single input single out ILC works is to switch our thinking from matrices to transfer functions.

If we go back to this block diagram, everything here could be represented in the s-domain. So, G, L U, Y, R, and E would all of be functions of S.  In this case, the block diagram simplification we did early would be the same, except that the error dynamics equation would have a 1 - GL instead of the identity matrix. This is because we’re just dealing with single input single output systems. And now the inequality we need to satisfy would be that the magnitude of 1 - GL has to be less than 1. So, in this case, how can we choose L if we don’t have a perfect system model, G?

Well, we can look at the frequency characteristics with a Nyquist plot. The Nyquist curve of 1 - GL must be within a circle of radius 1 centered at the origin. This way, the gain is never larger than 1. So, 1 - GL has to be contained within this circle, and therefore, the Nyquist curve of GL itself, must be within this circle which is moved to the right by 1. So, that’s our requirement. If the Nyquist curve of GL falls outside of this circle, even for just a few frequencies, then the error will grow for those frequencies and the learning will not be stable.

And unfortunately, this is a very restrictive constraint. For one, the gain of GL can never be larger than 2, and that’s just at a phase of zero degrees, it’s less at other phase shifts, and the phase can never be shifted by more than 90 degrees. Now, many real world systems break this requirement, since often they behave like higher order systems that have more than -90 degrees of phase shift.

That’s a problem. But, this is where clever choices for the learning function L come into play.

Even if we don’t know the exact dynamics of G, knowing the approximate dynamics can still help us determine what L should be.

For example, let’s assume that G is a 1st order process that produces this Nyquist curve. In this case, the Nyquist curve of G by itself stays within the learning circle, and so L could just be 1. This is essentially just adding the error directly to Uk to get Uk+1.

However, let’s say that you have some uncertainty in the gain and it might be larger, or if the system itself produces a Nyquist curve that exceeds the circle at some frequencies. In this case, we need L to reduce the gain, to bring it back in, and instead of L being 1, we can make it a scalar, gamma p, and make it less than 1. This will slow down the learning process, so it would require more iterations to converge, but it will be more robust to gain uncertainty. This is a proportional error feedback approach.

Now, let’s assume that your system is a second order process, one whose phase does shift more than -90 degrees.  In this case, a proportional only error feedback will not work, we need to add in some phase to rotate the curve back into the circle. And so the next logical step is to add a derivative to L, to add back in 90 degrees of phase at higher frequencies. We can scale this term by gamma D, and between that and the proportional term, we can find a combination that keeps the Nyquist curve of GL safely within the learning circle.

Now, this learning function is common, but we could make it more complex by adding higher order derivative terms, or anything we need to keep the learning process stable for whichever system we need to control. But the bottom line is that we don’t need an exact model of G to ensure that ILC converges. We just need to know enough about the system, to guarantee that the Nyquist curve of GL will stay within the learning circle.

And you might be thinking, well that seems hard to guarantee, but there is something we have control over that we haven’t discussed yet. Do you know what that is? That’s right, the system itself. We can change, G, in a way that helps ILC converge.

Remember this is our open loop ILC block diagram, but instead of operating the system completely open loop, we could design a feedback controller in parallel to the ILC controller. Now instead of the open loop system, G, the error dynamics are a function of the closed loop system.

Not only will this stabilize the system and get it to do something close to what you want while the ILC controller is in the learning phase, but we can make the closed loop system behave like a 1st or 2nd order process. If we do that, then we can get away with a model-free proportional-derivative learning function.

And this is why you will often implement ILC along side a feedback controller, so that you have the ultimate flexibility in ensuring the learning process converges.

So, you might be wondering why we would even use ILC if we have to develop a feedback controller anyway, right? Well, the reason is because you can get much better performance from your system if you fine tune with ILC. Remember, these are systems that are doing the same motion over and over again in the same environment. A feedback controller, can accomplish that but potentially with increased errors that come from disturbances or from uncertainty in the plant itself. And so, why accept those errors when you can remove them using an ILC controller in conjunction with a feedback controller. In this way, you can really get the most out of these systems that do repeated motions.

Ok, this video covered the basics of what ILC is and how it works. If you want to see how you can apply ILC to something practical like getting a drone to follow a flight path and you want to see how simple it is to set up an ILC controller in MATLAB and Simulink then check out this follow up video where we go through that exact implementation. It’s pretty cool to see it in action. Link is in the description below.

Alright, this is where I’m going to leave this video. Don’t forget to subscribe to this channel so you don’t miss any future videos. Also you can find all of the Tech Talk videos, across many different topics, nicely organized at mathworks.com. Thanks for watching, and I’ll see you next time.