Training a neural network in a real world system (inverted pendulum)

Question

Elvin Toh on 13 Nov 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/647013-training-a-neural-network-in-a-real-world-system-inverted-pendulum

Commented: Elvin Toh on 16 Nov 2020

Accepted Answer: Emmanouil Tzorakoleftherakis

Hi,

I have physically built an inverted pendulum system driven by a DC brushed-motor, controlled by an arduino, programmed using simulink blocks (arduino package for simulink). Encoder functions are done via S-function builder and provided to my blocks.

For the past 2 weeks, I have been able to successfully control my inverted pendulum using a simple PID controller, all implemented within simulink and deployed on my arduino mega.

But I would like to up the game and use machine learning to learn to control my inverted pendulum system (not PID). I do NOT have a simscape model, nor state-space representations of my inverted pendulum system nor do I intend to model it. I would like to use any of the reinforcement learning methods from matlab/simulink to physically learn the system in real-time (I don't mind if it takes days to physically run 1000s of actual runs) (as if it is a black-box system)

I'm currently using the DDPG agent and have been able to validateEnvironment without any errors (the RL agent block gets the 'observations', ' rewards', 'isdone' from my 'environment', and sends an 'action' to my environment).

The problem I am facing now is that when I start training, my dc motor will indeed move/fluctuate according to the ' actions' sent to the PWM pin of my arduino (again, via simulink blocks). But my S-function encoder block does not return any values (it constantly remains as 0) at all. I have used rate transition blocks, set different sampling time, but it just continues to output 0 to whenever I use the scope/display to monitor my encoder block output. And because it constantly produces 0, my reinforcement learning episodes gains no meaningful training at all.

To proof that the encoder block is indeed working, I even deleted the RL agent block and I could see proper values from my encoder block.

Can someone tell me if it is indeed possible to use/train a neural network based on a real time(or near to real time, I don't even mind if it is lagging) inputs/outputs to an arduino mega connected to an inverted pendulum system please?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 16 Nov 2020

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/647013-training-a-neural-network-in-a-real-world-system-inverted-pendulum#answer_546093

Hello,

I am not sure why you get zero values consistently, but since you say this works when you don't have the agent block in your model, this looks like an issue when using bidirectional communication. Try replacing the RLAgent block with, e.g. a constant block or MATLAB Function block that published actions at the same rate. If you still see zero values, then check your S-function implementation.

On a different note, it is certainly possible to use RL and train with a physical system, but that requires some extra work and attention. For example, how do you define an episode in a real physical system? Will you reset the system yourself after each episode? Controlling motors with trial & error methods such as RL might create issues and so on.

Most importantly, it seems like your workflow is as follows: policy lives in your Simulink model, sends actions to the system, reads observations from the board. This will most likely not work with a system that requires fast, real-time control like the pendulum. Your policy will always lag, so will observations so you will likely never get close to the desired equilibrium to get the "good rewards" (particularly more so if you are trying to swing up the pendulum).

The best approach is to have the policy deployed on your board and have the training algorithm in your Simulink model update the parameters of the deployed policy periodically. Unfortunately this is not currently supported mainly because you cannot deployed neural nets created with Deep Learning Toolbox layers on your Arduino (we are working on it). The workaround would be to recreate the neural network with core Simulink blocks (I am assuming your neural net for this problem is not very complex consisting of primarily fully connected layers and activations) and deploy it like this to your board. That way, the deployed policy will be able to control the pendulum in real time. Then you can adjust the FC layer weights periodically.

Hope that helps

1 Comment
Show -1 older commentsHide -1 older comments

Elvin Toh on 16 Nov 2020

Thank you very much Emmanouil!

Indeed the frustration with the RL-Agent blocks only functioning in "Normal" or "Accelerator" modes seems to be one of the main issues in me trying to deploy the simulink model in an "External" mode, or even running the train() function in Matlab (same issue, only normal or accelerator modes).

With the current road block, it seems I would have no choice but to consider your workaround recommendation to manually deploy a simple neural network using traditional blocks.

To answer the first part of your reply, without any RL-agent blocks, I have been able to confidently and consistently get arduino to feedback the encoder values to simulink scope for basic display. This has been ongoing for quite some time already. Very consistent and very reliable. The codes live and run onboard the arduino, and arduino diligently updates my scope, while concurrently maintaining great PID control of my system (I have also tried LQR control in the past with great success on a physical double inverted pendulum as well, arduino as well).

It is with much regrets/despair that it is I am unable to do the same using a control block made up of RL. I had this idea because I assumed that since a NN is nothing more than multiple matrix of weights and activations, whereby in each episode, those weights and biases are the same value (those weights need not change/evolve in real time), I would hence be able to run the first episode with a bad matrix, activate "isdone" if it runs out of my linear belt's limit or if that episode concludes, simulink (on my cpu) reads the observations of the last 30 seconds or so, calculates a better NN values, updates those values on my arduino and get going for a second episode (albeit, I would have to write a small reset function to place the pendulum at the middle again).

I think we both agree on the larger concept, but it's the actual implementation that I am stuck with for now.

I shall have to work towards your workaround. Much thanks once again!

Sign in to comment.

Training a neural network in a real world system (inverted pendulum)

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Training a neural network in a real world system (inverted pendulum)

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments