Multicore Programming with Simulink
Using the process of partitioning, mapping, and profiling in Simulink®, you can address common challenges of designing systems for concurrent execution.
Partitioning enables you to designate regions of your model as tasks, independent of the details of the embedded multicore processing hardware. This independence enables you to arrange the content and hierarchy of your model to best suit the needs of your application.
In a partitioned system, mapping enables you to assign partitions to processing elements in your embedded processing system. Use the Simulink mapping tool to represent and manage the details of executing threads, HDL code on FPGAs, and the work that these threads or FPGAs perform. While creating your model, you do not need to track the partitions or data transfer between them because the tool does this work. Also, you can reuse your model across multiple architectures.
Profiling simulates deployment of your application under typical computational loads. It enables you to determine the partitioning and mapping for your model that gives the best performance, before you deploy to your hardware.
To deploy your model to the target.
Set up your model for concurrent execution.
For more information about configuring your model for concurrent execution, see Configure Your Model for Concurrent Execution. With these settings, Simulink partitions your model based on the sample time of blocks at the root level, with each sample time in your model corresponding to a partition, and all blocks of a single rate or sample time belonging to the same partition.
If you want to specify how to partition your model, use explicit partitioning. With explicit partitioning, you must specify a target architecture, and then explicitly partition your model. For more information, see Specify a Target Architecture, and Partition Your Model Using Explicit Partitioning.
Generate code and deploy it to your target. You can choose to deploy onto multiple targets.
To build and deploy on a desktop target, see Build on Desktop.
To deploy onto embedded targets using Embedded Coder®, see Deployment (Embedded Coder).
To build and deploy on a real-time target using Simulink Real-Time™, see Standalone Operation (Simulink Real-Time).
To deploy onto FPGAs using HDL Coder™, see Deployment (HDL Coder).
Deployment onto FPGAs is supported only for explicitly partitioned models.
Optimize your design. This step is optional, and includes iterating over the design of your model and mapping to get the best performance, based on your metrics. One way to evaluate your model is to profile it and get execution times.
Product Information Desktop target Profile and Evaluate Explicitly Partitioned Models on a Desktop Simulink Real-Time Embedded Coder Code Execution Profiling (Embedded Coder) HDL Coder Speed and Area Optimization (HDL Coder)
How Simulink Helps You to Overcome Challenges in Multicore Programming
Manually programming your application for concurrent execution poses challenges beyond the typical challenges with manual coding. With Simulink, you can overcome the challenges of portability across multiple architectures, efficiency of deployment for an architecture, and cyclic data dependencies between application components. For more information on these challenges, see Challenges in Multicore Programming.
Simulink enables you to determine the content and hierarchical needs of the modeled system without considering the target system. While creating model content, you do not need to keep track of the number of cores in your target system. Instead, you select the partitioning methods that enable you to create model content. Simulink generates code for the architecture you specify.
You can select an architecture from the available supported architectures or add a custom architecture. When you change your architecture, Simulink generates only the code that needs to change for the second architecture. The new architecture reuses blocks and functions. For more information, see Supported Targets For Multicore Programming and Specify a Target Architecture.
To improve the performance of the deployed application, Simulink allows you to simulate it under typical computational loads and try multiple configurations of partitioning and mapping the application. Simulink compares the performance of each of these configurations to provide the optimal configuration for deployment. This is known as profiling. Profiling helps you to determine the optimum partition configuration before you deploy your system to the desired hardware.
You can create a mapping for your application in which Simulink maps the application components across different processing nodes. You can also manually assign components to processing nodes. For any mapping, you can see the data dependencies between components and remap accordingly. You can also introduce and remove data dependencies between different components.
Cyclic Data Dependency
Some tasks of a system depend on the output of other tasks. The data dependency between tasks determines their processing order. Two or more partitions containing data dependencies in a cycle creates a data dependency loop, also known as an algebraic loop. Simulink does not allow algebraic loops to occur across potentially parallel partitions because of the high cost of solving the loop using parallel algorithms.
In some cases, the algebraic loop is artificial. For example, you can have an artificial algebraic loop because of Model-block-based partitioning. An algebraic loop involving Model blocks is artificial if removing the use of Model partitioning eliminates the loop. You can minimize the occurrence of artificial loops. In the Configuration Parameter dialog boxes for the models involved in the algebraic loop, select Model Referencing > Minimize algebraic loop occurrences.
Additionally, if the model is configured for the Generic Real-Time target
grt.tlc) or the Embedded Real-Time target
ert.tlc) in the Configuration Parameters dialog box, clear the
Single output/update function check box.
If the algebraic loop is a true algebraic condition, you must either contain all the blocks in the loop in one Model partition, or eliminate the loop by introducing a delay element in the loop.
The following examples show how to implement different types of parallelism in Simulink. These examples contain models that are partitioned and mapped to a simple architecture with one CPU and one FPGA.
- Implement Data Parallelism in Simulink
- Implement Task Parallelism in Simulink
- Implement Pipelining in Simulink