Multicore Programming with Simulink

Using the process of partitioning, mapping, and profiling in Simulink^®, you can address common challenges of designing systems for concurrent execution.

Partitioning enables you to designate regions of your model as tasks, independent of the details of the embedded multicore processing hardware. This independence enables you to arrange the content and hierarchy of your model to best suit the needs of your application.

In a partitioned system, mapping enables you to assign partitions to processing elements in your embedded processing system. Use the Simulink mapping tool to represent and manage the details of executing threads, HDL code on FPGAs, and the work that these threads or FPGAs perform. While creating your model, you do not need to track the partitions or data transfer between them because the tool does this work. Also, you can reuse your model across multiple architectures.

Profiling simulates deployment of your application under typical computational loads. It enables you to determine the partitioning and mapping for your model that gives the best performance, before you deploy to your hardware.

Simulink tries to optimize the host computer performance regardless of the modeling method you use. For more information on the ways that Simulink helps you to improve performance, see Optimize Performance.

Basic Workflow

To deploy your model to the target.

Set up your model for concurrent execution.
For more information about configuring your model for concurrent execution, see Configure Your Model for Concurrent Execution. With these settings, Simulink partitions your model based on the sample time of blocks at the root level, with each sample time in your model corresponding to a partition, and all blocks of a single rate or sample time belonging to the same partition.
If you want to specify how to partition your model, use explicit partitioning. With explicit partitioning, you must specify a target architecture, and then explicitly partition your model. For more information, see Specify a Target Architecture, and Partition Your Model Using Explicit Partitioning.
Generate code and deploy it to your target. You can choose to deploy onto multiple targets.
- To build and deploy on a desktop target, see Build on Desktop.
- To deploy onto embedded targets using Embedded Coder^®, see Deploy Generated Software (Embedded Coder).
- To build and deploy on a real-time target using Simulink Real-Time™, see Standalone Target Computer Operation (Simulink Real-Time).
- To deploy onto FPGAs using HDL Coder™, see Deployment (HDL Coder).
  Note
  Deployment onto FPGAs is supported only for explicitly partitioned models.

Optimize your design. This step is optional, and includes iterating over the design of your model and mapping to get the best performance, based on your metrics. One way to evaluate your model is to profile it and get execution times.

Product	Information
Desktop target	Profile and Evaluate Explicitly Partitioned Models on a Desktop
Simulink Real-Time	Execution Profiling for Real-Time Applications (Simulink Real-Time)
Embedded Coder	Code Execution-Time Profiling (Embedded Coder)
HDL Coder	Speed and Area Optimization (HDL Coder)

How Simulink Helps You to Overcome Challenges in Multicore Programming

Manually programming your application for concurrent execution poses challenges beyond the typical challenges with manual coding. With Simulink, you can overcome the challenges of portability across multiple architectures, efficiency of deployment for an architecture, and cyclic data dependencies between application components. For more information on these challenges, see Challenges in Multicore Programming.

Portability

Simulink enables you to determine the content and hierarchical needs of the modeled system without considering the target system. While creating model content, you do not need to keep track of the number of cores in your target system. Instead, you select the partitioning methods that enable you to create model content. Simulink generates code for the architecture you specify.

You can select an architecture from the available supported architectures or add a custom architecture. When you change your architecture, Simulink generates only the code that needs to change for the second architecture. The new architecture reuses blocks and functions. For more information, see Supported Targets for Multicore Programming and Specify a Target Architecture.

Deployment Efficiency

To improve the performance of the deployed application, Simulink allows you to simulate it under typical computational loads and try multiple configurations of partitioning and mapping the application. Simulink compares the performance of each of these configurations to provide the optimal configuration for deployment. This is known as profiling. Profiling helps you to determine the optimum partition configuration before you deploy your system to the desired hardware.

You can create a mapping for your application in which Simulink maps the application components across different processing nodes. You can also manually assign components to processing nodes. For any mapping, you can see the data dependencies between components and remap accordingly. You can also introduce and remove data dependencies between different components.

Cyclic Data Dependency

Some tasks of a system depend on the output of other tasks. The data dependency between tasks determines their processing order. Two or more partitions containing data dependencies in a cycle creates a data dependency loop, also known as an algebraic loop. Simulink does not allow algebraic loops to occur across potentially parallel partitions because of the high cost of solving the loop using parallel algorithms.

In some cases, the algebraic loop is artificial. For example, you can have an artificial algebraic loop because of Model-block-based partitioning. An algebraic loop involving Model blocks is artificial if removing the use of Model partitioning eliminates the loop. You can minimize the occurrence of artificial loops. In the Configuration Parameter dialog boxes for the models involved in the algebraic loop, select Model Referencing > Minimize artificial algebraic loop occurrences.

Ensuring deterministic delay for periodic signals can also help break algebraic loops. To configure this setting, open Model Settings> Solver > Additional parameters > Configure Tasks. In the Data Transfer pane, set the default option for Periodic Signals to Ensure deterministic transfer (maximum delay).

Additionally, if the model is configured for the Generic Real-Time target (grt.tlc) or the Embedded Real-Time target (ert.tlc) in the Configuration Parameters dialog box, clear the Single output/update function check box.

If the algebraic loop is a true algebraic condition, you must either contain all the blocks in the loop in one Model partition, or eliminate the loop by introducing a delay element in the loop.

The following examples show how to implement different types of parallelism in Simulink. These examples contain models that are partitioned and mapped to a simple architecture with one CPU and one FPGA.