A  Guide to Debug Tools for Multicore Development



by Barry Lock of Lauterback UK  www.lauterbach.com


Multicore processors have much to offer; considerable functionality and processing power, extensive peripheral support and often, low power consumption and a small footprint. Many modern devices are dependent on multicore technology and in the future, it may be hard for engineers to avoid using this technology. Devices such as modern mobile phones have always been multicore and are now achieving 4 main cores plus 12 DSPs on the single chip, and this type of advancement is expected to continue. As a consequence, a major new challenge that engineers face is understanding what is happening as the code switches from core to core and the interaction of the cores. This article explores the debug technology available to examine code on multicore systems and how to understand and verify the system's behaviour.  

JTAG, Trace and Software Tools and Hardware Assisted Tools

An initial challenge for any form of debugging is how to access the data being processed by a core. Most cores have JTAG interfaces, which enables debugging by stopping the core and analysing the data. Some devices then have a Trace port option as well, which gives non-intrusive program flow, often with timing. This provides the information so one can know exactly where the core has been and how long it took. 

As the term "multiprocessing" implies, multiple cores are working together in an embedded system. What is crucial for debugging is understanding how the system tasks are distributed to the individual cores.  There are two common ways to distribute the code to the cores: AMP – asymmetrical multiprocessing; and SMP – symmetrical multiprocessing.

Debug Concept for AMP Systems

In AMP systems, each core is assigned a specific task. How the tasks are distributed is determined in the design phase of the system. In addition to cores for general tasks, specialist cores are frequently chosen that are optimized for specific functions, such as DSPs.

When debugging AMP systems (see Figure 4), an individual instance of the debugger is started for each core, as an AMP system can contain different core architectures and each core processes a separate part of the application. This means that the majority of the symbol and debug information is assigned exclusively to the corresponding core.

However, because the cores do not work independently, but perform all application tasks together and in parallel, it must be possible to start and stop all cores simultaneously. This is the only way to test the interaction between the cores and to monitor and control the entire application.

There are different ways to start and stop all cores simultaneously. Ideally, the multi-core processor will support this through internal synchronization logic. If this logic is missing, the debugger takes over the synchronization process. A special algorithm calculates JTAG command sequences to control all cores as promptly as possible.

Debug Concept for SMP Systems

In contrast to AMP systems, where the tasks assigned to each core are predefined, the assignment in SMP systems is flexible. In an SMP system, the system designer no longer assigns tasks to cores; an SMP operating system does this instead at runtime. All cores must be the same type to enable tasks to be assigned freely to each core as required.

Task assignment is performed dynamically, meaning that the assignment depends on the current system status. A task unit that can be assigned by an operating system is called a task or thread. In simple terms, a task that needs to be processed is assigned to a core that is free.

For debugging SMP systems, only one instance of the debugger is opened and all cores are controlled from this one point. Because the developer is primarily concerned with debugging a single task, the debugger user interface shows a graphical representation of the entire system from the perspective of this single task or from the perspective of the core where the task is running. Of course the visualization can be switched to other tasks or cores if required.

The debug tool assumes a function that is similar to an SMP operating system. It organizes the debugging of all cores so that developers do not need to look into the details of the SMP system. For example, if a breakpoint is set, the debugger ensures that the breakpoint is entered in all cores. This is necessary because at the time when the breakpoint is set, it’s not yet clear which core will execute the program task with the breakpoint. If a core stops at a breakpoint, all other cores may also be stopped automatically. The display in the debugger switches to the task or core that the breakpoint interrupted. When the program is restarted, all halted cores may be configured to start running again.

Debugging SMP systems is reasonably straightforward. After the debugger is started and configured for the SMP system, the developer can essentially use it as if they were debugging only one core.

Trace Concepts

Trace tools analyse and display trace information in different ways, depending on whether the trace data was generated by an AMP system or an SMP system. For AMP systems, trace analysis is largely performed on each core independently. The trace information for an SMP system, however, can be analysed for a single task, a single core, or for the entire system, depending on the type of query.

Trace Concept for AMP Systems

Because debugging individual cores of an AMP system is performed by separate instances of the debugger, trace information is also displayed on these individual user interfaces. AMP systems can consist of different types of cores, which means that trace logs may need to be used. As the individual logs are recorded and displayed in the separate user interfaces, they can be individually decoded and analysed.

To test the interaction of the cores and to quickly locate complex system errors, it is possible to display the individual trace views and also their relationship to each other over time. This is enabled with the provision of a common time base. This allows the developer to select a point in time in the trace view on one user interface and see exactly which command was being executed at approximately the same time on all of the other user interfaces (see Figure 6).

Trace Concept for SMP Systems

Although all information about the programs processed on an SMP system is stored in a shared trace memory for all cores, certain trace tools provide different views of this information (see Figure 7). To locate errors in a task or for task-specific run-time measurements, trace information can be displayed specifically for an individual task.
If you want to know information such as "Which cores processed my task?" or "What is the run-time load of my core?" it is useful to be able to view the trace information for all cores at the same time (see Figure 8).

Long Term Trace and High Speed Trace
A recent development in multicore debug technology has been ‘Long Term Trace’; the ability to collect massive amounts of code performance information from a running embedded system, in order for the user to detect and analyse the most unpredictable and transient of bugs. As the software to run on multicore devices is normally complex and the trace data is now being generated by multiple cores it would quickly fill even large buffers within any tools system. As an example the boot sequences for most modern Smartphones now generates more information than can be stored in a 4 Gigabyte buffer. This means the ability to stream to a hard drive has become essential. Even here the cores can produce peaks in the data output so the trace tool is used as a FIFO to smooth out the peaks and troughs in the trace output which is then streamed to a hard drive dedicated for trace storage on the host PC.

Another recent development in debug technology for multicore systems is “High Speed Serial Trace”, where engineers are working with high-speed systems that require a high level of debug capability. In recent times, several silicon vendors have implemented this technology with High Speed Serial Trace Ports (HSSTP). This is now an option in many core types designed for markets where code quality and security is essential.


Using Trace Data to Manage Power in Complex Systems
Power Management is another area that is being influenced by the use of Trace debug tools for multicore processors. Some tools offer an Energy Profiler that can provide a test set-up that measures, records and analyses the program and data flow of the control software as well as current and voltage gradients. Statistical analyses are run automatically after each program stop. They provide information about minimum, maximum and mean values of the energy consumption of the executed functions. Similarly, the absolute and percentage share of the total energy consumption is calculated for each function. This makes it easy to locate the program parts that use the most energy and enable modification, resulting in reduced battery size or even the use of a less costly processor. Some more modern cores even have a dedicated power state trace channel where changes to the power state of any component in the device can be logged with timestamp data to allow accurate energy use profiling of the target.

Starting from multicore in the smartphone market this technology has now become universal to provide more computing power and continue the growth required by most sectors of the embedded market.

As a result of this, powerful hardware assisted debug tools are increasingly becoming essential. They may not be cheap to buy, but for the developer to compete in the marketplace with innovative and robust products they may prove a critical investment – saving much time and money, thus helping to reduce commercial risk.