Multiprocessor SoC Platform Prototyping

by Fernando Martinez Vallina, Dr. Jafar Saniie, Embedded Computing and Signal Processing Lab, Electrical and Computer Engineering Department - Illinois Institute of Technology

Parallel computing systems have long been used as a means of accelerating program execution in the context of increasing problem size (i.e., data, computational complexity, or both). These systems have been traditionally implemented either on high-end multiprocessor computing systems available from companies like IBM, HP, and Sun, or on Linux clusters built from commercial off-the-shelf (COTS) computers. Both implementation styles for parallel computers have until recently been limited to a single processor per silicon die. New trends in integrated circuit fabrication are allowing for multiple processing cores to be implemented on the same die. This in turn is allowing for the characteristics of parallel computing systems to be ported into the embedded computing space.

Multiprocessor embedded (MPE) systems provide designers a greater flexibility in system specification and shift more of the development complexity from hardware to software. This approach does not rule out the use of dedicated hardware acceleration units, which are common in single processor systems. The key characteristic of an MPE is the emphasis in reducing the amount of dedicated hardware needed to satisfy design constraints, while increasing the applicability of the design through software. Efficient use of data and instruction parallelism allow multiprocessor systems to avoid the impact of some of the computational inefficiency of the underlying hardware and increase system throughput. The key to this improvement is the correct partitioning and decomposition of the application in terms of both software and hardware. In our study, one of the initial phases of MPE design is to partition and decompose the application entirely in software. With the application partitioned among the available processors, runtime performance analysis can be carried out to determine the suitability of this platform and any acceleration units that might be needed.

The MPE platform used in our studies is based on the Xilinx Virtex-IIPro FPGA and the Xilinx MicroBlaze softcore processor. During initial stages of our research we used the Xilinx EDK tool set for complete system implementation. This approach was adequate for the development of an initial MPE system, but limited in design adaptability. For design exploration we need the ability to interface different processors and interconnect structures to develop a parametrical automated design flow for MPE based SoCs (MPSoC). Before advancing our MPSoC research, we needed to develop a design flow sufficiently flexible to allow any combination of processors and bus structures. As part of this development we chose the Synplicity® synthesis tool set. To test the integration of these tools with the rest of the design flow, we ported the base system into the Synplify® Premier tool. The base MPSoC system which is shown in Figure 1, consists of four MicroBlaze processors P0-P3 interfaced by a fully connected network configuration with bidirectional point to point communication links. Each processor also has a private memory bank of 32Kb for both instruction and local data. P0 is a master processor, and partitions the incoming data among all processors. Processor P0 is also responsible for generating the final application result from the intermediate results created by the slave processors.


Figure 1. Base MPE System
Translating the design for implementation into the Synplify Premier tool was relatively straight forward. Our entire EDK project was exported as a subsystem assembly for the Synplicity tools. From there we developed a top level wrapper which enables us to tie in the Xilinx generated system with other standard bus structures and custom interconnect topologies. The current design flow is shown in Figure 2.

Figure 2. MPE Design Flow

The implemented MPE system currently executes the Fast Fourier Transform (FFT) on a maximum data batch size of 128 complex points. This system is composed of four MicroBlaze processors running at 100 MHz with a minimum FFT execution time of 1.2 ms for 128 points. Further experiments on this platform have shown that parallelization overhead incurred by this solution accounts for less than 10% of the overall execution time. In future designs we intend to work on a more varied set of applications and a more heterogeneous mix of processing cores. For the current system we purposefully limited the application of the MPE system so that we could focus on developing our design flow and coming up with the most appropriate tool mix to meet our needs. It is important to point out that the Synplify Premier tool reduced the design turnaround time to 30 min. This is an order of magnitude improvement to our original design flow and has allowed for more aggressive design analysis within our research group. Currently we are characterizing different application domains and resource allocation scenarios for symmetric MPSoC system implementation. Our next step will be to take our design flow and apply it directly to asymmetric MPSoC platform research. We intend to develop an automated system for the implementation of highly adaptable MPSoC systems with custom acceleration units, multiple processor core types, and heterogeneous interconnect topologies.

From The Syndicated Q2, 2006, published quarterly by Synplicity, Inc., www.synplicity.com.
Copyright © 2006 Synplicity, Inc. All rights reserved.