![]() |
|
|
Q2, 2006
Time Travel and Design Verification
Multiprocessor SoC Platform Prototyping Using Synplify Pro/Premier Synthesis and Identify Debugging for Xilinx EDK designs FPGA Design Verification: Techniques for Creating a Fully Functional Design
Accelerating ASIC Verification Through FPGA Prototyping
Efficient Development of Wireless IP with High Level Modeling and Synthesis |
Multiprocessor SoC Platform Prototypingby Fernando Martinez Vallina, Dr. Jafar Saniie, Embedded Computing and Signal Processing Lab, Electrical and Computer Engineering Department - Illinois Institute of Technology Parallel computing systems have long been
used as a means of accelerating program execution in the context of increasing
problem size (i.e., data, computational complexity, or both). These systems
have been traditionally implemented either on high-end multiprocessor computing
systems available from companies like IBM, HP, and Sun, or on Linux clusters
built from commercial off-the-shelf (COTS) computers. Both implementation styles
for parallel computers have until recently been limited to a single processor
per silicon die. New trends in integrated circuit fabrication are allowing for
multiple processing cores to be implemented on the same die. This in turn is
allowing for the characteristics of parallel computing systems to be ported
into the embedded computing space. Multiprocessor embedded (MPE) systems provide designers a greater
flexibility in system specification and shift more of the development complexity
from hardware to software. This approach does not rule out the use of dedicated
hardware acceleration units, which are common in single processor systems. The
key characteristic of an MPE is the emphasis in reducing the amount of dedicated
hardware needed to satisfy design constraints, while increasing the applicability
of the design through software. Efficient use of data and instruction parallelism
allow multiprocessor systems to avoid the impact of some of the computational
inefficiency of the underlying hardware and increase system throughput. The
key to this improvement is the correct partitioning and decomposition of the
application in terms of both software and hardware. In our study, one of the
initial phases of MPE design is to partition and decompose the application entirely
in software. With the application partitioned among the available processors,
runtime performance analysis can be carried out to determine the suitability
of this platform and any acceleration units that might be needed. The MPE platform used in our studies is based on the Xilinx Virtex-IIPro FPGA and the Xilinx MicroBlaze softcore processor. During initial stages of our research we used the Xilinx EDK tool set for complete system implementation. This approach was adequate for the development of an initial MPE system, but limited in design adaptability. For design exploration we need the ability to interface different processors and interconnect structures to develop a parametrical automated design flow for MPE based SoCs (MPSoC). Before advancing our MPSoC research, we needed to develop a design flow sufficiently flexible to allow any combination of processors and bus structures. As part of this development we chose the Synplicity® synthesis tool set. To test the integration of these tools with the rest of the design flow, we ported the base system into the Synplify® Premier tool. The base MPSoC system which is shown in Figure 1, consists of four MicroBlaze processors P0-P3 interfaced by a fully connected network configuration with bidirectional point to point communication links. Each processor also has a private memory bank of 32Kb for both instruction and local data. P0 is a master processor, and partitions the incoming data among all processors. Processor P0 is also responsible for generating the final application result from the intermediate results created by the slave processors.
Figure 1. Base MPE System Figure 2. MPE Design Flow The implemented MPE system currently executes the Fast Fourier
Transform (FFT) on a maximum data batch size of 128 complex points. This system
is composed of four MicroBlaze processors running at 100 MHz with a minimum
FFT execution time of 1.2 ms for 128 points. Further experiments on this platform
have shown that parallelization overhead incurred by this solution accounts
for less than 10% of the overall execution time. In future designs we intend
to work on a more varied set of applications and a more heterogeneous mix of
processing cores. For the current system we purposefully limited the application
of the MPE system so that we could focus on developing our design flow and coming
up with the most appropriate tool mix to meet our needs. It is important to
point out that the Synplify Premier tool reduced the design turnaround time
to 30 min. This is an order of magnitude improvement to our original design
flow and has allowed for more aggressive design analysis within our research
group. Currently we are characterizing different application domains and resource
allocation scenarios for symmetric MPSoC system implementation. Our next step
will be to take our design flow and apply it directly to asymmetric MPSoC platform
research. We intend to develop an automated system for the implementation of
highly adaptable MPSoC systems with custom acceleration units, multiple processor
core types, and heterogeneous interconnect topologies. |
![]() |
|