![]() |
|
|
Q3, 2007
High Speed Printed Circuit Boards For ASIC Prototyping
Reconfiguration for Reliability Tools
Efficient DSP Algorithm Development For FPGA And ASIC Technologies
|
Reconfiguration for Reliability ToolsC. Bolchini, A. Miele, M. D. Santambrogio Dipartimento di Elettronica e Informazione Politecnico di Milano Overview The adoption of Field Programmable Gate Arrays (FPGAs) as the platform for the development of embedded systems is nowadays common practice, to benefit from the limited costs, the highly flexible architecture, as well as the possibility to re-program it to modify its behavior or to correct it in case of problems. In particular, this opportunity to re-program the device has received a lot of attention and improvements have been introduced to also support such a feature at run-time, i.e., dynamically. Furthermore, advances in the FPGA technology led to support the re-configuration, at run-time, of only a portion of the device, thus being able both to reduce the time necessary to re-program the device and to avoid halting the entire system [1]. On the other hand, the flexibility of this specific platform provided by the reconfiguration capability also constitutes a problem, since the SDRAM memory elements storing the configuration bitstream (as well as the other memory elements of the device) are susceptible to radiation-induced temporary faults, also called soft errors, which corrupt the memory content causing the system to misbehave. In order to cope with this problem, particularly significant for the space environment but always more and more relevant at ground level, fault detection and tolerance techniques need to be introduced [2]. In this scenario, several studies have been carried out to deal with the problem of radiation induced faults in SRAM-based FPGAs from different points of view [3–9]; some of them apply well-known techniques, traditionally adopted for other platforms too, focusing the attention on the peculiarities of the selected platform, while others exploit the opportunities of reconfiguration to mitigate faults effects. Given the available solutions, the design of a reliable system on SRAM-based FPGAs may be achieved in different ways, according to the designer’s requirements and constraints. The main contribution of our work [10] is the definition of an approach supporting the exploration of the solution space for the application of a passive hardware redundancy technique, using the partial dynamic reconfiguration to mitigate soft-error effects; the result is a hybrid approach, masking faults and ad-hoc recovering to cope with this class of faults. These requirements turn the system design into a complex FPGA project, where incremental design techniques and design re-use can provide benefits to accomplish the definition of the final architecture. Due to these characteristics, the Synplify Pro® software [11] has proved to be the best choice to support this approach. Since the technique can be applied at different levels of granularity, with different trade-offs between costs, performance, and recovery time, it is important to be able to evaluate various solutions and the proposed methodology supports such an opportunity, offering the designer the chance to compare the available alternatives. The Synplify Pro FPGA synthesis tool has been extremely useful in the definition of these different alternatives. This synthesis tool is timing driven, but as soon as timing constraints are met, it changes its objective function to optimize area. With such a flexible and easy to use framework, the definition and the exploration of different designs become a very easy task to be accomplished. We presented a first work in [5] where we identify a set of parameters and techniques to achieve fault mitigation, and now we are generalizing this approach to take into account a broader set of figures of merit. Example of the Solution Space Exploration in the DWC Scenario In order to exploit the opportunity to reconfigure only a portion of the FPGA while the rest – when possible – keeps working, it is necessary to localize the portion of the FPGA that has been affected by the fault, and to trigger its reconfiguration. Thus, it is first necessary to functionally partition the system under consideration into subsystems, and to apply the duplication with comparison technique to the identified parts. Each subsystem and its replica, together with the associated comparator, need be placed in a separately reprogrammable portion of the FPGA – altogether referred to as TSC-area, so that the benefits of detecting and localizing faults can be coupled with the opportunity to intervene on the problem. A controller is designed to receive all the error signals from the comparators and is programmed to localize the faulty portion of the FPGA and to trigger its reconfiguration. Different partitioning solutions for the given device lead to alternative designs characterized by different costs due to the varying number and size of the comparators and the amount of shared logic: Figure 1 shows two applications of the DWC, where each pair of replicas and their comparator belongs to different reconfigurable portions of the FPGA. During this design and partitioning phase, the HDL Analyst RTL graphical and debugging tool from Synplicity has turned out to be an extremely powerful and effective framework, providing instant graphical views of both gate-level schematics and high level block diagrams with a direct connection to the source code. The usage of this tool has simplified this phase making code optimization and modification fast and easy.
Figure 1 - Two partitions: application of DWC and partial replication. The proposed methodology aims at estimating the costs and benefits deriving from the possible different partitioning solutions provided the dynamic partial reconfiguration constraints are fulfilled, that is the portions’ width are multiples of 4 slices. This exploration of the solution space is based on the evaluation of the following elements: a) size of the subsystems, b) size of the data widths to be compared (used to derive the size of the comparators), and c) amount of the minimal reconfiguration portion of the FPGA (used to derive available area and reconfiguration times). A preliminary synthesis of the initial device partitioned into subsystems is performed to identify the basic costs used later to calculate parameter values. Starting from this finer-grain partitioning, incremental aggregation of modules is performed to find a satisfying trade-off between area overheads and partial reconfigurable times. The controller is pre-defined, since its behavior depends only on the number of error signals pairs to be monitored and the portions of FPGA to be separately managed. Not all possible subsystem partitioning solutions are significant; if the size of the modules is too limited the placing could lead to the situation depicted in Figure 2, where it is possible to determine whether block f1 or block f2 is faulty. Nevertheless, the information is useless since the entire portion will be reconfigured in the occurrence of a fault either in f1 or in f2, thus it would be preferable to avoid the first comparator and reduce the introduced overhead.
Figure 2 - Disadvantageous partitioning solution. Another aspect that has to be considered concerns the area constraints which impose a horizontal placement multiple of four slices. In such a scenario, we have to define a placement-area that can be characterized by the presence of more available resources than the ones strictly needed by the TSC-area that has to be placed in. This consideration brings to define to different approaches to reduce such waste of resources. First of all we introduce the resources optimization index, ro-index. This parameter is defined as the ratio between the number of the required slices to implement a TSC-area and the number of resources (slices) available in the placement-area computed to implement such design. We can consider this parameter as the measure of the fragmentation computed on each pair (TSC-area, placement-area) has a value between 0, we are wasting all the available resources, and 1, where all the available resources in the placement-area are used to implement the functionality defined in the TSC-area. The ro-index, as previously stated, allows the designer to evaluate the adopted partitioning in order to find a better pair (TSC-area, placement-area) used to describe his/her solution. On the other hand, this parameter can be used to change the partition of the system, trying to bind together different TSC-areas, since their ro-index allow to combine them into just one ro-index without changing the placement-area. In such a scenario, we can see how the exploration of the solution space, provided by the proposed methodology, supports the designer in selecting the granularity level for the DWC technique and the placing of the modules across the FPGA, starting from the estimated costs. References [1] Xilinx Inc. http://www.xilinx.com. [2] M.D Santambrogio C. Bolchini, F. Salice. Exploring partial reconfiguration for mitigating SEU faults in SRAM-based FPGAs. In ERSA, pages 199–202, 2007. [3] F. Lima Kastensmidt, L. Sterpone, M. Sonza Reorda, and L. Carro. On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs. In IEEE Proc. Design, Automation and Test in Europe Conference, pages 1290–1295, 2005. [4] C. Carmichael. Triple Module Redundancy Design Techniques for Virtex FPGAs. Xilinx Application Notes 197, 2006. [5] C. Bolchini, D. Quarta, and M. Santambrogio. SEU Mitigation for SRAM-Based FPGAs through Dynamic Partial Reconfiguration. In Proc. ACM/IEEE Great Lake Symposium on VLSI, pages 55–60, 2007. [6] H. R. Zarandi, S. G. Miremadi, C. Argyrides, and D. K. Pradhan. Fast SEU Detection and Correction in LUT Configuration Bits for SRAMBased FPGAs. In Proc. 14th IEEE Reconfigurable Architecture Workshop, 2007. [7] C. Carmichael, E. Fuller, P. Blain, and M. Caffrey. SEU mitigation techniques for Virtex FPGAs in space application. In MAPLD99 Poster, page 24, 1999. [8] C. Carmichael, M. Caffrey, and A. Salazar. Correcting Single-Event Upsets Through Virtex Partial Configuration. Xilinx Application Notes 216, 2000. [9] M. Gokhale, P. Graham, E. Johnson, N. Rollinsand, and M. Wirthlin. Dynamic reconfiguration for management of radiation-induced faults in FPGAs. In Proc. 18th Intl. Parallel and Distributed Processing Symposium, pages 145–150, 2004. [10] M.D Santambrogio C. Bolchini, A. Miele. Tmr and partial dynamic reconfiguration to mitigate seu faults in FPGAs. In Proc. 22th IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, 2007. [11] Synplicity Inc. http://www.synplicity.com/ |
![]() |
|