Translated Abstract
A System-on-Chip (SoC) is an integrated circuit that integrates multiple functional components on a chip. Because of its high integration, high reliability, great functionality, low power consumption and small size, the SoC will be quite appropriate for space mission. However, space radiation effects, especially single event effects (SEE), pose serious threat to the on-orbit life and reliability of SoC. In order to ensure spacecraft successfully completing space missions, it is necessary to take some measures to enhance the anti-radiation ability of SoC. Therefore, the research of SEE in SoC can help to understand the mechanism of system failure and guide radiation-hardened design of system.
In this dissertation, a SEE testing system was established based on 28 nanometer Xilinx Zynq-7000 SoC and SEE experiment induced by alpha source was performed. In order to determine the soft error sensitive circuits of multiple components, the software-implemented fault injection technique was applied to Xilinx Zynq-7000 SoC. The probabilistic safety analysis (PSA) method and failure mode and effect analysis (FMEA) method were used to evaluate the reliability of soft errors in Xilinx Zynq-7000 SoC. Not only the sensitive components and weakness were identified, but also the most harmful component and failure mode were analyzed. The heavy ion microbeam experiments were conducted to obtain the SEE sensitive locations of different components of Xilinx Zynq-7000 SoC. In addition, the simulation-based fault injection experiments were carried out in OpenRISC1200 SoC and the soft error sensitivities of different modules and their system failure rates were gained. The Bayesian network method was used to diagnose the soft errors in OR1200 and Xilinx Zynq-7000 SoC. Finally, the fault diagnosis system model of SEE in SoC was proposed. The main results of this study are presented as follows:
1) The SEE cross sections of multiple components were obtained from the alpha particles SEE experiments. The experimental results showed the SEE cross section of on-chip memory (OCM) was the biggest, followed by programmable logic (PL), direct memory access (DMA), DCache, and the QSPI-Flash controller is the last. Moreover, four failure modes were detected, including data error, program interrupt, time-out, and system halt. The data error and program interrupt were main failure modes, which are up to ninety percent of the total system failure. Furthermore, the single event-latch (SEL) was not detected in Xilinx Zynq-7000 SoC.
2) The software-implemented fault injection system of Xilinx Zynq-7000 SoC was developed based on the software-implemented fault injection method. Lots of fault injection experiments were carried out to explore the soft error sensitive circuits and fault characteristics. The sensitive registers of CPU and failure modes were different from the test programs. Moreover, the R11 and R15 registers were the most sensitive registers. For memory, the soft errors occurred in code segment easily caused the system control exception, but errors in data segment mainly resulted in the data error. For DMA controller, the source address registers (SAR), destination address register (DAR), and channel control register (CCR) were the sensitive registers. Furthermore, the soft errors in SAR and DAR caused a lot of data errors. The control register was the sensitive registers of QSPI-Flash controller. The fault injection results can explain the alpha particles experimental results.
3) The PSA and FEMA methods were used to assess the reliability of Xilinx Zynq-7000 SoC. Firstly, the failure rate of soft errors, unavailability of system and mean time to failure (MTTF) were qualitatively calculated through the constructed system soft errors fault tree. The quantitative analysis showed the failure rate, unavailability and MTTF of Xilinx Zynq-7000 SoC induced by alpha particles emitted by uranium and thorium impurities in packing materials were 1.209×10-9h-1, 1.059×10-4, and 8.263×108h respectively. Moreover, the processing system (PS) was more vulnerable than PL to cause system failure. Secondly, through the event tree analysis method, the failure sequences of system functional interrupt caused by the SEU in OCM were analyzed and the highest fault frequency were identified. Finally, the failure rate and unreliability of SoC were calculated by using the failure mode and effect analysis method. The calculation results were the same with the fault tree’s results. Adopting the SoC-level risk assessment, the critical components for system failure was OCM and the critical failure mode was data error.
4) The heavy ion microbeam experiments of SEE in Xilinx Zynq-7000 SoC were performed. The experimental results provided the sensitive locations and distribution features of single event effects about the L1Cache, OCM, DMA and ALU, and also indicated that the sensitive locations had a close relation with hardware resources. The sensitive location of OCM was more concentrated while the arrangement was more orderly with stronger regularity; the sensitive location of ALU was more dispersed; the sensitive locations of DMA was more concertation, but no obvious regularity; the sensitive location of L1Cache was the most and centralized distribution in different regions. The SEE cross sections illustrated the susceptibility of L1Cache was higher than the other components.
5) The simulation-based fault injection system was established on the VerilogHDL model of the OR1200. The fault injection experiments were performed based on the SEU, stuck-at 0, and stuck-at 1 fault types. The experimentsl results showed the soft error sensitivities and system failure probabilities were varied in different blocks as well as different fault types. The system failure rate caused by the register files (RF) was the highest for SEU and stuck-at 0, but the multiplication unit was the highest for the stuck-at 1. On the other hand, the Bayesian network models of soft errors in the OR1200 and Xilinx Zynq-7000 SoC were constructed and the most sensitive modules were diagnosed through calculating the posterior probabilities and importances of different blocks. The calculation results illustrated that the most vulnerable blocks of OR1200 and Xilinx Zynq-7000 SoC were RF and OCM, respectively. Finally, the fault diagnosis system model of SEE in SoC was presented.
Translated Keyword
[Fault injection, Heavy ion irradiation, Probability analysis, Single event effect, System-on-Chip]
Corresponding authors email