Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.694275
Title: Fault tolerant techniques for asynchronous networks on chip
Author: Zhang, Guangda
ISNI:       0000 0004 5990 5744
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2016
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
Advancing semiconductor technology is boosting the core count on a single chip to achieve continuously increasing performance, posing a growing demand for scalable, efficient and reliable on-chip interconnection. However this advance also makes the electronics increasingly vulnerable to faults. Inter-core connection is increasingly provided by Networks-on-Chip (NoCs), typically using conventional synchronous designs. Scaling makes it increasingly hard to avoid problems with clock distribution and in many chips a single, synchronous domain is inappropriate, anyway. In place of the well-studied synchronous NoCs, event-driven asynchronous NoCs have emerged as a promising replacement. Asynchronous NoCs have many promising advantages over synchronous ones; however, their fault-tolerance has rarely been studied. Implemented in a Quasi-Delay-Insensitive (QDI) fashion, asynchronous NoCs can achieve high timing-robustness but show complicated failure scenarios in the presence of faults and behave differently from synchronous ones, posing a challenge to asynchronous circuit advocates. This research studies the impact of different faults on QDI NoC fabrics and presents thorough and systematic fault-tolerant solutions at the circuit level, providing a holistic, efficient and resilient interconnection solution for QDI NoCs. The contributions of this research include: 1) a thorough analysis of fault impact on QDI NoCs; 2) a Delay-Insensitive Redundant Check (DIRC) coding scheme protecting QDI links from transient faults; 3) a novel time-out technique detecting the fault-caused physical-layer deadlock in a QDI NoC (the adaptability of a QDI circuit to timing variation makes it vulnerable to this kind of deadlock); 4) a fine-grained recovery technique utilising a Spatial Division Multiplexing (SDM) implementation to recover the deadlocked network from a link fault. Both unprotected and protected QDI NoCs are implemented, along with a fault simulation environment, to provide a detailed performance and fault-tolerance evaluation of these techniques. The improvements to the NoC operation, together with the costs in circuit overhead and throughput are enumerated using a typical example of QDI interconnection.
Supervisor: Garside, James ; Navaridas Palma, Javier Sponsor: University of Manchester ; China Scholarship Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.694275  DOI: Not available
Keywords: Network-on-Chip ; asynchronous ; QDI ; fault tolerance ; deadlock ; permanent fault ; transient fault ; GALS
Share: