Use this URL to cite or link to this record in EThOS:
Title: Fault tolerance & error monitoring techniques for cost constrained systems
Author: Gutierrez Alcala, Mauricio Daniel
ISNI:       0000 0004 6496 6517
Awarding Body: University of Southampton
Current Institution: University of Southampton
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
With technology scaling, the reliability of circuits is becoming a growing concern. The appearance of logic errors in-the-field caused by faults escaping manufacturing testing, single-event upsets, aging, or process variations is increasing. Traditional techniques for online testing and circuit protection often require a high design effort or result in high area overhead and power consumption and are unsuitable for low cost systems. This thesis presents three original contributions in the form of low cost techniques for online error detection and protection in cost constrained systems. The first contribution consists on low cost fault tolerance design technique that protects the most susceptible workload on the most susceptible logic cones of a circuit, by targeting both timing independent and timing-dependent errors. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. Protecting the 32 most susceptible patterns, an average error coverage improvement of 63.5% and 58.2% against errors induced by stuck-at and transition faults is achieved, respectively, compared an unranked pattern selection and protection. Additionally, this technique produces an average error coverage improvement of 163% and 96% against temporary erroneous output transition and errors induced by bit-flips, respectively. These error coverage improvements incur in an area/power cost in the range of 18.0-54.2%, a 145.8-182.0% reduction compared to TMR. The second contribution proposes a low cost probabilistic online error monitoring technique that produces an alarm signal when systematic erroneous behaviour has occurred over a pre-defined time interval. To detect systematic erroneous behaviour, the collected data is compared on-chip against the signature of error-free behaviour. Results demonstrate on the largest circuits, an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.66%. The final contribution consists of a circuit approximation technique that can be used for low cost non-intrusive fault tolerance and concurrent error detection, based on finding functionality at the logic level that behaves similarly to single logic gates or constant values. An algorithm is proposed to select the input subsets to approximate. Results show an average coverage of 33.59% of all the input space with an average 7.43% area cost. Using this approximate circuits in a reduced TMR scheme results in significant area cost reductions compared to existing techniques.
Supervisor: Kazmierski, Tomasz Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available