Title:

Statistical disclosure control for frequency tables

Disclosure risk assessment of statistical data, such as frequency tables, is a prerequisite for data dissemination. This thesis investigates the problem of disclosure risk assessment of frequency tables from the perspective of a statistical institute. In the research reported here, disclosure risk is measured by a mathematical function designed for the data according to a disclosure risk scenario. Such functions are called disclosure risk measures. A disclosure risk measure is defined for frequency tables based on the entire population using information theory. If the disclosure risk of a population based frequency table is high, a statistical institute will apply a statistical disclosure control (SDC) method possibly perturbing the table. It is known that the application of any SDC method lowers the disclosure risk. However, measuring the disclosure risk of the perturbed frequency table is a difficult problem. The disclosure risk measure proposed in the first paper of the thesis is also extended to assess the disclosure risk of perturbed frequency tables. SDC methods can be applied to either the microdata from which the frequency table is generated or directly to the frequency table. The two classes of methods are called pre and posttabular methods accordingly. It is shown that the two classes are closely related and that the proposed disclosure risk measure can account for both methods. In the second paper, the disclosure risk measure is extended to assess the disclosure risk of sample based frequency tables. Probabilistic models are used to estimate the population frequencies from sample frequencies which can then be used in the proposed disclosure risk measures. In the final paper of the thesis, we investigate an application of building a flexible table generator where disclosure risk and data utility measures must be calculated onthefly. We show that the proposed disclosure risk measure and a related information loss measure are adaptable to these settings. An example implementation of the disclosure risk and data utility assessment using the proposed disclosure risk measure is given.
