Census Bureau data susceptible to ‘reconstruction attacks’ exposing individual data, report claims

A new report by computer scientists at the University of Pennsylvania claims that U.S. Census data can be accessed through a cyberattack using machine learning-based algorithms.

A new report by computer scientists at the University of Pennsylvania claims that U.S. Census data can be accessed through a cyberattack using machine learning-based algorithms. Techa Tungateja / GETTY IMAGES

A team of computer scientists demonstrated how cybercriminals can leverage commercial laptops to reverse engineer the Bureau’s statistics, leaving Americans exposed to risks like identity theft and discrimination. 

U.S. Census Bureau data can be vulnerable to exposure and theft from cybercriminals leveraging commercial laptops, according to a new study that further reveals the agency's longstanding cybersecurity challenges. 

A team of computer scientists at the University of Pennsylvania School of Engineering and Applied Science designed a "reconstruction attack" that reverse engineers the Bureau’s statistics to reveal individual respondents' protected information. 

Cybercriminals can use the attack to better determine which records are associated with real people, the report said, potentially exposing certain respondents to risks like identity theft and discrimination.

To defend against emerging threats like the reconstruction attack, which uses an algorithm design drawn from machine learning fundamentals to bypass statistical safeguards meant to protect individual respondents’ personal data, researchers said the Census Bureau should employ privacy-enhancing technologies when releasing privacy-sensitive datasets. 

"Over the last two decades it has become clear that practices in widespread use for data privacy — anonymizing or masking records, coarsening granular responses or aggregating individual data into large-scale statistics — do not work,” said Michael Kearns, a co-author of the study and professor of management and technology at University of Pennsylvania, in a statement.

Computer scientists have since developed provable protection techniques like differential privacy, he added, referring to a method created in 2006 that conceals individual data by creating otherwise negligible amounts of randomized false information throughout a dataset. However, some critics say the technique could complicate the Census Bureau's decennial national population survey.

The 2020 Census was the first to feature differential privacy as a safeguard for the publicly-released dataset, though the researchers noted the Census Bureau was still determining how to "balance the trade-off between accuracy and privacy."

While the private sector has been applying advanced privacy-protection technologies and methods to datasets for years, "the Census' long-running statistical programs and policies have additional complications attached," said Aaron Roth, a co-author of the study and professor of computer and cognitive science at University of Pennsylvania, in a statement. 

“In the long run, it may be that public policymakers decide that the risks posed by non-noisy statistics are worth the transparency,” he said. 

The Bureau has meanwhile struggled to strengthen other critical cybersecurity protocols since it was the subject of a hack ahead of the 2020 Census, according to an inspector general’s report published in November of last year. 

The Commerce Department Office of Inspector General's office of audit and evaluation later conducted a simulated cyberattack against the Census Bureau following the 2020 cyberattack and successfully gained unauthorized and undetected control of the agency's critical systems, the report said.