The latest in a long line of privacy scandals happened last week, after Google was found to have been pulling unredacted data from one of America’s largest healthcare providers to use in one of its projects. Despite assurances that it won’t use this information to supplant its ad business, that’s not the issue here. How was Google able to acquire this knowledge in the first place?
Professor Sandra Wachter is an expert in law, data and AI at the University of Oxford’s Internet Institute. She says that every time your data is collected, “you leave something of yourself behind.” She added that anyone can use your online behavior to “infer very sensitive things about you,” like your ethnicity, gender, sexual orientation and health status.
It’s bad enough when the companies use those inferences for targeted ads. But it gets a lot worse when they gain access to very private data. For instance, would you feel comfortable if Google started displaying ads for fertility treatments in your emails after a trip to the doctor? Or if your healthcare provider could access your browser history without your knowledge to determine how suitable you are for insurance.
Last week, we heard that Google has pulled vast amounts of unredacted and unanonymized data from healthcare provider Ascension. The files included test results, diagnoses and hospitalization records from tens of millions of patients.
These, Google said, were made available for researchers inside its Project Nightingale team as part of plans to build software that might help improve software in healthcare environments. It also said that access to the records were tightly controlled and only accessible by staffers who had been vetted by Ascension. That hasn’t stopped Congress and the Department of Health and Human Services from opening an investigation.
How was Google able to grab this data without the consent of the people involved? In the US, it’s legal under HIPAA, the Health Insurance Portability and Accountability Act, and Google and Ascension followed the law. At least, within the letter of the law, which allows cross-company data flows under certain conditions. But this isn’t just a failing with the law in the US.
“I don’t think we could rule something like this out in the EU,” says technology lawyer Neil Brown of decoded.legal. “There are no absolute prohibitions in the GDPR,” he said, referencing the European General Data Protection Regulation, which covers the European Union and the wider European Economic Area.
Brown says that, instead, the GDPR is “a series of controls or standards which companies must meet if they want to operate in a compliant manner. One of these conditions is that processing is necessary for scientific research purposes, so what Google is doing here may meet the requirements of that.” Although with no case law to support that claim, we’re in a gray area.
Another issue is that data protection laws focus too much on the moment when data is collected, wrote Wachter in 2018, not on what happens after it has been obtained. That’s at least one benefit of GDPR, which forces companies to minimize the data they hold on people. But otherwise, even if you offered informed consent at the time, you can’t control what conclusions are drawn from the data. If a company thinks you’re a bad debtor, then you can’t challenge that.
These conclusions are often the biggest issue, especially in areas where machine learning has been implemented. That’s why Wachter believes that now is the time to shift the onus from the individual to the entity hoarding all of that data. She wants to “make it an obligation or responsibility” of whoever is collecting the data to handle it in a responsible and ethically acceptable way.
Wachter also feels that a one-size-fits-all model for data privacy doesn’t work in a world where information is so crucial. “You want to have stricter rules when it comes to financial regulation,” but potentially looser ones if you’re “doing cancer research in a university.” But it would be up to each institution, body or company to demonstrate that they deserve that trust.
A key plank of Wachter’s reform proposals is the notion that, like the right to be forgotten, we need a right of “reasonable inferences.” This would, for instance, allow us to learn what data influenced a decision and the underlying assumptions generated at the time of gathering the data.
We’ve reported on this before — where data collection agencies look at our online activity and make totally wrong assumptions. When I polled one of the biggest US data companies to examine what and who they thought I was (under GDPR), there were major errors in the data. They had even ignored basic facts available as a matter of public record, like my age and marital status, in favor of algorithmic conclusions.
This is going to be an issue, both now and in the future, especially as organizations trust machines to draw inferences on their behalf. Facial recognition already infers your employability beyond what’s written on your resume. Even Facebook uses it as a form of security, despite numerous catastrophic data breaches.
In Europe, experts are already urging lawmakers to ban more advanced forms of these social credit schemes. And in the US, there is some call for tougher privacy laws in the spirit of Europe’s GDPR. But without specific action on preventing companies from pulling vast amounts of sensitive data and running them through their own machine learning, there’s even more trouble ahead.