Brave New Privacy

Blogging our mission of protecting secrets

Using Private Government Databases in a Big Data Fashion

Making better decisions

Great decisions come from great intuition combined with great advice. Intuition comes with experience, but everybody can get good advice from data. Governments and companies have understood this and use it wherever possible. However, data sharing has its risks to the well-being to the owners of data. Legislation and distrust or some of the barriers that prevent data sharing.

We are very excited to report on our biggest privacy technology breakthrough to date. Just take a look at the video below.

How to grow an IT workforce?

Every country needs skilled engineers who process data, automate factories and secure the digital world. One would think that if one needs more IT engineers, one has to fund their studies. However, this does not always turn out as planned.

The graph above shows the failure rate of IT students in the universities of Estonia (you might know Estonia as the once home of Skype, Transferwise and GrabCAD, or as a country with one of the worlds most efficient e-government systems). Surprisingly, by 2012, nearly 43% of IT students enrolled in the last five years had failed to graduate.

A hypothesis was raised that the booming IT industry is hiring too hungrily and causing students to fail at school. The Estonian Association of Information and Communication Technology (ITL) decided to investigate.

Big data vs Private data from a legal perspective

In Estonia, the Ministry of Education and Science keeps track of students and the Tax and Customs Board keeps track of working (by tracking income tax payments).

If data scientists could access these databases, they could find the correlation between working during studies and not graduating in time. Even though correlation does not mean causality, it paints a realistic picture of the world, if we are working on the whole population and not just a sample. For more discussion on this, see Chapter 4 in Big Data, an excellent book by Victor Mayer-Schönberger and Kenneth Cukier.

The government has a chance to make great decisions, because they have data. But the citizen has to feel the government really deals with all this data securely.

Taavi Kotka, Estonian Government CIO

However, this data cannot be shared because of the Personal Data Protection Act and the Taxation Act (not to mention the relevant EU regulation). This prevents such studies from being performed. The only way for performing such studies today is by letting the tax board pre-aggregate and group the data, so that no record can be associated with an individual. In our control study we saw that this has some undesired consequences.

Because the tax board pre-aggregates data to achieve k-anonymity, all somewhat unique students are left out of the study. This means 54% of master’s students and a staggering 78% of doctoral students were not accounted for in the analysis. This severely reduces the utility of the study. The only way to solve this privacy-to-utility imbalance was to leave gender attributes out of the control study.

How Sharemind saved the day

We used the Sharemind Application Server with its analytics package Rmind to perform the study in a privacy-preserving way. The following figure shows how encrypted data flows from the tax board and the ministry to Sharemind. Sharemind then performs privacy-preserving transformations, linking and analysis and publishes the results to the analysts at Centar.

The privacy-preserving solution was checked by the Estonian Data Protection Inspectorate. Their response was that our solution does not process Personally Identifiable Information (PII) in the meaning of the law. Furthermore, the Tax and Customs Board reviewed Sharemind’s source code to ensure that everything is performed according to the study plan.

Our study showed relations between higher education and higher income, but we found no relation between working during studies and not graduating on time. Instead, it turned out that Estonian students of all fields work an equal amount. Also, our data showed clearly the reduction of employment during the financial crisis in 2008.
This allowed more personalized follow-up studies to be planned for finding the reasons why students quit.

The future of data integration

We showed that Sharemind can do statistics more accurately and privately than what was possible before. This can lead to new applications that organizations have dreamed of, but have been unable to create. What if different states could combine data to detect welfare or tax fraud? Or consulting companies could ask their data providers for more information to sell higher quality reports?

Sharemind will rise to these challenges.