Differential privacy has become an integral way for data scientists to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual’s data to be distinguished or re-identified.

To help more researchers with their work, IBM has recently released the open-source Differential Privacy Library. The library “boasts a suite of tools for machine learning and data analytics tasks, all with built-in privacy guarantees,” according to Naoise Holohan, a research staff member on IBM Research Europe’s privacy and security team.

“What also sets our library apart is our machine learning functionality enables organizations to publish and share their data with rigorous guarantees on user privacy like never before, said Holoham.

In an interview, Holohan explained that differential privacy has become so popular that for the first time in its 230-year history, the US Census will use differential privacy to keep the responses of citizens confidential when the data is made available.

Differential privacy allows data collectors to use mathematical noise to anonymize information, and IBM’s library is special because it’s machine learning functionality enables organizations to publish and share their data with rigorous guarantees on user privacy.

“We decided to build this library that, using existing packages in Python, allows you to build on top of them, and then you can do machine learning with differential privacy guarantees built-in. A lot of the commands you can execute in a single line of code, so it’s very user friendly. It’s easy to use and it can be integrated easily within scripts people have so there isn’t a lot of extra effort required.”

Holohan said the IBM repository is already being used extensively for experimentation and to see what effect differential privacy has on machine learning algorithms. Academic institutions and bloggers are using the software to show how differential privacy works and he added that the library is being used internally at IBM to look at the impact of differential privacy on various applications.

“It has applicability to basically any application of data so that gives a very good opportunity to do a lot of work in a lot of different areas. We have focused on machine learning because the application of privacy-preserving protocols to machine learning fits very well and machine learning is very prevalent in any use of data,” he said.

“The next step is going to be allowing data scientists and analysts to be able to do a lot of statistical analysis easily with differential privacy and our library is the first or a few steps along that path.”

(Image Courtesy: www.assets.website-files.com)