Takeaway: Companies need to be vigilant about feeding their machines clean data to avoid hackers poisoning their networks.
Artificial intelligence is everywhere: from facial recognition technology to weather forecasting. As we get better at training computers to become more sophisticated with prediction models, hackers are developing stealthier methods to poison them.
But what is the threat?
It’s called data poisoning. Data poisoning is a form of manipulation that involves corrupting the information that is used to train machines. It’s an untraceable method to bypass defenses geared towards protecting artificial intelligence mechanisms, and companies may not be equipped to tackle these challenges.
Data poisoning works similarly to machine learning. Machine learning involves feeding a computer reams of data for the purposes of training it how to categorize information correctly. A computer might be fed 1,000 images of various types of animals that are correctly labeled by species and breed before the computer is tasked with recognizing an image as a dog. The computer, however, doesn’t actually “know” that the image shown is that of a dog. It’s simply running a slew of statistical calculations in the background based on past training allowing it to make an accurate prediction.
Companies take the same approach with cybersecurity. They feed their machines with reams of data that distinguish good code from bad code to teach the machines what malicious software is. The machine is then able to catch malicious software when confronted with bad code.
Hackers will be able to take advantage of this same technique. A savvy hacker could train a machine to recognize malicious code as harmless by labeling those corrupted codes as good and releasing them to a larger batch of data. A neural network could then surmise that poisoned piece of code as being harmless allowing it to poison the system.
Having realized this potential vulnerability in networks, companies are fighting back by attempting to ensure their data is “clean,” which means regularly checking that all the labels being put into machines are accurate. Additionally, companies must train their systems with fewer samples to make sure all the data is clean. When dealing with artificial intelligence, sample size is all that matters. And thus, the more companies are familiar with their AI-systems, the better they are equipped at handling these threats.
To learn more about data poisoning and all its various forms, read the full article: https://securityintelligence.com/articles/data-poisoning-ai-and-machine-learning/