nThe arrival of machine learning (ML) as a mainstream business capabilitynhas created a plethora of new cybersecurity risks that are still notnfully appreciated, Thomas P. Scanlon, technical program manager atnCarnegie Mellon University told delegates at the ISC2 Security Congressnin Nashville, Tennessee this week.n
nnAlthough Scanlon’s background straddles cybersecurity and, morenrecently, ML, this work profile is still unusual. In today’snorganizations it is far more likely that ML professionals will havena background in data science while cybersecurity people will havengrown up with security systems and networking. This technical andncultural division could slow down understanding of the ways in whichnML security is different and distinct. “We’ve got to bridge this gapnfrom both sides, “said Scanlon.n
nnHis high-level message was that securing ML will require a deepernunderstanding of ML-specific risks that can’t be addressed throughntraditional approaches to cybersecurity or software development. Henoffered three examples of how ML has come unstuck with negativeneffects:n
n- n
- nn In 2018, Amazonn n abandonedn n an internal recruiting tool developed to identify jobn candidates that turned out to be biased against women. Thisn issue of ‘distributional shift’ was caused by the skewedn training data the system’s models had been fed as the basisn of their output.nn n
- nn The now infamous 2016 example of Microsoft’s experimentaln n Tay chatbotn n designed to learn from its interactions with the public.n Bombarded with offensive interactions, the chatbot quicklyn adopted the extreme language and attitudes of the materialn it was being sent.nn n
- nn The numerous problems associated with self-driving carsn which Scanlon said were connected to incomplete testingn based on optimistic assumptions.nn n
nScanlon drew attention to important differences between traditionalncyberattacks and ML attacks. For example, exfiltrating data is not annobjective of ML attacks which are concerned with interfering with ornskewing data to alter the predictions the model makes. Similarly,npersistence is not an ML attack objective in the way it would be in anconventional cyberattack. IMAGE “CongressDay1_Blog3_SessionWriteUp”n
nnAnother important difference is that ML systems are often trained onnpublic data (as in the Microsoft chatbot example) which potentiallynallows an attacker a way to manipulate a system. Perhaps mostnfundamental of all, testing ML systems is different from the traditionalnsoftware testing that’s been around for decades.n
nnSecure MLOpsn
nnMLOps is the process of taking an ML model from experimental prototypento a production system used in deployed software. As with traditionalnsoftware development, it is structured and repeatable. The pipelinencomprises software and data but also the extra layer of the ML modelsnthemselves which creates additional development complexity.n
nnSecurity issues facing MLOps include data poisoning and modelnmanipulation or black boxing whereby an adversary can query the modelnpublicly and work out how it is structured (or infer the data used tontrain it). Ultimately, adversaries can repurpose a model. For morenattention on model attacks, Scanlon drew delegates’ attention to NIST’snnAdversarial Machine Learning: A Taxonomy and Terminology of Attacksnand Mitigations.nn
nnHow can MLOps protect itself?n
nnFor effective MLOps defence, Scanlon recommended encrypting datan(avoiding the tendency to skip this stage because ML systems are seen asnexceptions to normal rules), using data versioning to keep track of thendata used to train a model, as well as data provenance, that is wherenthe data came from. MLOps must know where the data came from and who hadnaccess to it. Finally, attention should be paid to data drift (where thendata changes suddenly), which might require the model to be retrainednfrom scratch.n
nnScanlon offered some questions anyone involved in an ML project couldnask regardless of their technical background: whether public datansources were being used, what sort of data validation would be carriednout, whether any synthetic data used had been vetted for restricted ornprivate data that might be revealed by an adversary’s prompt attacks,nand whether anomaly detection was being used to detect suspicious datantampering.n
nnScanlon concluded with a warning not to rely on traditional security tondefend ML: “It is foolhardy to rely on the data and model beingnprotected by broader IT protections. Assume your IT protections can bencompromised.”n
n- n
- n ISC2 Security Congress is taking place until October 27 2023n in Nashville, TN andn virtually. Moren information and on-demand registration can be found here.n n
- n ISC2 SECURE Washington, DC takes place in-person on Decembern 1, 2023 at the Ronald Reagan Building and Internationaln Traden Center. Then agenda and registration details are here.n n
- n ISC2 SECURE Asia Pacific takes place in-person on Decembern 6-7, 2023 at the Marina Bay Sands Convention Centre inn Singapore.n n Find out more and register here.n n n