From Attack to Defense: Towards AI Model Security Protection

3rd Author

Mengyun Tang, Tony Huang, Kevin Zhang

March, 2021

Abstract

With the development of deep learning, AI models have made great progress in various fields, such as computer vision and natural language processing. Until now, training a high-quality AI model is still a burdensome and costly task, requiring well-designed networks, large amounts of data, strong computing power, and etc. Consequently, well-trained AI models may value up to millions of dollars.

Usually, an AI model only provides an API interface and is isolated from users. Thus the remote deployed AI model typically to be considered safe and secure. However, in this talk, we will show how to steal a deployed AI model with distillation easily, which uses the outputs of the model as ground truth to retrain a surrogate model. The results show that the surrogate model has a similar performance to the deployed one. Thus the attackers can sell the stolen surrogate model for profit. Such an attack sounded the importance of AI model copyrights. Unfortunately, due to the nature of AI models, proving “ownership” and catching this cyber thief can be especially hard.

To mitigate the threats brought by the attack mentioned before, we introduce model watermarks, serving as a copyright trap. Such a watermark is added into the outputs of the models and is imperceptible to human eyes. Therefore, it has high concealment, and is hard for the model stealers to notice whether a model is protected by our model watermark. Predetermined model information (e.g., text or image identifiers) can be embedded in this watermark and can be extracted by a well-trained extractor. Once someone tries to steal the model, the watermark will also be embedded into the surrogate model. The owner of the original model can use the extractor to verify the outputs of suspicious models for identifying whether the model is a derivative one. Moreover, our proposed watermark has excellent robustness against common watermark attacks, meaning it can reliably safeguard valuable AI models without being maliciously removed.

To sum up, this talk will discuss the security of AI models from the sides of attack and defense. From the attack side, we will give an overview of how an AI model is created and deployed, and then show how to steal it with model distillation. From the defense side, we will detail our exploitation method based on the model watermark and share our experience on AI model copyright protection.

Publication

Power of Community 2021