NLP and others
15 August 2021
OpenReview.net에서 리뷰어들의 코멘트와 저자의 답변, 그리고 Accept/Reject 여부를 바탕으로 정리한 Adversarial Attacks 서베이입니다.
All image credits of this post: the original papers.
This survey covers the following papers including both accepted and rejected papers:
Part of the papers are surveyed in this post. The next post will cover the remains.
(More papers may be added)
ICLR 2021 Spotlight
Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu
The Idea: The proposed method ASCC(Adversarial Sparse Convex Combination) captures adversarial word substitutions in the vector space using a convex hull towards robustness. Using a convex hull can satisfy three aspects: inclusiveness, exclusiveness, and optimization.
One-sentence Summary of the Reviews: The idea is straightforward, experiments are well designed and quantitative.
ICLR 2021 Conference Withdrawn Submission
Liping Yuan
, Xiaoqing Zheng
, Yi Zhou, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang
The Idea: This paper studies the adversarial transferability across different models on NLP, varying different properties such as model architecture and size. Extensive experiments have been conducted to evaluate which factors can affect the transferability most.
One-sentence Summary of the Reviews: There are some doubtful experimental settings and the conclusion of this paper is not clear.
ICLR 2021 Poster
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma
The Idea: This paper presents an empirical study on the backdoor erasing in CNN via teacher-student alignment of the attention maps, which treats the poisoned model as the student and the fine-tuned model as the teacher.
One-sentence Summary of the Reviews:
Well-written paper including sufficient experiment, but the behavior of NAD is not completely understood.
ICLR 2020 Poster
Min Du, Ruoxi Jia, Dawn Song
The Idea: This paper leverages differential privacy’s (DP) stability properties to investigate its use for improved outlier and novelty detection. Under the assumption that a well-trained model would assign a higher loss on the outliers, the paper gives a theoretic bound on how this loss will decrease if there are poisoned samples in the training set.
One-sentence Summary of the Reviews: The paper is well-written but contributions are not substantial.
ICLR 2021 Conference Withdrawn Submission
Mingjie Sun, Siddhant Agarwal, J Zico Kolter
The Idea: The authors showed that with some post-processing analysis on a poisoned classifier, it is possible to construct effective alternative triggers against a backdoor classifier. Specifically, adversarial samples that are generated against models robustified with Denoised Smoothing often show backdoor patterns.
One-sentence Summary of the Reviews: The presented approach is mainly manual and needs human inspection.
Next Post: Survey of Adversarial Attacks from Accept/Reject Perspectives (2/2) (soon be updated)