2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems (2024)

1st Chiara GaldiDigital Security Dep.
EURECOM
Sophia Antipolis, France
  2nd Michele PanarielloDigital Security Dep.
EURECOM
Sophia Antipolis, France
  3rd Massimiliano TodiscoDigital Security Dep.
EURECOM
Sophia Antipolis, France
  4th Nicholas EvansDigital Security Dep.
EURECOM
Sophia Antipolis, France

Abstract

We introduce 2D-Malafide, a novel and lightweight adversarial attack designed to deceive face deepfake detection systems. Building upon the concept of 1D convolutional perturbations explored in the speech domain, our method leverages 2D convolutional filters to craft perturbations which significantly degrade the performance of state-of-the-art face deepfake detectors. Unlike traditional additive noise approaches, 2D-Malafide optimises a small number of filter coefficients to generate robust adversarial perturbations which are transferable across different face images.Experiments, conducted using the FaceForensics++ dataset, demonstrate that 2D-Malafide substantially degrades detection performance in both white-box and black-box settings, with larger filter sizes having the greatest impact.Additionally, we report an explainability analysis using GradCAM which illustrates how 2D-Malafide misleads detection systems by altering the image areas used most for classification. Our findings highlight the vulnerability of current deepfake detection systems to convolutional adversarial attacks as well as the need for future work to enhance detection robustness through improved image fidelity constraints.

Index Terms:

deepfake detection, adversarial attacks, lightweight adversarial attacks, convolutional filters, image perturbations.

I Introduction

In recent years, deep learning-based image recognition systems have achieved remarkable success across various applications, from face recognition to autonomous driving[1, 2]. However, these systems are vulnerable to adversarial attacks, namely deliberate manipulations designed to deceive the model[3, 4].Adversarial noise can typically be applied with subtle or seemingly insignificant perturbations to pixel values[5], involving even only small portions of the image. The perturbations are specially crafted to exploit model vulnerabilities and provoke erroneous outputs. Even if the perturbed image is indistinguishable to the eye from the original image, there can be drastic influences upon the model output.

Most adversarial attacks involve additive noise, where image-specific perturbations are learned and directly added[6]. Fortunately, these approaches are unsuitable for real-time implementation and exhibit high sensitivity to the specific input image. Typically, these methods are trained and tested using the same set of deepfake data, with no assurances of effectiveness against unseen deepfakes — a property often referred to as generalisability.Some adversarial attacks, whether additive[7] or involving spatial transformations[8], have partially solved the problem of generalisation but come at the cost of high complexity.

In this work, we propose the first adversarial attack which attempts to fulfil the generalisability property through convolutive noise while still being computationally lightweight. The former goal is met by optimising the adversarial perturbation over multiple samples. The latter is achieved by reducing the number of learnable parameters thanks to simple, yet effective modelling choices.

Building on a previous work, named Malafide[9] which explored adversarial perturbation attacks against voice anti-spoofing solutions, we have tailored and implemented a novel adversarial attack named 2D-Malafide against image deepfake detection systems. This technique allows the attack to be mounted independently to the specific input image, and requires the optimisation of only a small number of filter coefficients. While the attack is agnostic to the type of classifier and image, e.g.be they face, fingerprint, or iris images, etc, in this paper we report its application specifically to face images and face deepfake detection.

Our experiments demonstrate that 2D-Malafide significantly degrades the performance of recent face deepfake detectors. The attack remains effective in both white-box settings, where the filter is specifically trained to manipulate a particular detector model, as well as black-box settings, and hence poses a substantial threat to the reliability of such detection systems.

II Related Work

The concept of adversarial attacks against neural networks was originally introduced in[10, 5] in the context of image classification tasks. The term usually refers to the introduction of perturbations to the input image of a neural network so as to manipulate the output or decision. Such perturbations can be crafted by optimising the pixel values of the input image via a gradient descent-based technique to maximise the output probability of an arbitrary, incorrect class.

Adversarial attacks have since been explored in a wide variety of different domains, including deepfake detection. Early investigations showed that deepfakes can be rendered undetectable by deepfake detection algorithms using specially crafted adversarial perturbations[11, 3, 12, 13].However, these studies focused on crafting individual adversarial perturbations for each deepfake sample, a computationally intensive process.

More recent adversarial attack techniques have since been proposed to overcome this issue. The authors of[14] proposed the use of generative adversarial networks (GANs)[15] to produce adversarial attacks for arbitrary deepfake samples.In[16], adversarial perturbations are modelled as a linear combination of image transformations whose weights are optimised across multiple deepfake images in order to minimise the chances of detection.Using a similar objective function, the work in[17] demonstrates how a video deepfake detection system can be manipulated by using a single layer of additive noise with bounded amplitude applied to each image frame.

To the best of our knowledge, the only work that explores the generation of generalisable adversarial perturbations against deepfake detectorsis[18]. The authors propose a GAN-based technique to produce shadows which are introduced to an image deepfake to conceal generated artefacts. Nonetheless, this technique involves the training of two generative neural networks and requires considerable computing capabilities.

2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems (1)

III 2D-Malafide

In this section we describe the adaptation and implementation of 2D-Malafide for adversarial attacks against face deepfake detection (FDD) systems.

Let 𝐏(a)={𝐩1(a),𝐩2(a)𝐩N(a)}superscript𝐏𝑎superscriptsubscript𝐩1𝑎superscriptsubscript𝐩2𝑎superscriptsubscript𝐩𝑁𝑎{{\mathbf{P}}}^{(a)}=\{\mathbf{p}_{1}^{(a)},\mathbf{p}_{2}^{(a)}\dots\mathbf{p%}_{N}^{(a)}\}bold_P start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT = { bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT , bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT … bold_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT } be a set of deepfake/spoofed images generated by algorithma𝑎aitalic_a. Each image is designed to deceive a deepfake detection system to increase the likelihood of false accept decisions.Let FDD(𝐈)=s(y𝐈)FDD𝐈𝑠conditional𝑦𝐈\operatorname{FDD}(\mathbf{I})=s\left(y\mid\mathbf{I}\right)roman_FDD ( bold_I ) = italic_s ( italic_y ∣ bold_I ) be a deepfake detector model which assigns a score y𝑦yitalic_y to image 𝐈𝐈\mathbf{I}bold_I, where higher scores reflect greater support for the bona fide class and lower scores for the deepfake class. For spoofed images𝐩i(a)superscriptsubscript𝐩𝑖𝑎\mathbf{p}_{i}^{(a)}bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT, FDD(𝐩i(a))FDDsuperscriptsubscript𝐩𝑖𝑎\operatorname{FDD}(\mathbf{p}_{i}^{(a)})roman_FDD ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT ) should hence produce low scores.2D-Malafide attacks involve the optimisation of a 2D linear time-invariant (LTI), non-causal filter. The coefficients are optimised to provoke the misclassification of deepfake images as bona fide. The 2D LTI, L×L𝐿𝐿L\times Litalic_L × italic_L filter 𝐦(a)superscript𝐦𝑎\mathbf{m}^{(a)}bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT is designed to maximise FDD(𝐩i(a)𝐦(a))FDDsuperscriptsubscript𝐩𝑖𝑎superscript𝐦𝑎\operatorname{FDD}(\mathbf{p}_{i}^{(a)}*\mathbf{m}^{(a)})roman_FDD ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT ∗ bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT ), where * denotes the 2D convolution operator.In the case of several different deepfake algorithms a1aKsubscript𝑎1subscript𝑎𝐾a_{1}\dots a_{K}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, an attacker can optimise an equivalent number of filters 𝐦(a1)𝐦(aK)superscript𝐦subscript𝑎1superscript𝐦subscript𝑎𝐾\mathbf{m}^{(a_{1})}\dots\mathbf{m}^{(a_{K})}bold_m start_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT … bold_m start_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT. The filter should then be tuned to counter the reliance of the FDD system upon attack-specific artefacts.Filter coefficients 𝐦(a)superscript𝐦𝑎\mathbf{m}^{(a)}bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT can be optimised with conventional gradient descent using the set of spoofed images 𝐏(a)superscript𝐏𝑎{{\mathbf{P}}}^{(a)}bold_P start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT. The objective function is given by

max𝐦(a)iFDD(𝐩i(a)𝐦(a))subscriptsuperscript𝐦𝑎subscript𝑖FDDsuperscriptsubscript𝐩𝑖𝑎superscript𝐦𝑎\max_{\mathbf{m}^{(a)}}\sum_{i}\operatorname{FDD}(\mathbf{p}_{i}^{(a)}*\mathbf%{m}^{(a)})roman_max start_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_FDD ( bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT ∗ bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT )(1)

A graphical depiction of the training procedure is shown in Fig.1 for an attack a=𝑎absenta=italic_a = FaceShifter. The filter is optimised independently for each attack so as to manipulate the behaviour of a common FDD.

Without constraints, 2D-Malafide filtering can cause excessive image degradation. For detection settings in the absence of a human observer, this may have little consequence. However, where the FDD system is deployed alongside other systems, the distortion introduced to compromise the FDD system might also interfere with the behaviour of any other auxiliary system, e.g.an automatic face recognition system. In this case, for instance, it might even improve its resistance to attack, e.g.if image quality is significantly degraded.

Accordingly, 𝐦(a)superscript𝐦𝑎\mathbf{m}^{(a)}bold_m start_POSTSUPERSCRIPT ( italic_a ) end_POSTSUPERSCRIPT should be constrained to balance the maximisation of(1) and the preservation of image fidelity, e.g., clarity, detail, or key features.This can be achieved by tuning the filter size L×L𝐿𝐿L\times Litalic_L × italic_L. Larger filters allow for greater manipulation and stronger attacks but can also introduce greater distortion. Conversely, smaller filters can be configured so that they introduce less distortion at the expense of a weaker attack.We apply image normalisation after filtering in order to ensure that pixel values do not surpass the maximum quantisation level.

IV Experimental Setup

All experiments were conducted using the FaceForensics++ (FF++) dataset[19]. It contains 1000 bona fide videos in addition to 5000 corresponding fakes generated with 5 different algorithms.The first two are computer graphics-based approaches.Face2Face[20] is a facial reenactment system which transfers expressions from a source video to a target video while retaining the target face identity.FaceSwap[21] transfers the face region from a source to a target video using facial landmarks to fit a 3D model which is then backprojected, blended, and colour corrected.There are three deep learning-based approaches. The first, Deepfakes, was implemented using the open-source implementation deepfakes faceswap111https://github.com/deepfakes/faceswap and requires training with a pair of videos of source and target subjects.The second, NeuralTextures[22], learns a neural texture of the target person using a photometric reconstruction loss combined with an adversarial loss for training. The last is the two-stage face-swapping method FaceShifter[23] which uses a pair of input images (a source for identity and a target for attributes like pose and expression) and a two-stage framework (AEINet and HEARNet) for high-fidelity face swaps.

Although the FF++ dataset contains videos, the selected FDD systems operate on individual frames hence, in the remainder of this paper, mentions of the dataset refer to the collection of frames extracted from FF++ videos. The attacker is assumed to have access only to the test partition of the dataset. Thus, the FF++ test partition was used for training and testing 2D-Malafide attacks. Attack-specific filters were trained according to (1), using subsets of FF++ for each deepfake method. The FF++ test partition was split into 70% for training (Part1) and 30% for testing (Part2), with 1399 images in Part1 and 599 images in Part2. 2D-Malafide filters were trained using Part1 and tested using Part2. This setup simulates offline filter training and online attacks. 2D-Malafide filters were trained using only deepfake images.

Each attack-specific 2D-Malafide filter is trained using the Adam algorithm [24]. The learning rate and weight decay are tuned separately for each FDD system. The maximum number of epochs is set to 100 since, for all but a single experiment, training reaches the stop condition before 100 epochs, where the stop condition is defined by an equal error rate (EER) in excess of 50%. The resulting filter is then applied to Part2 for evaluation.A batch size of 32 was chosen because it was suitable for the GPUs used for our experiments. During optimisation of 2D-Malafide, the weights of the FDD pre-trained models are frozen. We explored different filter sizes L=(3,9,27,81)𝐿392781L=(3,9,27,81)italic_L = ( 3 , 9 , 27 , 81 ) in order to analyse the impact on performance.Our implementation is available as open-source and can be used to reproduce our results.222https://github.com/eurecom-fscv/2D-Malafide

To determine the effectiveness of the adversarial filter attack we used the following two FDD systems.

CADDM [24]333https://github.com/megvii-research/CADDM is a deepfake detection system developed to address the problem of Implicit Identity Leakage. The authors observed that deepfake detection models supervised using only binary labels are sensitive to identity.Thus, they propose a method, termed an ID-unaware Deepfake Detection Model, to reduce the influence of the identity representation. This is achieved by guiding the model to focus on local rather than global (whole image) features. Intuitively, by forcing the model to focus only on local areas of the image, less attention will be paid to global identity information.

Self-Blended Images (SBIs)[25]444https://github.com/mapooon/SelfBlendedImages is a deepfake detection system which leverages training data augmentation to improve generalisability. The key idea behind SBIs is that the use of more general and barely recognisable fake samples encourage classifiers to learn generic and robust representations without overfitting to manipulation-specific artefacts.Fake samples are generated by blending pairs of pseudo source and target images, obtained using different image augmentation transformations, thereby increasing the difficulty of the face forgery detection task and encouraging the learning of more generalisable models.

The implementations of both CADDM and SBIs used in this work support the use of different backbone architectures. For our experiments, both methods use EfficientNet convolutional neural networks, the only difference being that we use efficientnet-B3 for CADDM, but efficientnet-B4 for SBIs. Models pre-trained using the FF++ training dataset are used for both methods and are available on the respective GitHub repositories.

Baseline Deepfake Detection System - CADDM (C) / SBI (S)
Attack typeDeepfakesFace2FaceFaceShifterFaceSwapNeuralTextures
FDDCSCSCSCSCS
No filter0.000.711.341.431.347.140.671.432.505.00
2D-Malafide trained on CADDM and tested on CADDM / SBI - (W)hite box / (B)lack box
Filter sizeWBWBWBWBWB
3x33.176.512.835.332.836.344.349.684.846.34
9x93.177.347.508.846.494.668.689.026.687.68
27x2746.418.0149.837.1650.177.1646.417.6851.926.01
81x8147.087.3455.5010.3364.002.1648.087.6862.104.51
2D-Malafide trained on SBI and tested on SBI / CADDM - (W)hite box / (B)lack box
Filter sizeWBWBWBWBWB
3x36.183.1713.172.8311.002.838.013.6713.864.51
9x913.861.3428.831.5034.170.6731.392.0033.062.84
27x276.852.0140.172.8343.340.6739.403.3445.243.50
81x8129.053.1726.672.8345.992.8330.055.0129.223.84
2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems (2)

V Experimental Results

Results presented in Table I show EER values for CADDM and SBI FDD systems with and without the application of 2D-Malafide filters under white-box (tested using the same countermeasure used for 2D-Malafide training) and black-box (tested using an unseen countermeasure) settings.Results are shown separately for the baseline FDD system (top block), then 2D-Malafide attacks trained using CADDM (middle) and SBI (bottom).Baseline FDD results show detection error rates for the five different attack types.

For the CADDM white-box setting (denoted W in TableI), the application of 2D-Malafide filters leads to a significant increase in EER, especially with larger filter sizes (27×27272727\times 2727 × 27 and 81×81818181\times 8181 × 81). This indicates a substantial degradation in FDD performance, demonstrating the effectiveness of the adversarial filters in deceiving the detection system.For the corresponding black-box setting, for which the model is trained using CADDM but tested using SBI, results show that most filters provoke an increase in the baseline EER. However, in some cases (highlighted in red), filtering instead reduces the EER, indicating that they made it easier for the FDD system to detect the underlying attack.

For the SBI white-box setting the 2D-Malafide filters again lead to notable increases in the EER, particularly for the 27×27272727\times 2727 × 27 filter size. We note that, for the 81×81818181\times 8181 × 81 filter, the EERs decrease slightly, showing that the largest filter size is less effective for SBI than for CADDM. For the corresponding black-box setting, filtering generally increases the baseline EER. However, the impact is less pronounced compared to CADDM, indicating that adversarial training performed using SBI does not generalise well.

Overall, results indicate that FDD systems are vulnerable to 2D-Malafide attacks, with the greatest impact observed under white-box settings. The impact varies with filter size. Larger filters (27×27272727\times 2727 × 27 and 81×81818181\times 8181 × 81) tend to cause the most significant degradation in detection performance, particularly for CADDM. Under black-box settings, while filtering generally provokes an increase in error rates, there are instances where detection performance improves, suggesting that adversarial filtering does not always generalise well to unseen detectors.

Last, Fig.2 shows a comparison of bona fide images, the corresponding attacks and then after application of four different 2D-Malafide filters, for the CADDM FDD system. For smaller filter sizes, the face is still recognisable even if the colours are unnatural.For larger filters, the face is significantly distorted or even unrecognisable. This finding in itself highlights a critical limitation in face deepfake detection in that they can be compromised so easily with images which do not even resemble natural faces.This raises concerns about the robustness of such systems when dealing with altered or degraded images, with obvious implications for both the security and reliability of the technology.

2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems (3)

VI Explainability Analysis

In order to gain deeper insights into the impact of 2D-Malafide filtering upon the deepfake detectors, we also report an explainability analysis. We report the GradCAM[26] heatmaps for a pair of different attacks and filter sizes when using the CADDM and SBI detectors, specifically Deepfakes 3×3333\times 33 × 3 and FaceShifter 81×81818181\times 8181 × 81.GradCAM is applied to each image in the test set and for each category: bona fide, spoof, spoof + malafide. The resulting heatmaps are averaged to show predominant activation patterns.

The heatmaps in Fig.3 indicate the areas of the face images where the model focuses its attention according to the input label, hence revealing features relevant to either bona fide or spoof classes.The first row of Fig.3 shows results for CADDM and the Deepfakes 3×3333\times 33 × 3 attack. The left-most heatmap in column (a), shows that significant facial landmarks which correspond to the contours of the face are most informative for the classification of images as bona fide. In constrast, in the case of fake images, shown in column (b), facial landmarks corresponding to areas of the eyes and eyebrows are most informative. Visual inspections reveal that these areas often correspond to visible artefacts, e.g.double eyebrows, resulting from the application of Deepfakes.

Heatmaps in columns (c) and (d) display results after application of 2D-Malafide and for fake face images when using bona fide and spoof labels respectively. Whereas heatmaps (a) and (c) exhibit similar patterns, heatmaps (b) and (d) are notably different. 2D-Malafide hides fake image artefacts upon which the detector relies, namely those in the central part of the face. There are no obvious activations in this area in heatmap (d), hence why the model is misled into classifying the fake as bona fide.

The second row of Fig.3 shows results for FaceShifter 81×81818181\times 8181 × 81 attacks, again for CADDM. The heatmap in column (c) shows that the CADDM model focuses on the sides of the face image, but with greater intensity than for bona fide images. Heatmap (d) remains similar to that for the Deepfakes 3×3333\times 33 × 3 attack. Not only does 2D-Malafide hide fake artefacts, it also provokes a greater rate in the misclassification of fake images by causing the detector to focus more on sides of the face. This finding accounts for results reported in TableI, in particular cases for which 2D-Malafide is more efficient the largest filter size.The dominant spot to the upper right might be due to the Multi-scale Detection Module (MSDM) of the CADDM architecture. The MSDM uses predefined anchor boxes which are tiled across the image. The level of activations in this area might correspond to the location of the last analysed anchor box.

Heatmaps in rows 3 and 4 of Fig.3 show results for the SBI detector. For the Deepfakes 3×3333\times 33 × 3 attack, the detector focuses on small parts of bona fide images at different positions, hence the seemingly flat heatmap. In contrast and in the case of fakes, the model focuses predominantly on central areas of the face, albeit in a less localised manner compared to CADDM. After application of 2D-Malafide filtering, there are few differences between results for bona fide images(a) and filtered bona fide images(c), and also between those for fakes(b) and filtered fakes(d). However, a closer look revels how attention for attacks without filtering, shown in column (b), is more concentrated to the bottom left of the central part of the face. Instead, for fake images processed by 2D-Malafide, attention is concentrated more to the top right, and more so for FaceShifter 81×81818181\times 8181 × 81 attacks.

VII Conclusions

In this article we introduce 2D-Malafide, an adversarial attack which uses 2D convolutional filtering to deceive face deepfake detection systems. The attack significantly increases the EER of state-of-the-art deepfake detectors in both white-box and black-box settings and highlights the vulnerability of current FDD systems to such attacks. Larger filters (27×27272727\times 2727 × 27 and 81×81818181\times 8181 × 81) cause substantial performance degradation. Moreover, the generalisability of 2D-Malafide ensures robustness across various image inputs, making for a versatile threat.Colour information is the first to be impacted by the application of 2D-Malafide showing that the FFDs considered in this work fail to recognise simple, even unnatural changes in colour.

GradCAM explainability analysis reveals that 2D-Malafide misleads FDD systems by altering the areas of an image they use for classification, thereby increasing false acceptance rates. Attack success varies across different FDD systems, indicating some level of generalisability but also a dependency on the specific architecture.

The results emphasise the need for comprehensive and diverse training datasets to improve FDD robustness. Future research should focus on enhanced image fidelity constraints, including colour consistency, to counter such adversarial attacks. Overall, 2D-Malafide demonstrates the critical need for ongoing advancements in FDD technology to ensure the security and reliability of deepfake detection systems.

References

  • [1]V.M. Opanasenko, S.K. Fazilov, O.N. Mirzaev, and S.S.u. Kakharov, “An ensemble approach to face recognition in access control systems,” Journal of Mobile Multimedia, vol.20, p.749–768, May 2024.
  • [2]J.Janai, F.Guney, A.Ranjan, M.J. Black, and A.Geiger, “Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art,” Foundations and Trends in Computer Graphics and Vision, vol.12, no.1–3, pp.1–308, 2020.
  • [3]A.Gandhi and S.Jain, “Adversarial perturbations fool deepfake detectors,” in 2020 International Joint Conference on Neural Networks (IJCNN), pp.1–8, 2020.
  • [4]F.Vakhsh*teh, A.Nickabadi, and R.Ramachandra, “Adversarial attacks against face recognition: A comprehensive study,” IEEE Access, vol.9, pp.92735–92756, 2021.
  • [5]I.J. Goodfellow, J.Shlens, and C.Szegedy, “Explaining and harnessing adversarial examples,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y.Bengio and Y.LeCun, eds.), 2015.
  • [6]R.Ambati, N.Akhtar, A.Mian, and Y.S. Rawat, “Prat: Profiling adversarial a ttacks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.3667–3676, 2023.
  • [7]H.Stanly, M.S. S., and R.Paul, “A review of generative and non-generative adversarial attack on context-rich images,” Engineering Applications of Artificial Intelligence, vol.124, p.106595, 2023.
  • [8]Y.Zhang, W.Ruan, F.Wang, and X.Huang, “Generalizing universal adversarial attacks beyond additive perturbations,” in 2020 IEEE International Conference on Data Mining (ICDM), pp.1412–1417, 2020.
  • [9]M.Panariello, W.Ge, H.Tak, M.Todisco, and N.Evans, “Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems,” in Proc. INTERSPEECH 2023, pp.2868–2872, 2023.
  • [10]C.Szegedy, W.Zaremba, I.Sutskever, J.Bruna, D.Erhan, I.J. Goodfellow, and R.Fergus, “Intriguing properties of neural networks,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (Y.Bengio and Y.LeCun, eds.), 2014.
  • [11]N.Carlini and H.Farid, “Evading deepfake-image detectors with white- and black-box attacks,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.2804–2813, 2020.
  • [12]S.Hussain, P.Neekhara, M.Jere, F.Koushanfar, and J.McAuley, “Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), (Los Alamitos, CA, USA), pp.3347–3356, IEEE Computer Society, jan 2021.
  • [13]S.Jia, C.Ma, T.Yao, B.Yin, S.Ding, and X.Yang, “Exploring frequency adversarial attacks for face forgery detection,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.4093–4102, 2022.
  • [14]B.Fan, S.Hu, and F.Ding, “Synthesizing black-box anti-forensics deepfakes with high visual quality,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4545–4549, 2024.
  • [15]I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair, A.Courville, and Y.Bengio, “Generative adversarial networks,” Commun. ACM, vol.63, p.139–144, oct 2020.
  • [16]Y.Hou, Q.Guo, Y.Huang, X.Xie, L.Ma, and J.Zhao, “Evading deepfake detectors via adversarial statistical consistency,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.12271–12280, June 2023.
  • [17]P.Neekhara, B.Dolhansky, J.Bitton, and C.C. Ferrer, “Adversarial threats to deepfake detection: A practical perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp.923–932, June 2021.
  • [18]J.Liu, M.Zhang, J.Ke, and L.Wang, “Advshadow: Evading deepfake detection via adversarial shadow attack,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4640–4644, 2024.
  • [19]A.Rossler, D.Cozzolino, L.Verdoliva, C.Riess, J.Thies, and M.Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, pp.1–11, 2019.
  • [20]J.Thies, M.Zollhöfer, M.Stamminger, C.Theobalt, and M.Nießner, “Face2face: real-time face capture and reenactment of rgb videos,” Commun. ACM, vol.62, p.96–104, dec 2018.
  • [21]Kowalski, Marek, “Github repository - MarekKowalski/FaceSwap,” Aug. 2024.original-date: 2016-06-19T00:09:07Z.
  • [22]J.Thies, M.Zollhöfer, and M.Nießner, “Deferred neural rendering: image synthesis using neural textures,” ACM Trans. Graph., vol.38, jul 2019.
  • [23]L.Li, J.Bao, H.Yang, D.Chen, and F.Wen, “Advancing high fidelity identity swapping for forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  • [24]S.Dong, J.Wang, R.Ji, J.Liang, H.Fan, and Z.Ge, “Implicit identity leakage: The stumbling block to improving deepfake detection generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.3994–4004, June 2023.
  • [25]K.Shiohara and T.Yamasaki, “Detecting deepfakes with self-blended images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.18720–18729, 2022.
  • [26]J.Gildenblat and contributors, “Pytorch library for cam methods.” https://github.com/jacobgil/pytorch-grad-cam, 2021.
2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems (2024)

References

Top Articles
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 5611

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.