Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made.

Huang, Y., Li, B., Barni, M., Huang, J. (2020). Identification of VoIP Speech with Multiple Domain Deep Features. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 15, 2253-2267 [10.1109/TIFS.2019.2960635].

Identification of VoIP Speech with Multiple Domain Deep Features

Barni M.;
2020-01-01

Abstract

Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made.
2020
Huang, Y., Li, B., Barni, M., Huang, J. (2020). Identification of VoIP Speech with Multiple Domain Deep Features. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 15, 2253-2267 [10.1109/TIFS.2019.2960635].
File in questo prodotto:
File Dimensione Formato  
08936059.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 4.29 MB
Formato Adobe PDF
4.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1105745