Identification of VoIP Speech with Multiple Domain Deep Features

IRIS

Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made.

Huang, Y., Li, B., Barni, M., Huang, J. (2020). Identification of VoIP Speech with Multiple Domain Deep Features. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 15, 2253-2267 [10.1109/TIFS.2019.2960635].

Identification of VoIP Speech with Multiple Domain Deep Features

Huang Y.;Li B.;Barni M.;Huang J.

2020-01-01

Abstract

Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista su cui è pubblicata l'opera
	
				IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
			
	Citazione
	
				Huang, Y., Li, B., Barni, M., Huang, J. (2020). Identification of VoIP Speech with Multiple Domain Deep Features. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 15, 2253-2267 [10.1109/TIFS.2019.2960635].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
08936059.pdf non disponibili Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 4.29 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.29 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1105745