ارائه یک روش بهبودیافته نهان‌کاوی گفتار در داده‌های صوتی مبتنی بر VoIP با استفاده از رویکرد یادگیری عمیق

نوع مقاله : قدرت- انتقال و توزیع

نویسندگان

1 دانشجوی دکتری،دانشگاه صنعتی مالک اشتر،تهران، ایران

2 دانشیار دانشگاه صنعتی مالک اشتر ،تهران، ایران

چکیده

امروزه پروتکل انتقال صدا از طریق اینترنت (VoIP)  به‌صورت گسترده در ارتباطات بلادرنگ و شبکه‌های اجتماعی مورداستفاده قرار گرفته و به حامل مناسبی برای روش‌های نهان‌نگاری تبدیل‌شده است. در راستای مقابله با این تهدیدات، روش‌های متعدد نهان­کاوی ابداع شده‌اند که در میان راه‌حل‌های ارائه‌شده، ترکیب روش‌های پردازش سیگنال و یادگیری ماشین، امکان ایجاد نهان کاوهایی بادقت بسیار بالا را فراهم نموده است. در این مقاله یک رویکرد ترکیبی از روش‌های پردازش سیگنال گفتار و الگوریتم‌های هوش مصنوعی استفاده شده است. در این تحقیق ابتدا پیش‌پردازش داده بر روی سیگنال صوتی فشرده‌شده با کدک G.729 صورت می­گیرد که ویژگی‌های درون­فریمی و همبستگی‌های بین­فریمی را بادقت خوبی استخراج می­کند. سپس نتایج به‌دست‌آمده به یک شبکه یادگیری عمیق داده شده تا آموزش داده‌های پاک از داده‌های نهان نگاشته انجام گیرد. ارزیابی نتایج حاصل از پیاده­سازی، میزان بهبود را هم در بخش صحت تشخیص و در بحث زمان محاسبات شامل می­شود. روش پیشنهادی برای دو خانواده مهم نهان‌نگاری یعنی QIM و PMS مورد ارزیابی قرار گرفته و برای نرخ­های مختلف ادغام روش مذکور تست و پیاده‌سازی شده است. نکته مهم دیگر تست برخط بودن روش ارائه‌شده بوده که برای فایل‌های 1000 میلی‌ثانیه‌ای، زمان پاسخ‌گویی کمتر از 5 میلی‌ثانیه بوده که نشان از سرعت‌ بالای مدل پیشنهادی در مرحله اجرا می­­باشد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

An improved speech steganalysis based on VoIP using deep learning approach

نویسندگان [English]

  • Hojat Allah moghadasi 1
  • hamid dehghani 2
1 PhD student, Malek Ashtar University of Technology, Tehran, Iran
2 Associate Professor, Malek Ashtar University of Technology, Tehran, Iran
چکیده [English]

Today, Voice over Internet Protocol (VoIP) is widely used in real-time communication and social networks and has become a suitable carrier for steganography methods. To confronting these threats, many steganalysis methods have been invented, among the proposed solutions, the combination of signal processing and machine learning methods has made it possible to create steganalysis methods with high accuracy. In this paper, a combined approach of speech signal processing methods and artificial intelligence algorithms is used. In this research, first, data pre-processing is done on compressed audio signal with G.729 codec, which extracts intra-frame features and inter-frame correlations with good resolution. Then the obtained results are given to a deep learning network to train cover data from stego data. The results of the implementation include the improvement in both the detection accuracy and the computation time. This method has been analyzed for two important steganography families, QIM and PMS, and the proposed method has been tested and implemented for different embedding rates. Another important point is the real-time test of the presented method, which for 1000 millisecond files, the response time was less than 5 millisecond, which shows the high speed of the proposed model in the execution phase.

کلیدواژه‌ها [English]

  • Steganography
  • Staganalysis
  • Deep Learning
  • Quantization Index Modulation (Qim)
  • Pitch Modulation Ssteganography (Pms)

Smiley face

[1]     Shamalizadeh Baei, M. A.; Norozi, Z.; Sabzinezhad, M.; Karami, M. R. “Designing an Image Steganography Algorithm Based on Entropy and ELSB2”; Adv. Defence Sci. & Technol. 2018, 02, 39-50 (In Persian).
[2]     Petitcolas, F.A.; Anderson, R.J.; Kuhn, M.G. “Information Hiding-A Survey”; IEEE Int. Joint Conf. Neural Networks 1999, 87(7), 1062-1078.
[3]     Tacticus, A. “How to Survive Under Siege”; Clarendon Press. 1990. https://doi.org/10.1109/5.771065. 
[4]     Lemma, A.N.; Aprea, J.; Oomen, W.; van de Kerkhof, L. “A Temporal Domain Audio Watermarking Technique”; IEEE Trans. Signal Process. 2003, 51, 1088-1097. https://doi.org/ 10.1109/TSP.2003.809372.
[5]     Jafari, S. M; Sadnejad, S. R.; Saryazdi, S.; Jamshidi, V. “A New Audio Steganography Algorithm Based on Sample Clustering“; 7th ISCISC’10, 2010 (In Persian). https://doi.org/10.11591/ijece.v12i1.pp320-330.
[6]     ITU, G. “Coding of Speech at 8 kbit/s Using Conjugatestructure Algebraic-Code-Excited Linear-Prediction (CSACELP)“; 1996.
[7]     Chen, B.; Wornell, G. W. “Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding”; IEEE Trans. Inform. Theory 2001, 47, 1423-1443. https://doi.org/ 10.1109/18.923725.
[8]     Yan, S.; Tang, G.; Chen, Y. “Incorporating Data Hiding into G.729 Speech Codec”;        Multimed. Tools Appl. 2016, 75, 11493-11512.
[9]     Ren, Y.; Wu, H.; Wang, L. “An AMR Adaptive Steganography Algorithm Based on Minimizing Distortion”; Multimed. Tools Appl. 2018, 77, 12095-12110. https://doi.org/10.1007/s11042-015-2865-1.
[10]  Huang, Y.; Liu, C.; Tang, S.; Bai, S. ”Steganography Integration into a Low-Bit Rate Speech Codec”; IEEE Trans. Inf. Foren. Sec. 2012, 7, 1865-1875. https://doi.org/10.1109/ TIFS.2012.2218599.
[11]  Huang, Y.; Tao, H.; Xiao, B.; Chang, C. “Steganography in Low Bit-Rate Speech Streams Based on Quantization Index Modulation Controlled by Keys”; Sci. China Tech. Sci. 2017, 60, 1585-1596. https://doi.org/10.1007/s11431-016-0707-3.
[12]  Liu, P.; Li, S.; Wang, H. “Steganography Integrated into Linear Predictive Coding for Low Bit-Rate Speech Codec”; Multimed. Tools Appl. 2017, 76, 2837-2859. https://doi.org/ 10.1007/s11042-016-3257-x.
[13]  Xiao, B.; Huang, Y.; Tang, S. “An Approach to Information Hiding in Low Bit-Rate Speech Stream”; IEEE Glob. Telecomm. Conf. 2008, 1-5. https://doi.org/ 10.1109/ GLOCOM.2008.ECP.375.
[14]  Huang, Y.F.; Tang, S.; Yuan, J. “Steganography in Inactive Frames of VoIP Streams Encoded by Source Codec”; IEEE Trans. Inf. Foren. Sec. 2011, 6, 296-306. https://doi.org/ 10.1109/TIFS.2011.2108649.
[15]  Liu, J.; Zhou, K.; Tian, H. “Least-Significant-Digit Steganography in Low Bitrate Speech”; IEEE ICC 2012, 1133-1137.
[16]  Lin, R.S. “An Imperceptible Information Hiding in Encoded Bits of Speech Signal”; IEEE IIH-MSP 2015, 37-40.
[17]  Xu, S.; Tian, H.; Quan, H.; Lu, J. “A Novel Global-Local Representations Network for Speech Steganalysis”; 5th Int. Conf. on AI and Pattern Recognition 2022, 945-949.
[18]  Wang, J.; Yang, J.; Gao, F.; Xu, P. “Steganalysis of Compressed Speech Based on Global and Local Correlation Mining”; IEEE Access.  2022, 10, 78472-78483. https:// doi.org/10.1109/ACCESS.2022.3194051.
[19]  Qiu, Y.; Tian, H.; Tang, L.; Mazurczyk, W.; Chang, C.C. “Steganalysis of Adaptive Multi-Rate Speech Streams with Distributed Representations of Codewords”; J. Inf. Sec. App. 2022, 68, 103250. https://doi.org/10.1016/j.jisa. 2022.103250.
[20]  Li, S.; Wang, J.; Liu, P. “General Frame-Wise Steganalysis of Compressed Speech Based on Dual-Domain Representation and Intra-Frame Correlation Leaching”; IEEE/ACM Trans. ASLAB. 2022, 30, 2025-2035.
[21]  Qiu, Y.; Tian, H.; Li, H.; Chang, C. C.; Vasilakos, A. V. “Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis”; IEEE Trans. Inf. Foren. Sec. 2023.
[22]  Yang, Z.; Yang, H.; Chang, C.C.; Huang, Y.; Chang, C.C. “Real-time Steganalysis for Streaming Media Based on Multi-channel Convolutional Sliding Windows”; Knowl-Based Syst. 2022.
[23]  Ren, Y.; Liu, D.; Liu, C.; Xiong, Q.; Fu, J.; Wang, L. “A Universal Audio Steganalysis Scheme based on Multiscale Spectrograms and DeepResNet”; IEEE Trans. Depend. Sec. 2022, 20, 665-679. https://doi.org/10.1109/TDSC.2022. 3141121.
[24]  Wu, Z.; Guo, J. “MFPD-LSTM: A Steganalysis Method Based on Multiple Features of Pitch delay Using RNN-LSTM”; J. Inf. Sec. App. 2023, 74, 103469. https://doi.org/ 10.1016/j.jisa.2023.103469.
[25]  Hu, Y.; Huang, Y.; Yang, Z.; Huang, Y. “Detection of Heterogeneous Parallel Steganography For Low Bit-Rate Voip Speech Streams”; Neurocomputing 2021, 419, 70-79. https://doi.org/10.1016/j.neucom.2020.08.002.
[26]  Li, S.; Wang, J.; Liu, P.; Wei, M.; Yan, Q. “Detection of Multiple Steganography Methods in Compressed Speech Based on Code Element Embedding, Bi-LSTM and CNN with Attention Mechanisms”; IEEE/ACM TASLAP 2021, 29, 1556-1569. https://doi.org/10.1109/TASLP.2021.3074752.
[27]  Yang, H.; Yang, Z.; Bao, Y.; Liu, S.; Huang, Y. “FCEM: A Novel Fast Correlation Extract Model for Real Time Steganalysis of VOIP Stream via Multi-head Attention”; In IEEE ICASSP. 2020, 2822-2826. https://doi.org/10.1109/ ICASSP40776.2020.9054361. 
[28]  Yang, H.; Yang, Z.; Huang, Y. “Steganalysis of VoIP Streams with CNN-LSTM Network”; ACM Workshop on Information Hiding and Multimedia Security 2019, 204-209. https:// doi.org/ 10.1145/3335203.3335735.
[29]  Wang, H.; Yang, Z.; Hu, Y.; Yang, Z.; Huang, Y. “Fast Detection of Heterogeneous Parallel Steganography for Streaming Voice”; ACM Workshop on Information Hiding and Multimedia Security 2021, 137-142. https://doi.org/ 10.1145/3437880.3460404.
[30]  Yang, H.; Yang, Z.; Bao, Y.; Liu, S.; Huang, Y. “Fast Steganalysis Method for VoIP Streams”; IEEE Signal Proc. Let. 2019, 27, 286-290. https://doi.org/10.1109/LSP.2019. 2961610.
[31]  “Vector Quantization - Mohamed Qasem” http://mqasem.net/vector-quantization/ (accessed Oct. 25, 2021).
[32]   ITU, T.S.S.O. “Dual Rate Speech Coder for Multimedia Communication Transmitting at 5.3 and 6.3 kbit/s”; Recommendation g, 723. 1996.
[33]  Lin, Z.; Huang, Y.; Wang, J.; “RNN-SM: Fast steganalysis of VoIP streams using recurrent neural network”; IEEE Trans. Inf. Foren. Sec. 2018, 13, 1854-1868. https://doi.org/10.1109/ TIFS.2018.2806741.