Template-Type: ReDIF-Article 1.0 Author-Name:Adeel Munir, Hammad Nasir, Madiha Sher, Arbab Masood Ahmad Author-Email:adeelmodernite@gmail.com Author-Workplace-Name:Department of Computer Systems Engineering,University of Engineering and Technology,Peshawar, Pakistan Title:Voice Cloning and Synthesis Using Deep Learning: A Comprehensive Study Abstract:This paper reviews current voice cloning and speech synthesis methods. It focuses on the way that deep learning enhances AI-generated voice synthesis in terms of quality, flexibility,and efficiency. We analyze the top AI models in terms of their significance to virtual assistants, dubbing, and accessibility tools: XTTS_v2, Whisper, and Llama 8B. Voice cloning and TTS efforts in Tortoise are improved by XTTs_v2. Based on the multilingual creative transfer, it has a higher speed and shorter time of a computational process,and generates synthetic speech closer to naturalness. Whisper is a transcription model that goes from an audio waveform to text. It simplifies access to audio data. Llama 8B focuses on user question answering for enhancing AI and human interaction. Other related work includes fastSpeech2 [1], Neural Voice Cloning with few Samples [2], and Deep Learning-Based Expressive Speech Synthesis [3], which also contribute to these advancements. This progress enhances machines' ability to communicate in an emotional and human-like way, leading to more sophisticated technology. Keywords:Voice Cloning, Speech Synthesis, Deep Learning, Multilingual Zero-shot Multi-Speaker TTS (XTTS), Speaker Adaptation, Cross-Lingual TTS, Whisper, Llama 8B Journal: International Journal of Innovations in Science and Technology Pages:2225-2235 Volume:7 Issue:3 Year: 2025 Month:September File-URL:https://journal.50sea.com/index.php/IJIST/article/view/1551/2242 File-Format: Application/pdf File-URL:https://journal.50sea.com/index.php/IJIST/article/view/1551 File-Format: text/html Handle: RePEc:abq:IJIST:v:7:y:2025:i:3:p:2225-2235