About Us

Who Am I?

Hi I'm Rishabh Jain I am currently a postdoctoral researcher at Trinity College Dublin, focusing on Multimodal AI. I recently completed my PhD in AI, specializing in Speech (TTS/STT), at the University of Galway, Ireland. In 2020, I earned my MSc in Data Analytics from the University of Galway, following my B.Tech in Computer Science and Engineering from Vellore Institute of Technology.

I have a strong passion for coding and technology, constantly driven to learn and explore new areas within computer science. Whether it's Machine Learning, Artificial Intelligence, Python Development, or Data Analytics, I am always eager to expand my knowledge and skills. I am actively seeking opportunities as a Machine Learning Engineer or Data Scientist to apply my expertise and contribute to innovative projects.

Machine Learning

Artificial Intelligence

Large Language Models

Speech Technologies

What I do?

Here are some of my expertise

AI Consultant

Providing expert advice on optimizing AI strategies for speech technologies.

Research Scientist

Conducting pioneering research in AI and machine learning.

Software Engineer

Developing robust software solutions and applications enabling AI

Data Analyst

Utilizing data insights to drive AI innovation and improvements.

LLM Engineer

Expertise in development and finetuning of large language models

ML Engineer

Specializing in building and optimizing ML models and Algorithms

My Specialty

My Skills

In the heart of innovation and technology, I thrive as a Data Scientist, AI Engineer, and ML Engineer. My journey through the digital landscape has been marked by a relentless pursuit of excellence, crafting solutions that bridge the gap between theory and application.

Core Skills

  • Machine Learning/Deep Learning
  • PyTorch
  • Command Line (GitHub/Docker/Slurm)
  • Data Visualization
  • Research Expertise
  • Speech Technologies

Languages and ML Libraries

  • SQL, R, Python, C, C++, C#, Java
  • HTML, CSS, Javascript, NodeJS
  • Pandas, NumPy, SciPy
  • Librosa, Transformers, Sklearn
  • Seaborn, Tensorflow, PyTorch

Softwares and Cloud Platforms

  • Anaconda, Pycharm, Spyder
  • Jupyter Notebook, Visual Studio, Eclipse
  • AWS, GCP, Docker

Soft Skills

  • Teamwork, Communication, Leadership
  • Research Work, Problem Solving

Languages

  • English, Hindi, and Spanish
Education

Education

Year:2021-2024 from University of Galway, Ireland

Research Thesis: CHILD SPEECH UNDERSTANDING AND GENERATION VIA NEURAL ASR AND TTS MODELS

Contributions:

- Enhanced child speech technologies in TTS and STT for the DAVID project.

- Optimized Tacotron 2 and Fastpitch models for realistic synthetic child speech.

- Improved child ASR performance using wav2vec2, Whisper, and Conformer models.

- Applied data augmentation techniques to boost ASR accuracy for child speech.

- Integrated TTS and ASR technologies into interactive smart toys.

- Developed facial animation pipelines for synthetic-speaking children.

- Focused on creating nuanced, ethical applications in speech technology for children.

Year:2019-2020 from University of Galway, Ireland

Research Thesis: Toolkit for Facial landmarks Identification and recognition based on the analysis of video data and creating video snippets of the identified faces.

Grade:1.1 Honours

Modules: Machine Learning, Deep Learning, Natural Language Processing, Applied Regression, Data Visualization, Web and Network Science, Case Studies in Data Analytics, Information Retrieval, Applied Statistics, Programming in Data Analytics (includes R and Python).

Year:2015-2019 from VIT University, India

CGPA:8.73

Thesis: Patient Health Monitoring System, and Detection of Atrial Fibrillation, Fall and Air Pollutants using IoT Technologies

Experience

Work Experience

Research Fellow at Trinity College Dublin, Ireland

November, 2024 - Present

1. Conduct cutting-edge research and development in the field of multimodal large language models to develop tools for gesture recognition, audio-visual modalities, hyper/hypo speech etc.

2. Develop and implement state-of-the-art techniques in audio-visual speech recognition, enabling seamless integration of speech and visual inputs for robust performance across diverse environments.

3. Collaborate with cross-functional teams to design, train, and optimize large-scale multimodal language models, leveraging diverse datasets and advanced machine learning algorithms.

Research Scientist at Oxford Wave Research, UK

February, 2024 - June, 2024

1. Conduct research on training the models in the domain of Speech-To-Text (STT) and Large Language Models (LLMs) for low resource languages.

2. Working with engineers to train and deploy such models for production, and optimizing models for real-world use cases.

3. Proficient in fine-tuning, and deploying cutting-edge LLMs including Saul, Mistral, LLama’s, Gemma, Gemini, Zephyr Command-R, and Psi-3, optimizing for performance and scalability.

4. Specialized in LLM summarization techniques and Speaker Diarization following Whisper ASR, enhancing model capabilities for nuanced tasks in NLP and NLU for law and forensics use cases.

5. Experienced in LLM Quantization methodologies, implementing efficient model compression techniques to optimize inference speed and resource utilization while maintaining model fidelity.

6. Conducted evaluations of LLM performance across diverse linguistic domains, informing model refinement and optimization efforts.

7. Proficient in leveraging a variety of LLM tools and frameworks such as Ollama, LangChain, LM Studio, and Llama.cpp.

Research Intern at Xperi Corporation, Galway, Ireland

October, 2020 - May, 2024

-The DTIF-DAVID project is a collaboration between XPERI, SoapBox Labs and NUIG funded by Enterprise Ireland (EI). Its main objective is the development of a multimodal (sound and vision) AI processing platform with low cost and low power consumption to be used for the creation of voice-enabled toys.

-Embedded (on-device) processing of data is currently the preferred solution across the smart toy industry to enable Artificial Intelligence in smart toys. The key challenge remains how to deliver a high-quality on-device AI experience, with long playtime (battery life), in a cost-effective way, that requires ‘data-centre’ level processing – typically delivered via online cloud-based approaches.

-Technologies to be built on the proposed embedded platform include Object/Gesture Detection/Recognition, Automatic Speech Recognition, Text-to-Speech Synthesis, and more.

Research Assistant at University of Galway, Ireland

October, 2020 - February, 2024

Working on the DTIF-DAVID Project, which is a collaboration between XPERI, SoapBox Labs, and NUIG funded by Enterprise Ireland (EI). Its main objective is the development of a multimodal (sound and vision) AI processing platform with low cost and low power consumption to be used for the creation of voice-enabled toys.

This research project is focused on speech and audio technology. The main goal is centered around using Text-To-Speech (TTS) technology specifically in the domain of child speech. As current TTS research is solely focused on using Adult Speech Data, this project aims to explore the use of Child Speech data to create better child speech synthesis models and tools. This project also aims to improve the current Automatic Speech Recognition (ASR) technologies in the domain of Child Speech by using synthetic data generated from TTS models.

For more information: https://nuigalway.ie/c3i/projects/david/

Teaching Assistant at University of Galway, Ireland

January, 2020 - April, 2020

1. Lab Assistant for the course: Object-Oriented Software Design and Development, Object-Oriented Programming II, and, Computer Network and Data Communications.

2. Responsible for teaching lab work to students and solving their queries.

3. Evaluating assignments and exams.

4. Teaching languages include Java, JSON, SQL, XML, Linux, Terminal, Script, Shell, and C#.

Software Engineer Intern at Enbake Consulting, India

April, 2018 - July, 2018

1. Engineered middleware of an application that geocodes addresses using Python.

2. Technologies include NodeJS, MongoDB, Python API, and the Google Cloud Platform.

3. Working on an algorithm engine and user interface of the application.

4. Data Mining and Data Modeling.

5. Developed networking and communication skills by making connections.

Software Development Intern at Momentum Software, India

May, 2017 - July, 2017

1. Debugging software to make it compatible with new hardware.

2. Developed a music player and song trimmer in a team of 5 interns for the client.

3. Technologies include Java, Java file handling and swings, MySQL, HTML, and Javascript.

Client Relationship Manager at AIESEC

February, 2017 - May, 2017

1. Confer with customers by telephone or in-person to provide information about products and services.

2. Preparing a pitch and handling the queries of incoming and outgoing Interns.

3. Hosting cultural events in University.

4. Developing awareness in university with regards to UN sustainable development goals.

My Publications

Recent Articles