Dhruvraj Singh Rathore - Profile Photo

Hi, I'm Dhruvraj Singh Rathore

AI Engineer • Data Scientist • Data Engineer • ML Engineer • Data Analyst

I am a Data Science graduate student at Texas A&M with experience spanning AI engineering, data engineering, and applied data science. My work focuses on building systems that are scalable, reliable, and directly support data-driven decision-making.

Professional Experience

AI Engineer Intern

TechSur Solutions, Reston, Virginia

2024 – Present

  • Created a Multi-AI agent pipeline with CrewAI and MCP that auto-generates unit tests using LlamaIndex + Chroma DB, cutting manual test development and execution effort by 70%.
  • Automated error resolution with AI agents that fixed code and test failures against JIRA requirements, cutting manual debugging time by 30% and boosting code coverage by 40%.
  • Collaborated with QA Engineers and Architects to help in integrating test automation into the existing SDLC.
  • Integrated the pipeline into an AI-powered IDE extension chatbot, containerized with Docker, enabling seamless automated testing against locally stored developer code.
  • Engineered CI/CD pipelines with GitHub Actions to execute full end-to-end testing with BDD selenium, ensuring code reliability before PR merges, cutting release failures by 50%.

Data Analyst

Draup Business Solutions, Bangalore, India

2023 – 2024

  • Built scalable PySpark ETL pipelines on AWS EMR to process 50TB data, reducing anomalies by 35% and runtime by 30% via schema enforcement and optimized partitioning.
  • Created a data health dashboard by validating 200M+ workforce records/day via Airflow DAGs, improving data reliability and reducing QA escalations by 50% through collaboration with analysts and consultants.
  • Modeled OLAP schemas in Amazon Redshift (Snowflake design) for 20M+ job-role/location records in talent cost, optimizing joins and aggregations to improve query speed by 40%.
  • Developed a serverless pipeline with AWS Lambda and DynamoDB to serve on-demand queries from S3, reducing client data delivery time by 30% and supporting high concurrency.

Data Scientist

HighRadius Corporation, Hyderabad, India

2022 – 2023

  • Refined SQL/Snowflake workflows for AR data processing and reporting with indexing and window functions, reducing runtime and reporting delays by 40%.
  • Implemented a keyword matching algorithm in Python to automate the matching of claims to deductions resulting in a 60% increase in net recovery rates.
  • Built machine learning models to predict customer payment dates using gradient boosting and regression techniques, achieving an accuracy of 75% and improving cash flow forecasting.
  • Enhanced model performance using GridSearchCV for hyperparameter tuning, Ridge regularization to reduce overfitting, and Adam optimizer, boosting accuracy by 40%.

Featured Projects

HR Smart Screener

HR Smart Screener

AI-powered resume screening app that analyzes and ranks resumes against job descriptions using BERT embeddings (60%), LLaMA 3.2 evaluation (25%), and keyword matching (15%). Built with Streamlit and Python.

Python BERT LLaMA 3.2 Streamlit
TravelGenie

TravelGenie

Automated itinerary planner using REST APIs and LLMs to generate budget-friendly travel plans with real-time flight, hotel, and attraction data, ranked via BM25 + embeddings, and RAG-generated itineraries.

Python REST APIs RAG BERT
Cotton Field Detector

Cotton Field Detector

Developed a U-Net deep learning model in PyTorch to detect cotton fields from satellite images with 92% IoU. Achieved 88% segmentation accuracy, automating crop area estimation and reducing manual inspections by 50%.

Deep Learning U-NET PyTorch Computer Vision
Personalized Academic Research Assistant

Personalized Academic Research Assistant

Built an academic research assistant using RAG & LangChain for fast paper retrieval and summarization. Implemented FAISS + SciBERT for efficient document retrieval, enhancing search relevance by 40% and reducing research retrieval time by 60%.

NLP RAG LangChain LLM
Metro Interstate Traffic Volume

Metro Interstate Traffic Volume

Developed a traffic congestion prediction model using scikit-learn, feature scaling, and time-series analysis. Evaluated Random Forest, Lasso, Ridge, and Polynomial Regression, improving traffic volume prediction accuracy by 30%.

Machine Learning Statistical Techniques Scikit-learn Time Series
Metastatic Cancer Detection

Metastatic Cancer Detection

Integrated deep learning model using CNNs to classify metastases in histopathological images from the PatchCamelyon (PCam) dataset. Applied data augmentation and batch processing, achieving an F1 score of 0.8768.

Machine Learning Deep Learning CNN PyTorch

Technical Skills

Python
SQL
Pandas
NumPy
Matplotlib
Scikit-learn
Shell Script
MySQL
NoSQL
Redis
MongoDB
AWS
EMR
S3
EC2
Lambda
Spark / PySpark
LLMs & BERT
LangChain
Predictive Analytics
AWS SageMaker
RAG
CrewAI
Git/GitHub
CI/CD
Apache Airflow
Docker
Power BI
Snowflake
dbt
T-SQL
Java
AWS Glue
Databricks
Kinesis
Redshift
Data Lakes
Kafka
PostgreSQL
Schema Design
Data Modeling
Data Mining
Hadoop
GitLab
JIRA
Confluence
Tableau
QuickSight
Seaborn

Education

Texas A&M University, College Station, TX

Master of Science in Data Science

Aug 2024 – Dec 2025

CGPA: 4.0

Courses: Data Mining & Analysis, Mathematical foundation for Data Science, Statistical Foundation for Data Science, Databases & Computational Tools Used in Big Data, Information Storage and Retrieval (Search Engines, Recommender Systems, Large Language Models), Natural Language Processing, Applied Analytics with R programming, Reinforcement Learning, Data Science Capstone.

SRM Institute of Science and Technology, Chennai, India

Bachelor of Technology in Computer Science

Jul 2018 – May 2022

CGPA: 3.8

Certifications & Courses

Machine Learning with Scikit-Learn

View Certificate

SQL-MySQL for Data Analytics and Business Intelligence

View Certificate

Apache Spark (TM) SQL for Data Analysts

View Certificate

Python for Data Analysis and Visualization

View Certificate

Statistics for Data Science and Business Analysis

View Certificate

Docker Essential Training

View Certificate

Git Foundations Training

View Certificate

Redis Essential Training

View Certificate

The Complete dbt (Data Build Tool) Bootcamp

View Certificate

AI For Everyone Coursera

View Certificate

Leadership & Impact

Committee Head - Sponsorship, Marketing, and Highlights

AARUUSH, National-level Techno-Management Fest, SRM Institute of Science and Technology

2019 - 2022

Led a team of 50+ committee members, ensuring smooth coordination across sponsorship, marketing, and event highlights. Organized summits, guest lectures, and exhibitions, managing logistics, hospitality, and stakeholder engagement. Secured key sponsorships and partnerships to fund multiple large-scale events. Developed strong teamwork, communication, and crisis management skills, ensuring seamless execution under high-pressure situations.

Corporate Lead - SRM CodeChef Chapter

SRM Institute of Science and Technology

2018 - 2022

Spearheaded the sponsorship team, bringing in partners and sponsors for multiple coding events. Organized workshops, coding challenges, and hackathons, attracting participation from diverse technical communities. Strengthened skills in public relations, negotiation, and leadership, effectively coordinating with stakeholders and managing deadlines under pressure.

Let's Connect

I'm always interested in new opportunities and collaborations. Feel free to reach out!