AI & Data Science — Healthcare & Fraud Detection

Building intelligent

Data Scientist with 3+ years of expertise in ML, Deep Learning, and LLMs. Focused on Healthcare AI and Fraud Detection. Proficient in building end-to-end MLOps pipelines on AWS and GCP. Currently at NexGen IT Services, delivering scalable AI systems.

terminal://eincode
┌───────────────────────┐
│     AASHISH MUSALE     │
│                       │
│  > expertise: ML/AI   │
│  > status: building   │
│  > focus: healthcare  │
└───────────────────────┘
v0.1.0
3+ years exp
NexGen IT Services

About Me

Building Intelligent Systems

I'm a Data Scientist and Machine Learning Engineer with 3+ years of hands-on experience building production-scale AI systems. My passion lies in solving complex problems at the intersection of machine learning and real-world applications, particularly in healthcare and fraud detection.

At NexGen IT Services, I'm currently developing secure AI systems for identifying and categorizing sensitive health information in electronic health records (EHRs) and insurance claims, achieving 99.2% accuracy while maintaining HIPAA compliance.

My technical arsenal includes expertise in Deep Learning (PyTorch, TensorFlow), Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), cloud platforms (AWS, GCP), and MLOps. I'm particularly experienced in designing end-to-end machine learning pipelines that scale to billions of data points.

Beyond my professional work, I'm driven by continuous learning and innovation. I actively contribute to open-source projects, build experimental AI systems, and stay at the cutting edge of developments in LLMs, transformers, and production machine learning systems.

Current Role

Data Scientist at NexGen IT Services

Location

Hyderabad, India / Dallas, TX

By the Numbers

3+

Years of Experience

99%

Model Accuracy

8+

Production Projects

2.5B+

Data Points Processed

Artifacts

Open Source Projects

Featured
shipped
2025

PHI/PII Detection System

Secure AI system to identify and categorize PHI/PII in EHRs and insurance claims. Achieved 99.2% accuracy with custom quantized models and KV-Cache optimization for HIPAA compliance.

423
PyTorchLLMsHIPAAHealthcare AI
shipped
2024

Credit Fraud Detection MLOps

Real-time fraud detection models using XGBoost and statistical methods with 2.5% false positive rate. Built end-to-end MLOps pipeline on AWS with Kafka streaming.

285
XGBoostAWSKafkaML Pipeline
shipped
2023

Medical QA Chatbot with RAG

RAG-based medical QA system using LLaMa2, LlamaIndex, and FAISS. Achieved 92% relevance score and 88% answer correctness on BioASQ dataset.

194
LLaMA2RAGLlamaIndexFAISS
shipped
2024

GPT-2 Implementation from Scratch

Reproduced GPT-2 in PyTorch using GPT-3 hyperparameters with sliding window attention, mixed precision training, and CUDA optimization.

358
PyTorchTransformersCUDADeep Learning
shipped
2024

Job Hunt - AI Career Platform

Full-featured career platform with ATS optimization, smart job matching, interview prep, and AI-powered resume enhancement using modern web tech and LLM integrations.

216
FastAPIReactLLMsFull Stack
shipped
2022

Walmart Sales Analysis

Analytical insights on Walmart store sales across locations with trend analysis, seasonality examination, and product category recommendations using R and data visualization.

133
RData AnalysisVisualization
shipped
2025

MLOps Pipeline Architecture

Enterprise MLOps pipeline with Terraform, Kubernetes, Docker, Jenkins, and CI/CD. Multi-environment customer onboarding with reliable RESTful microservices.

245
KubernetesTerraformJenkinsDevOps
shipped
2025

Anomaly Detection Engine

Real-time anomaly detection using temporal clustering and information fusion. Analyzes user telemetry logs to identify suspicious patterns and alert systems.

174
PythonClusteringReal-time Analytics

Career Journey

Professional Experience

3+ years building production ML systems, healthcare AI, and fraud detection at leading organizations.

Current

Data Scientist

NexGen IT ServicesDallas, TX

July 2025 – Present
  • Implemented secure AI system to identify and categorize PHI/PII in EHRs with 99.2% accuracy
  • Reduced false positives to 2% using KV-Cache sliding window architecture and loss gradient checkpointing
  • Applied multi-environment LLMOps pipeline with Terraform, Kubernetes, Docker, and Jenkins on GCP/Azure
  • Converted 7M+ unstructured telemetry logs to structured data using PySpark and APIs

DS Research Scientist

University of Texas at DallasDallas, TX

April 2024 – May 2025
  • Developed real-time credit fraud detection models using XGBoost with 2.5% false positive rate
  • Deployed MLOps pipeline on AWS (S3, EC2, Lambda, EKS, EMR, SageMaker) for billions of transactions
  • Designed Apache Kafka-based data streaming with Cassandra, increasing detection speed by 30%
  • Developed 20+ Java APIs for product development with direct impact on system reliability

Junior Data Scientist

Cognizant Technology SolutionsIndia

January 2022 – July 2023
  • Built medical QA chatbot with RAG using LLaMa2, LlamaIndex, and FAISS achieving 92% relevance
  • Implemented fine-tuning of BERT and T5 transformers on BioASQ with 85% cosine similarity
  • Introduced CI/CD pipelines during Agile sprints, reducing model deployment time by 18%
  • Built ETL pipelines on Airflow and PySpark connecting data sources to Amazon Redshift

Learning Path

Education

Master of Science in Information Technology Management

The University of Texas at Dallas

August 2023 – May 2025
Relevant Coursework:Machine LearningDeep Learning and Neural NetworksApplied Natural Language Processing

Bachelor of Science in Computer Science and Engineering

Jawaharlal Nehru Technological University

June 2018 – May 2022
Relevant Coursework:Algorithm Design and AnalysisMachine LearningApplied Statistics

Technical Arsenal

Technical Skills

A comprehensive toolkit built over 3+ years of experience in machine learning, deep learning, and production systems.

Generative AI & LLMs

TransformersLLaMA2LlamaIndexRAGLangChainLoRARLHFHuggingFaceFAISS

Deep Learning

PyTorchTensorFlowKerasscikit-learnOpenCVXGBoostNLTKspaCy

Cloud & MLOps

AWS SageMakerGCP Vertex AIKubernetesTerraformDockerJenkinsCI/CDMLflow

Data & Big Data

PySparkKafkaDatabricksApache AirflowSQLPandasNumPyCassandra

Programming Languages

PythonRJavaJavaScriptC++SQL

Specializations

Healthcare AIHIPAA ComplianceFraud DetectionRAG SystemsTime Series Analysis

Technical Expertise

Research & Innovation

Key research areas, technical breakthroughs, and learnings from building production AI systems.

aiMar 2025

LLM Fine-tuning with LoRA and RLHF

Techniques for efficient LLM fine-tuning using Low-Rank Adaptation and Reinforcement Learning from Human Feedback for domain-specific applications.

read more
healthcareFeb 2025

Building HIPAA-Compliant PHI Detection

Implementing secure AI systems to identify and categorize Protected Health Information in EHRs with 99% accuracy while maintaining patient privacy.

read more
devopsJan 2025

MLOps Pipelines on AWS & GCP

End-to-end machine learning operations with Kubernetes, Terraform, Jenkins CI/CD, and multi-environment deployments at scale.

read more
mlDec 2024

Real-time Fraud Detection with Kafka

Designing scalable data streaming systems with Apache Kafka and Cassandra for real-time credit transaction anomaly detection.

read more