PHI/PII Detection System
Secure AI system to identify and categorize PHI/PII in EHRs and insurance claims. Achieved 99.2% accuracy with custom quantized models and KV-Cache optimization for HIPAA compliance.
AI & Data Science — Healthcare & Fraud Detection
Data Scientist with 3+ years of expertise in ML, Deep Learning, and LLMs. Focused on Healthcare AI and Fraud Detection. Proficient in building end-to-end MLOps pipelines on AWS and GCP. Currently at NexGen IT Services, delivering scalable AI systems.
┌───────────────────────┐ │ AASHISH MUSALE │ │ │ │ > expertise: ML/AI │ │ > status: building │ │ > focus: healthcare │ └───────────────────────┘┌─────────────────────────────────────┐ │ AASHISH MUSALE │ │ Data Scientist & ML Engineer │ │ │ │ > expertise: Deep Learning, LLMs │ │ > specialization: Healthcare AI │ │ > status: actively building │ │ > location: Hyderabad, India │ │ │ │ skills: PyTorch, TensorFlow, │ │ AWS/GCP, Kubernetes, RAG, HIPAA │ │ │ └─────────────────────────────────────┘
About Me
I'm a Data Scientist and Machine Learning Engineer with 3+ years of hands-on experience building production-scale AI systems. My passion lies in solving complex problems at the intersection of machine learning and real-world applications, particularly in healthcare and fraud detection.
At NexGen IT Services, I'm currently developing secure AI systems for identifying and categorizing sensitive health information in electronic health records (EHRs) and insurance claims, achieving 99.2% accuracy while maintaining HIPAA compliance.
My technical arsenal includes expertise in Deep Learning (PyTorch, TensorFlow), Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), cloud platforms (AWS, GCP), and MLOps. I'm particularly experienced in designing end-to-end machine learning pipelines that scale to billions of data points.
Beyond my professional work, I'm driven by continuous learning and innovation. I actively contribute to open-source projects, build experimental AI systems, and stay at the cutting edge of developments in LLMs, transformers, and production machine learning systems.
Current Role
Data Scientist at NexGen IT Services
Location
Hyderabad, India / Dallas, TX
By the Numbers
3+
Years of Experience
99%
Model Accuracy
8+
Production Projects
2.5B+
Data Points Processed
Artifacts
Secure AI system to identify and categorize PHI/PII in EHRs and insurance claims. Achieved 99.2% accuracy with custom quantized models and KV-Cache optimization for HIPAA compliance.
Real-time fraud detection models using XGBoost and statistical methods with 2.5% false positive rate. Built end-to-end MLOps pipeline on AWS with Kafka streaming.
RAG-based medical QA system using LLaMa2, LlamaIndex, and FAISS. Achieved 92% relevance score and 88% answer correctness on BioASQ dataset.
Reproduced GPT-2 in PyTorch using GPT-3 hyperparameters with sliding window attention, mixed precision training, and CUDA optimization.
Full-featured career platform with ATS optimization, smart job matching, interview prep, and AI-powered resume enhancement using modern web tech and LLM integrations.
Analytical insights on Walmart store sales across locations with trend analysis, seasonality examination, and product category recommendations using R and data visualization.
Enterprise MLOps pipeline with Terraform, Kubernetes, Docker, Jenkins, and CI/CD. Multi-environment customer onboarding with reliable RESTful microservices.
Real-time anomaly detection using temporal clustering and information fusion. Analyzes user telemetry logs to identify suspicious patterns and alert systems.
Career Journey
3+ years building production ML systems, healthcare AI, and fraud detection at leading organizations.
NexGen IT Services • Dallas, TX
University of Texas at Dallas • Dallas, TX
Cognizant Technology Solutions • India
Learning Path
The University of Texas at Dallas
Jawaharlal Nehru Technological University
Technical Arsenal
A comprehensive toolkit built over 3+ years of experience in machine learning, deep learning, and production systems.
Technical Expertise
Key research areas, technical breakthroughs, and learnings from building production AI systems.
Techniques for efficient LLM fine-tuning using Low-Rank Adaptation and Reinforcement Learning from Human Feedback for domain-specific applications.
Implementing secure AI systems to identify and categorize Protected Health Information in EHRs with 99% accuracy while maintaining patient privacy.
End-to-end machine learning operations with Kubernetes, Terraform, Jenkins CI/CD, and multi-environment deployments at scale.
Designing scalable data streaming systems with Apache Kafka and Cassandra for real-time credit transaction anomaly detection.
Current Focus
Ongoing research and development. Building the next generation of intelligent healthcare AI systems.
Building enterprise-grade RAG systems with semantic chunking, hybrid search, and multi-modal embeddings
Scalable MLOps infrastructure with Vertex AI, Cloud Functions, and automated model serving
Interactive dashboards for model explainability, feature importance, and anomaly detection insights
Reusable Python library for common ML tasks in healthcare including data preprocessing and evaluation