status: building

AI & Data Science — Healthcare & Fraud Detection

Building intelligent

Data Scientist with 3+ years of expertise in ML, Deep Learning, and LLMs. Focused on Healthcare AI and Fraud Detection. Proficient in building end-to-end MLOps pipelines on AWS and GCP. Currently at NexGen IT Services, delivering scalable AI systems.

explore artifacts→

terminal://eincode

┌───────────────────────┐
│     AASHISH MUSALE     │
│                       │
│  > expertise: ML/AI   │
│  > status: building   │
│  > focus: healthcare  │
└───────────────────────┘┌─────────────────────────────────────┐
│        AASHISH MUSALE               │
│     Data Scientist & ML Engineer    │
│                                     │
│   > expertise: Deep Learning, LLMs  │
│   > specialization: Healthcare AI   │
│   > status: actively building       │
│   > location: Hyderabad, India      │
│                                     │
│   skills: PyTorch, TensorFlow,      │
│   AWS/GCP, Kubernetes, RAG, HIPAA   │
│                                     │
└─────────────────────────────────────┘

v0.1.0

3+ years exp

NexGen IT Services

scroll

About Me

Building Intelligent Systems

I'm a Data Scientist and Machine Learning Engineer with 3+ years of hands-on experience building production-scale AI systems. My passion lies in solving complex problems at the intersection of machine learning and real-world applications, particularly in healthcare and fraud detection.

At NexGen IT Services, I'm currently developing secure AI systems for identifying and categorizing sensitive health information in electronic health records (EHRs) and insurance claims, achieving 99.2% accuracy while maintaining HIPAA compliance.

My technical arsenal includes expertise in Deep Learning (PyTorch, TensorFlow), Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), cloud platforms (AWS, GCP), and MLOps. I'm particularly experienced in designing end-to-end machine learning pipelines that scale to billions of data points.

Beyond my professional work, I'm driven by continuous learning and innovation. I actively contribute to open-source projects, build experimental AI systems, and stay at the cutting edge of developments in LLMs, transformers, and production machine learning systems.

Current Role

Data Scientist at NexGen IT Services

Location

Hyderabad, India / Dallas, TX

Get in Touch

aashishmusale056@gmail.com

GitHub

@Aashish123m

/in/aashish-musale

By the Numbers

Years of Experience

99%

Model Accuracy

Production Projects

2.5B+

Data Points Processed

Artifacts

Open Source Projects

Featured

shipped

2025

PHI/PII Detection System

Secure AI system to identify and categorize PHI/PII in EHRs and insurance claims. Achieved 99.2% accuracy with custom quantized models and KV-Cache optimization for HIPAA compliance.

423

PyTorchLLMsHIPAAHealthcare AI

source

shipped

2024

Credit Fraud Detection MLOps

Real-time fraud detection models using XGBoost and statistical methods with 2.5% false positive rate. Built end-to-end MLOps pipeline on AWS with Kafka streaming.

285

XGBoostAWSKafkaML Pipeline

source

shipped

2023

Medical QA Chatbot with RAG

RAG-based medical QA system using LLaMa2, LlamaIndex, and FAISS. Achieved 92% relevance score and 88% answer correctness on BioASQ dataset.

194

LLaMA2RAGLlamaIndexFAISS

source

shipped

2024

GPT-2 Implementation from Scratch

Reproduced GPT-2 in PyTorch using GPT-3 hyperparameters with sliding window attention, mixed precision training, and CUDA optimization.

358

PyTorchTransformersCUDADeep Learning

source

shipped

2024

Job Hunt - AI Career Platform

Full-featured career platform with ATS optimization, smart job matching, interview prep, and AI-powered resume enhancement using modern web tech and LLM integrations.

216

FastAPIReactLLMsFull Stack

source

shipped

2022

Walmart Sales Analysis

Analytical insights on Walmart store sales across locations with trend analysis, seasonality examination, and product category recommendations using R and data visualization.

133

RData AnalysisVisualization

source

shipped

2025

MLOps Pipeline Architecture

Enterprise MLOps pipeline with Terraform, Kubernetes, Docker, Jenkins, and CI/CD. Multi-environment customer onboarding with reliable RESTful microservices.

245

KubernetesTerraformJenkinsDevOps

source

shipped

2025

Anomaly Detection Engine

Real-time anomaly detection using temporal clustering and information fusion. Analyzes user telemetry logs to identify suspicious patterns and alert systems.

174

PythonClusteringReal-time Analytics

source

Career Journey

Professional Experience

3+ years building production ML systems, healthcare AI, and fraud detection at leading organizations.

Current

Data Scientist

NexGen IT Services • Dallas, TX

July 2025 – Present

Implemented secure AI system to identify and categorize PHI/PII in EHRs with 99.2% accuracy
Reduced false positives to 2% using KV-Cache sliding window architecture and loss gradient checkpointing
Applied multi-environment LLMOps pipeline with Terraform, Kubernetes, Docker, and Jenkins on GCP/Azure
Converted 7M+ unstructured telemetry logs to structured data using PySpark and APIs

DS Research Scientist

University of Texas at Dallas • Dallas, TX

April 2024 – May 2025

Developed real-time credit fraud detection models using XGBoost with 2.5% false positive rate
Deployed MLOps pipeline on AWS (S3, EC2, Lambda, EKS, EMR, SageMaker) for billions of transactions
Designed Apache Kafka-based data streaming with Cassandra, increasing detection speed by 30%
Developed 20+ Java APIs for product development with direct impact on system reliability

Junior Data Scientist

Cognizant Technology Solutions • India

January 2022 – July 2023

Built medical QA chatbot with RAG using LLaMa2, LlamaIndex, and FAISS achieving 92% relevance
Implemented fine-tuning of BERT and T5 transformers on BioASQ with 85% cosine similarity
Introduced CI/CD pipelines during Agile sprints, reducing model deployment time by 18%
Built ETL pipelines on Airflow and PySpark connecting data sources to Amazon Redshift

Learning Path

Education

Master of Science in Information Technology Management

The University of Texas at Dallas

August 2023 – May 2025

Relevant Coursework:Machine LearningDeep Learning and Neural NetworksApplied Natural Language Processing

Bachelor of Science in Computer Science and Engineering

Jawaharlal Nehru Technological University

June 2018 – May 2022

Relevant Coursework:Algorithm Design and AnalysisMachine LearningApplied Statistics

Technical Arsenal

Technical Skills

A comprehensive toolkit built over 3+ years of experience in machine learning, deep learning, and production systems.

Generative AI & LLMs

TransformersLLaMA2LlamaIndexRAGLangChainLoRARLHFHuggingFaceFAISS

Deep Learning

PyTorchTensorFlowKerasscikit-learnOpenCVXGBoostNLTKspaCy

Cloud & MLOps

AWS SageMakerGCP Vertex AIKubernetesTerraformDockerJenkinsCI/CDMLflow

Data & Big Data

PySparkKafkaDatabricksApache AirflowSQLPandasNumPyCassandra

Programming Languages

PythonRJavaJavaScriptC++SQL

Specializations

Healthcare AIHIPAA ComplianceFraud DetectionRAG SystemsTime Series Analysis

Technical Expertise

Research & Innovation

Key research areas, technical breakthroughs, and learnings from building production AI systems.

aiMar 2025

LLM Fine-tuning with LoRA and RLHF

Techniques for efficient LLM fine-tuning using Low-Rank Adaptation and Reinforcement Learning from Human Feedback for domain-specific applications.

healthcareFeb 2025

Building HIPAA-Compliant PHI Detection

Implementing secure AI systems to identify and categorize Protected Health Information in EHRs with 99% accuracy while maintaining patient privacy.

devopsJan 2025

MLOps Pipelines on AWS & GCP

End-to-end machine learning operations with Kubernetes, Terraform, Jenkins CI/CD, and multi-environment deployments at scale.

mlDec 2024

Real-time Fraud Detection with Kafka

Designing scalable data streaming systems with Apache Kafka and Cassandra for real-time credit transaction anomaly detection.

Current Focus

Active Projects

Ongoing research and development. Building the next generation of intelligent healthcare AI systems.

~/aashish/ml-research

live

Advanced RAG Framework

Building enterprise-grade RAG systems with semantic chunking, hybrid search, and multi-modal embeddings

85%

Feb 2025

GCP-Based MLOps Platform

Scalable MLOps infrastructure with Vertex AI, Cloud Functions, and automated model serving

75%

Jan 2025

Interpretable ML Dashboard

Interactive dashboards for model explainability, feature importance, and anomaly detection insights

70%

Dec 2024

Open-Source ML Library

Reusable Python library for common ML tasks in healthcare including data preprocessing and evaluation

65%

Nov 2024

❯git status --allpress enter to run