← Back to Work

Alumni Career Success Prediction

Nov 2023

Random Forest Statistical Modeling Machine Learning Analytics
View on GitHub

What I Did

Analyzed 3,300+ alumni records across 4 fragmented datasets to predict career trajectories and identify patterns in career success.

Data cleaning

Merged 4 separate Excel files with completely inconsistent formats. Standardized salary data across different currencies (USD/INR), time periods (monthly/annual), and notations. Unified job titles using regex and string matching. Applied log transformation to handle skewed distributions.

Analysis

Looked at salary distribution by country and occupation, identified career paths (industry vs. academia vs. entrepreneurship), and ran correlation analysis between geography, designation, and compensation.

Key Findings

  • Data scientists and senior software engineers had the highest salaries (14-18 LPA median)
  • Geographic mobility mattered more than job title. Same designation showed 3x salary variance by country
  • Alumni working abroad earned 2-3x more than domestic positions

Modeling

Built Random Forest models to predict salary using designation, country, and years since graduation. Used K-means clustering for career trajectory patterns. Applied 5-fold cross-validation for evaluation.

Note: This was an academic exploration. Planning to refine the analysis and findings.