Alumni Career Success Prediction
Nov 2023
View on GitHubWhat I Did
Analyzed 3,300+ alumni records across 4 fragmented datasets to predict career trajectories and identify patterns in career success.
Data cleaning
Merged 4 separate Excel files with completely inconsistent formats. Standardized salary data across different currencies (USD/INR), time periods (monthly/annual), and notations. Unified job titles using regex and string matching. Applied log transformation to handle skewed distributions.
Analysis
Looked at salary distribution by country and occupation, identified career paths (industry vs. academia vs. entrepreneurship), and ran correlation analysis between geography, designation, and compensation.
Key Findings
- Data scientists and senior software engineers had the highest salaries (14-18 LPA median)
- Geographic mobility mattered more than job title. Same designation showed 3x salary variance by country
- Alumni working abroad earned 2-3x more than domestic positions
Modeling
Built Random Forest models to predict salary using designation, country, and years since graduation. Used K-means clustering for career trajectory patterns. Applied 5-fold cross-validation for evaluation.
Note: This was an academic exploration. Planning to refine the analysis and findings.