Below I summarize the final project of the Data Science for All DS4A Course, 2020. Carried out by team 41, with Alejandra Martínez, Gabriel Cárdenas, Sebastián Betancourt. Under guidance of TAs: Daniel Gil , Chris Earle. See Report

Background

Education enables upward socioeconomic mobility and is a key to escaping poverty. Over the past decade, major progress was made towards increasing access to education and school enrollment rates. Nevertheless the Saber 11 exam scores have a strong dependency on the socioeconomic background of the students, so comparing them using the same scale will put certain students at an advantage that other students don’t have access to.

Our proposal is to create a scaled index based on socioeconomic variables, which will offset the score of those students with a higher than expected score and deduct those with a lower than expected score.

Model

To test our hypothesis, we used Clustering techniques to generate 3 groups using only the socioeconomic variables, and then, we check how the Scores changed along each of the groups.This three clusters are a good representation of low, medium and high socioeconomic background.

Proposed equity index

Expected Scores were calculated for each individual, comparing several regression techniques and selecting the one that performs better in the data. The selected algorithm was a Linear Regression with ElasticNet Regularization, and the expected scores were used to calculate the index for each student, used to scale its original score as follows.

\[Index = \frac{Actual Score}{Expected Score}\]

Results

As a solution to the existing gap in access to higher education, this equity indicator was created to compensate those students who excelled with scores higher than the expected score according to our model results.

When selecting the best students using this index, it is observed that the best students of each cluster (socioeconomic group) would have the privilege of accessing more benefits, such as scholarships to access university.