Data Science for Social Science Research
Najah
Ashoka University
left_join()
: all rows from xright_join()
: all rows from yfull_join()
: all rows from both x and yinner_join()
: all rows from x where there are matching values in y, return all combination of multiple matches in the case of multiple matchesanti_join()
: return all rows from x where there are not matching values in y, never duplicate rows of xFor the next few slides…
id | value_x |
---|---|
1 | x1 |
2 | x2 |
3 | x3 |
id | value_y |
---|---|
1 | y1 |
2 | y2 |
4 | y4 |
left_join()
id | value_x | value_y |
---|---|---|
1 | x1 | y1 |
2 | x2 | y2 |
3 | x3 | NA |
right_join()
id | value_x | value_y |
---|---|---|
1 | x1 | y1 |
2 | x2 | y2 |
4 | NA | y4 |
full_join()
id | value_x | value_y |
---|---|---|
1 | x1 | y1 |
2 | x2 | y2 |
3 | x3 | NA |
4 | NA | y4 |
inner_join()
id | value_x | value_y |
---|---|---|
1 | x1 | y1 |
2 | x2 | y2 |
anti_join()
id | value_x |
---|---|
3 | x3 |
Percentage of the STEM graduates among the total direct recruits in the civil services for every year?
STEM - engineering and natural science, natural sciences and Medical sciences
First degree
Degree at the time of entry
Rows: 22,149
Columns: 9
$ ID <chr> "TN003700", "TN003700", "UP011100", "UP011100", "U…
$ Name <chr> "Ms. Anna R Malhotra", "Ms. Anna R Malhotra", "Shr…
$ Cadre <chr> "Tamil Nadu", "Tamil Nadu", "Uttar Pradesh", "Utta…
$ Reference_Value <int> 0, 1, 0, 1, 2, 3, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,…
$ Qualification <chr> "Graduate", "P.G.", "Certificate", "Graduate", "P.…
$ Subject <chr> "English Lit", "N.A.", "French", "Physics", "Econo…
$ Category_of_Subject <chr> "Languages and Fine Arts", "N.A.", "Languages and …
$ Division <chr> "Ordinary", "First", "Ordinary", "Second", "Ordina…
$ Source <chr> "Supremo", "Supremo", "Supremo", "Supremo", "Supre…
ID | Name | Cadre | Reference_Value | Qualification | Subject | Category_of_Subject | Division | Source |
---|---|---|---|---|---|---|---|---|
TN003700 | Ms. Anna R Malhotra | Tamil Nadu | 0 | Graduate | English Lit | Languages and Fine Arts | Ordinary | Supremo |
UP011100 | Shri Pk Kaul | Uttar Pradesh | 0 | Certificate | French | Languages and Fine Arts | Ordinary | Supremo |
MH006800 | Shri B G Deshmukh | Maharashtra | 0 | Graduate | Botany | Natural Sciences | Ordinary | Supremo |
TN003900 | Shri Ram Chander | Tamil Nadu | 0 | Graduate | Law | Governance, Law and Management | Third | Supremo |
AP004800 | Shri Gv Ramakrishna | Andhra Pradesh | 0 | Graduate | Chemistry | Natural Sciences | First | Supremo |
BH005400 | Shri N Nagamani | Bihar | 0 | Graduate | Mathematics | Engineering, Technology and Mathematics | First | Supremo |
edu_summary = profile_edu %>%
## limit data to direct recruitment
filter(Source_of_Recruitment == "Direct Recruitment") %>%
# select relevant vars
select(ID, Category_of_Subject) %>%
## add a new column stem_grad
mutate(stem_grad = ifelse(Category_of_Subject %in% c("Engineering,
Technology and Mathematics", "Natural Sciences","Medical" ) , "yes", "no")) %>%
# grouping the var
group_by(stem_grad) %>%
# counting by group
summarise(count = n())
stem_grad | count |
---|---|
no | 5896 |
yes | 1241 |
edu_summary %>%
mutate(
# adding up the count
sum_count = sum(count),
#getting the fraction
frac = count/sum_count,
# converting to percentage
perc = frac*100,
# rounding the perc value
pc_round = round(perc,2)) %>%
select(stem_grad, pc_round) %>%
# renaming the cols
rename("STEM Graduates" = stem_grad,
"Percentage" = pc_round)
STEM Graduates | Percentage |
---|---|
no | 82.61 |
yes | 17.39 |