1.1 Research Question
The report is related to the Research Question 1: Are there any differences in the opinions of students and experts regarding the quality of explanation? The experiment recreates some features of Evans et al. (2022), as it involves the comparison of the judgements of mathematical explanations between the raters (students and experts) based on comparative judgement data. ## 1.2 Background
Comparative judgement (CJ) is the process in which groups of judges sequentially make judgements between pairs of items, which allow construction of a quality scale that is reliable but does not require rubrics. Analyzing the common grounds of the perception of the quality of explanation between the students and experts is relevant to the educational assessment and feedback practices. ## 1.3 Data Sources
Student judgements: workshop-groupA-judgements.csv and workshop-groupB-judgements.csv
Expert judgements: expert-judgements-Partial/decisions-clean.csv
Analysis focuses on explanations judged by both groups (n = r length(common_explanations)` common explanations)
2.1 Data Preparation
The data on student judgement are extracted out of two CSV files, which represent two workshop groups, and then combined into one dataset. Another column (judge_type), is added to distinguish student raters and expert raters. Equally, the expert judgement information is also imported and labelled as such. To facilitate compatibility and uniformity, the winner and the loser variables which depict the results of pairwise judgement are transformed to character data types. This step prevents the gaps that could be caused by the mixed data format hence determines the shared reasons perceived by both the students and professionals by the overlap of the unique distinguishing information in both data sets.
groupA_judgements <- read.csv("C:/Users/f/Desktop/data/workshop-groupA-judgements.csv")
groupB_judgements <- read.csv("C:/Users/f/Desktop/data/workshop-groupB-judgements.csv")
student_judgements <- bind_rows(groupA_judgements, groupB_judgements) %>%
mutate(judge_type = "student")
expert_judgements <- read.csv("C:/Users/f/Desktop/data/expert-judgements-PARTIAL/decisions-clean.csv") %>%
mutate(judge_type = "expert")
student_judgements <- student_judgements %>%
mutate(winner = as.character(winner), loser = as.character(loser))
expert_judgements <- expert_judgements %>%
mutate(winner = as.character(winner), loser = as.character(loser))
common_explanations <- intersect(
unique(c(student_judgements$winner, student_judgements$loser)),
unique(c(expert_judgements$winner, expert_judgements$loser))
)
2.2 Analytical Approach
In this section, the modelling of the judgement data is done using the Plackett-Luce framework. The data are then filtered to only keep pairwise comparisons that include the common explanations that have been identified above. This will provide a basis of analytical comparability as both the analysis of the students and the expert will use the same sets of items.
The refined data is then converted into ranking objects by the function, as.rankings(), that changes winner-loser format into a ranking organization that can be used in modeling probabilistically. The two distinct Plackett-Luce models that fit these ranking objects include those that model student data and those that model expert data. In both models, the log-worth parameters (each) are the estimated measure of the latent quality or perceived merit of any individual explanation, depending upon the number of times it is chosen over others in a series of pairwise choices.
The coefficients of the fitted models are chosen out as quality scores of each explanation. Comparing between the groups, these scores give quantitative evidence of the extent to which students and experts agree in their judgments of the quality of the explanations. The script combines the coefficients that have been extracted into a single dataframe where the student quality score and the expert quality score are combined together with the difference and the absolute difference of the two quality scores.
Lastly, the relationship between the student and expert rankings is statistically evaluated using Spearman rank correlation coefficient which is a non parametric measure that measures the monotonic relationship between the two sets of rankings. The significance test, (cor.test) associated with it identifies an observed correlation to be significant, thereby giving empirical justification to either agree or disagree between the groups.
Data filtering: Retained only explanations judged by both students and experts
Model fitting: Separate Plackett-Luce models for student and expert judgements
Quality estimation: Derived log-worth values representing perceived quality
Comparison: Spearman correlation between student and expert quality rankings
Disagreement analysis: Identified explanations with largest rating differences
student_common <- student_judgements %>%
filter(winner %in% common_explanations & loser %in% common_explanations) %>%
select(judge, winner, loser)
expert_common <- expert_judgements %>%
filter(winner %in% common_explanations & loser %in% common_explanations) %>%
select(judge, winner, loser)
student_rankings <- as.rankings(student_common[, c("winner", "loser")],
input = "orderings", id = student_common$judge)
expert_rankings <- as.rankings(expert_common[, c("winner", "loser")],
input = "orderings", id = expert_common$judge)
model_students <- PlackettLuce(student_rankings)
model_experts <- PlackettLuce(expert_rankings)
student_qualities <- coef(model_students, log = FALSE)
expert_qualities <- coef(model_experts, log = FALSE)
common_items <- intersect(names(student_qualities), names(expert_qualities))
comparison_df <- data.frame(
explanation_id = common_items,
student_quality = student_qualities[common_items],
expert_quality = expert_qualities[common_items]
) %>% mutate(
quality_difference = student_quality - expert_quality,
abs_difference = abs(quality_difference)
)
correlation <- cor(comparison_df$student_quality, comparison_df$expert_quality,
method = "spearman", use = "complete.obs")
cor_test <- cor.test(comparison_df$student_quality, comparison_df$expert_quality,
method = "spearman")