Page MenuHomePhabricator

Scoring
Updated 2,575 Days AgoPublic

Vision

Many educational games sport adaptive scoring and leveling technologies, but many miss the mark with actually delivering the right content. Two major problems are A) most systems are easy for the player to botch, to force it into presenting easy material, and B) many systems rely on one-size-fits-all levels that don't really match the individual player's needs.

Scoring Algorithm

I (@jcmcdonald) authored this algorithm back in 2013, in response to the problem that pure percentage scoring presented. While accurate for high question counts (i.e. 30 of 50), it did not accurately reflect scores with low question counts (i.e. 3 of 5). The alternative was giving 2 points for every correct answer, and removing 1 for every incorrect answer. However, this made scores for high question counts inaccurate.

I stumbled into this completely by accident, and the exact mechanism behind this algorithm is completely unknown. Two math professors tried to prove my algorithm, and both have concluded that it works, but for completely unknown reasons. (Coooooool!)

Here is the table demonstrating this.

Correct/TotalPure PercentageWeighted 2-1Trailcrest
3/560%75%85.71%
6/1060%75%77.78%
5/1050%66.67%70%
7/1070%82.35%84.48%
30/5060%75%63.78%
60/10060%75%61.90%
50/10050%66.67%52%
70/10070%82.35%71.65%
500/100050%66.67%50.20%

The significance of this is in that, because I am showing the grade as a LETTER GRADE (A+, B-, etc.), my algorithm’s percentages translate to a score which is more representative of the student’s ACTUAL mastery.

Case and point, on the first result, 3 correct answers out of 5 in an activity is actually fairly good in reality, however 60% would be considered failing. A 2-1 weight gives a slightly better result, but still not a good indicator. My algorithm weights the score to be 85.71%, which translates to a B, which is far more indicative of the student’s actual progress.

Meanwhile, 6 out of 10 is not as good in reality (though not horrible, either), and it is given a score of 77.78%, a C. Under both of the other methods, it would receive the same score as 3 out of 5.

Meanwhile, 30 out of 50 is 63.78%, somewhat closer to the pure percentage of 60%. Here, we see that the 2-1 weight is now overcompensating.

The same is true of 60 out of 100, which is 61.90% by my algorithm (versus a pure percentage of 60%).

In fact, the larger the numbers are, the smaller the offset. 500 out of 1000 is 50.20% under my algorithm, versus 50% pure percentage. At this point, the 2-1 weight is still wildly overcompensating by an entire letter grade.

The whole point of this is, logically, the more chances a student HAS to get right answers, the less of an offset. Inversely, the less chances a student has to get right answers, the more of an offset.

Scoring Algorithm - Pseudocode

t = correct + incorrect
incorrect offset = (t/2.0) - 1.5
correct offset = t – incorrect offset
n = incorrect * incorrect offset;
c = correct * correct offset;
r = 100.00 - (100.00/(c + n)) * n
Last Author
jcmcdonald
Last Edited
Jul 27 2015, 6:17 PM

Event Timeline

jcmcdonald edited the content of this document. (Show Details)
jcmcdonald added a subscriber: jcmcdonald.