First Update on My Specifications Grading Experiment

21Mar16

Last month I shared my plan to use specifications grading in my Television and American Culture course this spring semester. I just finished marking the first exam, which provides my first real opportunity to reflect on how the experiment is going. (Make sure to read that previous post for the specifics of the approach and course design.) Below I walk through the first exam, what my students did, and reflect what this system has revealed to me about my teaching and students’ learning.

The course has 31 students enrolled, and all seem to be on board with the grading approach. I asked students to sign a short form to affirm their understanding with the grading system, and asked them to indicate (with no binding commitment) which “bundle” of assignments, and thus which final grade, they planned on working toward in the course. 85% of the students said they planned on working toward an A, with the remaining 15% indicating the B bundle. This wasn’t much of a surprise, given that the norm at Middlebury is toward receiving A grades – if anything, the surprise was that as many as 5 students said they were striving toward “only” a B in the course. It will be interesting to track how this initial plan matches the work that students end up doing, as I expect there will be some who started aiming at an A who choose to do less work as the semester proceeds, and perhaps a few of who revise their aim higher.

The first exam consisted of two questions, each with two versions – the Basic versions provide opportunities for students to demonstrate their ability to restate the course content in their own words (which, as an open book take-home exam, should not be particularly challenging), while the Advanced versions ask students to apply this knowledge to specific examples or to craft their own arguments about the concepts – doing more Advanced questions allows students to qualify for B or A final grades, while every student must satisfactorily complete at least Basic versions of six exam questions throughout the semester. For the first exam, all students may revise their Unsatisfactory answers at no “cost,” while future exams require them to spend “flexibility tokens” to revise answers. Each question on the three exams focuses on one of the 6 units in the course, so it is all very structured and hopefully transparent as to what is being evaluated.

To give a sense of what was asked and how the specifications were given, here is the full text for the first question:

Question #1 – complete either the Basic or Advanced version.

Basic prompt:

Commercial television strives to create new programs that minimize risk of failure and maximize the chance of success. Describe the process by which a new program moves from an idea to actually airing on television, highlighting at least five distinct and specific ways that producers and distributors try to reduce risk and create commercial success.

Advanced prompt:

Commercial television strives to create new programs that minimize risk of failure and maximize the chance of success. Analyze the upcoming NBC series Heartbeat, based on the information and videos contained on its official website, highlighting at least five distinct and specific ways that you can see evidence of producers and distributors trying to reduce risk and create commercial success. You do not need to do any research beyond the NBC site, although you may look up additional basic information as needed, citing sources for anything beyond the website.

Specifications:

A satisfactory answer to either of these questions must meet all of the following specifications:

Submitted via Moodle by 11:00 am, Tuesday, March 15, as an uploaded document in either .docx, .doc, or .rtf format.

Consists of at least 750 words, not counting quotations.

Includes the honor code pledge.

Cites any sources referenced beyond the assigned course readings, screenings, class meetings, and the NBC Heartbeat website, following either MLA or Chicago style consistently. (Additional sources are not required.)

Cites any direct quotations from assigned course readings, screenings, and class meetings. References to ideas from those sources do not need to be cited unless directly quoted.

Makes no more than 5 errors to standard written English.

Contains no more than 2 minor factual inaccuracies and no major factual inaccuracies.

Clearly addresses the chosen essay prompt (indicating whether it is a basic or advanced essay), with the minimum number of five distinct examples and points that meet the prompt’s parameters, with at least three of the points being of high significance.

Makes relevant connections to course materials and appropriate use of terminology.

Expresses ideas clearly and fluently in your own words, with coherent and effective organization.

Additionally, a satisfactory answer to the Advanced prompt must meet the following specifications:

Applies course materials and concepts to the specific instance of Heartbeat, highlighting relevant and important aspects of the case study.

Demonstrates original analytical thinking about the case study.

Interestingly, every student chose the Advanced version of this question, opting to apply the concepts rather than just restating them—I had designed this question to be very straightforward and one of the easier Advanced versions to undertake. 60% of the students wrote Satisfactory essays on the first try, with many very strong analyses. From a conventional grading standpoint, a 40% Unsatisfactory rate would be shocking, as I cannot remember anytime where 40% of my students got an essay question “wrong” like this.

But the Satisfactory / Unsatisfactory marks do not correspond to Right / Wrong, nor Pass / Fail. This was one of the crucial insights I gained by marking these essays: Unsatisfactory really means “not done yet.” As I described to my class when announcing the high rate of Unsatisfactory (which seemed to shock them as well), think of it like when your parents ask you to clean your room: you tell them that you’re done, they assess your work and say “not done yet!,” giving you a chance to keep cleaning to meet their standards. This is also more comparable to how projects are assessed within many professional worlds, where work that doesn’t meet expectations will require another round (at least) of work to bring it up to snuff. With a list of clear specifications, there are a range of ways that an essay might not be Satisfactory: some of the Unsatisfactory essays cited sources inappropriately, while a few included some factual inaccuracies. The most common reason why these essays did not meet the specifications was that they did not clearly iterate five distinct points, either through ineffective structure that muddied the analysis, or including multiple points that were too similar (e.g. it’s a clone and it’s an imitation).

The numbers were even more stark for Essay #2. I’m not going to share the entire question, but the crux of it focused on exploring a key concept tied to broadcast regulation. The Basic prompt asked, “Identify at least three distinct ways that X shapes American television, referencing readings, screenings, and course topics directly as needed, and using specific examples to support your discussion.” The Advanced prompt asked for argumentation, not just description: “Stake a claim as to how X impacts American democracy. In arguing your position, include at least two distinct points supporting your position, and explain and rebut at least one counter-argument against your position, referencing readings, screenings and course topics directly as needed, and using specific examples to support your claims.” The specifications were similar to those on question #1.

Unlike the first question, some students opted for the Basic version, with 20% choosing the less challenging approach; interestingly, only one of the students who opted to answer the Basic question had declared their intention to strive for the B bundle in the course, meaning that either students are changing their intentions and/or a number of students striving for As opted to use their one opportunity to choose an easier question early in the semester. Satisfactory rates were much lower for this question – for those electing the Basic version, 2/3 were Satisfactory, while the Unsatisfactory essays all fell short of required word count, a classic case of an “unforced error.” The Advanced essays had only a 40% Satisfactory rate, with the Unsatisfactory answers falling short in a range of different ways, from major to minor concerns.

Around 60% of my students will have to revise at least one of the essays. My hope is that through the revision process, sustained learning may occur, as students grapple with what it means to write a successful analytical or argumentative essay. Whether it’s through the more surface level attention to requirements for citations, word count, or grammar, or the deeper challenges of understanding how to make ideas clearly distinct or structuring an argument to convey a clear position, there are many more opportunities here for students to engage with their own work and strengthen their rhetorical skills, far beyond what might happen with a conventionally graded assignment.

One of the things that I’ve learned is how different a conventional grading scheme feels from this Satisfactory / Unsatisfactory model, especially at the higher end. While any essay that would have typically received an A or A– did get Satisfactory, essays that I might have given a B or B+ fell into both camps. There were very smart essays with some surface issues like citation format that got Unsatisfactory, and there were Satisfactory Advanced essays that met the specifications, but were not particularly compelling or insightful.

This points to what seems to be the greatest disparity between conventional and specifications grading: there is no differentiation between work that meets expectations and that which exceeds them. This distinction is typically one that I adhere to in my conventional grading: work that is good enough but not great gets a B+, while I reserve A– and A for work that exceeds expectations. This leads to my courses typically having an average grade of B+, as with the last time I taught Television & American Culture, with a GPA of 3.35 and 38% of students receiving final grades of A or A–. (Note such grades put my courses well below average at Middlebury, where more than 50% of all grades given are A or A–, and overall average GPA is around 3.5.)

I’m trying to figure out what the effect of eliminating the distinction between meeting and exceeding expectations will be, and why it matters. Any student who satisfactorily completes all of the required work to receive an A in this course should have demonstrated that they accomplished all of the learning goals, probably more systematically than some students who earned As in the past. Should those who meet expectations with more “style” and exceptional level of accomplishment be rewarded with something beyond an A? (A+ is not an option at Middlebury.) I don’t think so, meaning that to differentiate between meeting vs. exceeding expectations requires lowering the outcome for meeting all expectations to a lower grade (A– or B+), reserving a straight A for those who significantly exceed expectations. There are systems for doing this, differentiating Satisfactory / Unsatisfactory marks into the wider range of E / M / R / F (Excellent / Meets expectations / Revision needed / Fragmentary), and then requiring a certain number of E marks to earn an A. However, I feel like this just renames the conventional A / B / C / D model, undermining the focus on specifications and promoting the stressful drive that students have to go beyond expectations and aim for the vague difference that earns the highest mark.

So what is the difference between meeting and exceeding expectations? On my assignments, typically it’s elegance and style in writing, subtlety of analysis, originality of insight, and depth of thinking. These are not learning goals for the course, and they are not things I directly teach—obviously I value all of these elements, and try to model them in leading discussion and assigning exemplary readings, but I do not focus on such advanced abilities in this introductory course. This is the crux for me: the students who are exceeding my expectations are doing so based on what they bring to the course, rather than what they are learning from the class. Of the handful of Advanced essays that exceeded my expectations on this exam, almost all were written by upperclass Film & Media majors who had taken a previous course from me. That suggests that they learned how to write effective and compelling media studies analyses in those (and other) courses, and they exhibited that practiced skill admirably on my exam. Should those abilities be overly rewarded within this course, at the cost of the grades for students who meet expectations but did not come to the course with the same experience and background? I think not.

Obviously there’s a ways to go before the semester ends, but I feel like this is a crucial insight: our grading systems need to measure, reward, and incentivize students’ work and learning within the course, not reward or punish what they bring to the course. I’ll be quite curious to see how that plays out in future assignments, now that students have a better sense of how grading will work and what my expectations are. Stay tuned…

Filed under: Academia, Middlebury, Teaching | 5 Comments
Tags: specifications grading

5 Responses to “First Update on My Specifications Grading Experiment”

Feed for this Entry Trackback Address

1 sthistorian on March 24, 2016 said:

Fascinating. I also experimented with specifications grading last semester and had the same issue — how (or whether) to distinguish between satisfactory and excellent. I appreciate, and plan to think about, your insight about rewarding performance within the course. However, mine was a course that fulfilled our general education communication requirement, so I was teaching writing/style in it, as well as content/analysis. I’m toying with the idea of requiring a certain number of “exceeds expectations” results for an A, which would require students to at least experience revision to a high standard. In general, though, while the students were shocked by the system (particularly the need to follow instructions), I have never had so many students so happy with what they learned. The high standards and the opportunities for revision meant that students worked far harder in this class, and were much happier with the results, than in my previous versions of the course. I think that at least part of that was due to the specifications grading scheme.

Reply

	Tikno on More Evidence that AI Excels a…
	Nicholas on On Disliking Mad Men
	Daniel Bilski on Lost in a Great Story
	JanMike on On Disliking Mad Men
	Why using AI tools l… on More Evidence that AI Excels a…
	More Evidence that A… on Some Interesting Limits to AI…
	Specifications Gradi… on Rethinking Grading: An In-Prog…
	First/Final Minutes… on Videographic Deformations: Pec…
	The Sounds of Silent… on More videographic news!
	The Sounds of Silent… on A Bit of Good News