Case Study · AI · Teacher Tools

Grading
Assistant.

As the LLM revolution started reverberating across the world, NoRedInk was in an interesting position. It had no prior experience with AI or machine learning, but its whole reason for being (teaching English Language Arts) was exactly what the early models seemed best able to support.

Companies were employing "spray-on AI" to ride the early wave, bolting on LLM-powered features that were more about checking a box than meaningfully improving their products. NoRedInk wanted to take a more thoughtful, long-term approach. After early ideation, we arrived at automatic grading: a feature that LLMs were uniquely good at, and one that would meaningfully serve teachers and students in a fundamental way.

Finding the Opportunity

Our early research showed that grading writing is one of the highest-friction tasks for teachers. Even committed teachers reported difficulty with limited time and providing individualized feedback consistently. Teachers who were less confident teaching writing often avoided assigning it at all, prioritizing easier-to-measure skill practice. One teacher interview emphasized that he "would not typically stray into writing…unless AI can grade it and ELA teachers trust it over 80% of the time."

We began internal experimentation to see whether an LLM could evaluate student work reliably, how rubric-aligned feedback might be generated, and what level of accuracy was achievable with prompt engineering and curriculum input. I collaborated with a product manager, curriculum specialist, and engineers to define the "surface area" that an MVP could reasonably support. These early constraints helped refine the ideation into a manageable product scope suitable for beta testing: focus first on argumentative paragraphs, where grading criteria are explicit and structured, and prioritize a lightweight experience that reduces friction for teachers rather than adding steps.

Designing the Experience

Based on what we were hearing from teachers, we established design principles to guide our outcome:

  • Teacher-first: Teachers must always remain in control of grading decisions.
  • Transparent: Show how the AI reached its judgment.
  • Lightweight: Reduce grading effort rather than introducing new workflows.
  • Aligned: Follow familiar rubric structures grounded in curriculum accuracy.

These principles led me to a design that used the existing rubric-based grading interface as a starting point, then added AI feedback as comments, which also doubled as explanations for teachers. I designed the experience so that teachers needed either to edit or approve the AI feedback, supporting teacher-first.

One particular area of focus was how explicitly teachers should need to review/approve the AI feedback. I removed the ability to mass-approve all feedback out of a desire to ensure only vetted feedback reached students, and to avoid making the experience too onerous, I also removed the nag modal encouraging review.

Launching the Beta

After validating the design with internal users and stakeholders, and after the feedback produced by Grading Assistant was meeting our quality threshold, we launched a beta to a subset of users. The purpose was to validate whether AI-assisted grading could measurably reduce teacher workload, accelerate feedback to students, and maintain a level of scoring reliability that preserved teacher trust.

Prior to beta launch, I created a microsite to announce the coming feature and provide a way for teachers to sign up. This built hype and established a means to directly communicate with our beta testers. I finalized the user experience, usability tested the final prototype to ensure the interface made sense to teachers, and worked with engineering on adding polish such as small animations that played when teachers approved feedback.

Getting Back Results

The results from the beta were extremely encouraging:

  • Teachers spent 82 seconds manually grading comparable assignments, compared to 37 seconds with Grading Assistant.
  • Twice as many students received feedback within a day when graded with Grading Assistant.
  • Students were 3× more likely to receive written feedback when Grading Assistant was used (45% vs. 16%).

Usage and adoption were also strong. 15% of teachers who assigned any writing chose a Grading Assistant assignment, and 55% of teachers who viewed one ultimately assigned it. Qualitatively, teachers showed high enthusiasm. Many expressed interest in using Grading Assistant "every day" for writing practice, and remarked that students found the feedback easy to act on and were energized by how quickly it arrived.

Of course, these were beta users, a self-selecting group. Nevertheless, both quantitatively and qualitatively, we felt affirmed that we were on the right track and should proceed immediately to a full release.

Improvements Based on User Feedback

Teachers in the beta edited scores or feedback only 5% of the time, suggesting strong perceived accuracy. And since many noted that manually clicking "accept" for each comment could be tedious, I iterated on the interface to drop the need to approve AI comments, resulting in a simpler, more efficient experience. This then raised a question of whether edits were being saved, which I resolved with a simple autosave message.

Much of the post-beta iteration was in the area of comment tone and structure, including more positive language, higher specificity, and better personalization. We also found that feedback containing paraphrased student writing could unintentionally repeat harmful content in rare cases. As a response, I designed an experience whereby teachers would be alerted to flags and could choose to grade manually or force Grading Assistant to evaluate a submission.

Perhaps the biggest barrier to satisfaction was that teachers simply wanted more: more genres, longer essays, and support for source texts. This was a great problem to have, and I proposed early examples of how more options and different types of grading could be integrated into the experience.

Productization & Integration

Moving from a limited beta to a full-fledged platform feature meant integrating Grading Assistant seamlessly into the application while also promoting its use. I surveyed our teacher experience to ensure it supported and acknowledged Grading Assistant, adding iconographic notations to differentiate it from normal assignments, and calls to action in appropriate locations such as recommended assignments on the dashboard and in the assignment library.

I also created upsells for free teacher users as part of a free trial. Teachers who used Grading Assistant during the free trial were 70% more likely to apply for Premium.

Beyond the core teacher experience, I worked with our Enterprise team to add these new Grading Assistant assignments to our Benchmark experience, creating the flow whereby district admins would choose from and modify Grading Assistant assignments among those supported by the Benchmark feature.

Conclusion

Grading Assistant meaningfully reduced friction in writing instruction by halving teacher grading time, tripling student access to written feedback, and increasing teacher willingness to assign writing tasks. Our cross-disciplinary approach ensured that the feature not only met the technical and pedagogical challenge of automated grading, but also built trust among teachers.

The design process demonstrated the importance of framing new technology within familiar boundaries. Beyond technical accuracy, teachers needed transparency, pedagogical alignment, and confidence that they remained in control. Small interaction decisions, such as how feedback was structured, when approval was required, and how edits were saved, had an outsized impact on the experience.

Going forward, my focus shifted to supporting longer, more structured essays, and refining the interface to allow teachers to view more content on smaller screens. Now that the concept was proven, we were full steam ahead on expanding the feature to support as wide a variety of writing types as possible and making Grading Assistant more discoverable and usable.