Bob, Curriculum Assessor Agent
A Copilot Studio agent I built to evaluate uploaded training decks against a 21-criterion curriculum analysis rubric and write structured results straight to Excel. Replaces the 1-2 day manual review with a 10-15 minute pipeline, with the assessment criteria visible and editable per engagement.
Reviewing an inherited training deck used to take a day or two of slow expert reading: stop, score, write a comment, score the next dimension. The shape of the work is repeatable. The cognitive load is exhausting. I built Bob to handle the structured layer so the human reviewer can spend their time on the judgment calls.
What I built
A user uploads a training deck (PDF exported from PPTX) into a Copilot chat or a OneDrive folder. Power Automate picks up the file, extracts text via a Cloudmersive PDF connector, and hands it to the Copilot Studio agent. The agent evaluates the content against the 21-criterion Curriculum & Content Analysis rubric, three categories deep:
- General Information (program title, owner, description, duration, target audience, mode of delivery)
- Instructional Design (learning objectives, structure, assessment alignment, scenario quality, and four more)
- Content (accuracy, depth, relevance, tone, and three more)
For each criterion the agent returns a verdict and a short evidence string. Results write to Excel via Power Automate, one row per deck, with the column structure preserved exactly so the L&D team’s existing template still works. A summary email goes out at the end.
What it improved
The interesting engineering work was in the Power Automate-to-Excel handoff: getting the JSON parsing right (record, not table), the dynamic column naming via Office Scripts so the verdict column gets renamed to the filename, and the Power Fx record syntax for updating rows. Production version uses an OneDrive trigger because chat attachments don’t auto-trigger Copilot agents in the way you’d hope.
The version of Bob I’m running now started on GPT and was migrated to Claude as the reasoning layer. The accuracy on the harder criteria, the ones that require reading between the lines of a training deck, climbed noticeably after the migration. That’s a signal worth keeping in mind for anyone running production assessment agents on older model families.