Build: Interactive Exploration Labs
Duration: 90 minutes
- Lab 1: Prompt Design (15 min)
- Lab 2: Personalization A/B Test (15 min)
- Lab 3: Data Model Designer (20 min)
- Lab 4: Parameter Playground (10 min)
- Lab 5: CLT Analyzer (20 min)
- Final Reflection (10 min)
Learning Objectives
By the end of this section, you will:
- Experience how prompt design shapes learning outcomes
- Feel the cognitive load difference between generic and personalized examples
- Design the data structure for pedagogically sound worked examples
- Understand how model parameters affect educational quality
- Evaluate generated examples using CLT criteria
The Interactive Exploration App
Below is a live marimo notebook with 5 hands-on labs. You’ll experiment with the design decisions that shape AI educational tools.
How to use it:
- Scroll through the labs in order
- Fill in text fields with YOUR information
- Click buttons to generate examples
- Compare results and reflect on what you notice
Lab 1: Prompt Design Laboratory (15 minutes)
What You’ll Explore
Learning Question: How does prompt engineering affect the quality of worked examples?
Instructions
Read both prompts in the app:
- Basic Prompt (no pedagogical grounding)
- CLT-Grounded Prompt (reduces cognitive load)
Click “Generate Both Examples”
Compare the results:
- Which problem is clearer and more specific?
- Which solution breaks down steps better?
- Which explanation helps you understand WHY, not just WHAT?
Key Insight
The prompt IS your pedagogical design encoded in language.
Every word in your prompt shapes the language model’s output. Generic prompts produce generic examples. Pedagogically grounded prompts produce learning-focused examples.
Reflection Questions
Think about:
- What specific phrases in the CLT-grounded prompt improved the output?
- How could you apply this to prompts in YOUR teaching domain?
- What CLT principles are explicitly mentioned in the grounded prompt?
Discussion: Share one phrase from the CLT prompt that you found particularly effective.
Lab 2: Personalization A/B Test (15 minutes)
What You’ll Explore
Learning Question: Can you FEEL the difference in cognitive load?
Instructions
Enter YOUR context:
- Your hobby or interest (e.g., photography, cooking, gaming)
- What you want to achieve (e.g., build a recipe app, automate photo editing)
Click “Generate A/B Comparison”
Read both examples:
- Generic (standard textbook style)
- Personalized (using your context)
Notice how each one FEELS:
- Which is more engaging to read?
- Which feels easier to process mentally?
- Can you visualize the personalized example more easily?
Key Insight
This is the personalization effect in action!
Familiar contexts require less cognitive effort to process. When you don’t have to decode an unfamiliar scenario, more working memory is available for learning the target concept.
Try This
Experiment further:
- Generate examples for 2-3 different hobbies
- Notice how the SAME concept (Python dictionaries) gets explained differently
- Which personalized context resonated most with you?
The takeaway: Personalization isn’t just “nice to have”—it’s a cognitive load reduction strategy.
Reflection Questions
Think about:
- How did the personalized example reduce extraneous cognitive load?
- Could you use personalization in YOUR teaching context?
- What student interests/contexts could you leverage?
Discussion: Share your most effective personalized example with a neighbor.
Lab 3: Data Model Designer (20 minutes)
What You’ll Explore
Learning Question: What makes a worked example “worked”?
Instructions
Read the current data model shown in the app
Select fields you think support learning:
problem: str(The problem statement)solution_steps: list[str](Steps as a list for chunking!)solution: str(Solution as one big block)final_answer: str(Explicit conclusion)key_insight: str(Why this approach works)code_with_comments: str(Annotated code)common_mistakes: str(What to avoid)connection_to_real_world: str(Practical relevance)
See the pedagogical analysis:
- What CLT principles do your choices implement?
- What’s your design score?
- What feedback does the analyzer provide?
Key Insight
The data structure IS the pedagogy.
When you design a Pydantic model (the structure that controls generated outputs), you’re making pedagogical choices. Each field implements (or undermines) a CLT principle.
Reflection Questions
Think about:
- Why is
solution_steps: list[str]better thansolution: strfor novices? - What field would you ADD for your teaching domain?
- How does structure guide (or constrain) what the model generates?
Discussion: Design the ideal data model for worked examples in YOUR subject area. What fields would you include?
Lab 4: Parameter Playground (10 minutes)
What You’ll Explore
Learning Question: How do model parameters affect pedagogical quality?
Instructions
- Adjust the parameters:
- Reasoning Effort (none, low, medium, high)
- Verbosity (low, medium, high)
- Read the guidance:
- For novices: Low reasoning (fast), medium-high verbosity (detailed)
- For experts: Higher reasoning (better solutions), lower verbosity (concise)
- Consider the tradeoffs:
- More reasoning = better quality but slower and more expensive
- Higher verbosity = clearer explanations but longer to read
Key Insight
The “best” parameters depend on your learners!
There’s no universal setting. You must match technical parameters to pedagogical needs:
- Novice learners: Need detailed, step-by-step explanations (high verbosity)
- Expert learners: Want concise, sophisticated solutions (low verbosity, high reasoning)
- Budget constraints: Lower reasoning is faster and cheaper
- Quality requirements: Higher reasoning produces better examples
Reflection Questions
Think about:
- What parameters would you use for YOUR learners?
- How would you balance cost and quality?
- When might you use different settings for different students?
Lab 5: CLT Analyzer (20 minutes)
What You’ll Explore
Learning Question: Can you evaluate examples using CLT principles?
Instructions
Click “Generate Random Example”
Read the example carefully
Evaluate it using the checklist:
- ✅ Reduces extraneous cognitive load (no unnecessary complexity)
- ✅ Manages intrinsic load (breaks problem into chunks)
- ✅ Optimizes germane load (helps build schemas/patterns)
- ✅ Is a WORKED example (shows complete solution, not a puzzle)
- ✅ Has clear step-by-step progression
- ✅ Explains WHY, not just WHAT
See your score:
- 5-6: Excellent pedagogical design
- 3-4: Good, but room for improvement
- 1-2: Needs significant pedagogical revision
Key Insight
You’re developing a CLT-grounded critical lens for evaluating AI tools!
This skill is more valuable than coding. When you can evaluate generated outputs using learning science principles, you can:
- Spot pedagogically weak examples
- Request specific improvements
- Compare competing language models and tools
- Design better prompts and data models
Try This
Generate 3-4 examples and evaluate each:
- Do you see patterns in what GPT-5.1 generates well?
- What does it consistently miss?
- How would you revise the prompt to improve low-scoring areas?
The goal: Develop your critical evaluation instinct.
Reflection Questions
Think about:
- Which CLT criteria are hardest for language models to meet?
- What prompt changes would improve low-scoring examples?
- How would you use this checklist when evaluating tools you already use?
Discussion: Share one example you evaluated. What was its score? What would you improve?
Final Reflection (10 minutes)
What You’ve Learned
Through these 5 labs, you explored:
- ✅ Prompts encode pedagogy (Design drives outputs)
- ✅ Personalization reduces load (Context matters)
- ✅ Structure shapes learning (Data models are pedagogical choices)
- ✅ Parameters affect quality (Settings have learning implications)
- ✅ Critical evaluation is a skill (You can assess AI tools with CLT)
Integration Questions
Consider:
- What surprised you most?
- Which lab challenged your assumptions?
- What principle seemed most powerful?
- What will you change?
- How will you modify prompts you write for language models?
- What will you look for when evaluating educational technology tools?
- What will you build?
- Could you adapt this pattern to your teaching domain?
- What concepts would you include in your worked example generator?
- What questions remain?
- What do you still want to understand?
- What would you need to deploy this in your context?
Checkpoint: Can You Answer These?
Pedagogical Understanding:
- Why does chunking (
solution_steps: list[str]) reduce cognitive load? - How does personalization reduce extraneous cognitive load?
- What makes a worked example “worked” versus a problem to solve?
Practical Skills:
- Can you write a CLT-grounded prompt for your subject?
- Can you evaluate a generated example using CLT criteria?
- Can you design a data model for worked examples in your domain?
Critical Thinking:
- What’s the difference between technically impressive and pedagogically sound?
- When should you prioritize speed/cost vs. quality?
- What ethical considerations arise from personalized learning tools powered by language models?
What’s Next?
You’ve explored the design principles. Now you’re ready to:
Option 1: Use the Complete Tool
Try the full Worked Example Weaver
- See all 3 domains (Programming, Health Sciences, Agronomy)
- 16 concepts to explore
- Generate personalized examples for real learners
Option 2: Design Your Extensions
Move to the Extend section where you’ll:
- Plan how to adapt this to YOUR teaching context
- Sketch your personalized worked example tool
Option 3: Dive Into the Code
If you want to understand the technical implementation:
- View the complete app.py on HuggingFace
- Explore the Marimo and Pydantic documentation
Next: Extend Section (Design your own extensions)