27 June, 2024
Virtual Academy Knowledge Base [more up-to-date than PDF]
A branch of computer science that aims to create machines that can perform tasks that typically require human intelligence.
An LLM is a type of generative AI model that is trained to predict the next word following the input (prompt).
Text is generated one word at a time (actually tokens, not words).
Generated text depends on the generative model and the context.
Every word (token) is given an equal amount time (computation per token is constant).
A foundation model, or large language model (LLM):
: What is the capital of France?
: What is the capital of Germany? What is the capital of Italy? . ..
Figure courtesy of Andrej Karpathy
Trained (fine-tuned) to have conversations: turn-taking, question answering, not being rude/sexist/racist.


What are LLMs good at?
Limitations

: What is the capital of Uzbekistan?
: Tashkent
It looks like the LLM knows the capital of Uzbekistan1.
ask (retrieve) and tell (store) facts.
| Biases in LLMs | Source | Examples |
|---|---|---|
| Training data bias | Text from internet, books, articles. | Stereotypes reflecting gender, race, religion. |
| Representation bias | Underrepresented groups/perspectives in data. | Less accurate responses for minority cultures. |
| Algorithmic bias | Training and fine-tuning algorithms. | Optimizations for fluency and coherence may lead to preference for dominant cultural narratives. |
| User interaction bias | Adaptation based on user interactions. | Increased biased or harmful content generation. |
| Privacy Concerns | Issue | Examples |
|---|---|---|
| Data memorization | Memorizing sensitive information. | Reproducing phone numbers, addresses. |
| Training data leakage | Unauthorized dissemination of confidential data. | Summarizing proprietary documents. |
| User query logging | Storing sensitive user interactions. | Exposing private queries if data is mishandled. |
| Queries used for training | User queries may be used for further training. | Personal data in queries could be inadvertently included in training data. |
PROMPT : Write a haiku about a workshop on large language models.
ASSISTANT : Whispers of circuits,
Knowledge blooms in bytes and bits,
Model learns and fits.
OpenAI give a set of strategies for using their models effectively:
These include:
: You are an expert on learning techniques. Explain the concept of ‘flipped classroom’ in one paragraph.
: You are an expert financial derivatives. Explain the concept of ‘flipped classroom’ in one paragraph.
Instead of this:
: The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1. Yes or no?
Do this:
: Is this statement correct? The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
Reason through the problem step-by-step. Start by identifying the odd numbers. Next, add them up. Finally, determine if the sum is even or odd. Write down your reasoning steps in a numbered list.
Open this activity.
If you need further help with prompting techniques, see these websites:
| Task Category | Specific Tasks |
|---|---|
| Editing tasks | Create/improve different versions of sections. |
| Transitions | Write and compare transitions. |
| Improve drafts | Critique and refine drafts. |
| Writing styles | Rewrite sections for different audiences. |
| Controversial statements | Identify controversial points and strengthen arguments. |
| Research journal | Keep a diary and use LLM for reflection. |

| LZR Racer swim suit | AI-base writing tools | |
|---|---|---|
| Improvement | Reduced Resistance, Increased Buoyancy | Improved Grammar, Formulation, Content Creation |
| Fairness | Provided an Unfair Advantage, Led to Record Performances | Considered Unfair in Academic Contexts |
| Impact | Banned to Maintain Competitive Integrity | Raises Questions of Originality and Skill Development |
| Factors | Descriptions |
|---|---|
| High pressure | High stakes increase cheating. Fear of failure reinforces this. |
| Lack of intrinsic motivation | Engagement and relevance are important. Lacking these makes cheating more attractive. |
| Perceived injustice | Unfair grading leads to cheating. |
| Low fear of getting caught | Low risk encourages cheating. |
| Peer influence | Widespread cheating among peers pressures students to join in. |
| Low self-efficacy | Doubts about one’s own abilities increase cheating as the seemingly only option. |
| Strategies | Descriptions |
|---|---|
| Foster intrinsic motivation | Spark genuine interest. Provide choices and practical applications. |
| Mastery learning | Clear learning objectives. Focus on mastery of content. Include constructive and corrective feedback in formative assessments. |
| Reduce pressure | Diversify assessment methods. Use portfolios and low-stress tests to reduce anxiety. |
| Strengthen self-efficacy | Provide constructive feedback and promote peer learning (peer tutoring, peer review). |
| Create a culture of integrity | Open discussion about academic integrity. Set clear guidelines and promote community ethics. |
| Types of Plagiarism | Description |
|---|---|
| Unattributed use | Using the work or ideas of others without proper attribution. |
| Minor changes or translations | Using the work of others with minor changes or translations without attribution. |
| Self-plagiarism | Reusing substantial parts of one’s own work without proper citation. |
| Joint works | Reusing jointly written publications without proper acknowledgment. |
| Types of Plagiarism | Description |
|---|---|
| Unattributed use | Using the work or ideas of others without proper attribution. |
| Minor changes or translations | Using the work of others with minor changes or translations without attribution. |
| Self-plagiarism | Reusing substantial parts of one’s own work without proper citation. |
| Joint works | Reusing jointly written publications without proper acknowledgment. |
E.g. APA Style: Cite as software (not as personal communication).


Can be detected by the use of specific vocabulary and phrases: “delve”, “vibrant”, “embark”, “it’s important to note”, ” based on the data provided”.
Detection tools are not very useful, and can be easily circumvented.
According to Fleckenstein et al. (2024)