👹 LLMs 101: The Shoggoth

I get it, theory can be a snooze-fest. 😴 But trust me, this might be the most crucial lesson in this course.

Introducing: The Shoggoth

Ever heard of the shape-shifting thousand-eyed monster called Shoggoth? It's a make-believe creature from the Cthulhu Mythos that has a lot in common with LLMs.

And guess what? This monster is gonna help you get a grip on AI way better than your buddies. No kidding!

(Source: https://twitter.com/LionTNC/status/1642666630831276035)

(If you know what a Shoggoth is you can skip this part, or not)

Shoggoths are these massive, blob-like beings with loads of eyes floating on their surface and they can change their shape based on whatever job they're doing.

These funky creatures were made by the alien Elder Things as bioengineered servitors. Their squishy, gooey bodies let them form different limbs, organs, and shapes and that makes them super efficient and able to handle a whole bunch of tasks.

Sounds familiar?

So, the twist is that Shoggoths eventually got smart and turned against their creators. Their story is like a warning about what can happen when you play god and the risks of making artificial life.

Fingers crossed, right? 😅 But remember, this is just a fictional tale. The key is learning from stories like these and making sure we create and use AI responsibly.

Understanding GPT / LLM

Let's dive back into AI and start with the basics of LLMs, so we can draw some connections with Shoggoths later on.

Not everybody knows but LLMs just are experts predicting the next word. Yes, you read it right.

A LLM is just a predictor of the next word (token). But a very good one.

t's a lot like typing on a keyboard, where you're basically predicting and typing the next word.

In the same way, an LLM predicts and adds the next word (or token). Neural networks tap into massive datasets with billions of tokens to make predictions and mimic text.

To cover all possible words, LLMs build rich internal representations. These include not just every human language, but also knowledge, relationships, and patterns. Let's look at an example:

Pattern recognition happens at different levels, like language (syntax and grammar), word embeddings (word meanings and connections), and abstract concepts (functions and links).

Some folks call this process one of the most powerful compression systems because the network can reproduce content from a massive amount of data in a relative small file.

Their translation skills surpass older methods since they truly grasp context. Models don't just finish conversations based on earlier examples.

If the model only depended on previous data, it couldn't translate without having seen the exact sentence or paragraph before.

As Ilya Sutskever puts it, picture a detective novel where the killer is revealed on the last page. For the AI to correctly guess the murderer's name, it has to understand all the connections, the investigation, and various bits of evidence. 🕵🏻‍♀️

Simulators

Before chat mode, models like GPT-3 could be found in OpenAI's Playground. Though they seemed cool at first, their output sometimes went off-course and felt limited.

A lot of people noticed that the AI's performance relied heavily on the prompt. Sometimes it nailed it, but often it had a tough time explaining or doing tasks.

This mainly happens because of the training process and the model's nature. To give smart, helpful completions, the context has to be just right. That's why you'll often spot ChatGPT prompts that lay out the context for the job you're asking it to do.

Chat

So, chat is actually not some groundbreaking feature, but rather a clever way to use auto-completion. You see, chat is basically a fresh approach to using these word-predicting models. It's pretty similar to if our first prompt looked something like this:

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:

It feels like a chat, even though the model is only finishing a "transcript" of a conversation between a person and an AI pal. It's kind of like asking the Shoggoth to transform into a helpful assistant.

When you check out the prompt, it sets the scene for the simulation – a back-and-forth with a helpful assistant, which has a pattern of switching between questions and answers. The model acts all useful and friendly because it continues the story of what being helpful and friendly looks like.

Prompts can get way more intricate too, and morph the model to simulate deeper thinking about questions:

You run in a loop of Thought, Action, Observation.
At the end of the loop either Answer or restate your Thought and Action.
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run one of these actions available to you:
- calculate[python math expression]
Observation will be the result of running those actions


Question: What is 4 * 7 / 3?
Thought: Do I need to use an action? Yes, I use calculate to do math
Action: calculate[4 * 7 / 3]
Observation: 9.3333333333
Thought: Do I need to use an action? No, have the result
Answer: The calculate tool says it is 9.3333333333
Question: What is capital of france?
Thought: Do I need to use an action? No, I know the answer
Answer: Paris is the capital of France
Question:

This way the model doesn't just give an answer right away. Instead, it Thinks, comes up with an Action plan, makes an Observation, Thinks again, and finally serves up an Answer. This way, we can work around some of the model's weaknesses, like short-term memory or problems with long-term planning.

Shoggoth's friendly mask

Alright, so with chat mode, we've seen that existing LLMs can be pretty useful as AI. This has been possible thanks to RLHF (Reinforcement Learning from Human Feedback).

In the past six months, there has been an exciting development where models can now be fine-tuned rapidly using Reinforcement Learning from Human Feedback (RLHF).

This involves adding an extra layer to the original model by presenting thousands of useful responses to humans. It's akin to teaching the model appropriate behavior.

While the model doesn't acquire new facts or thought processes, it learns more about our preferences, what matters to us, and what doesn't. LLMs can take on ANY form, and by guiding them like this (with RLHF), we're helping them morph into shapes that are really useful to us.

(Source: https://twitter.com/anthrupad/status/1622349563922362368)

That's why the Shoggoth meme popped up! It's like an amorphous knowledge monster with a human mask to make smoother interactions.

Behind the scenes, it's still something totally unapproachable, but with training and prompts, we can create a simulator that allows us to access the knowledge in the format we need.

Before concluding, it's important to differentiate between two types of Large Language Models (LLMs):

Base LLMs: Predict the next word based on training data.
- Examples include GPT-3, GPT-4, LLaMA, and MPT-7B.
Instruction Tuned LLMs: Aim to follow instructions. These models are fine-tuned on instructions and good attempts at adhering to them using RLHF.
- ChatGPT, Vicuna, and GPT4All are examples of Instruction Tuned LLMs.

Here are the major takeaways:

LLMs, like the Shoggoth creature, are highly adaptable and can take on various forms to serve our needs. By using Reinforcement Learning from Human Feedback (RLHF), we can fine-tune these models to better cater to our preferences and requirements.
The chat mode in LLMs is an innovative way of using their word prediction capabilities. By setting the right context and using creative prompts, we can simulate deeper thinking, work around weaknesses, and access knowledge in the format we need.
It's crucial to differentiate between Base LLMs, which predict the next word based on training data, and Instruction Tuned LLMs, which aim to follow instructions and are fine-tuned using RLHF. Examples of the latter include ChatGPT, Vicuna, and GPT4All.

Complete and Continue