When talking about artificial intelligence, the ability to process and understand vast amounts of information simultaneously is a key factor in determining an AI Agent's capabilities. There are two concepts we need to understand to appreciate the power of AI Agents today and in the future; tokens and context window.
First, let's talk tokens! Here's OpenAI's definition:
Large language models (sometimes referred to as LLMs or GPTs) process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens and excel at producing the next token in a sequence of tokens.
OpenAI has a FREE 'tokenizer' that converts text to tokens if you want to see how this works. In the example below, you can see how it tokenized the text I provided.
Remember, in the end, computers ONLY understand numbers and their statistical (mathematical) relationships to each other to generate predictions (i.e., answers to your question).
Now, let's move on to a 'Context Window'.
A context window refers to the amount of text an AI model can process at once. A larger context window allows the model to consider more information when making decisions or generating responses. This can significantly improve the quality and relevance of the AI Agent's output.
To better grasp this concept, imagine yourself standing in a room with only one window. Your view is limited by the size of that window. If you doubled the window size you'd be able to see 2x more. As the window grows so does your view. With LLMs, as the context window grows, so does its processing power.
For example, when you go into a GPT like Google’s Gemini and ask it a question, it will deliver an answer based on what you typed into the window. Then you’ll ask it for further information or ask a new question and it will respond. Now if the ‘context window’ is big enough, it can ‘see’ the previous questions you’ve asked and the responses given. In other words, this ‘context window’ is like a short-term memory buffer Gemini uses to answer your questions based on previous questions and responses.
Recently, Google announced that its Large Language Model (LLM) Gemini has a context window (i.e., can store) of 2 million tokens (roughly 1,500,000 words)!
What does that mean to you? That means you can:
- Process longer documents: It can handle lengthy articles, research papers, or even entire books.
- Maintain context over longer conversations: It can remember previous parts of a conversation, making interactions more natural and coherent.
- Perform more complex tasks: It can tackle tasks that require understanding and integrating information from multiple sources.
Here's a simple example of how powerful this is. I'm a fan of sales researchers Matt Dixon (co-author of The JOLT Effect and The Challenger Sales) and Brent Adamson (Co-Author of The Challenger Sale). Suppose I wanted to take these two books and write a 'hybrid' book titled, ”JOLTing the Challenger”. (Note: I’m not sure about the title! lol)
How would I do it? Let's say that the total word count for both books is 75,000. I would upload the text into Gemini and using a series of prompts ask Gemini to read and review both books and write a new book with 35,000 words. If the right prompts were developed, it could be done! Total word count: 75,000 + 35,000 = 110,000; well within Gemini's context window limit.
In summary, as the context window grows for LLMs, the more content it can process. The more memory an AI Agent has, the more effective it can be. Going back to my Air.ai example in the previous article, the AIR Agent will be able to handle complex questions and respond efficiently making human conversations sound more natural!
The question remains, can AI Agents realistically mimic human emotions and interactions to the point that a buyer or customer will embrace them?