is an important task that is critical to achieve, with the vast amount of content available today. An information retrieval task is, for example, every time you Google something or ask ChatGPT for an answer to a question. The information you’re searching through could be a closed dataset of documents or the entire internet.
In this article, I’ll discuss agentic information finding, covering how information retrieval has changed with the release of LLMs, and in particular with the rise of AI Agents, who are much more capable of finding information than we’ve seen until now. I’ll first discuss RAG, since that is a foundational block in agentic information finding. I’ll then continue by discussing on a high level how AI agents can be used to find information.

Why do we need agentic information finding
Information retrieval is a relatively old task. TF-IDF is the first algorithm used to find information in a large corpus of documents, and it works by indexing your documents based on the frequency of words within specific documents and how frequent a word is across all documents.
If a user searches for a word, and that word occurs frequently in a few documents, but rarely across all documents, it indicates strong relevance for those few documents.
Information retrieval is such a critical task because, as humans, we’re so reliant on quickly finding information to solve different problems. These problems could be:
- How to cook a specific meal
- How to implement a certain algorithm
- How to get from location A->B
TF-IDF still works surprisingly well, though we’ve now discovered even more powerful approaches to finding information. Retrieval augmented generation (RAG), is one strong technique, relying on semantic similarity to find useful documents.
Agentic information finding utilises different techniques such as keyword search (TF-IDF, for example, but typically modernized versions of the algorithm, such as BM25), and RAG, to find relevant documents, search through them, and return results to the user.
Build your own RAG

Building your own RAG is surprisingly simple with all the technology and tools available today. There are numerous packages out there that help you implement RAG. They all, however, rely on the same, relatively basic underlying technology:
- Embed your document corpus (you also typically chunk up the documents)
- Store the embeddings in a vector database
- The user inputs a search query
- Embed the search query
- Find embedding similarity between the document corpus and the user query, and return the most similar documents
This can be implemented in just a few hours if you know what you’re doing. To embed your data and user queries, you can, for example, use:
- Managed services such as
- OpenAI’s text-embedding-large-3
- Google’s gemini-embedding-001
- Open-source options like
- Alibaba’s qwen-embedding-8B
- Mistral’s Linq-Embed-Mistral
After you’ve embedded your documents, you can store them in a vector database such as:
After that, you’re basically ready to perform RAG. In the next section, I’ll also cover fully managed RAG solutions, where you just upload a document, and all chunking, embedding, and searching is handled for you.
Managed RAG services
If you want a simpler approach, you can also use fully managed RAG solutions. Here are a few options:
- Ragie.ai
- Gemini File Search Tool
- OpenAI File search tool
These services simplify the RAG process significantly. You can upload documents to any of these services, and the services automatically handle the chunking, embedding, and inference for you. All you have to do is upload your raw documents and provide the search query you want to run. The service will then provide you with the relevant documents to you’re queries, which you can feed into an LLM to answer user questions.
Even though managed RAG simplifies the process significantly, I would also like to highlight some downsides:
If you only have PDFs, you can upload them directly. However, there are currently some file types not supported by the managed RAG services. Some of them do not support PNG/JPG files, for example, which complicates the process. One solution is to perform OCR on the image, and upload the txt file (which is supported), but this, of course, complicates your application, which is the exact thing you want to avoid when using managed RAG.
Another downside of course is that you have to upload raw documents to the services. When doing this, you need to make sure to stay compliant, for example, with GDPR regulations in the EU. This can be a challenge for some managed RAG services, though I know OpenAI at least supports EU residency.
I’ll also provide an example of using OpenAI’s File Search Tool, which is naturally very simple to use.
First, you create a vector store and upload documents:
from openai import OpenAI
client = OpenAI()
# Create vector store
vector_store = client.vector_stores.create(
name="",
)
# Upload file and add it to the vector store
client.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("filename.txt", "rb")
)
After uploading and processing documents, you can query them with:
user_query = "What is the meaning of life?"
results = client.vector_stores.search(
vector_store_id=vector_store.id,
query=user_query,
)
As you may notice, this code is a lot simpler than setting up embedding models and vector databases to build RAG yourself.
Information retrieval tools
Now that we have the information retrieval tools readily available, we can start performing agentic information retrieval. I’ll start off with the initial approach to use LLMs for information finding, before continuing with the better and updated approach.
Retrieval, then answering
The first approach is to start by retrieving relevant documents and feeding that information to an LLM before it answers the user’s question. This can be done by running both keyword search and RAG search, finding the top X relevant documents, and feeding those documents into an LLM.
First, find some documents with RAG:
user_query = "What is the meaning of life?"
results_rag = client.vector_stores.search(
vector_store_id=vector_store.id,
query=user_query,
)
Then, find some documents with a keyword search
def keyword_search(query):
# keyword search logic ...
return results
results_keyword_search = keyword_search(query)
Then add these results together, remove duplicate documents, and feed the contents of these documents to an LLM for answering:
def llm_completion(prompt):
# llm completion logic
return response
prompt = f"""
Given the following context {document_context}
Answer the user query: {user_query}
"""
response = llm_completion(prompt)
In a lot of cases, this works super well and will provide high-quality responses. However, there is a better way to perform agentic information finding.
Information retrieval functions as a tool
The newest frontier LLMs are all trained with agentic behaviour in mind. This means the LLMs are super good at utilizing tools to answer the queries. You can provide an LLM with a list of tools, which it decides when to use itself, and which it can utilise to answer user queries.
The better approach is thus to provide RAG and keyword search as tools to your LLMs. For GPT-5, you can, for example, do it like below:
# define a custom keyword search function, and provide GPT-5 with both
# keyword search and RAG (file search tool)
def keyword_search(keywords):
# perform keyword search
return results
user_input = "What is the meaning of life?"
tools = [
{
"type": "function",
"function": {
"name": "keyword_search",
"description": "Search for keywords and return relevant results",
"parameters": {
"type": "object",
"properties": {
"keywords": {
"type": "array",
"items": {"type": "string"},
"description": "Keywords to search for"
}
},
"required": ["keywords"]
}
}
},
{
"type": "file_search",
"vector_store_ids": [""],
}
]
response = client.responses.create(
model="gpt-5",
input=user_input,
tools=tools,
)
This works much better because you’re not running a one-time information finding with RAG/keyword search and then answering the user question. It works well because:
- The agent can itself decide when to use the tools. Some queries, for example, don’t require vector search
- OpenAI automatically does query rewriting, meaning it runs parallel RAG queries with different versions of the user query (which it writes itself, based on the user query
- The agent can determine to run more RAG queries/keyword searches if it believes it doesn’t have enough information
The last point in the list above is the most important point for agentic information finding. Sometimes, you don’t find the information you’re looking for with the initial query. The agent (GPT-5) can determine that this is the case and choose to fire more RAG/keyword search queries if it thinks it’s needed. This often leads to much better results and makes the agent more likely to find the information you’re looking for.
Conclusion
In this article, I covered the basics of agentic information retrieval. I started by discussing why agentic information is so important, highlighting how we are highly dependent on quick access to information. Furthermore, I covered the tools you can use for information retrieval with keyword search and RAG. I then highlighted that you can run these tools statically before feeding the results to an LLM, but the better approach is to feed these tools to an LLM, making it an agent capable of finding information. I think agentic information finding will be more and more important in the future, and understanding how to use AI agents will be an important skill to create powerful AI applications in the coming years.
👉 Find me on socials:
💻 My webinar on Vision Language Models
📩 Subscribe to my newsletter
🧑💻 Get in touch
🐦 X / Twitter
✍️ Medium
You can also read my other articles:


