Adding LlamaParse
Complicated PDFs can be very tricky for LLMs to understand. To help with this, LlamaIndex provides LlamaParse, a hosted service that parses complex documents including PDFs. To use it, get a LLAMA_CLOUD_API_KEY
by signing up for LlamaCloud (it's free for up to 1000 pages/day) and adding it to your .env
file just as you did for your OpenAI key:
LLAMA_CLOUD_API_KEY=llx-XXXXXXXXXXXXXXXX
Then replace SimpleDirectoryReader
with LlamaParseReader
:
const reader = new LlamaParseReader({ resultType: "markdown" });
const documents = await reader.loadData("../data/sf_budget_2023_2024.pdf");
Now you will be able to ask more complicated questions of the same PDF and get better results. You can find this code in our repo.
Next up, let's persist our embedded data so we don't have to re-parse every time by using a vector store.