POMA | LLM Chunking Solution

solving chunking.

We believe in LLMs and RAG

POMA is a proprietary software solution (patent pending) that applies cutting-edge mathematics to fix the biggest issues currently facing AI tools and the large language models (LLMs) that power them.Retrieval augmented generation (RAG) was a major leap forward for LLMs. Its chunking method made it possible for an LLM to incorporate information from databases outside its training model, which greatly increases its capabilities.But until now, this power-up has come with significant costs, as the additional computing power required consumes more time, money, electricity, and water. And since first-generation chunking strategies each had major blind spots, the accuracy of even the most advanced AI models has suffered.POMA fixes both of these problems. We’ve created a new RAG method– powered by what we call “chunksets,” which are combined to create “cheatsheets”—that preserves the full root-to-leaf hierarchy of each document in the database. In short, POMA enables LLMs to retrieve information with the same contextual awareness and subtle complexity as the human brain.Just at a vastly more powerful scale. Which has massive potential for application in the fields of law, healthcare, finance, government, and more.

We take in what matters
…and then serve it back in perfect bite sizes.

Text

Most of the information to be meaningfully digested by LLMs is in (or has to be converted to!) good old text. Be it lengthy legal documents or regulations, product manuals or documentations, reports or conversations as emails or chats — it's all text.And most of it has a structure where normal chunking approaches lose any spatial awareness inside the text, thus requiring the LLM to be overfilled with context, needlessly wasting capital and energy.

Tables

Luckily, a lot of structured data has table layouts that (could) show their complex structures better than their equivalent in plain text would ever do — for human readers.Unluckily, LLMs (literally) see that a bit differently and struggle with complex and/or long tables, losing exactly the valuable information added by the tabulation.Our processing not only averts that but also crops each table to exactly the data needed for each request.

Images

A picture is worth a thousand words — and yet, they are very imperfect surrogates for each other, in both directions.We embrace this with our hybrid approach, making any visual content much more searchable for Retrieval but then augment the Generation with the best of both worlds.

Oh, and all the other data/file types?

Think long and hard, and you realize that anything else can be converted into sets of these three :)

Read our Medium article

Please get in touch with me (news, updates, usage, license):

Made with 🍏 by TIGON S.L.U., Carrer Bellavista 16, 1°-4°, AD200 Encamp (Andorra)

Thank you!

We will contact you.