Large language models and the geometry of embeddings
Large language models (LLMs) map words, sentences, and concepts into high-dimensional vector spaces called embedding spaces. These spaces are far from arbitrary: meaningful semantic and syntactic relationships tend to self-organize into geometric structures.
I study these structures using tools from computational topology, notably persistent homology. The goal is to better understand what a LLM learns, how reliably it represents data, and how its internal geometry evolves through training. By obtaining a clearer picture of these geometric structures, we can improve the explainability and robustness of LLMs.