New top story on Hacker News: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
24 by bhavnicksm | 7 comments on Hacker News.
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/cnyHBiO Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?

November 10, 2024 at 10:58PM bhavnicksm 24 https://ift.tt/pmNX7o4 Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG 7 I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/cnyHBiO Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications? https://ift.tt/cnyHBiO

Nhận xét

Bài đăng phổ biến từ blog này