New top story on Hacker News: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

tháng 11 10, 2024

Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
24 by bhavnicksm | 7 comments on Hacker News.
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/cnyHBiO Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?

November 10, 2024 at 10:58PM bhavnicksm 24 https://ift.tt/pmNX7o4 Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG 7 I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground. Core features: - 21MB default install vs 80-171MB alternatives - 33x faster token chunking than popular alternatives - Supports multiple chunking strategies: token, word, sentence, and semantic - Works with all major tokenizers (transformers, tokenizers, tiktoken) - Zero external dependencies for basic functionality Technical optimizations: - Uses tiktoken with multi-threading for faster tokenization - Implements aggressive caching and precomputation - Running mean pooling for efficient semantic chunking - Modular dependency system (install only what you need) Benchmarks and code: https://ift.tt/cnyHBiO Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications? https://ift.tt/cnyHBiO

Tìm kiếm Blog này

bgfweightloss

New top story on Hacker News: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

Nhận xét

Đăng nhận xét

Bài đăng phổ biến từ blog này

FOX BIZ NEWS: Buy Netflix stock after earnings missed expectations, wealth manager says

In Blow to Beijing, Taiwan Re-elects Tsai Ing-wen as President

New top story on Hacker News: Intel Executive Posts Thunderbolt 5 Photo Then Deletes It: 80 Gbps and Pam-3