New top story on Hacker News: Is KDB a sane choice for a datalake in 2024?

Is KDB a sane choice for a datalake in 2024?
9 by sonthonax | 7 comments on Hacker News.
Pardon the vague question, but KDB is very much institutional knowledge hidden from the outside world. People have built their livelihoods around it and use it as a hammer for all sorts of nails. It's also extremely expensive and written in a language with origins so obtuse that it's progenitor APL needed a custom keyboard laden with mathematical symbols. Within my firm, it's very hard to get an outside perspective, the KDB developers are true believers in KDB, but they they obviously don't want to be professionally replaced. So I'm asking the more forward leaning HN. One nail in my job, is KDB as a data-lake and I'm being driven nuts by it. I write code in Rust that prices options. There's a lot of complex code involved in this, I use a mix of numeric simulations to calculate greeks and somewhat lengthy analytical formulas. The data that I save to KDB is quite raw, I save the market data and derived volatility surfaces, which are themselves complex-ish models needing some carefully unit-tested code to convert in to implied vols. Right now my desk has no proper tooling for backtesting that uses our own data. And I'm constantly being asked to do something about it, and I don't know what to do! I'm 99% sure KDB is the wrong tool for the job, because of three things: - It's not horizontally scalable. A divide and conquer algo on N<{small_number} cores is pointless. - I'm scared to do queries that return a lot of data. It's non trivial to get a day's worth of data. The query will just often freeze, it doesn't even buffer. Even if I'm just trying to fetch what should be a logical partition, the wire format is really inefficient and uncompressed. I feel like I need to engineering work for trivial things. - The main thing is that I need to do complex math to convert my raw data, order-books and vol-surfaces into useful data to backtest. I have no idea how do do any of this in KDB. My firm is primarily a spot desk, and while I respect my colleagues, their answer is: > Other firms are really invested in KDB and use KDB for this, just figure it out. I'm going nuts because I'm under the assumption that these other firms are way larger and have teams of KDB-quants doing the actual research. While we have some quant traders who know a bit of KDB but they work in the spot side with far more simple math. I keep on advocating for some Parquet style data-store with Spark/Dask/Arrow/Polars running on top of it that can be horizontally scaled and most importantly, with Polars, I can write my backtests in Rust and leverage the libraries I've already written. I get shot down with "we use KDB here". I just don't know how I can deliver a maintainable solution to my traders with this current infrastructure. Bizarrely, and this is a financial firm, no one in a team of ~100 devs has ever touched Spark style tech other than me here. What should I do? Are my concerns overblown? Am I misunderstanding the power of KDB?

June 10, 2024 at 12:00AM sonthonax 9 https://ift.tt/hAPNE40 Is KDB a sane choice for a datalake in 2024? 7 Pardon the vague question, but KDB is very much institutional knowledge hidden from the outside world. People have built their livelihoods around it and use it as a hammer for all sorts of nails. It's also extremely expensive and written in a language with origins so obtuse that it's progenitor APL needed a custom keyboard laden with mathematical symbols. Within my firm, it's very hard to get an outside perspective, the KDB developers are true believers in KDB, but they they obviously don't want to be professionally replaced. So I'm asking the more forward leaning HN. One nail in my job, is KDB as a data-lake and I'm being driven nuts by it. I write code in Rust that prices options. There's a lot of complex code involved in this, I use a mix of numeric simulations to calculate greeks and somewhat lengthy analytical formulas. The data that I save to KDB is quite raw, I save the market data and derived volatility surfaces, which are themselves complex-ish models needing some carefully unit-tested code to convert in to implied vols. Right now my desk has no proper tooling for backtesting that uses our own data. And I'm constantly being asked to do something about it, and I don't know what to do! I'm 99% sure KDB is the wrong tool for the job, because of three things: - It's not horizontally scalable. A divide and conquer algo on N<{small_number} cores is pointless. - I'm scared to do queries that return a lot of data. It's non trivial to get a day's worth of data. The query will just often freeze, it doesn't even buffer. Even if I'm just trying to fetch what should be a logical partition, the wire format is really inefficient and uncompressed. I feel like I need to engineering work for trivial things. - The main thing is that I need to do complex math to convert my raw data, order-books and vol-surfaces into useful data to backtest. I have no idea how do do any of this in KDB. My firm is primarily a spot desk, and while I respect my colleagues, their answer is: > Other firms are really invested in KDB and use KDB for this, just figure it out. I'm going nuts because I'm under the assumption that these other firms are way larger and have teams of KDB-quants doing the actual research. While we have some quant traders who know a bit of KDB but they work in the spot side with far more simple math. I keep on advocating for some Parquet style data-store with Spark/Dask/Arrow/Polars running on top of it that can be horizontally scaled and most importantly, with Polars, I can write my backtests in Rust and leverage the libraries I've already written. I get shot down with "we use KDB here". I just don't know how I can deliver a maintainable solution to my traders with this current infrastructure. Bizarrely, and this is a financial firm, no one in a team of ~100 devs has ever touched Spark style tech other than me here. What should I do? Are my concerns overblown? Am I misunderstanding the power of KDB?

Nhận xét

Bài đăng phổ biến từ blog này