Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishlist #17

Open
10 tasks
mattsta opened this issue Mar 23, 2024 · 28 comments
Open
10 tasks

Wishlist #17

mattsta opened this issue Mar 23, 2024 · 28 comments
Labels
de-crapify Correct crap decisions made in the past

Comments

@mattsta
Copy link

mattsta commented Mar 23, 2024

It would be nice to finally de-crapify and de-egoify the project fully. Just make a new primary version and go backwards incompatible.

I'm fairly certain the blanket "license change" is also illegal since they didn't get permission from every copyright holder to rip their code apart from the license (also why linux can never change from GPL2), but if they don't care, then we don't have to care about un-licensing all their code either. Evil goes both ways I guess (that article has some timeline inaccuracies and rationale inaccuracies and leaves out multiple other key players, but it's mostly correct in how redis was "stolen" by motivated exploiters over the years).

Wishlist

  • finally remove all master/slave terminology #36
  • remove sentinel and cluster (they will never work reliably because they are fundamentally flawed in their design. notice how nothing else in the world uses them or their design patterns? at least clickhouse liked zookeeper so they rewrote it in C++ for their own usage)
    • overall it's more stable going with separation-of-concerns and using consistent-hash-replication sharding proxies in front of these things anyway, though maybe some CRDTs will need to get involved
  • port all tcl scripts to something better (it's 2024 not 1994)
    • new code robots can greatly help with this I imagine
  • use clang-format for all the code (it's 2024 not 1994)
    • clang-format -style="{BasedOnStyle: llvm, IndentWidth: 4, AllowShortFunctionsOnASingleLine: None, KeepEmptyLinesAtTheStartOfBlocks: false, InsertBraces: true; InsertNewlineAtEOF: true}"
  • stop supporting weird platforms because it's 2024 and nobody cares about big endian or 32 bit systems anymore (it causes unnecessary maintenance overhead for refactoring and building new features)
  • stop using CRC everywhere. again, it's 2024 not 1994. Nobody should be using CRC anymore because it's designed for 1970s serial interfaces. We have much better systems now.
  • fix the hand written, hand parsed, self-rewriting config file system abomination (nothing else in the world does this, and it caused a wormable exploit in the past so should never be trusted. it only exists because "lol its a cool trick!!!" unprofessional insecure programming)
  • stop using hand written Makefiles and use CMake instead (again, it's 2024 not 1994)
    • I actually spent a few hundred hours making jemalloc compatible with CMake in 2017-2019 then when I aksed FB if they could pay for my contributions they said "uh, we can send you a t-shirt?" then they never even sent the t-shirt (a month later, FB paid a $5 billion government fine for their typical FB-style business practices, but they sure don't have any money to pay developers... go figure?)
  • convert the info/status output to regular readable JSON instead of hand-written output formats requiring hand-written parsers to read
  • continue to refactor and improve a lot of the data structures from their unrefined CS101 origins

there's probably a couple other dozen things too, but that's my long-term gripe list I'm really surprised most people haven't been motivated to see the problems and fix them yet for long term project reliability and continuity.

Conclusion

I fixed all these problems (and more!) in my own rewrite a couple years ago (just imagine you put luajit, sqlite, erlang, redis, and memcached in a blender then a new modern high-performance multi-core secure in-memory cache system popped out), but I tried to sell it instead of giving away thousands of hours of work across 5+ years for free, so it never got any traction: https://carrierdb.cloud/

Provenance

i used to do stuff here but the project preferred to remain computationally and culturally conservative instead of embracing the future (except for selling out for profit at the expense of community and collaboration and user experience, of course) :(

Work I created which the project sometimes fought against but eventually included anyway:

  • entire module system was my own design and implementation, but the project fought against it until later they decided it could be used for profit
  • geo commands (though they aren't complete and should really use h3 instead of geohash, but nobody seems to understand enough to maintain it over the past 10 years)
  • JSON experiments (not sure what happened with those, but my redis replacement server can switch between output formats of redis, memcached, JSON, and even python (only format with actual set syntax in the output), and it also supports unlimited nesting of any data type each with their own nested expiration values too)
  • massive memory improvements to the lists structures (further efficiency improvements are in my redis replacement server too along with improving all the other data structures as well) — this was inspired by the custom changes to redis Twitter was running internally over thousands of servers totaling terabytes of RAM (which was a lot in 2015), but the twitter devs refused to release their improvements because "we can't get legal to clear it," so I re-built their ideas in an even better version from scratch (since then, millions of servers have benefited from these efficiency improvements for free, so thanks to me i guess)
  • crc speed improvements which took a couple hundred hours of on-and-off experiments over a year to finally get the logic working because redis uses a random weird non-standard CRC64 implementation for some reason so no other libraries are compatible with it unless you do extensive custom workarounds all over the place.
placeholderkv$ rg "Matt Stancliff"                                                                                       on unstable
src/quicklist.c
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>

src/geo.c
2: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>.

src/geohash_helper.c
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>.

src/crcspeed.h
1:/* Copyright (c) 2014, Matt Stancliff <matt@genges.com>

src/quicklist.h
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>

src/geohash_helper.h
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>.

src/geohash.h
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>.

src/zmalloc.c
865: * 3) Was modified for Redis by Matt Stancliff.

src/crcspeed.c
4: * Modifications by Matt Stancliff <matt@genges.com>:

src/geohash.c
3: * Copyright (c) 2014, Matt Stancliff <matt@genges.com>.

src/crc64.c
1:/* Copyright (c) 2014, Matt Stancliff <matt@genges.com>

deps/hiredis/hiredis.c
4: * Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,

deps/hiredis/net.c
5: * Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,

deps/hiredis/net.h
5: * Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,

deps/hiredis/fuzzing/format_command_fuzzer.c
4: * Copyright (c) 2020, Matt Stancliff <matt at genges dot com>,

deps/hiredis/hiredis.h
4: * Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,

deps/hiredis/CHANGELOG.md
511:* Fix tests when assert() undefined (Keith Bennett, Matt Stancliff)
@PingXie
Copy link
Member

PingXie commented Mar 23, 2024

+1 on all, @mattsta!

These are all great points and suggestions! These changes would have little impact on the existing users but would go a long way to support future innovations. I would love to work with you to make these happen.

@madolson
Copy link
Member

there's probably a couple other dozen things too, but that's my long-term gripe list I'm really surprised most people haven't been motivated to see the problems and fix them yet for long term project reliability and continuity.

Honestly, your comment of "culture conservatism" resonated strong with me. I spent a bunch of time trying to move to python tests, and the resistance from the former Redis guys (Oran and Yossi) was insurmountable. I'm perhaps a bit more conservative in the fact that I think we should be deeply fearful of breaking changes for Redis, given it's place in the stack, but definitely believe we need to be moving faster.

@mattsta
Copy link
Author

mattsta commented Mar 24, 2024

I think we should be deeply fearful of breaking changes for Redis, given it's place in the stack, but definitely believe we need to be moving faster.

Exactly. There is a balance between continuity of existing systems while also not remaining stuck with 15 year old designs for the next 15+ years. I raised a lot of these standard project maintenance issues 10 years ago and they never got better, so now it looks even more archaic in a lot of places.

Honestly, your comment of "culture conservatism" resonated strong with me. I spent a bunch of time trying to move to python tests, and the resistance from the former Redis guys (Oran and Yossi) was insurmountable.

I'm sorry I couldn't have prevented the current situation. I tried to advocate for both not giving the project away to a corrupt company and also for more consistent full time project management and architecture improvements (which was misunderstood as trying to "take over the project" then everything blew up), but I failed to generate the change I wanted to see in the world. Remember this one? ah, memories: http://antirez.com/news/87

But as for moving tests, it would obviously be great to move them to python. I bet with some careful work we could paste tcl files into Claude and ask for pytest formatted results. It would also be nice to stand up more concurrent testing if we can isolate tests to not step on each other.

@zuiderkwast
Copy link
Contributor

Matt, your perspective and experience is very valuable. I'm glad you came to this fork. Like Madelyn, I think we need to be both radical and conservative.

We'll need to create issues for each of these points to discuss them one by one, and have some kind of categorization and decision making process.... We'll come to that.

@wenerme
Copy link

wenerme commented Mar 25, 2024

If we are live in 2024, can we have some http(ws,http1,http2,http3) based protocol support builtin ?

@zuiderkwast
Copy link
Contributor

Yeah, if we drop the idea of vendoring all dependencies (see #15), we can definitely have optional support for those. I'd like to have RESP over QUIC (multiplexed streams of commands over one connection) and optional compilation with liburing (io_uring).

@mattsta
Copy link
Author

mattsta commented Mar 25, 2024

I still have drafts of a redis binary network protocol from 2014-2015 based on http/2 I was planning (including things like multiplexing using per-command client command ids matched to reply ids for non-head-of-line blocking concurrency, etc).

I think it's interesting to note the difference between the network protocol and the data protocol though. Currently redis operates a hybrid network+data protocol, but we can easily split the network protocol into a more streamlined binary format with multiple addressable streams allowing different data output formats too.

For data output, it turns out of all formats possible, JSON is the most efficient format if your data isn't majority binary blobs.

What does JSON solve?

  • you don't need extra bytes for length prefixes on strings since all strings are contained within two quotes as the start/end string signal: "string"
  • you don't need to pre-specify the length or depth or type of all your data structures either since JSON data structures are also start/end delimited
    • though, JSON isn't an amazing format for streaming without workarounds like ndjson (and it doesn't have a set type or other extensible types without creating custom tagged union wrappers)
  • numbers written in ascii decimal are optimal because shorter numbers require fewer bytes, and shorter numbers are more common
  • implicit type lengths defined by {} and "" and [] and split by , are more efficient than requiring up-front lengths and type markers delimited everywhere like $5\r\nHELLO\r\n instead of just "hello" for user data
    • *2\r\n$5\r\nhello\r\n$5\r\nworld\r\n is just ["hello","world"] instead, etc
  • escaping JSON strings can be checked for bad values and converted quickly using SIMD operations on intel and ARM in 16 byte chunks (or potentially even larger)
  • PLUS, you don't have to write yet another conversion layer from some rando server protocol to JSON, you can just yeet your command results directly back at users (so your app servers just act as literal data proxies instead of structured transformation nodes if your data is scoped properly with no other intermediate transformation steps).

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Mar 25, 2024

We can add opt-in JSON along as HELLO json. It has everything that RESP2 has and parts of what RESP3 has (maps) but lacks some features: no difference between arrays and sets (not a big deal), push messages (can be done at protocol lever if we use JSON over HTTP/2 though).

We don't have a JSON dependency right now though. Can we split this out into a separate issue please?

[Edit] One huge disadvantage of JSON it that it can't store binary data. Strings must be valid Unicode. To store binary data in JSON strings, people use tricks like Base64.

@stockholmux
Copy link
Contributor

finally remove all master/slave terminology

+1 this is the right time start this.

@wenerme
Copy link

wenerme commented Mar 26, 2024

Speaking from my personal experience, my preference for NATS over Redis stems from NATS's support for WebSocket-based transport. This choice does not imply that I will use NATS directly in a browser environment. However, it allows me to leverage the existing infrastructure while bypassing the constraints associated with port requirements.

@mattsta
Copy link
Author

mattsta commented Mar 28, 2024

Just a note looking through more recent issues people are adding:

These are all literally things I brought up as major design flaws 10 years ago (with plans to fix!). I'm really surprised there's been no progress on so many of these basic architecture and design issues.

Even simple things like "the script isn't replicated everywhere so you get random failures" was happening 10 years ago ya'll and nobody decided to improve the system in the meantime? I'm curious what's missing. Initiative? Permission? Ability? Project management prioritization? Lack of curiosity? Only following "profitable" improvements instead of overall stability? (another interpretation may be the project has intentionally preferred to remain less extensible and flexible to retain lock-in so people don't "grow the project" in "unapproved" directions where profit can't be captured)

Cluster topology still is causing client problems? AOF and RDB formats and differences are still causing problems and requires a full version revision for all changes instead of having extensible metadata built-in? It's almost like the project has been afraid of addressing any of the original architecture and design inconsistencies?

It's worth remembering a lot of these architecture and design decisions weren't made by some expert committee having a combined 100 years of experience in distributed systems and distributed consistency protocols and flexible persistence and reliability and storage formats... it's mostly just "the ideas of some guy having fun building a personal database on a macbook air in 2010." Somehow, "because redis has always done it this way" became codified as a reason to never change or improve any of these core faults of the design? The core design of redis has always been treated as almost sacred and unquestionable and unchangeable even though high visibility deficiencies are scattered throughout. It's all just software and not some immutable laws of the universe.

wut.

@bitnom
Copy link

bitnom commented Apr 1, 2024

Things I want:

  • JSON/REJSON
  • Graph
  • Search
  • Timeseries
  • Vectors
  • Websockets (And possibly gRPC, additionally but not instead of)
  • Clustering
  • A fully compatible (And clusterable) WASM build.

We should probably do these via polls in the Discussion section though.

@bitnom
Copy link

bitnom commented Apr 1, 2024

I added some under: https://github.com/orgs/valkey-io/discussions

@madolson
Copy link
Member

madolson commented Apr 2, 2024

@bitnom I would prefer to have issues instead of discussions, it's easier for us to keep all of the features there then trying to review them in two places.

@zuiderkwast zuiderkwast added the de-crapify Correct crap decisions made in the past label Apr 2, 2024
@vmorris
Copy link

vmorris commented Apr 12, 2024

nobody cares about big endian

Speak for yourself, and maybe do some research or provide evidence before you make such a claim?

@mattsta
Copy link
Author

mattsta commented Apr 12, 2024

Speak for yourself,

such is the default state of speaking

and maybe do some research or provide evidence before you make such a claim?

??? if you have more information feel free to share. details are always useful.

wikipedia contributes:

The IBM System/360 uses big-endian byte order, as do its successors System/370, ESA/390, and z/Architecture. The PDP-10 uses big-endian addressing for byte-oriented instructions. The IBM Series/1 minicomputer uses big-endian byte order. The Motorola 6800 / 6801, the 6809 and the 68000 series of processors use the big-endian format. Solely big-endian architectures include the IBM z/Architecture and OpenRISC.

None of those really matter in modern hosting environments, and if they matter to individual companies for unique low-demand use cases, well, why are you using anonymous free software for mission critical services. If "big endian" is officially supported, it also means the entire CI cycle needs duplicate itself for big endian VMs running every update as well. It's technically "supported" now, but never gets tested unless somebody complains. I seem to recall some parts didn't convert endianness in the save file properly for 10 years and nobody complained because nobody uses it:

This comment was added in 2018 after the file format had been broken for over 5 years:

/* This function loads a time from the RDB file. It gets the version of the

  • RDB because, unfortunately, before Redis 5 (RDB version 9), the function
  • failed to convert data to/from little endian, so RDB files with keys having
  • expires could not be shared between big endian and little endian systems
  • (because the expire time will be totally wrong). The fix for this is just
  • to call memrev64ifbe(), however if we fix this for all the RDB versions,
  • this call will introduce an incompatibility for big endian systems:
  • after upgrading to Redis version 5 they will no longer be able to load their
  • own old RDB files. Because of that, we instead fix the function only for new
  • RDB versions, and load older RDB versions as we used to do in the past,
  • allowing big endian systems to load their own old RDB files.

It's also worth noting the new key value cabal is almost all from "hyperscaler" hosting providers on modern x64 or arm.

These days I care more about matching development effort to sustainable forward-looking developer experience.

low level C developers are literally dying out, so the way forward is to simplify and remove as many traps a possible. Every time I have to slow down and guard something with if (isBigEndian()) { thing = byteswap(thing) } it just feels like wasted effort (plus, we don't even know if it's correct anymore since no CI is running big endian VMs anyway and the only reason for maintaining "big endian" support is for hybrid architecture deployments — so the CI would actually need to run: little endian VM, big endian VM, save dump file from each, load into opposite architecture, replicate between architectures and confirm, cluster between architectures and confirm... fun matrix combinatorial complexity for the benefit of... ???).

4321 > 1234

@kadler
Copy link

kadler commented Apr 12, 2024

Every time I have to slow down and guard something with if (isBigEndian()) { thing = byteswap(thing) } it just feels like wasted effort

Well yeah, that's not a sensible solution to the problem. Endianness pretty much only matters in serialization: to the network and/or to disk (and sometimes this doesn't even matter). The better solution is to pick an endianness for your data (little being most appropriate nowadays, despite what network byte order would dictate) then write functions or macros that read from or write to that endianness, byteswapping as appropriate. You do the endian checks in one place (preferably at compile time) and then just always use those functions and never have to think about endianness again.

@mattsta
Copy link
Author

mattsta commented Apr 12, 2024

that's not a sensible solution to the problem.

true, but the code still has to exist somewhere (like the 5+ year bug example above... it was in the serialization code and it just didn't have the conversion check around it (combined with no integration tests verifying the goals of dual-architecture save/restore consistency)).

You do the endian checks in one place (preferably at compile time) and then just always use those functions and never have to think about endianness again.

technically true, but the design of redis isn't necessarily that coherent. it's mostly "a cruise ship built up from assorted scrap found on a beach over 15 years."

the problem isn't actual conversions, but rather the hand-evolved custom byte-by-byte writers and readers not respecting any formal definitions, so every refactor/improvement is a gamble as to whether we're breaking things. 🤷

@pleia2
Copy link

pleia2 commented Apr 16, 2024

In case it helps as you evaluate this, my team at IBM runs an incredibly active, free, s390x virtual machine program for open source projects. If they Valkey project would like VMs, please reach out to us: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

@iapicca
Copy link

iapicca commented Apr 18, 2024

wasm and wasi support would be amazing

@madolson
Copy link
Member

wasm and wasi support would be amazing

Say more? Are you interested in having Valkey be compiled into WASM or support for it in modules/scripting?

@iapicca
Copy link

iapicca commented Apr 18, 2024

wasm and wasi support would be amazing

Say more? Are you interested in having Valkey be compiled into WASM or support for it in modules/scripting?

I have 2 use cases in mind

  • wasi support for fermyon (in both nomad and kubernetes)
  • wasm support for flutter web (and mobile and desktop)

@wenerme
Copy link

wenerme commented Apr 19, 2024

If valkey can compiled into WASM with Websocket support, so I can setup a simple replicate, dose that mean I have an offline first kv db in browser 🤔

@aruanruan
Copy link

system command/op (such as migration/sync command) maybe isolated from user data access command by different network channel or protocol, user access command should keep atomic, but system commands may work for one long time or complex task with multiple commands

@iapicca
Copy link

iapicca commented Apr 23, 2024

I have 2 use cases in mind

  • wasi support for fermyon (in both nomad and kubernetes)
  • wasm support for flutter web (and mobile and desktop)

If valkey can compiled into WASM with Websocket support, so I can setup a simple replicate, dose that mean I have an offline first kv db in browser 🤔

@wenerme
that's exactly the use case I have in mind for flutter, it's already doing that with sqllite (not kv of course)

@aruanruan
Copy link

we can use yaml format as config file

@ikersuen
Copy link

wishing for "cluster" type of architecture for some use case only support "cluster" type of redis.

@zuiderkwast
Copy link
Contributor

I should clarify: There are no plans to stop supporting cluster. It is very important for many users. We will keep supporting cluster and will improve it.

As an example, we just merged a large improvement to cluster consistency for scenarios like if a failover happens during slot migration. #21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
de-crapify Correct crap decisions made in the past
Projects
None yet
Development

No branches or pull requests