Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extensions like citext #250

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jgaskins
Copy link
Contributor

This PR actually adds a few different things:

  • The concept of handling extensions
  • An API to register those extensions to prepare new connections for them
  • Connection-specific decoders, delegating to the global decoder map
  • citext as one of those extensions, including a decoder for it and a PG::CIText Crystal type

The CIText type could just return a string, but I thought it might be confusing to get a case-insensitive string out of the database only for it to do case-sensitive checking later. This type offers the case-insensitive equality checks in Crystal code and also allows you to get the raw string out.

After working on this, I discovered #89 which is quite similar to this PR, but is 5 years old and likely significantly out of date by this point. This is just a draft because it’s still unpolished, but I’m using a monkeypatched version in one of my apps (email addresses are stored as citext) and it seems to work.

This commit also adds support for registering extensions that prepare
connections for using thos extensions. In this case that preparation
involves registering a decoder on the connection, since extensions in
different Postgres databases will most likely have different oids.

struct CIText
def initialize(text : String)
@text = text.downcase
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downcase might be leftover from an earlier version of this code and may not be necessary. If we don’t need to downcase, that’ll save us a heap allocation.

@straight-shoota
Copy link
Contributor

I haven't looked into the implementation of the extension mechanism and compared it to #89.

But I noticed some issues with text representation in CIText:

  • #hash should have the same behaviour as #==, i.e. two strings that compare equally must have the same hash value. The hash needs to reflect case-insensitivity.
  • A more serious issue: There is not just a single case insensitive behaviour, but different sets of folding rules for mapping upper and lower case characters to each other. In Postgres, these rules are represented as collations. The case-insensitive behaviour of citext values is defined by the collation. But the Crystal implementation doesn't reflect that. It only uses String#compare(other, case_insensitive: true) which has a fixed, generic unicode collation (there are limited options for other folding algorithms). As a result, comparing two CIText strings in Crystal could have a different result than comparing the same two strings in the database. That would be wildly confusing. The implementation should either have identical behaviour (which is the goal of using CIText instead of String in the first place), or be clear about that it isn't identical. It would be quite complex to implement this correctly, and I don't even see a good reason for that. So I'd rather keep it dumb and make that explicit. Just using String type should be fine. It should be clear that comparing that type is case sensitive.

@jgaskins
Copy link
Contributor Author

#hash should have the same behaviour as #==, i.e. two strings that compare equally must have the same hash value. The hash needs to reflect case-insensitivity.

Ah, right, that's why I had the downcase in there. I knew I had a good reason for it.

It would be quite complex to implement this correctly, and I don't even see a good reason for that. So I'd rather keep it dumb and make that explicit.

I considered the differences in collation, but I've never seen anyone specify a non-default collation in 13 years of using Postgres. I assume that anyone using them understands they introduce weird inconsistencies between DB and application code in addition to complicating queries. For everyone else, it should work as expected. If collations got more usage, I'd fully agree with you that not matching the behavior exactly would likely be too misleading.

I don't really have a strong opinion either way on the CIText type, to be clear, but I added it to show that I explored the option. I use it in one of my own apps for email addresses and it does okay there. I prefer it to calling downcase everywhere I need to compare that string with another. To be fair, I could also use a converter inside my app to do this, but it's convenient not to need to.

@jwoertink
Copy link
Contributor

Hey, just ran in to possibly needing this... I'm wanting to use a citext[], but currently it fails on decoding. Any chance we can look at reviving it @jgaskins ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants