Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6358] Support all PostgreSQL 14 date/time patterns #3773

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

normanj-bitquill
Copy link
Contributor

  • Splits the PostgreSQL toChar function off to its own function
  • Does not implement FX or TM prefix
  • Does not implement SP suffix
  • Timezone patterns are supported but all datetimes are in local timezone

@normanj-bitquill
Copy link
Contributor Author

Need to properly separate the MySQL/Oracle implementation of TO_CHAR from the PostgreSQL implementation.

@mihaibudiu
Copy link
Contributor

The amount of testing is very impressive. These won't be easy to review. I assume they were checked against postgres.
You should address the failures reported by the CI.

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu

The amount of testing is very impressive. These won't be easy to review. I assume they were checked against postgres. You should address the failures reported by the CI.

These were tested against PostgreSQL 14. The CI issues have been addressed.

Is there anything that I can be done to make reviewing this easier?

@mihaibudiu
Copy link
Contributor

The statement that they were run against Postgres helps a lot. Thanks!

VI
!ok

select to_char(timestamp '2022-06-03 13:15:48.678', 'rm');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a day of the week test for a date before the gregorian calendar?
I noticed in the past that these could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for this on line 316. It tests Jan 1 of year 1 to see if it is a Monday.

final Locale originalLocale = Locale.getDefault();
try {
Locale.setDefault(Locale.US);
assertEquals(" MONDAY", PostgresqlDateTimeFormatter.toChar("DAY", date1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rule about leading spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this table:
https://www.postgresql.org/docs/14/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE

it says that days are blank padded to 9 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The padding should be at the end. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

private interface FormatPattern {
/**
* Checks if the current position in the format string applies to this format
* element. Will produce a formatted string if matched that is the format element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the second phrase difficult to parse,maybe you can rephrase it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to reword this.

"a.m.", "p.m."),
new NumberFormatPattern(dt -> {
final String formattedYear = String.format(Locale.ROOT, "%0,4d", dt.getYear());
if (formattedYear.length() == 4 && formattedYear.charAt(0) == '0') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies that years above 10000 wont' be correct. Maybe that's within spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%0,4d means to print out at least 4 digits and formatted with commas. A value like 12345 will become 12,345.

There is the special case, where we get a 4 digit string back that has at least one leading 0. In this case, Java will not add the comma. For example 1 will become 0001. Need to insert the comma to match PostgreSQL behaviour.

return yearString.substring(yearString.length() - 1);
},
"I"),
new StringFormatPattern(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure Calcite can handle dates BC, but maybe this code should not concern itself with that point.
But you may not be able to test this codepath end-to-end.

dt -> {
final String monthName =
dt.getMonth().getDisplayName(TextStyle.FULL,
Locale.getDefault());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about the Locale?
I thought that the spec requires the month name to be in English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. It should default to English and I should only translate when the TM prefix is provided. I'll make the changes.

https://www.postgresql.org/docs/14/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make use of the TM flag.

dt -> {
final String monthName =
dt.getMonth().getDisplayName(TextStyle.SHORT,
Locale.ROOT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my answer above. I'll make the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make use of the TM flag.

@mihaibudiu
Copy link
Contributor

I have approved, but also left some questions.

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that calcite currently has support for the postgres to_char function. Should we remove the previous support?

@mihaibudiu
Copy link
Contributor

Until https://issues.apache.org/jira/browse/CALCITE-6281 has a solution, the best way to test this may be to create a Quidem test file (.iq suffix). If that file is identical with the Postgres shell output, you're done. (I have no idea whether the quidem output is compatible with Postgres, though).

@normanj-bitquill
Copy link
Contributor Author

@caicancai

I remember that calcite currently has support for the postgres to_char function. Should we remove the previous support?

The existing TO_CHAR function is used by MySQL and Oracle, so it is left in here. This PR adds a new implementation that only PostgreSQL uses.

https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java#L1654

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu

Until https://issues.apache.org/jira/browse/CALCITE-6281 has a solution, the best way to test this may be to create a Quidem test file (.iq suffix). If that file is identical with the Postgres shell output, you're done. (I have no idea whether the quidem output is compatible with Postgres, though).

Thanks, I'll take a look.

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu Here is the output of a PostgreSQL script running all of the TO_CHAR tests from the postgresql.iq file in this PR. I kept the expected output for Calcite in SQL comments to make comparison easier.

This is the SQL script.
to_char.txt

This is the output when running it on PostgreSQL 14.
to_char_results.txt

@mihaibudiu
Copy link
Contributor

This is pretty close, the existing file core/src/test/resources/sql/sub-query.iq already uses the postgres format.
What I would do is to make a small python script to convert the postgres output into a .iq file, which you can hopefully just add to the resources directory in java. Then, before you commit, make sure the tests in the file are being executed by introducing a typo. And please add a comment to the iq file stating the Postgres version that was used to generate it. I am not sure whether there is some place where the python script itself could be committed...

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu

This is pretty close, the existing file core/src/test/resources/sql/sub-query.iq already uses the postgres format. What I would do is to make a small python script to convert the postgres output into a .iq file, which you can hopefully just add to the resources directory in java. Then, before you commit, make sure the tests in the file are being executed by introducing a typo. And please add a comment to the iq file stating the Postgres version that was used to generate it. I am not sure whether there is some place where the python script itself could be committed...

Added the queries here:
https://github.com/Bit-Quill/calcite/blob/calcite-6358/core/src/test/resources/pg_to_char_queries.sql

Here is a script that is able to generate the *.iq file:
https://github.com/Bit-Quill/calcite/blob/calcite-6358/core/src/test/resources/to_char_generate_iq.py

Ran the :core:test target with the generated *.iq file. Verified that the file is run by adding an error. The generated file is executed successfully with no errors.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label May 1, 2024
@mihaibudiu
Copy link
Contributor

This looks very good, when they unfreeze the main branch we can merge this.
Regarding the script, it's good for this particular use case, but I am not sure whether it will generalize for arbitrary queries. But we can take this discussion to the other JIRA ticket (the preferred way to design solutions in Calcite is to discuss in JIRA). Some Calcite-isms leak into the iq file, like $EXPR0, so a true solution would probably have a different architecture.

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LibraryOperator(libraries = {POSTGRESQL})
public static final SqlFunction TO_CHAR_PG =
new SqlBasicFunction("TO_CHAR", SqlKind.OTHER_FUNCTION,
SqlSyntax.FUNCTION, true, ReturnTypes.VARCHAR, null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ReturnTypes.VARCHAR_NULLABLE here? The original Calcite TO_CHAR function result should also be nullable, by the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this has been updated.

@mihaibudiu
Copy link
Contributor

I don't understand why the CI fails.
Can you please rebase on main? Maybe it has something to do with the recent transition to 1.37 which had some build system changes.

@normanj-bitquill
Copy link
Contributor Author

I don't understand why the CI fails. Can you please rebase on main? Maybe it has something to do with the recent transition to 1.37 which had some build system changes.

@mihaibudiu I have rebased to the latest for main. CI still fails for "Windows (JDK 8)" and "Windows (JDK 17)".

* Splits the PostgreSQL toChar function off to its own function
* Does not implement SP suffix
* Timezone patterns are supported but all datetimes are in local timezone
@normanj-bitquill
Copy link
Contributor Author

I have no idea about the Windows build errors. It fails trying to do task :buildSrc:autostyleKotlinGradleCheck. It fails since it can't find the class org.jetbrains.kotlin.com.intellij.psi.tree.IElementType.

I don't see why that is missing due to the changes for this PR.

@normanj-bitquill
Copy link
Contributor Author

Locally with a fresh checkout I get:

Skipping task ':buildSrc:autostyleKotlinGradleCheck' as it is up-to-date.

For the Windows build, I get:

:buildSrc:autostyleKotlinGradleCheck (Thread[Execution worker,5,main]) started.

> Task :buildSrc:autostyleKotlinGradleCheck FAILED
Caching disabled for task ':buildSrc:autostyleKotlinGradleCheck' because:
  Caching has not been enabled for the task
Task ':buildSrc:autostyleKotlinGradleCheck' is not up-to-date because:
  No history is available.
The input changes require a full rebuild for incremental task ':buildSrc:autostyleKotlinGradleCheck'.

@normanj-bitquill
Copy link
Contributor Author

@mihaibudiu I tried creating a new PR here, that did build correctly.
#3792

If you are OK with it, we should just use that PR and close this one.

I think that there are a couple of things going on.

  • I had squashed commits, which could be interfering with how changes are detected in autostyle. Lesson learned, I won't squash commits.
  • Some Kotlin libraries are missing from the classpath for autostyle. Not an issue unless some changes are detected to a Kotlin file. The issue here may show up again if someone makes changes to build.gradle.kts

Copy link

sonarcloud bot commented May 15, 2024

@mihaibudiu
Copy link
Contributor

Anything that works correctly is fine for me. Please note that I am on vacation so I won't have time to review this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
4 participants