Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INNER/LEFT JOIN #1673

Closed
sanikolaev opened this issue Dec 12, 2023 · 12 comments
Closed

INNER/LEFT JOIN #1673

sanikolaev opened this issue Dec 12, 2023 · 12 comments
Assignees
Labels

Comments

@sanikolaev
Copy link
Collaborator

The task is to implement INNER/LEFT JOIN functionality with the following limitations:

  • 2 tables only
  • special column instead of full NULL support, e.g.:
    select * from purchases AS p left join articles AS a ON a.id = p.article_id:
    +------+------------+-------------+------+-------+-------------+
    | id   | article_id | customer_id | id   | title | @right_null |
    +------+------------+-------------+------+-------+-------------+
    |    1 |          1 |          10 |    1 | book  |           0 |
    |    2 |          1 |          11 |    1 | book  |           0 |
    |    3 |          3 |          10 |    0 |       |           1 |
    +------+------------+-------------+------+-------+-------------+
    
@sanikolaev
Copy link
Collaborator Author

Little issue with NULLs in 6.2.13 255ad5232@24030516

No NULL for the last column here:

MySQL [(none)]> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; CREATE TABLE a (f text); CREATE TABLE b ( a_id int, i int); INSERT INTO a VALUES (1, 'a'); select * from a left join b on b.a_id=a.id;
--------------
DROP TABLE IF EXISTS a
--------------

Query OK, 0 rows affected (0.026 sec)

--------------
DROP TABLE IF EXISTS b
--------------

Query OK, 0 rows affected (0.004 sec)

--------------
CREATE TABLE a (f text)
--------------

Query OK, 0 rows affected (0.001 sec)

--------------
CREATE TABLE b ( a_id int, i int)
--------------

Query OK, 0 rows affected (0.001 sec)

--------------
INSERT INTO a VALUES (1, 'a')
--------------

Query OK, 1 row affected (0.002 sec)

--------------
select * from a left join b on b.a_id=a.id
--------------

+------+------+------+--------+------+
| id   | f    | b.id | b.a_id | b.i  |
+------+------+------+--------+------+
|    1 | NULL | NULL |   NULL |    0 |
+------+------+------+--------+------+
1 row in set (0.000 sec)

If I replace the text field with another int it becomes ok:

MySQL [(none)]> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; CREATE TABLE a (i int); CREATE TABLE b ( a_id int, i int); INSERT INTO a VALUES (1, 123); select * from a left join b on b.a_id=a.id;
--------------
DROP TABLE IF EXISTS a
--------------

Query OK, 0 rows affected (0.026 sec)

--------------
DROP TABLE IF EXISTS b
--------------

Query OK, 0 rows affected (0.000 sec)

--------------
CREATE TABLE a (i int)
--------------

Query OK, 0 rows affected (0.001 sec)

--------------
CREATE TABLE b ( a_id int, i int)
--------------

Query OK, 0 rows affected (0.001 sec)

--------------
INSERT INTO a VALUES (1, 123)
--------------

Query OK, 1 row affected (0.000 sec)

--------------
select * from a left join b on b.a_id=a.id
--------------

+------+------+------+--------+------+
| id   | i    | b.id | b.a_id | b.i  |
+------+------+------+--------+------+
|    1 |  123 | NULL |   NULL | NULL |
+------+------+------+--------+------+
1 row in set (0.000 sec)

@sanikolaev
Copy link
Collaborator Author

sanikolaev commented Mar 6, 2024

The task is to implement INNER/LEFT JOIN functionality with the following limitations:
2 tables only
special column instead of full NULL support, e.g.:

Done in #1855 even with proper NULL support for LEFT JOIN.

What's left is to document it along with the following nuances:

  • You cannot perform SELECT left_table.field_name, right_table.field_name FROM .... Instead, you must use SELECT field_name, right_table.field_name FROM ....
  • You must use JOIN ON table_name.some_field = another_table_name.some_field; you cannot omit the table names.
  • You cannot do select *, (nums2.n + 3) * n from nums left join nums2 on nums2.id = nums.num2_id (i.e., when the operands are from both tables) without aliasing. Use select *, (nums2.n + 3) x, x * n from nums left join nums2 on nums2.id = nums.num2_id instead. Rel. issue Simplify expressions with operands from the joined table #1918
  • SELECT right_table.stored_field is not supported. Same with select * ... join ..., i.e. you won't get a stored field from the joined table in the result set. Rel. issue JOIN: stored fields from the joined table #1915
  • When you do expr(field_from_left_table, field_from_right_table) AS some_alias, you can get the value of the expression, but you can't filter by it even if you alias it. Rel. issue Filter by joined expression #1919
  • ANY(right.multi) requires an expression with a query example (rel. slack thread https://manticore-community.slack.com/archives/C7NSLK1NE/p1710674370442859?thread_ts=1710630191.329119&cid=C7NSLK1NE)
  • Grouping by joined JSON attributes that generate multiple groupby values (JSON arrays) is not supported

@glookka
Copy link
Contributor

glookka commented Mar 8, 2024

Little issue with NULLs in 6.2.13 255ad5232@24030516

No NULL for the last column here:

Fixed in 25d99ef

@sanikolaev sanikolaev pinned this issue Mar 8, 2024
@PavelShilin89
Copy link
Contributor

A new error has been detected. Sorting in order by id asc does not work if engine='columnar' is used

mysql> drop table if exists tbl1; CREATE TABLE tbl1 engine='columnar'; INSERT INTO tbl1 VALUES (1); drop table if exists tbl2; CREATE TABLE tbl2 (tbl1_id bigint); INSERT INTO tbl2 VALUES (1, 1); select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc;
--------------
drop table if exists tbl1
--------------

--------------
CREATE TABLE tbl1 engine='columnar'
--------------

--------------
INSERT INTO tbl1 VALUES (1)
--------------

--------------
drop table if exists tbl2
--------------

--------------
CREATE TABLE tbl2 (tbl1_id bigint)
--------------

--------------
INSERT INTO tbl2 VALUES (1, 1)
--------------

--------------
select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc

Sorting in order by id asc works if engine='columnar' is not used

drop table if exists tbl1; CREATE TABLE tbl1; INSERT INTO tbl1 VALUES (1); drop table if exists tbl2; CREATE TABLE tbl2 (tbl1_id bigint); INSERT INTO tbl2 VALUES (1, 1); select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc;
--------------
drop table if exists tbl1
--------------

--------------
CREATE TABLE tbl1
--------------

--------------
INSERT INTO tbl1 VALUES (1)
--------------

--------------
drop table if exists tbl2
--------------

--------------
CREATE TABLE tbl2 (tbl1_id bigint)
--------------

--------------
INSERT INTO tbl2 VALUES (1, 1)
--------------

--------------
select id from tbl1 join tbl2 on tbl1.id=tbl2.tbl1_id order by id asc
--------------

+------+
| id   |
+------+
|    1 |
+------+

@glookka
Copy link
Contributor

glookka commented Mar 9, 2024

A new error has been detected. Sorting in order by id asc does not work if engine='columnar' is used

Fixed in de6dac0

@PavelShilin89
Copy link
Contributor

PavelShilin89 commented Mar 11, 2024

I confirm that sorting by id asc works, regardless of whether engine='columnar' is used.
But I noticed a new bug, on the minimal table group by t2.attr and both attr rows are identical, and probably the hash should be identical.

mysql> DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id bigint, name text, surname text); INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2'); select * from t1; CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json); INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}'); select * from t2; select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc;
--------------
DROP TABLE IF EXISTS t1
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
DROP TABLE IF EXISTS t2
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
CREATE TABLE t1 (id bigint, name text, surname text)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t1
--------------

+------+-------+----------+
| id   | name  | surname  |
+------+-------+----------+
|    1 | name1 | surname1 |
|    2 | name2 | surname2 |
+------+-------+----------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t2
--------------

+---------------------+-------+--------------+-------+-----------------------------+
| id                  | name  | description  | t2_id | attr                        |
+---------------------+-------+--------------+-------+-----------------------------+
| 5839239912211561623 | name1 | description1 |     1 | {"color":"black","size":14} |
| 5839239912211561624 | name2 | description2 |     1 | {"color":"black","size":14} |
+---------------------+-------+--------------+-------+-----------------------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc
--------------

+------+----------+-----------------+
| id   | count(*) | groupby()       |
+------+----------+-----------------+
|    1 |        1 | 140586561985152 |
|    1 |        1 | 140586560571744 |
+------+----------+-----------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

@sanikolaev
Copy link
Collaborator Author

@PavelShilin89

it gives random data, which is not in the table

What exactly random data do you mean? The groupby() values? What result do you expect? Do you mean the problem is that groupby() is not the same for the records?

@sanikolaev
Copy link
Collaborator Author

@glookka Crash in:

show version
--------------

+-----------+--------------------------------+
| Component | Version                        |
+-----------+--------------------------------+
| Daemon    | 6.2.13 08a009d44@24031205 dev  |
| Columnar  | columnar 2.2.5 aa3504b@240304  |
| Secondary | secondary 2.2.5 aa3504b@240304 |
| KNN       | knn 2.2.5 aa3504b@240304       |
| Buddy     | buddy v2.3.1                   |
+-----------+--------------------------------+
5 rows in set (0.00 sec)

based on the Pavel's MRE with a little bit different query:

mysql> DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id bigint, name text, surname text); INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2');
select * from t1; CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json); INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}'); select * from t2; select id, *, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr.color, t2.attr.size order by id asc;
--------------
DROP TABLE IF EXISTS t1
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
DROP TABLE IF EXISTS t2
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
CREATE TABLE t1 (id bigint, name text, surname text)
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
INSERT INTO t1 (id, name, surname) VALUES (1, 'name1', 'surname1'), (2, 'name2', 'surname2')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t1
--------------

+------+-------+----------+
| id   | name  | surname  |
+------+-------+----------+
|    1 | name1 | surname1 |
|    2 | name2 | surname2 |
+------+-------+----------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
CREATE TABLE t2 (id bigint, t2_id bigint, name text, description text, attr json)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
INSERT INTO t2 (id, t2_id, name, description, attr) VALUES (0, 1, 'name1', 'description1', '{"color":"black","size":14}'), (0, 1, 'name2', 'description2', '{"color":"black","size":14}')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t2
--------------

+---------------------+-------+--------------+-------+-----------------------------+
| id                  | name  | description  | t2_id | attr                        |
+---------------------+-------+--------------+-------+-----------------------------+
| 5839241405892919297 | name1 | description1 |     1 | {"color":"black","size":14} |
| 5839241405892919298 | name2 | description2 |     1 | {"color":"black","size":14} |
+---------------------+-------+--------------+-------+-----------------------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
select id, *, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr.color, t2.attr.size order by id asc
--------------

ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
ERROR:
Can't connect to the server

@PavelShilin89
Copy link
Contributor

@PavelShilin89

он дает случайные данные, которых нет в таблице

Какие именно случайные данные вы имеете в виду? Ценности groupby()? Какого результата вы ожидаете? Вы имеете в виду, что проблема в том, что groupby()это не то же самое для записей?

I filled the table with data. However, as a result of the query select id, count(*), groupby() from t1 inner join t2 on t1.id=t2.t2_id group by t2.attr order by id asc field groupby() gives data that are not in the table. I can't understand where they come from.

@sanikolaev
Copy link
Collaborator Author

Pls read in the docs about groupby(). It's a hash, it can't exist in the table. But the point that the hash values are different for seemingly identical rows looks wrong.

@glookka
Copy link
Contributor

glookka commented Mar 15, 2024

@glookka Crash in:

Fixed in 8d30bc3

@sanikolaev
Copy link
Collaborator Author

This task is done to some extent. The functionality has been released in beta stage in 6.3.0. The docs can be found here https://manual.manticoresearch.com/Searching/Joining

@sanikolaev sanikolaev added the rel::6.3.0 Released in 6.3.0 label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants