Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929

naanagon · 2024-04-11T18:33:22Z

In the sample table we have almost all char columns and few varchar columns for both of these data types attlen in pg_attribute table is -1. Since we have varlen attributes in nocachegetattr function it is going to this part of the code (nocachecode, ybcModifyTable) . In this way the time complexity is O(n*n) where n is number of attributes.

there is an another for loop in (nocachecode) as it is doing the same work every time, whey can't we build a prefix sum array once for the row and just use that for all the attributes.

CLAassistant · 2024-04-11T18:33:27Z

All committers have signed the CLA.

jasonyb

It seems the core issue is with upstream PG code, unless this access pattern by YB is discouraged by upstream PG:

for (AttrNumber attnum = minattr; attnum <= natts; attnum++)
	column->datum = heap_getattr(tuple, attnum, tupleDesc, &column->is_null);

jasonyb · 2024-04-12T21:26:31Z

src/postgres/src/backend/access/common/heaptuple.c

+		 * Otherwise, check for non-fixed-length attrs up to and including
+		 * target.  If there aren't any, it's safe to cheaply initialize the
+		 * cached offsets for these attrs.


It appears that, ever since the beginning as far as Git history shows, caching attr offsets was disabled for variable-length attributes. I don't know why exactly that is the case, but it calls into question whether your caching is safe.

Since they are varlen attributes the offsets of attributes will changes from one tuple and another tuple and i think that is reason offsets aren't cached. I'm caching for each tuple and marking the array NULL after that.

jasonyb · 2024-04-12T21:30:02Z

src/postgres/src/backend/access/common/heaptuple.c

+
+			for (j = 0; j <= attnum; j++)
+			{
+				if (TupleDescAttr(tupleDesc, j)->attlen <= 0)


Supposing it is safe to cache varlen offsets for the case you are targetting, I think a cleaner solution is to reuse the existing attcacheoff rather than duplicate code. You could add a YB exception here (perhaps by a boolean argument to the function) so that slow is not set to true, and the caller can reset these attcacheoff back to -1 when done? Note I haven't given the idea much thought and there could be holes.

It is safe to cacheoffsets for each row. Even i was thinking in the same line but i was stuck on how to reset to -1 only for this varlen case at ybcModifyTable level. But if we throw exception then that requires changes at many places which is hard for now to review.. Let me also rethink on this.

jasonyb · 2024-04-12T21:32:14Z

src/postgres/src/backend/access/common/heaptuple.c

+		 * ... only we don't have it yet, or we'd not have got here.  Since
+		 * it's cheap to compute offsets for fixed-width columns, we take the
+		 * opportunity to initialize the cached offsets for *all* the leading
+		 * fixed-width columns, in hope of avoiding future visits to this
+		 * routine.


It seems this is the O(n^2) behavior in case all your columns are not fixed-width.

slow becomes true as we have varlen attributes. This is the loop that is causing O(n^2) - link

andrei-mart · 2024-04-16T23:02:18Z

It is correctly pointed out that repeatable calls to heap_getattr are generally inefficient.
However, in cases like this, when all or most of values are fetched, you can simply use heap_deform_tuple.
Same or even better efficiency as heap_getattr with cache, no need to add anything to the library.

naanagon · 2024-04-20T17:26:36Z

It is correctly pointed out that repeatable calls to heap_getattr are generally inefficient. However, in cases like this, when all or most of values are fetched, you can simply use heap_deform_tuple. Same or even better efficiency as heap_getattr with cache, no need to add anything to the library.

Thanks andrei for taking a quick look. Based on your suggestion given in slack, using TupleTableSlot is the best option. I have updated the pr accordingly. Didn't combine tupDesc and slot because in classes like matview.c the tupleDesc is retrieved from diff struct like DR_transientrel, which i'm thinking may not be updated in slot.

Please lmk if i'm missing anything.

netlify · 2024-05-01T15:42:58Z

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`2ea43a7`
🔍 Latest deploy log	https://app.netlify.com/sites/infallible-bardeen-164bc9/deploys/66326314d88c55000922e5a7
😎 Deploy Preview	https://deploy-preview-21929--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

andrei-mart · 2024-05-03T18:17:11Z

src/postgres/src/backend/bootstrap/bootstrap.c

@@ -824,8 +824,12 @@ InsertOneTuple(Oid objectid)
 	if (objectid != (Oid) 0)
 		HeapTupleSetOid(tuple, objectid);

-	if (IsYugaByteEnabled())
-		YBCExecuteInsert(boot_reldesc, tupDesc, tuple, ONCONFLICT_NONE);
+	if (IsYugaByteEnabled()){


New line before the opening brace

andrei-mart · 2024-05-03T18:45:30Z

src/postgres/src/backend/commands/createas.c

@@ -626,8 +626,9 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)

 	if (IsYBRelation(myState->rel))
 	{
+		slot->tts_tupleDescriptor=RelationGetDescr(myState->rel);


ExecSetSlotDescriptor should be used.
However, it is a bad idea to modify passed in slot.
Why do we think that slot does not have a descriptor already?

I haven't used ExecSetSlotDescriptor as this clears the tuple from the slot.
slot should have descriptor otherwise it won't work in slot_getattr as that retrieves tupleDesc from slot->tts_tupleDescriptor. But, I have done a check

RelationGetDescr(rel)!=slot->tts_tupleDescriptor -> this is true as it has different tdtypeid and tdrefcount. So, I tried to update it.

Based on your comments i have done a recheck and I hope we aren't using tdtypeid and tdrefcount anywhere.

andrei-mart · 2024-05-03T18:46:10Z

src/postgres/src/backend/commands/matview.c

@@ -485,8 +485,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	tuple = ExecMaterializeSlot(slot);
 	if (IsYBRelation(myState->transientrel))
 	{
+		slot->tts_tupleDescriptor=RelationGetDescr(myState->transientrel);


Same as above

andrei-mart · 2024-05-03T18:47:26Z

src/postgres/src/backend/commands/tablecmds.c

@@ -5417,11 +5417,13 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 			/* Write the tuple out to the new relation */
 			if (newrel)
 			{
-				if (IsYBRelation(newrel))
+				if (IsYBRelation(newrel)){


New line before brace

andrei-mart · 2024-05-03T18:47:57Z

src/postgres/src/backend/commands/tablecmds.c

@@ -5417,11 +5417,13 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 			/* Write the tuple out to the new relation */
 			if (newrel)
 			{
-				if (IsYBRelation(newrel))
+				if (IsYBRelation(newrel)){
+					newslot->tts_tupleDescriptor=RelationGetDescr(newrel);


Unsafe tuple descriptor assignment again

andrei-mart · 2024-05-03T18:48:29Z

src/postgres/src/backend/executor/nodeModifyTable.c

@@ -1573,6 +1573,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
 		if (is_pk_updated)
 		{
+			planSlot->tts_tupleDescriptor=RelationGetDescr(resultRelationDesc);


andrei-mart · 2024-05-03T18:50:08Z

src/postgres/src/backend/executor/ybcModifyTable.c

    HeapTuple tuple, OnConflictAction onConflictAction, Datum *ybctid,
    YBCPgTransactionSetting transaction_setting) {
 	Oid            relfileNodeId    = YbGetRelfileNodeId(rel);
 	AttrNumber     minattr          = YBGetFirstLowInvalidAttributeNumber(rel);
 	int            natts            = RelationGetNumberOfAttributes(rel);
 	Bitmapset      *pkey            = YBGetTablePrimaryKeyBms(rel);
-
+	TupleDesc		tupleDesc       = slot->tts_tupleDescriptor;


if slot->tts_tupleDescriptor is NULL you probably can use RelationGetDescr(rel)

andrei-mart · 2024-05-08T20:44:57Z

yugabyte=# CREATE TABLE test (r1 int, r2 int, i int, PRIMARY KEY (r1 ASC, r2 DESC));
CREATE TABLE
yugabyte=# INSERT INTO test VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3);
INSERT 0 3
yugabyte=# INSERT INTO test  VALUES (3, 3, 31) ON CONFLICT (r1, r2) DO UPDATE SET r1 = EXCLUDED.i, r2 = EXCLUDED.i;
INSERT 0 1
yugabyte=# select * from test;
 r1 | r2 | i  
----+----+----
  1 |  1 |  1
  2 |  2 |  2
 31 | 31 | 31
(3 rows)

Expected result is

yugabyte=# select * from test;
 r1 | r2 | i  
----+----+---
  1 |  1 | 1
  2 |  2 | 2
 31 | 31 | 3
(3 rows)

There are more failures of ./yb_build.sh --java-test 'org.yb.pgsql.TestPgUpdatePrimaryKey'

naanagon · 2024-05-11T15:54:11Z

src/postgres/src/backend/executor/nodeModifyTable.c

@@ -1573,7 +1573,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
 		if (is_pk_updated)
 		{
-			YBCExecuteUpdateReplace(resultRelationDesc, planSlot, tuple, estate);
+			slot->tts_tuple->t_ybctid = YBCGetYBTupleIdFromSlot(planSlot);
+			YBCExecuteUpdateReplace(resultRelationDesc, slot, tuple, estate);


There is a TODO already here to pass a transformed slot. I guess slot is the right option instead of planSlot

naanagon · 2024-05-14T14:36:03Z

src/postgres/src/backend/executor/ybcModifyTable.c

    YBCPgTransactionSetting transaction_setting) {
 	Oid            relfileNodeId    = YbGetRelfileNodeId(rel);
 	AttrNumber     minattr          = YBGetFirstLowInvalidAttributeNumber(rel);
 	int            natts            = RelationGetNumberOfAttributes(rel);
 	Bitmapset      *pkey            = YBGetTablePrimaryKeyBms(rel);
+	TupleDesc	   tupleDesc        = slot->tts_tupleDescriptor;
+	HeapTuple 	   tuple            = slot->tts_tuple;


Based on the definition of ExecMaterializeSlot, I haven't used it. So, ybctid and oid are updated in tuple itself. Please lmk if i'm missing anything.

andrei-mart · 2024-05-07T20:58:39Z

src/postgres/src/backend/executor/ybcModifyTable.c

    HeapTuple tuple, OnConflictAction onConflictAction, Datum *ybctid,
    YBCPgTransactionSetting transaction_setting) {
 	Oid            relfileNodeId    = YbGetRelfileNodeId(rel);
 	AttrNumber     minattr          = YBGetFirstLowInvalidAttributeNumber(rel);
 	int            natts            = RelationGetNumberOfAttributes(rel);
 	Bitmapset      *pkey            = YBGetTablePrimaryKeyBms(rel);
-
+	TupleDesc		tupleDesc       = slot->tts_tupleDescriptor;
+	if(tupleDesc == NULL || tupleDesc != RelationGetDescr(rel))


If tupleDesc has to be equal to rel's descriptor, why not just to assign rel's descriptor to that variable?

andrei-mart · 2024-05-14T18:32:47Z

src/postgres/src/backend/commands/tablecmds.c

@@ -18211,7 +18210,7 @@ YbATCopyTableRowsUnchecked(Relation old_rel, Relation new_rel,
 		ExecStoreHeapTuple(tuple, newslot, false);

 		/* Write the tuple out to the new relation */
-		YBCExecuteInsert(new_rel, newslot->tts_tupleDescriptor, tuple,
+		YBCExecuteInsert(new_rel, newslot, 


Nit: the statement is shorter now, can fit into single line.

andrei-mart · 2024-05-14T18:38:24Z

src/postgres/src/backend/executor/ybcModifyTable.c

@@ -342,7 +347,7 @@ static Oid YBCApplyInsertRow(
 	if (IsCatalogRelation(rel))
 	{
 		MarkCurrentCommandUsed();
-		CacheInvalidateHeapTuple(rel, tuple, NULL);
+		CacheInvalidateHeapTuple(rel, slot->tts_tuple, NULL);


If I'm not missing something, local variable tuple is defined here and equal to slot->tts_tuple

…ilable in TupleTableSlot

… a TupleTableSlot A TupleTableSlot parameter replaces HeapTuple and TupleDescriptor parameters in all DocDB insert procedures, including bulk insert. In Postgres insert procedures work with a HeapTuple, because it represent on-disk format, so tuple data was simply copied into the data page. DocDB store individual values (Datums). In Yugabyte insert procedures operated on HeapTuple too, because it was ready available, however HeapTuple is less efficient to retrieve individual values, especially if tuple has variable length arguments. Fixed size arguments allow to calculate data offsets once, but since variable size value occur in the tuple, to retrieve a value from a HeapTuple it is necessary to iterate over all columns from the beginning and calculate offsets. This impacted a lot extremely wide tables with hundreds of columns. Values were added to the DocDB statement, one by one, and each time columns were iterated to find the offset. Hundreds of columns required up to hundreds iterations each. With new approach tuple may be already in slot and dematerialized at the time when it is about to be sent to DocDB for insert. If so, all values are readily available and materialization step is not needed. In worst case, if there is a HeapTuple to insert, access to values via TupleTableSlot does dematerialization in one pass, no multiple iterations over columns. Effect is barely noticeable for tables with a few/several columns, but load tests into tables with 1000+ columns show up to 3x improvement.

naanagon force-pushed the enhancing-copy-thoughput branch from 3d570b4 to 59af390 Compare April 11, 2024 18:46

jasonyb reviewed Apr 12, 2024

View reviewed changes

This comment was marked as duplicate.

Sign in to view

naanagon force-pushed the enhancing-copy-thoughput branch 4 times, most recently from e0f846e to 107e5d4 Compare April 20, 2024 17:20

naanagon force-pushed the enhancing-copy-thoughput branch from 107e5d4 to 26299a3 Compare May 1, 2024 15:40

naanagon force-pushed the enhancing-copy-thoughput branch from 26299a3 to 2ea43a7 Compare May 1, 2024 15:43

andrei-mart reviewed May 3, 2024

View reviewed changes

andrei-mart requested changes May 3, 2024

View reviewed changes

naanagon force-pushed the enhancing-copy-thoughput branch from 2ea43a7 to 90c2d9e Compare May 7, 2024 10:29

naanagon force-pushed the enhancing-copy-thoughput branch from 90c2d9e to eb4d763 Compare May 11, 2024 15:49

naanagon commented May 11, 2024

View reviewed changes

naanagon force-pushed the enhancing-copy-thoughput branch 3 times, most recently from f71115d to e572a17 Compare May 14, 2024 14:34

naanagon commented May 14, 2024

View reviewed changes

andrei-mart reviewed May 14, 2024

View reviewed changes

naanagon force-pushed the enhancing-copy-thoughput branch from e572a17 to ea5954c Compare May 15, 2024 14:22

naanagon changed the title ~~Enhancing-copy-throughput-by-storing-offsets-for-tuple~~ Enhancing-copy-throughput-by-retrieving-datums-with-checkpointing-available-in-slot May 22, 2024

naanagon changed the title ~~Enhancing-copy-throughput-by-retrieving-datums-with-checkpointing-available-in-slot~~ Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot May 22, 2024

Enhancing copy throughput by retrieving datums with checkpointing ava…

0c8f066

…ilable in TupleTableSlot

naanagon force-pushed the enhancing-copy-thoughput branch from ea5954c to 0c8f066 Compare May 22, 2024 16:16

andrei-mart approved these changes May 22, 2024

View reviewed changes

andrei-mart merged commit 5751825 into yugabyte:master May 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929

Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929

naanagon commented Apr 11, 2024 •

edited

CLAassistant commented Apr 11, 2024 •

edited

jasonyb left a comment

jasonyb Apr 12, 2024

naanagon Apr 14, 2024 •

edited

jasonyb Apr 12, 2024

naanagon Apr 14, 2024

jasonyb Apr 12, 2024

naanagon Apr 14, 2024

This comment was marked as duplicate.

andrei-mart commented Apr 16, 2024

naanagon commented Apr 20, 2024 •

edited

netlify bot commented May 1, 2024 •

edited

andrei-mart May 3, 2024

andrei-mart May 3, 2024

naanagon May 7, 2024

andrei-mart May 3, 2024

andrei-mart May 3, 2024

andrei-mart May 3, 2024

andrei-mart May 3, 2024

andrei-mart May 3, 2024

andrei-mart commented May 8, 2024

naanagon May 11, 2024

naanagon May 14, 2024

andrei-mart May 7, 2024

andrei-mart May 14, 2024

andrei-mart May 14, 2024

Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929

Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929

Conversation

naanagon commented Apr 11, 2024 • edited

CLAassistant commented Apr 11, 2024 • edited

jasonyb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naanagon Apr 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as duplicate.

andrei-mart commented Apr 16, 2024

naanagon commented Apr 20, 2024 • edited

netlify bot commented May 1, 2024 • edited

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrei-mart commented May 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naanagon commented Apr 11, 2024 •

edited

CLAassistant commented Apr 11, 2024 •

edited

naanagon Apr 14, 2024 •

edited

naanagon commented Apr 20, 2024 •

edited

netlify bot commented May 1, 2024 •

edited