New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929
Enhancing copy throughput by retrieving datums with checkpointing available in TupleTableSlot #21929
Conversation
3d570b4
to
59af390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the core issue is with upstream PG code, unless this access pattern by YB is discouraged by upstream PG:
for (AttrNumber attnum = minattr; attnum <= natts; attnum++)
column->datum = heap_getattr(tuple, attnum, tupleDesc, &column->is_null);
* Otherwise, check for non-fixed-length attrs up to and including | ||
* target. If there aren't any, it's safe to cheaply initialize the | ||
* cached offsets for these attrs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that, ever since the beginning as far as Git history shows, caching attr offsets was disabled for variable-length attributes. I don't know why exactly that is the case, but it calls into question whether your caching is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they are varlen attributes the offsets of attributes will changes from one tuple and another tuple and i think that is reason offsets aren't cached. I'm caching for each tuple and marking the array NULL after that.
|
||
for (j = 0; j <= attnum; j++) | ||
{ | ||
if (TupleDescAttr(tupleDesc, j)->attlen <= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposing it is safe to cache varlen offsets for the case you are targetting, I think a cleaner solution is to reuse the existing attcacheoff rather than duplicate code. You could add a YB exception here (perhaps by a boolean argument to the function) so that slow is not set to true, and the caller can reset these attcacheoff back to -1 when done? Note I haven't given the idea much thought and there could be holes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safe to cacheoffsets for each row. Even i was thinking in the same line but i was stuck on how to reset to -1 only for this varlen case at ybcModifyTable level. But if we throw exception then that requires changes at many places which is hard for now to review.. Let me also rethink on this.
* ... only we don't have it yet, or we'd not have got here. Since | ||
* it's cheap to compute offsets for fixed-width columns, we take the | ||
* opportunity to initialize the cached offsets for *all* the leading | ||
* fixed-width columns, in hope of avoiding future visits to this | ||
* routine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is the O(n^2) behavior in case all your columns are not fixed-width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slow becomes true as we have varlen attributes. This is the loop that is causing O(n^2) - link
It is correctly pointed out that repeatable calls to heap_getattr are generally inefficient. |
e0f846e
to
107e5d4
Compare
Thanks andrei for taking a quick look. Based on your suggestion given in slack, using TupleTableSlot is the best option. I have updated the pr accordingly. Didn't combine tupDesc and slot because in classes like matview.c the tupleDesc is retrieved from diff struct like DR_transientrel, which i'm thinking may not be updated in slot. Please lmk if i'm missing anything. |
107e5d4
to
26299a3
Compare
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
26299a3
to
2ea43a7
Compare
@@ -824,8 +824,12 @@ InsertOneTuple(Oid objectid) | |||
if (objectid != (Oid) 0) | |||
HeapTupleSetOid(tuple, objectid); | |||
|
|||
if (IsYugaByteEnabled()) | |||
YBCExecuteInsert(boot_reldesc, tupDesc, tuple, ONCONFLICT_NONE); | |||
if (IsYugaByteEnabled()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line before the opening brace
@@ -626,8 +626,9 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self) | |||
|
|||
if (IsYBRelation(myState->rel)) | |||
{ | |||
slot->tts_tupleDescriptor=RelationGetDescr(myState->rel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecSetSlotDescriptor should be used.
However, it is a bad idea to modify passed in slot.
Why do we think that slot does not have a descriptor already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used ExecSetSlotDescriptor as this clears the tuple from the slot.
slot should have descriptor otherwise it won't work in slot_getattr as that retrieves tupleDesc from slot->tts_tupleDescriptor. But, I have done a check
RelationGetDescr(rel)!=slot->tts_tupleDescriptor -> this is true as it has different tdtypeid and tdrefcount. So, I tried to update it.
Based on your comments i have done a recheck and I hope we aren't using tdtypeid and tdrefcount anywhere.
@@ -485,8 +485,9 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self) | |||
tuple = ExecMaterializeSlot(slot); | |||
if (IsYBRelation(myState->transientrel)) | |||
{ | |||
slot->tts_tupleDescriptor=RelationGetDescr(myState->transientrel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@@ -5417,11 +5417,13 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) | |||
/* Write the tuple out to the new relation */ | |||
if (newrel) | |||
{ | |||
if (IsYBRelation(newrel)) | |||
if (IsYBRelation(newrel)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line before brace
@@ -5417,11 +5417,13 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) | |||
/* Write the tuple out to the new relation */ | |||
if (newrel) | |||
{ | |||
if (IsYBRelation(newrel)) | |||
if (IsYBRelation(newrel)){ | |||
newslot->tts_tupleDescriptor=RelationGetDescr(newrel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe tuple descriptor assignment again
@@ -1573,6 +1573,7 @@ ExecUpdate(ModifyTableState *mtstate, | |||
ModifyTable *plan = (ModifyTable *) mtstate->ps.plan; | |||
if (is_pk_updated) | |||
{ | |||
planSlot->tts_tupleDescriptor=RelationGetDescr(resultRelationDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and again
HeapTuple tuple, OnConflictAction onConflictAction, Datum *ybctid, | ||
YBCPgTransactionSetting transaction_setting) { | ||
Oid relfileNodeId = YbGetRelfileNodeId(rel); | ||
AttrNumber minattr = YBGetFirstLowInvalidAttributeNumber(rel); | ||
int natts = RelationGetNumberOfAttributes(rel); | ||
Bitmapset *pkey = YBGetTablePrimaryKeyBms(rel); | ||
|
||
TupleDesc tupleDesc = slot->tts_tupleDescriptor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if slot->tts_tupleDescriptor is NULL you probably can use RelationGetDescr(rel)
2ea43a7
to
90c2d9e
Compare
Expected result is
There are more failures of |
90c2d9e
to
eb4d763
Compare
@@ -1573,7 +1573,8 @@ ExecUpdate(ModifyTableState *mtstate, | |||
ModifyTable *plan = (ModifyTable *) mtstate->ps.plan; | |||
if (is_pk_updated) | |||
{ | |||
YBCExecuteUpdateReplace(resultRelationDesc, planSlot, tuple, estate); | |||
slot->tts_tuple->t_ybctid = YBCGetYBTupleIdFromSlot(planSlot); | |||
YBCExecuteUpdateReplace(resultRelationDesc, slot, tuple, estate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TODO already here to pass a transformed slot. I guess slot is the right option instead of planSlot
f71115d
to
e572a17
Compare
YBCPgTransactionSetting transaction_setting) { | ||
Oid relfileNodeId = YbGetRelfileNodeId(rel); | ||
AttrNumber minattr = YBGetFirstLowInvalidAttributeNumber(rel); | ||
int natts = RelationGetNumberOfAttributes(rel); | ||
Bitmapset *pkey = YBGetTablePrimaryKeyBms(rel); | ||
TupleDesc tupleDesc = slot->tts_tupleDescriptor; | ||
HeapTuple tuple = slot->tts_tuple; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the definition of ExecMaterializeSlot, I haven't used it. So, ybctid and oid are updated in tuple itself. Please lmk if i'm missing anything.
HeapTuple tuple, OnConflictAction onConflictAction, Datum *ybctid, | ||
YBCPgTransactionSetting transaction_setting) { | ||
Oid relfileNodeId = YbGetRelfileNodeId(rel); | ||
AttrNumber minattr = YBGetFirstLowInvalidAttributeNumber(rel); | ||
int natts = RelationGetNumberOfAttributes(rel); | ||
Bitmapset *pkey = YBGetTablePrimaryKeyBms(rel); | ||
|
||
TupleDesc tupleDesc = slot->tts_tupleDescriptor; | ||
if(tupleDesc == NULL || tupleDesc != RelationGetDescr(rel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If tupleDesc has to be equal to rel's descriptor, why not just to assign rel's descriptor to that variable?
@@ -18211,7 +18210,7 @@ YbATCopyTableRowsUnchecked(Relation old_rel, Relation new_rel, | |||
ExecStoreHeapTuple(tuple, newslot, false); | |||
|
|||
/* Write the tuple out to the new relation */ | |||
YBCExecuteInsert(new_rel, newslot->tts_tupleDescriptor, tuple, | |||
YBCExecuteInsert(new_rel, newslot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the statement is shorter now, can fit into single line.
@@ -342,7 +347,7 @@ static Oid YBCApplyInsertRow( | |||
if (IsCatalogRelation(rel)) | |||
{ | |||
MarkCurrentCommandUsed(); | |||
CacheInvalidateHeapTuple(rel, tuple, NULL); | |||
CacheInvalidateHeapTuple(rel, slot->tts_tuple, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not missing something, local variable tuple is defined here and equal to slot->tts_tuple
e572a17
to
ea5954c
Compare
…ilable in TupleTableSlot
ea5954c
to
0c8f066
Compare
… a TupleTableSlot A TupleTableSlot parameter replaces HeapTuple and TupleDescriptor parameters in all DocDB insert procedures, including bulk insert. In Postgres insert procedures work with a HeapTuple, because it represent on-disk format, so tuple data was simply copied into the data page. DocDB store individual values (Datums). In Yugabyte insert procedures operated on HeapTuple too, because it was ready available, however HeapTuple is less efficient to retrieve individual values, especially if tuple has variable length arguments. Fixed size arguments allow to calculate data offsets once, but since variable size value occur in the tuple, to retrieve a value from a HeapTuple it is necessary to iterate over all columns from the beginning and calculate offsets. This impacted a lot extremely wide tables with hundreds of columns. Values were added to the DocDB statement, one by one, and each time columns were iterated to find the offset. Hundreds of columns required up to hundreds iterations each. With new approach tuple may be already in slot and dematerialized at the time when it is about to be sent to DocDB for insert. If so, all values are readily available and materialization step is not needed. In worst case, if there is a HeapTuple to insert, access to values via TupleTableSlot does dematerialization in one pass, no multiple iterations over columns. Effect is barely noticeable for tables with a few/several columns, but load tests into tables with 1000+ columns show up to 3x improvement.
… a TupleTableSlot A TupleTableSlot parameter replaces HeapTuple and TupleDescriptor parameters in all DocDB insert procedures, including bulk insert. In Postgres insert procedures work with a HeapTuple, because it represent on-disk format, so tuple data was simply copied into the data page. DocDB store individual values (Datums). In Yugabyte insert procedures operated on HeapTuple too, because it was ready available, however HeapTuple is less efficient to retrieve individual values, especially if tuple has variable length arguments. Fixed size arguments allow to calculate data offsets once, but since variable size value occur in the tuple, to retrieve a value from a HeapTuple it is necessary to iterate over all columns from the beginning and calculate offsets. This impacted a lot extremely wide tables with hundreds of columns. Values were added to the DocDB statement, one by one, and each time columns were iterated to find the offset. Hundreds of columns required up to hundreds iterations each. With new approach tuple may be already in slot and dematerialized at the time when it is about to be sent to DocDB for insert. If so, all values are readily available and materialization step is not needed. In worst case, if there is a HeapTuple to insert, access to values via TupleTableSlot does dematerialization in one pass, no multiple iterations over columns. Effect is barely noticeable for tables with a few/several columns, but load tests into tables with 1000+ columns show up to 3x improvement.
… a TupleTableSlot A TupleTableSlot parameter replaces HeapTuple and TupleDescriptor parameters in all DocDB insert procedures, including bulk insert. In Postgres insert procedures work with a HeapTuple, because it represent on-disk format, so tuple data was simply copied into the data page. DocDB store individual values (Datums). In Yugabyte insert procedures operated on HeapTuple too, because it was ready available, however HeapTuple is less efficient to retrieve individual values, especially if tuple has variable length arguments. Fixed size arguments allow to calculate data offsets once, but since variable size value occur in the tuple, to retrieve a value from a HeapTuple it is necessary to iterate over all columns from the beginning and calculate offsets. This impacted a lot extremely wide tables with hundreds of columns. Values were added to the DocDB statement, one by one, and each time columns were iterated to find the offset. Hundreds of columns required up to hundreds iterations each. With new approach tuple may be already in slot and dematerialized at the time when it is about to be sent to DocDB for insert. If so, all values are readily available and materialization step is not needed. In worst case, if there is a HeapTuple to insert, access to values via TupleTableSlot does dematerialization in one pass, no multiple iterations over columns. Effect is barely noticeable for tables with a few/several columns, but load tests into tables with 1000+ columns show up to 3x improvement.
In the sample table we have almost all char columns and few varchar columns for both of these data types attlen in pg_attribute table is -1. Since we have varlen attributes in nocachegetattr function it is going to this part of the code (nocachecode, ybcModifyTable) . In this way the time complexity is O(n*n) where n is number of attributes.
there is an another for loop in (nocachecode) as it is doing the same work every time, whey can't we build a prefix sum array once for the row and just use that for all the attributes.