Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Upgrade Go, Hugo, Docsy and Node dependencies #22307

Open
wants to merge 372 commits into
base: master
Choose a base branch
from

Conversation

samiahmedsiddiqui
Copy link
Contributor

@samiahmedsiddiqui samiahmedsiddiqui commented May 8, 2024

Current Version Updated Version
Go 1.12 1.20
Hugo 0.115.4 0.125.6
Docsy 0.6.0 0.10.0

Docsy v0.6.0 was using Bootstrap library v4 and from Docsy v0.7.0 they updated the Bootstrap library to v5 which is a breaking change as mentioned on their release page.
https://github.com/google/docsy/releases/tag/v0.7.0

This PR contains the fixes to work properly with Bootstrap library v5.

mchiddy and others added 30 commits April 25, 2024 06:25
…n to fail

Summary: When taking a backup with prometheus using the backup script or via yba-ctl the platform_dump.sql file is left behind because the trap is overriden by the prometheus specific code. If the backup was created with root then YBA process can't create a dump anymore as it tries to overwrite the dump file and HA replication fails. This diff adds explicit attempt to cleanup the backup file separate from the trap firing.

Test Plan: Create yba-ctl backup with and without prometheus/releases, ensure platform_dump.sql file is cleaned up after the change.

Reviewers: dshubin, sanketh

Reviewed By: sanketh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34488
Summary: Issue discovered as part of mismatched YBA version backup/restore where the DB schema was not being completely restored, and sometimes would have errors cleaning out because of dependent keys or indexes that we added after the migration had been created. This diff adds an explicit step as part of restore where we drop the public schema of the yugaware DB before performing the restore. This gives a higher chance of success that the restore will completely succeed.

Test Plan:
Manually perform backup restore from 2.18 to master in an HA failover scenario. Ensure accept_any_certificate error does not show up.

Test backup restore of replicated YBA

Test backup restore across versions with yba-installer

Test restore of empty platform_dump.sql, ensure pg_restore is skipped.

Reviewers: nsingh, dshubin, sanketh

Reviewed By: sanketh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34348
…B to PG datum in walsender

Summary:
The walsender spends a considerable amount of time in converting QLValuePB to PG datum in ybc_pggate. Add a VLOG(1) which logs the time taken in this
operation.
Jira: DB-11059

Test Plan:
Jenkins: compile only

Existing tests. Looked at the log manually in a local test run.

Reviewers: asrinivasan

Reviewed By: asrinivasan

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34483
…r background tasks

Summary:
Add a flag to customise whether we pull background task states in ASH
Jira: DB-11052

Test Plan: yb_build.sh --cxx-test wait_states-itest

Reviewers: asaha

Reviewed By: asaha

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34456
Summary:
Currently we use Uuid::FromHexString to convert the node/instance_id to uuid.
However, this function expects the string to be encoded using LittleEndian.

ToBytes/FromBytes uses BigEndian order, thus we were seeing uuids reversed
in ash.

To fix this, we avoid using FromHexString and implement/use FromHexStringBigEndian.
This is similar to Uuid::FromString, except for also accepting string without dashes.
Jira: DB-10862

Test Plan: yb_build.sh --cxx-test wait_states-itest

Reviewers: asaha, hsunder

Reviewed By: asaha

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34332
…at can be assigned multiple values.

Summary:
Previously `yugabyted` was hard coded to parse multi-valued gflags with `{}` braces. This approach was not scalable and maintainable as it led to duplicated code for each special flag added to yugabyted which can take multiple values.
To handle the scenario better we have used regex and custom parsing to create a unified framework where simple and complex flags are parsed without hardcoding. This approach also allows us to support any new flags which are introduced in the DB side without having to make any changes in `yugabyted` code.
Jira: DB-11011

Test Plan: Manual testing.

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: yugabyted-dev, shikhar.sahay

Differential Revision: https://phorge.dev.yugabyte.com/D34400
… a version mismatch between the nodes.

Summary:
Made changes to the `/alerts` endpoint to alert the users when the cluster has nodes in different versions. The alert message informs users about the version mismatch and gives details about version of all the nodes.
Jira: DB-10790

Test Plan: Manual Testing

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: yugabyted-dev, shikhar.sahay

Differential Revision: https://phorge.dev.yugabyte.com/D34328
Summary:
Added service endpoints to list of cql/sql endpoints if loadbalancer service is enabled

Universe status after adding service endpoints:
```
cqlEndpoints:
  - yboperator-un-us-west1-a-z3a3-yb-tserver-0.yboperator-un-us-west1-a-z3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:9042
  - yboperator-un-us-west1-b-a3a3-yb-tserver-0.yboperator-un-us-west1-b-a3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:9042
  - yboperator-un-us-west1-c-b3a3-yb-tserver-0.yboperator-un-us-west1-c-b3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:9042
  - 10.150.6.150:9042
  - 10.150.4.170:9042
  - 10.150.1.209:9042
  sqlEndpoints:
  - yboperator-un-us-west1-a-z3a3-yb-tserver-0.yboperator-un-us-west1-a-z3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:5433
  - yboperator-un-us-west1-b-a3a3-yb-tserver-0.yboperator-un-us-west1-b-a3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:5433
  - yboperator-un-us-west1-c-b3a3-yb-tserver-0.yboperator-un-us-west1-c-b3a3-yb-tservers.yb-dev-operator-universe-test-4-742125735.svc.cluster.local:5433
  - 10.150.6.150:5433
  - 10.150.4.170:5433
  - 10.150.1.209:5433
```

Test Plan: Modified UTs, tested manually

Reviewers: anijhawan, #yba-api-review

Reviewed By: anijhawan, #yba-api-review

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34374
…ffer

Summary:
This revision adds computation and logging of the time taken by the Walsender in yb_decode and reorderbuffer while processing a single batch from the CDC
service.
Jira: DB-11071

Test Plan:
Jenkins: compile only

Existing tests. Looked at the log manually in a local test run.

Reviewers: asrinivasan

Reviewed By: asrinivasan

Subscribers: ycdcxcluster, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34530
Summary:
MAx query latency was not calculated properly - fixed that. Also excluded Max latency from query latency anomaly detection as it's not a good indicator typically.
Also, changed the way we build graphs - now we'll create 0 points for all periods, where we have no data. This will simplify time corellation - as all the graphs will show the same time range.

Test Plan: Tested manually

Reviewers: rmadhavan

Reviewed By: rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34540
…though no upgrades available, CDC link not clickable

Summary:
**[PLAT-13675]**

Problem : Upgrade Available link is shown in universe though no new upgrades available

This issue is a regression after this [[ https://phorge.dev.yugabyte.com/D33889 | diff ]]. In this diff two different set of  releases metadata is supplied to UI which is not integrated in universeDetail.js file.

**[PLAT-13677]**

Removing link since we don't have anything to show in UI

Test Plan: Tested manually

Reviewers: kkannan

Reviewed By: kkannan

Subscribers: yugaware, ui

Differential Revision: https://phorge.dev.yugabyte.com/D34526
…equests

Summary:
Previously the function to find the target partition for backward scan
might work incorrectly if the upper bound wasn't a partition bound.
It seems like it always was the case when bounds were set for a backward scan
request. However, parallel backward scan uses the bounds and they may be
arbitrary keys.

Also, fix the test.

Correct target partition to start backward scan with valid upper bound
is the partition holding the bound, unless the bound equals to lower key
of the partition and upper bound is exclusive, when target partition is
the previous partition.
Jira: DB-10577

Test Plan: ./yb_build --cxx-test client_client-test --gtest_filter ClientTest.TestKeyRangeUpperBoundFiltering

Reviewers: arybochkin, timur

Reviewed By: arybochkin

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33590
…de and update TS Fraemwork version in YBA

Summary: Add ability to filter by event operations in OUTLIER mode and update TS Fraemwork version in YBA

Test Plan: Add ability to filter by event operations in OUTLIER mode and update TS Fraemwork version in YBA

Reviewers: amalyshev, cdavid

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34541
Summary: Add both ASH graphs to each anomaly for better RCA

Test Plan: tested manually

Reviewers: rmadhavan

Reviewed By: rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34543
Summary:
Mark ddl_queue and replicated_ddls tables as system tables.
Move the non-colocation and single tablet creation rules to catalog manager.
Blocking tablet splits on these tables.

Jira: DB-7986

Test Plan:
```
ybd --cxx-test xcluster_ddl_replication-test --gtest_filter "XClusterDDLReplicationTest.CreateTable"
ybd --cxx-test xcluster_ddl_replication-test --gtest_filter "XClusterDDLReplicationTest.DisableSplitting"
```

Also tested locally and confirmed that these tables show up as system tables in the master ui.

Reviewers: hsunder, xCluster

Reviewed By: hsunder

Subscribers: asrivastava, yql, ybase, xCluster

Differential Revision: https://phorge.dev.yugabyte.com/D34018
…sensusInfo when received NOT_THE_LEADER error

Summary:
Problem Background:
In our system, when a client needs to perform an operation on a specific tablet, it first needs to find out which server is currently responsible for that operation. If the operation is a WriteRpc for example, it must find the tablet leader server. However, the system's current method of figuring out the tablet leader is not very efficient. It tries to guess the leader based on a list of potential servers (peers), but this guessing game can be slow, especially when there are many servers or when the servers are located far apart geographically. This inefficiency can lead to operations failing because the leader wasn't found quickly enough.

Additionally, the system doesn't handle server failures well. If a server is down, it might take a long time for the system to stop trying to connect to it, wasting valuable seconds on each attempt. While there's a mechanism to avoid retrying a failed server for 60 seconds, it's not very effective when a server is permanently out of service. One reason for this inefficiency is that the system's information about who the leaders are (stored in something called the meta cache) can become outdated, and it doesn't get updated if the system can still perform its tasks with the outdated information, even if doing so results in repeated connection failures.

Solution Introduction:
This ticket introduces a preliminary change aimed at improving how the system tracks the current leader for each piece of data. The idea is to add a new piece of information to the meta cache called "raft_config_opid," which records the latest confirmed leadership configuration for each tablet. This way, when the system receives new information about the leadership configuration (which can happen during normal operations from other servers), it can check this new information against what it already knows. If the new information is more up-to-date, the system can update its meta cache, potentially avoiding wasted efforts on trying to connect to servers that are no longer leaders or are down.
This diff, combined with D33197 and D33598, updates the meta-cache using TabletConsensusInfo that is piggybacked by a Write/Read/GetChanges/GetTransactionStatus ResponsePB when we sent a request to a non-leader but requires a leader to receive our request. These frequent RPC requests should be able to keep our meta-cache sufficiently up to date to avoid the situation that caused the CE.

Upgrade/Rollback safety:
The added field in the ResponsePBs is not to be persisted on disk, it is guarded by protobuf's backward compatibility
Jira: DB-9194

Test Plan:
Unit Testing:
1. ClientTest.TestMetacacheRefreshWhenSentToWrongLeader: Changes leadership of a RaftGroup after meta-cache is already filled in. This introduces a discrepancy between the information available in the meta-cache and the actual cluster configuration, and should return back a NOT_THE_LEADER error for our caller. Normally, this will prompt the TabletInvoker to try the next-in-line replica's Tablet Server, and using our test set up, this will guarantee that the TabletInvoker will retry the RPC at least 3 times. However, because this diff introduces the mechanism to refresh the meta-cache right away after a NOT_THE_LEADER error, we should observe that the RPC will succeed in 2 tries instead of one, the first attempt will piggyback the TabletConsensusInfo and update the meta-cache, while the other attempt will use that newest meta-cache and find the correct leader to send the request to.
2. CDCServiceTestMultipleServersOneTablet.TestGetChangesRpcTabletConsensusInfo: Since the GetChanges code path for updating meta-cache is sufficiently diverged from other RPC types, this test is introduced to explicitly check that when a cdc proxy receives a not the leader error message, its meta-cache should be refreshed.

Reviewers: mlillibridge, xCluster, hsunder

Reviewed By: mlillibridge

Subscribers: yql, jason, ycdcxcluster, hsunder, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D33533
Summary:
This prevents us from having to track which platforms support which targets externally.
Once this is backported to all supported branches we can make changes to build jobs to call
all targets all the time regardless of platform.

Test Plan: Built locally

Reviewers: devops, sanketh, steve.varnau

Reviewed By: steve.varnau

Subscribers: devops, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34296
Summary:
If we retry a task, one of nodes could be stopped at the moment.
And for this case, in RF3 3-node universe, this will fail (because it will test all the nodes one by one)
But eventual upgrade will be successful (because we will process failed server first)

Test Plan: sbt test

Reviewers: cwang, sanketh, nsingh

Reviewed By: cwang, nsingh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34542
Summary:
For now, we are symlinking the verson.txt from YBA, as it uses the
same versioning scheme as ybdb.  Later, when YBA moves to 3.0.0 versioning, we may need to manually define the version

Test Plan: build and validate min_yba_verison is populated in version.txt

Reviewers: muthu

Reviewed By: muthu

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34438
…ller

Summary:
As we move to the new release form of 2024.1, when comparing versions
we now want to block any comparison between a stable version (2.20 or 2024.1)
and a preview version (2.17, 2.23)

specifically, we do not want to allow upgrades from one to the other

Test Plan: new unit test

Reviewers: muthu, skurapati, sanketh

Reviewed By: muthu, skurapati

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34440
…n add release

Summary:
1. Add a PUT API call to refresh releases
2. Disable Delete/Disable Deployment button for releases in use
3. Proper Error Message Not Displayed When Adding Duplicate Release
4. On create release page, ensure to show the versions that are ACTIVE

Test Plan:
{F173401}

{F173402}Please refer to screenshots

Reviewers: jmak, dshubin

Reviewed By: jmak

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34495
* 404 redirects

* Apply suggestions from code review

Co-authored-by: Aishwarya Chakravarthy  <achakravarthy@yugabyte.com>

---------

Co-authored-by: Aishwarya Chakravarthy <achakravarthy@yugabyte.com>
Summary:
Adding yb-admin commands:
`repair_xcluster_outbound_replication_add_table <replication_group_id> <table_id> <stream_id>`
`repair_xcluster_outbound_replication_remove_table <replication_group_id> <table_id>`

These will alow us to manually add or remove an individual table from the source side Outbound Replication Group.
`repair_xcluster_outbound_replication_add_table` requires a stream_id which can be created using `bootstrap_cdc_producer`.
`repair_xcluster_outbound_replication_remove_table` will not delete the xcluster stream. It will have to be manually deleted with `delete_cdc_stream`.
NOTE: This is only meant for manual use by DevOps in extreme situations.

**Upgrade/Rollback safety:**
New proto messages and APIs are guarded under `enable_xcluster_api_v2`

Fixes yugabyte#21540
Jira: DB-10425

Test Plan:
XClusterOutboundReplicationGroupTest.Repair
XClusterOutboundReplicationGroupTest.RepairWithYbAdmin
XClusterDBScopedTest.DropTableOnProducerThenConsumer
XClusterDBScopedTest.DropAllTables
XClusterDBScopedTest.DisableAutoTableProcessing

Reviewers: jhe, slingam, xCluster

Reviewed By: jhe

Subscribers: xCluster, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34239
Summary:
a db release cannot be greater then the current yba
version, so check that in the api controllers.

Test Plan: unit tests

Reviewers: muthu, anijhawan

Reviewed By: muthu

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34387
Summary: The fix https://phorge.dev.yugabyte.com/D34125 to sync remote configs removed the initialization of the local instance last backup time so subsequent updates were always null. This adds that line back in.

Test Plan: Setup HA, ensure that lastBackupTime is displayed and shows up in HA config

Reviewers: dshubin, sanketh

Reviewed By: dshubin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34554
… ordinary gflag

Summary:
Here we introduce a new gflag:
```
~/code/yugabyte-db/src/yb/tserver/server_main_util.cc:36:
DEFINE_NON_RUNTIME_bool(
    use_memory_defaults_optimized_for_ysql, false,
    "If true, the recommended defaults for the memory usage settings take into account the amount "
    "of RAM and cores available and are optimized for using YSQL.  "
    "If false, the recommended defaults will be the old defaults, which are more suitable "
    "for YCQL but do not take in account the amount of RAM and cores available.");
```
to gate the new memory recommendations.

More precisely, we still have a concept of recommended defaults and individual memory settings can choose whether or not to use the recommended defaults.  For example:
```
~/code/yugabyte-db/src/yb/tserver/tablet_memory_manager.cc:59:
// NOTE: The default here is for tools and tests; the actual defaults
// for the TServer and master processes are set in server_main_util.cc.
DEFINE_NON_RUNTIME_int32(tablet_overhead_size_percentage, 0,
    "Percentage of total available memory to use for tablet-related overheads. A value of 0 means "
    "no limit. Must be between 0 and 100 inclusive. Exception: "
    BOOST_PP_STRINGIZE(USE_RECOMMENDED_MEMORY_VALUE) " specifies to instead use a "
    "recommended value determined in part by the amount of RAM available.");
```
If this gflag is set to USE_RECOMMENDED_MEMORY_VALUE by the customer or left unset (the gflag default is set to that value for the TServer and master processes elsewhere), then we will use the recommended default for it.

If the new gflag is set to true then we use the new recommendations we have derived using our models.  If it is set to false, then we use the same values we would've used prior to version 2024.1 if the customer had not provided a value for this gflag.

Thus, the customer gets essentially the same behavior as before 2024.1 if they do not set the new gflag.
Jira: DB-11090

Test Plan:
Started cluster with and without flag on and look at logs:
```
~/code/yugabyte-db/bin/yb-ctl start

I0425 10:52:25.180203  5935 server_main_util.cc:90] Setting flag db_block_cache_size_percentage to recommended value -3
I0425 10:52:25.180212  5935 server_main_util.cc:92] Flag default_memory_limit_to_ram_ratio has value 0.65 (recommended value is 0.85)
I0425 10:52:25.180217  5935 server_main_util.cc:94] Setting flag tablet_overhead_size_percentage to recommended value 0
```

```
bin/yb-ctl start --tserver_flags "use_memory_defaults_optimized_for_ysql=true" \
    --master_flag "use_memory_defaults_optimized_for_ysql=true"

I0425 10:57:00.198920  6412 server_main_util.cc:101] Total available RAM is 31.183224 GiB
I0425 10:57:00.198928  6412 server_main_util.cc:90] Setting flag db_block_cache_size_percentage to recommended value 32
I0425 10:57:00.198938  6412 server_main_util.cc:92] Flag default_memory_limit_to_ram_ratio has value 0.65 (recommended value is 0.6)
I0425 10:57:00.198943  6412 server_main_util.cc:94] Setting flag tablet_overhead_size_percentage to recommended value 10
```
(yb-ctl overrides default_memory_limit_to_ram_ratio.)

Reviewers: zdrudi

Reviewed By: zdrudi

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34547
…ssful RPC responds

Summary:
Background:
As detailed in https://phorge.dev.yugabyte.com/D33533, the YBClient currently fails to update its meta-cache when there are changes in a raft group's configuration. This lapse can lead to inefficiencies such as persistent follower reads not recognizing the addition of closer followers. Consequently, even if there are suitable nearer followers, the system continues to rely on more distant followers as long as the RPCs are successful.

Solution:
This update proposes enhancing RPC mechanisms (Read, Write, GetChanges, GetTransactionStatus) by appending the current raft_config_opid_index to each request. Upon receiving a request, if the raft_config_opid_index is outdated compared to the committed raft config opid index on the TabletPeer handling the request, the Tablet Server will include updated raft consensus state information in its response. This change aims to ensure that the meta-cache remains current, thus improving the system's efficiency in recognizing and utilizing the optimal server configurations for processing requests. This adjustment is part of a series of updates (alongside D33197 and D33598) designed to keep the meta-cache sufficiently current, thereby preventing the inefficiencies previously caused by outdated cache information. A flag enable_metacache_partial_refresh is added to turn the feature on, it is by default of right now.

Upgrade/Rollback safety:
The additional field in the Response and Request Protobufs is temporary and will not be stored on disk, maintaining compatibility and safety during potential system upgrades or rollbacks.
Jira: DB-9194

Test Plan:
Jenkins: urgent
Full Coverage Testing:
Added a test flag FLAGS_TEST_always_return_consensus_Info_for_succeeded_rpc which will be turned on during debug mode. This flag will prompt the GetRaftConfigOpidIndex method on RemoteTablet to always return an OpId Index of value -2. So when the Tablet server is about to send back a successful response, it will find out that the request's piggybacked OpId index is stale, thus piggyback a TabletConsensusInfo to the response. When we receive the response in the aforementioned RPCs, if this flag is turned on, it will use a DCHECK to verify that if the RPC response can contain a TabletConsensusInfo and that the response was successful, then it must be the case that the TabletConsensusInfo exists in the response. This essentially allows us to leverage all the existing tests in the code base that exercises these RPCs to DCHECK our code path.

Unit testing:
Added metacache_refresh_itest.cc, which contains the following tests:
TestMetacacheRefreshFromFollowerRead:
1. Sets up an external mini-cluster.
2. Fills in the meta-cache by issuing a write op.
3. Change the raft configuration of the tablet group by blacklisting a node and adding a node.
4. Verify the next ConsistentPrefix read successfully refreshes meta-cache using a sync point.

TestMetacacheNoRefreshFromWrite:
1. Turns off the FLAGS_TEST_always_return_consensus_Info_for_succeeded_rpc
2. Fills in the meta-cache by issuing a write op.
3. Issue another write op and observe that no refresh happened.

Reviewers: mlillibridge, xCluster, hsunder

Reviewed By: mlillibridge

Subscribers: bogdan, ybase, ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D34272
…cache_events=1

Summary:
To reproduce the bug:

./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1

./bin/ysqlsh
```
ysqlsh: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5433?
FATAL:  cannot read pg_class without having selected a database
```

I found that when MyDatabaseId is not resolved yet, there is a debugging code
block that is executed when yb_debug_log_catcache_events=1 cannot work
properly because it requires a database already selected. The debugging
code relies on other PG code to work and it failed at:

```
static HeapTuple
ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
{
    HeapTuple   pg_class_tuple;
    Relation    pg_class_desc;
    SysScanDesc pg_class_scan;
    ScanKeyData key[1];
    Snapshot    snapshot;

    /*
     * If something goes wrong during backend startup, we might find ourselves
     * trying to read pg_class before we've selected a database.  That ain't
     * gonna work, so bail out with a useful error message.  If this happens,
     * it probably means a relcache entry that needs to be nailed isn't.
     */
    if (!OidIsValid(MyDatabaseId))
        elog(FATAL, "cannot read pg_class without having selected a database");
```

Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid
calling `YBDatumToString` which can trigger the PG FATAL.

After the fix, the above simple test now succeeds, and the PG logs show debugging
logs like:

```
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 10:
    Target rel: pg_authid (oid : 1260), index oid 2676
    Search keys: typid=19 value=<not logged>
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 11:
    Target rel: pg_authid (oid : 1260), index oid 2677
    Search keys: typid=26 value=<not logged>
2024-04-25 23:37:25.885 UTC [27592] LOG:  Catalog cache miss on cache with id 35:
    Target rel: pg_namespace (oid : 2615), index oid 2684
```
Jira: DB-11064

Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion

Reviewers: jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34500
haikarthikssk and others added 22 commits May 9, 2024 22:16
Summary:
This [[ https://phorge.dev.yugabyte.com/D27665 | diff ]] introduced this bug. Instead of using "incrementalBackupProps" as parameter, "incrementalBackup" was used.
Fix:
Used the correct parameter

Test Plan:
Tested manually by creating full backup and incremental backup

Reviewers: lsangappa

Reviewed By: lsangappa

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34905
Summary: Reverting this changes, as i'm seeing the graphs are not getting displayed properly

Test Plan: Tested manually

Reviewers: rmadhavan, lsangappa

Reviewed By: rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34912
…2298)

enable_tablet_split_of_xcluster_replicated_tables is an AutoFlag and is enabled by default in 2.23 and upcoming 2024.1. Customers should not touch the value of the AutoFlag manually.
Only changing the preview pages since 2024.1 doc release will copy the content from preview to stable.
* changed preview to 2.19

* removed 2.21 release notes
…st node not set for all involved transactions

Summary:
There are two modes of usage for `pg_locks`:
1. where the user doesn't specify transaction(s)
2. where the incoming request has transactions set

When populating the response as part of 1 above, we always expect that the distributed transactions being returned in the response have host node uuid set. The below status check ensures this
```
    RSTATUS_DCHECK(
        seen_transactions.empty() || !req->transaction_ids().empty(), IllegalState,
        Format("Host node uuid not set for transactions: $0", yb::ToString(seen_transactions)));
```

When processing the lock status request at each tablet, we should populate the response with all granted/awaiting lock info iff the request doesn't specify a transaction set and also doesn't have `max_single_shard_waiter_start_time_us` set (the field which dictates the age of fast path transactions that will be included in the response).

The wait-queue does this by checking for both
```
if (transactions.empty() && max_single_shard_waiter_start_time_us == 0) {
```
but at the tablet, we just check the following
```
if (transactions.empty()) {
  // populate the resp with all granted locks.
}
```

It could happen that a tablet receives a lock status request with just `max_single_shard_waiter_start_time_us` set (to fetch old fast path waiters), and we end up populating all granted locks at the tablet, which isn't expected leading to the check failure upstream.

This diff addresses the issue by populating the lock status resp with all granted locks only if the below holds true
```
transactions.empty() && max_single_shard_waiter_start_time_us == 0
```
Jira: DB-11108

Test Plan: ./yb_build.sh --cxx-test='TEST_F(PgGetLockStatusTest, TestLockStatusRespHasHostNodeSet) {'

Reviewers: rsami, rthallam, pjain

Reviewed By: rsami

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34800
…cal cache

Summary:
The memory leak was identified via code inspection during debugging the issue.

```
static void
YbInitUpdateRelationCacheState(YbUpdateRelationCacheState *state)
{
    YbLoadTupleCache(&state->pg_attrdef_cache, AttrDefaultRelationId,
                     &YbExtractAttrDefTupleCacheKey, "pg_attrdef local cache");
    YbLoadTupleCache(&state->pg_constraint_cache, ConstraintRelationId,
                     &YbExtractConstraintTupleCacheKey,
                     "pg_constraint local cache");
}
```

and

```
static void
YbLoadTupleCache(YbTupleCache *cache, Oid relid,
                 YbTupleCacheKeyExtractor key_extractor, const char *cache_name)
{
    Assert(!(cache->rel || cache->data));
    cache->rel = heap_open(relid, AccessShareLock);
    HASHCTL ctl = {0};
    ctl.keysize = sizeof(Oid);
    ctl.entrysize = sizeof(YbTupleCacheEntry);
    cache->data = hash_create(cache_name, 32, &ctl, HASH_ELEM | HASH_BLOBS);
```

We have called `hash_create` in `YbLoadTupleCache`, but in the corresponding
`YbCleanupTupleCache` I do not see a call to `hash_destroy`:

```
static void
YbCleanupTupleCache(YbTupleCache *cache)
{
    if (!cache->rel)
        return;

    heap_close(cache->rel, AccessShareLock);
}
```

This diff fixes the memory leak by calling `hash_destroy` in
`YbCleanupTupleCache`.

Jira: DB-11181

Test Plan:
Manual test.
(1) ./bin/yb-ctl create --rf 1 --tserver_flags ysql_catalog_preload_additional_tables=true
(2) before the fix

```
yugabyte=# select used_bytes,total_bytes from pg_get_backend_memory_contexts() where name like '%local cache';
 used_bytes | total_bytes
------------+-------------
       6616 |        8192
       5576 |        8192
(2 rows)

```
(3) after the fix

```
yugabyte=# select used_bytes,total_bytes from pg_get_backend_memory_contexts() where name like '%local cache';
 used_bytes | total_bytes
------------+-------------
(0 rows)

```

Reviewers: kfranz

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34845
… to GB, Add Validation In progress message

Summary:
**[PLAT-13596]**

Convert core File size unit from bytes to GB for User friendliness.

**[PLAT-13904]**

Show Validation In progress message while provider form is validating from backend
Added a new component **SubmitInProgress** for Provider form

Test Plan: Tested manually

Reviewers: kkannan, jmak

Reviewed By: kkannan

Subscribers: ui, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34902
Summary:
YbDdlRollbackEnabled is defined as `static bool inline` which causes `old-style-declaration` gcc error.
Switch it to `static inline bool`

-Wold-style-declaration (C and Objective-C only)
Warn for obsolescent usages, according to the C Standard, in a declaration. For example, warn if storage-class specifiers like static are not the first things in a declaration. This warning is also enabled by -Wextra.

Fixes yugabyte#22334
Jira: DB-11239

Test Plan: Jenkins

Reviewers: fizaa, jason

Reviewed By: fizaa, jason

Subscribers: jason, yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34918
Summary:
Remove the option to create an xCluster stream via the cdc_service. This was just forwarding the call to master leader, and used only in TEST code.
Creation of xCluster streams requires custom options to be set on it. This code was scattered across multiple places and there used to be several client function calls. Unified all these into one `XClusterClient::CreateXClusterStream<Async>`.
xCluster streams use a static set of options which does not change. So now `XClusterSourceManager` sets these to the correct value. The Client code still sends the same options to make sure we work with older clusters. The client code can be cleaned up in the future

Fixes yugabyte#22343
Jira: DB-11249

Test Plan: Jenkins

Reviewers: jhe, slingam, xCluster

Reviewed By: jhe

Subscribers: stiwary, skumar, ycdcxcluster, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34138
…clauses requiring non-batched outerrels

Summary:
The affected lines of this change go through the join clauses to see if they would be pushed down to a batched index scan and conflict with any batched outerrels of this scan. Before this change, the logic that determined whether or not a join clause would be pushed down to it was insufficient. This change fixes that gap.

Fixes bug introduced by: 59191f1//D32986
Needs backports on 2024.1, 2.20, 2.18
Jira: DB-10781

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressJoin'

Reviewers: mtakahara

Reviewed By: mtakahara

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34888
…erialized txn in reorder buffer

Summary:
While streaming changes as part of the logical replication protocol, the reorderbuffer serializes large transactions to disk to avoid using excessive memory.

Apart from the datum and isnull values used to describe a heap tuple, we also have special handling for intentionally omitted values due to the CHANGE replica
identity. This array was not getting serialized as part of the serialization of the heap tuples, leading to the values being lost after restoring from the disk.

This revision updates the serialization and de-serialization logic to handle the yb_is_omitted array as well.
Jira: DB-10863

Test Plan:
Jenkins: test regex: .*ReplicationSlot.*

./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot'

Reviewers: asrinivasan

Reviewed By: asrinivasan

Subscribers: yql, ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D34851
…ate table

Summary:
This diff adds support to delete the cdc_state table entry for the slot, when the stream corresponding to the slot is deleted.

The master background task sets the checkpoint to max for all the state table entries corresponding to the stream that is being deleted. The slot row entry is no exception to this. We use this max checkpoint information to find out the slot entries that need to be deleted.
The slot entry is used for setting of `cdc_sdk_safe_time` for the other entries for that stream, hence while deleting we need to ensure that the slot entry is deleted only after all the entries with valid tablet_id for that stream are deleted. We accomplish this by deleting the entries with valid tablet_id first and then deleting slot entry in the next pass of `UpdatePeersAndMetrics`.
Jira: DB-10134

Test Plan:
Jenkins: .*CDCSDK.*
./yb_build.sh --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test --gtest_filter CDCSDKConsumptionConsistentChangesTest.TestSlotRowDeletionWithSingleStream

./yb_build.sh --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test --gtest_filter CDCSDKConsumptionConsistentChangesTest.TestSlotRowDeletionWithMultipleStreams

Reviewers: asrinivasan, skumar, stiwary, siddharth.shah

Reviewed By: asrinivasan

Subscribers: ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34828
Summary:
Custom CA trust store runtime config has been enabled since 2.18
Doing the cleanup to remove the same & enabling the feature by default

Test Plan: Manually verified

Reviewers: amalyshev, rmadhavan, kkannan

Reviewed By: amalyshev, rmadhavan

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34765
@samiahmedsiddiqui samiahmedsiddiqui changed the title [Docs] Upgrade Go, Hugo, Docsy and other dependencies [Docs] Upgrade Go, Hugo, Docsy and Node dependencies May 13, 2024
@samiahmedsiddiqui samiahmedsiddiqui marked this pull request as ready for review May 13, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet