Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc][yba] 2024.1 YBA CLI landing page #22209

Open
wants to merge 196 commits into
base: master
Choose a base branch
from

Conversation

ddhodge
Copy link
Contributor

@ddhodge ddhodge commented Apr 30, 2024

YBA CLI landing page
DOC-330

@netlify /preview/yugabyte-platform/anywhere-automation/

rajmaddy89 and others added 30 commits April 17, 2024 17:52
…p string

Summary:
So when we first look at the input config, the problem with current logic is that it finds the first instance directly, rather we need to ensure we match via regexnwith whitespace characters

For example:
Validation FAILS:
host all +ldap_service_users 10.0.0.0/8 ldap  ldapurl=ldaps://ldap.dev.schwab.com:636 ldapsearchattribute=""sAMAccountName"" ldapbasedn=""OU=ServiceAccount,DC=csdev,DC=corp""
ldapbinddn=""CN=svc.yb_ldap_dev,OU=ServiceAccount,DC=csdev,DC=corp"" ldapbindpasswd=""Password""

The above is not working as it finds the first instance of ldap being +ldap_service_users which is wrong, the first instance of ldap is one which is the whole word with whitespace characters

Validation SUCCEEDS:
host all +asldwfhhasg 10.0.0.0/8 ldap  ldapurl=ldaps://ldap.dev.schwab.com:636 ldapsearchattribute=""sAMAccountName"" ldapbasedn=""OU=ServiceAccount,DC=csdev,DC=corp"" ldapbinddn=""CN=svc.yb_ldap_dev,OU=ServiceAccount,DC=csdev,DC=corp""
ldapbindpasswd=""Password""

Test Plan:
Please refer to the screenshots
{F170693}

{F170694}

Reviewers: jmak

Reviewed By: jmak

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34230
…ount

Summary:
In this issue we see a tserver crash that can be traced to the root cause that
yb-master's catalog version went backwards. Tserver has already seen version
pair of (320, 242) for DB 16429 but later yb-master's catalog version pair for
DB 16429 became (319, 242). Timing wise it coincided with a PITR restore
operation, and we see the following log repeatedly after a PITR restore operation:

```
E0401 18:14:53.567183 31982 tablet_server.cc:917] Ignoring ysql db 16429 catalog
version update: new version too old. New: 319, Old: 320
```

It is unexpected for catalog version to go backwards on a PITR restore operation.

This lasted for more than 20 minutes until a ysql_dump-based restore operation
happened. As part of the restore operation we were running a global-impact DDL
statement:

```
\if :use_roles
    ALTER DATABASE "postgres_88" OWNER TO postgres;
ALTER DATABASE
\endif
```

This DDL will increment catalog versions for all databases, including 16429. Now
the new pair for 16429 becomes (320, 320). Note that the second number is the
breaking version. When it changes it is always equal to the first number. It is
this new pair (320, 320) that caused the tserver to crash because we see the
breaking version associated with 320 is 242 and it is not possible to have
breaking version change given the same current version.

This bug has so far only appeared once in about 80+ runs of the integration
test `test_cross_db_concurrent_ddls`. The root cause is that we have seen master
version going backwards. However we only have a LOG(DFATAL) when that happens.
If it really happens in production environment which has release build,
this can simply go undetected for long time and eventually crashes when a new
breaking DDL statement come to bump up the current catalog version and breaking
version at the same time.

I added a new gflag --ysql_min_new_version_ignored_count that if we see stale
catalog version returned from master for consecutively this many number of
times, then crash the tserver earlier to sync up with master again. This helps
to
* reproduce the bug easier in our integration test (now have --log_ysql_catalog_versions=true)
* avoids running a tserver for too long a time when its catalog version is out
  of sync with master.

This diff is only limited to per-database catalog version mode because we do not
want to change the behavior in global catalog version mode at this time.
Jira: DB-10651

Test Plan:
(1) ./yb_build.sh release --cxx-test pg_catalog_version-test

(2) Manual test
* # create a RF-3 local cluster

  ./bin/yb-ctl create --rf 3

* ./bin/ysqlsh -c "create table foo(id int)"

* # run a DDL that increments DB yugabyte's current_version to 2

  ./bin/ysqlsh -c "alter table foo add column v1 text"

* # verify database yugabyte's current_version is 2

  ./bin/ysqlsh -c "select * from pg_yb_catalog_version"

* # manually force DB yugabyte's version to go back to 1

  ./bin/ysqlsh -c "set yb_non_ddl_txn_for_sys_tables_allowed=1; update pg_yb_catalog_version set current_version = 1"

* Wait and see all 3 tservers crash with the expected log messages:

F0415 23:51:02.940397 29969 tablet_server.cc:924] Ignoring ysql db 13248 catalog version update: new
version too old. New: 1, Old: 2, ignored count: 19
F0415 23:51:05.871065 30011 tablet_server.cc:924] Ignoring ysql db 13248 catalog version update: new
version too old. New: 1, Old: 2, ignored count: 31
F0415 23:51:17.931600 29927 tablet_server.cc:924] Ignoring ysql db 13248 catalog version update: new
version too old. New: 1, Old: 2, ignored count: 48

Reviewers: jason

Reviewed By: jason

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34146
… SST files only retained for CDC"

Summary:
D33131 introduced a segmentation fault which was  identified in multiple tests.
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4
    frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11
    frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32
    frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45
    frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5
    frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16
    frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7
```
This diff reverts the change to unblock the tests.

The proper fix for this problem is WIP
Jira: DB-10780, DB-10466

Test Plan: Jenkins: urgent

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34245
* Sandbox support for Innovation track

* typo
…from multiple tablets of a tserver leading to undetected deadlocks

Summary:
The local waiting txn registry at the Tablet Server maintains wait-for dependencies arising at all tablet leaders hosted on the node. When a request is processed at the wait-queue, the local registry sends a partial update containing just the wait-for dependencies of the request. The registry keeps accumulating all dependencies, and periodically sends full update requests which comprise of wait-for dependencies of all outstanding requests.

`UpdateTransactionWaitingForStatusRequestPB` proto is used for sending the wait-for dependency info. Each `WaitingTransaction` in `UpdateTransactionWaitingForStatusRequestPB` has a `wait_start_time` which is populated at the local registry and is set to `clock_->Now()`.

The deadlock detector maintains a multi container `waiters_` indexed by key pair <txn_id, tserver_uuid>. In the existing implementation, the detector overwrites the wait-for dependencies of a waiter when it encounters a `WaitingTransaction` with a later timestamp than the existing one. Since multiple requests of the same txn (at the same tablet or at multiple tablets) with different blockers can exist at a given time, it led to incomplete wait-for info at the detector, thus resulting in undetected deadlocks.

This diff addresses the issue by changing the logic at the detector to keep track of all dependencies of the waiter and not overwrite it based on start time. Each time the detector sees a `WaitingTransaction`, it triggers probes for just the new wait-for dependencies contained in the message and appends the blocker info to the existing waiter record (if any). Additionally, we propagate the request id info of the waiter to the deadlock detector, and store a list of request ids blocked on the `{blocker_id, subtxn, status_tablet}` tuple.

Note that the diff doesn't change the periodic deadlock probing algorithm at the detector.

Sample vlogs from the detector for one of the below tests
```
[ts-1] I0416 05:31:20.734838 129157 deadlock_detector.cc:611] vlog4: T 9eed4e1f5b2745aeb8333e8fe0fd9c61 D c78320b1-e14c-44ba-b6fe-1518c4be25d2 Adding new wait-for relationship -- waiter txn id: 631c3948-6816-403c-890a-95560fc4615f blocker id: 6cb37588-3d67-4301-8336-cf944222cdab, status tablet: 9eed4e1f5b2745aeb8333e8fe0fd9c61, blocking subtxn info: [2, 2], waiting_requests (id, start_time): [{11, { days: 19829 time: 05:31:20.733994 }}] received from TS: 25db8deb4335401ab1261c01e8634ab8
[ts-1] I0416 05:31:20.735044 129157 deadlock_detector.cc:629] vlog4: T 9eed4e1f5b2745aeb8333e8fe0fd9c61 D c78320b1-e14c-44ba-b6fe-1518c4be25d2 Updated blocking data -- txn_id_: 631c3948-6816-403c-890a-95560fc4615f, tserver_uuid_: 25db8deb4335401ab1261c01e8634ab8, waiter_data_: [blocker id: b9356bf9-6bb6-4ef6-be4e-96dcfec46ddd, status tablet: a8643356f0e24c1a836b029f1bea0dfa, blocking subtxn info: [4, 4], waiting_requests (id, start_time): [{10, { days: 19829 time: 05:31:20.733203 }}], blocker id: 6cb37588-3d67-4301-8336-cf944222cdab, status tablet: 9eed4e1f5b2745aeb8333e8fe0fd9c61, blocking subtxn info: [2, 2], waiting_requests (id, start_time): [{11, { days: 19829 time: 05:31:20.733994 }}, {9, { days: 19829 time: 05:31:20.727855 }}]]
[ts-1] I0416 05:31:20.735205 129157 deadlock_detector.cc:244] vlog4: T 9eed4e1f5b2745aeb8333e8fe0fd9c61 D c78320b1-e14c-44ba-b6fe-1518c4be25d2 - probe(c78320b1-e14c-44ba-b6fe-1518c4be25d2, 1) AddBlocker: waiting_txn_id: 631c3948-6816-403c-890a-95560fc4615f, blocker id: b9356bf9-6bb6-4ef6-be4e-96dcfec46ddd, status tablet: a8643356f0e24c1a836b029f1bea0dfa, blocking subtxn info: [4, 4], waiting_requests (id, start_time): [{10, { days: 19829 time: 05:31:20.733203 }}], probe_num: 1, min_probe_num: 0
[ts-1] I0416 05:31:20.735318 129157 deadlock_detector.cc:244] vlog4: T 9eed4e1f5b2745aeb8333e8fe0fd9c61 D c78320b1-e14c-44ba-b6fe-1518c4be25d2 - probe(c78320b1-e14c-44ba-b6fe-1518c4be25d2, 1) AddBlocker: waiting_txn_id: 631c3948-6816-403c-890a-95560fc4615f, blocker id: 6cb37588-3d67-4301-8336-cf944222cdab, status tablet: 9eed4e1f5b2745aeb8333e8fe0fd9c61, blocking subtxn info: [2, 2], waiting_requests (id, start_time): [{11, { days: 19829 time: 05:31:20.733994 }}, {9, { days: 19829 time: 05:31:20.727855 }}], probe_num: 1, min_probe_num: 0
```

Test Plan:
Jenkins

./yb_build.sh --cxx-test pgwrapper_pg_wait_on_conflict-test --gtest_filter PgWaitQueueRF1Test.TestDeadlockAcrossMultipleTablets -n 20
./yb_build.sh --cxx-test pgwrapper_pg_wait_on_conflict-test --gtest_filter PgWaitQueueRF1Test.TestDetectorPreservesBlockerSubtxnInfo -n 20
./yb_build.sh --cxx-test='TEST_F(UnsignedIntSetTest, Hash) {'

test 1 fails consistently prior to this diff - w waits on blockers b1 and b2. the test ensures that the detector doesn't erase w->b1 on seeing w->b2)
test 2 ensures that the detector doesn't overwrite blocking subtxn info of a given blocker - w waits on b1(subtxn 2) and b1(subtxn 3). the test asserts that the detector doesn't erase w->b1(subtxn 2) on seeing w->b1(subtxn 3).

Reviewers: rsami

Reviewed By: rsami

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D33641
Summary:
Audit logs were not exported from newly added read replica cluster. We were sending `AuditLogConfig` as null when calling the read replica API. Since the primary cluster already has the `AuditLogConfig`, we should be using that instead when provisioning the new nodes.

The following flow didn't work before, which does now:
```
Create a universe without RR.
Enable DB audit logging.
Add RR to this universe.
Verify --> Audit logs are not visible on DD
```

Test Plan:
Manually tested the following flow:
Tried the following scenarios:
Case 1:
```
Create a universe with primary cluster and RR.
Enable DB audit logs.
Verify. ---> Works as expected. Audit logs from primary cluster nodes and RR nodes are visible on DD.
```

Case 2:
```
Create a universe without RR.
Enable DB audit logging.
Add RR to this universe.
Verify --> Works as expected. Audit logs from primary cluster nodes and RR nodes are visible on DD.
```

Case 3 (Patched this diff on my diff: https://phorge.dev.yugabyte.com/D33949):
```
Create a universe without RR.
Enable DB audit logging.
Add RR to this universe.
Verify --> Works as expected. Audit logs from primary cluster nodes and RR nodes are visible on DD.
Add new node to universe.
Verify --> Works as expected. Audit logs from both primary cluster nodes and RR nodes are visible on DD.
```

Reviewers: amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D33995
Summary:
In case the AMI is not present for a region in bundle, we were falling
back to retrieve the same from the region.
This diff makes changes as -
1. For YBA managed bundles - we will read the AMIs from the YBA metadata in
case they are not present in the bundle.

2. For Custom bundles - We will fail without having any fall back mechanism.

Removes the dependency on region -> ybImage.

Test Plan:
Created a provider with custom bundle.
Removed the ybImage from the bundle.
Deployed the universe using the same. Verified that it failed.

Created a provider with YBA managed bundles.
Removed the ybImage from the bundle.
Deployed the universe. Verified that it picks up from the YBA's image metadata.

Reviewers: amalyshev, nbhatia

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34119
Summary:
Special characters like `$` get escaped when provided via the flags for passwords. The help text
indicates using `''` quotes to provide these values to be parsed correctly

Test Plan: Test create universe and ysql connection with single quotes

Reviewers: skurapati, rohita.payideti

Reviewed By: rohita.payideti

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34264
Test Plan:
Start tserver with rpc bind and http addresses set differently and verify that the display shows this. Tested with --rpc_bind_addresses set to 0.0.0.0 and verified that the hostname shows up instead (this seems to depend on whether ybdb can discover a local hostname)

{F166067}

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: hsunder, esheng, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D33726
Summary:
Created new class to exclude base upgrade tasks from unit tests

The general idea is to distinguish checking of basic subtasks (like `WaitForServer` etc) from logic that is specific to particular upgrade.
In this approach each upgrade test will check only actual upgrade subtasks (like `InstanceActions`) and nodes which these actions are applied to.

And there should also be test, that tests basic sequence (but currently is missing in this diff!)

The problem with approach in master - all upgrade tests check both things, and we have to modify all of tests if we alter basic logic.

Test Plan: sbt test

Reviewers: nsingh, sanketh

Reviewed By: nsingh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D33032
…rver proxy if not using a distributed transaction

Summary:
Before this revision, every RollbackToSubTransaction operation in PG would lead to a corresponding RPC call to the local tserver. The local tserver used to
return early in case there was no distributed transaction.

This revision adds the logic in the PG layer (pg_session) to skip sending the RPC if the transaction is read-only or a fast-path transaction i.e., has NON_TRANSACTIONAL isolation level. Note that we were already doing that for transaction commit/aborts but weren't skipping the RPC for rollback of sub-transaction.

This change was proposed as part of the implementation of the PG compatible logical replication support. While streaming the changes to the Walsender, it starts and aborts transactions for every transaction that gets streamed. This is required for reading PG catalog tables. As a result, we were seeing a lot of unnecessary RPC calls to the local tserver.
Jira: DB-10402

Test Plan: All tests

Reviewers: asrinivasan, pjain

Reviewed By: pjain

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34162
Summary: This diff enables the OS Patching runtime flag by default.

Test Plan: iTest pipeline

Reviewers: amalyshev, nbhatia, #yba-api-review!

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34184
…record if needed

Summary:
When there's a `GetChanges` request (req_1) and service layer receives a `CacheMissError`, it refetches the enum labels and executes a new internal `GetChanges` request (req_2) for a fresh `GetChangesResponse`.

Now suppose this is the first `GetChanges` request from the connector where it still hasn't received the DDL record, after the service clears the response, it looks at the `cached_schema_details` object while making `req_2` to decide whether or not to publish the DDL record. But since we have already populated the `cached_schema_details` while processing `req_1`, it will mean that we do not populate the DDL record and thus the client will not receive the DDL record in `GetChangesResponse` causing it to fail while decoding further change events.

**Solution:**

This diff implements a simple solution by clearing the `cached_schema_details` while executing `req_2` if the connector/client has indicated that it needs the schema i.e. if `req->need_schema_info() == true`.
Jira: DB-9701

Test Plan:
```
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestPopulationOfDDLRecordUponCacheMiss
```

Reviewers: skumar, asrinivasan, stiwary

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D34107
Summary:
This change addresses a bug introduced in diff D32566 that caused some tablet metrics to have the wrong metric_type. It also fixes a pre-existing issue with how metric attributes are stored.

**Root Cause:**
D32566 started storing table metrics into the aggregation map for grouping purposes. (Previously, table metrics were flushed directly if they were at the table level as no aggregation needed). Because of this, It also started saving the attributes of these table metrics in an attributes map with their entity_id, which is table_id, as the key.

However, there were two problems:
- Table level Attribute Collision: Both table and tablet metrics used the table_id as the key for storing their attributes, leading to collisions and incorrect attributes when aggregating them at table level.
- Potential Pre-existing Stream level Attribute Collision: Even before D32566, using entity_id as the key wasn't ideal because some metrics like XClusterMetric and CdcsdkMetric have different attribute structures despite having the same entity_id (stream_id in this case).

**Fix:**
This change addresses both issues by storing metric attributes with a composite key consisting of:
- metric_type: Identifies the specific type of metric (e.g., XClusterMetric, CdcsdkMetric).
- entity_id: Identifies the entity the metric belongs to (e.g., table_id, stream_id).
This approach ensures unique keys for storing metric attributes and avoids collisions based solely on entity_id.
Jira: DB-10501

Test Plan:
Jenkins: urgent

To verify the fix addresses both issues, a `DCHECK` was added in `PrometheusWriter::AddAggregatedEntry`. This check compares the stored attribute map with the incoming attribute map. If there's a mismatch, it indicates a collision. This DCHECK effectively covers both scenarios:
- Table level Attribute Collision: Detected by `PrometheusMetricFilterTest.TestV1Default`
- Potential Pre-existing Stream level Attribute Collision: Detected by `MetricsTest.VerifyHelpAndTypeTags`

Reviewers: mlillibridge, rthallam

Reviewed By: mlillibridge

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D33396
* partition by region node settings

* screen shot

* update CLI

* update CLI

* minor edit

* review comments

* cli help edits

* update screenshots
* remove redis/yedis references from docs

* remove old realworld apps

* Apply suggestions from code review

Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com>

* fix broken links

* fix external links

* fix link

---------

Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com>
Co-authored-by: Dwight Hodge <ghodge@yugabyte.com>
…tional replication setup

Summary:
This commit modifies the behavior when a user adds a YSQL table to a bidirectional replication. With these changes, the bootstrapping process is always skipped when adding a table to a bidirectional replication, regardless of whether it is required or not.

The detection of bidirectional replication operates at the database granularity. This means that when adding a table to a replication, the replication is considered bidirectional if any sibling table (i.e., other tables within the same database as the table being added) is already part of a bidirectional replication.

Note: because the bootstrapping is skipped, it will be the responsibility of the user to ensure the existing data are copied over.

Test Plan:
- Made sure the user is able to add tables to a bidirectional replication, no matter it requires bootstrapping or not.
- Made sure for unidirectional replication, it does bootstrapping if required (previous behavior).

Reviewers: #yba-api-review, cwang, jmak, sanketh

Reviewed By: #yba-api-review, sanketh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34228
…smatch

Summary:
With changes to send backup code sending backups to followers of lower YBA version is now an error, which meant we weren't syncing the config. As part of config sync we generate version mismatch events. If we try to sync the config to all instances we will both update the config correctly so standby shows correct state even when it is a lower version and the active correctly fires alert to upgrade standby.

moving backup update code when we send backup

simplifying code

Test Plan: Setup HA, upgrade standby, then promote. Ensure that alert fires and config looks correct on standby.

Reviewers: dshubin, sanketh

Reviewed By: sanketh

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34125
…elCache failure

Summary:
`CREATE PUBLICATION FOR ALL TABLES` invalidates entire relcache (via CacheInvalidateRelcacheAll) and hence is global impact DDL. This test fails as a result.

This diff fixes the test failure by checking the PG backend version via `SELECT version()` and adjust the test result depending on whether it is PG11 or PG15.
Jira: DB-10952

Test Plan:
In both master (PG11) and PG15 branches, apply the diff patch and run:

./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.InvalidateWholeRelCache

Reviewers: aagrawal

Reviewed By: aagrawal

Subscribers: jason, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34262
…@yugabyte-ui-common-component library

Summary:
Handle ASH as special case as it has OUTLIER style buttons in case of OVERALL mode
Ensure graph API call is made when WAIT EVENT or WAIT EVENT CLASS or WAIT EVENT COMPONENT is selected

Test Plan: Tested locally via TS Web UI

Reviewers: amalyshev, cdavid

Reviewed By: cdavid

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34243
Summary:
We have timezone related bug in ASH retrieval code, which breaks retrieval.
Also, old code could skip some sample events.
It's fixed  now.
Also it makes universe details retrieval sync with universe metadata update.

Test Plan: Unit tested + tested ASH retrieval manually

Reviewers: rmadhavan, cdavid

Reviewed By: rmadhavan, cdavid

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34236
…tIndex

Summary:
Use std::upper_bound instead of std::lower_bound, which allows to find
the answer in one statement.

Move sanity check forward, so error messages are more accurate
Jira: DB-10789

Test Plan: ./yb_build.sh --cxx-test client_client-test

Reviewers: arybochkin, dmitry, mlillibridge, timur

Reviewed By: arybochkin

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33935
…eys when yb.cloud.enabled is true

Summary: This is a workaround to let itests pass. The way we are reading YBM enabled is messy. Another pass to make it uniform can be done that may require validating all the paths work.

Test Plan: Itest should pass.

Reviewers: cwang, yshchetinin, sanketh, kvikraman

Reviewed By: yshchetinin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34257
… log anchor session

Summary:
When RBS is done from a non-leader peer, the rbs source creates a session id of the form `<requestor_uuid>-<tablet_id>-<MonoTime::Now()>`. It sends back the same identifier to the destination node, and the rbs destination node uses this for the subsequent calls. The same is used for propagating the log anchor information to the leader peer, i.e we use this session id in `RegisterLogAnchorRequestPB`.

While creating a session for anchoring the log on the leader, the logic was similar to the below
```
auto tablet_peer_result = tablet_peer_lookup_->GetServingTablet(req->tablet_id());
...
auto it = log_anchors_map_.find(req->owner_info());
if (it == log_anchors_map_.end()) {
  ...
} else {
  tablet_peer.reset(it->second->tablet_peer_.get());         // <- this line creates a problem
}
```
When re-using the session, `tablet_peer.reset` takes ownership of the underlying manager `TabletPeer` object and fails to consider the existing shared_ptrs. So once it goes out of the scope (function `RemoteBootstrapServiceImpl::RegisterLogAnchor`), `~TabletPeer()` is called on the underlying object, which leads to the below fatal.
```
../../src/yb/tablet/maintenance_manager.cc:101] Check failed: !manager_.get() You must unregister the LogGCOp(7b72199ed9714f649d10be762212950d) Op before destroying it.
```

This diff addresses the issue by using `=` operator which rightly tracking the existing shared_pts as well, and doesn't destruct the underlying object once `tablet_peer` goes out of scope.

Note: To repro this in a test, `RemoteBootstrapServiceImpl::RegisterLogAnchor` should be called with the same `owner_info` set in `RegisterLogAnchorRequestPB`. But that is only possible when the rbs source re-uses its rbs session, whose session id is computed with a suffix of `MonoTime::Now()`. So wasn't able to simulate the above crash in a test, but this was observed in the logs reported by in the community forum.
Jira: DB-10926

Test Plan: Jenkins

Reviewers: amitanand

Reviewed By: amitanand

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34205
Summary:
Show release date as empty string in release details
Release date in general for customers will never be empty, it will always have a date, but in case of internal usage, release date for most of the dev builds will be empty

Test Plan:
Please refer to the screenshot
{F171445}

Reviewers: jmak, dshubin

Reviewed By: dshubin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34291
…eUniverseReplicationRequestPB

Summary:
For xCluster DR we want to be able to failover very quickly. DeleteUniverseReplication cleans up the streams on the source, which will timeout since the source is unavailable during failover.
When `skip_producer_stream_deletion` is set on `DeleteUniverseReplicationRequestPB` we will skip the cleanup process.

**Upgrade/Rollback safety:**
New field is optional and false by default, so safe for upgrade and rollbacks.

Fixes yugabyte#22050
Jira: DB-10965

Test Plan: XClusterTest.DeleteWithoutStreamCleanup

Reviewers: slingam, jhe, xCluster

Reviewed By: slingam

Subscribers: ybase, xCluster

Differential Revision: https://phorge.dev.yugabyte.com/D34286
Summary:
All connections that `postgres_fdw` establishes to foreign servers are kept open in the local session for re-use.
With option `use_remote_estimate true` specified during a foreign table's creation, when PG estimates the cost of the foreign table, it executes a SQL statement remotely using the existing open connection to the foreign server where the foreign table resides.
With changes made in commit 9a27aff, open PG connections need to refresh catalog cache because ANALYZE increments catalog version.
Thus, the plan in test `TestPgRegressContribPostgresFdw` changed based on cost because open connections use up-to-date statistics instead of stable statistics.
Jira: DB-10738

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressContribPostgresFdw'

Reviewers: tverona, myang

Reviewed By: myang

Subscribers: jason, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34071
…d tservers do not have any tablets assigned to them

Summary: Verify that tablet count is zero on the blacklisted nodes after wait for data move. Also, made a fix to host port comparison for YBM dual NIC.

Test Plan:
1. Create a universe.
2. Run full move (2 times for previously blacklisted nodes not in the universe).
3. Verified the log messgaes.

```
2024-04-15T20:51:17.743Z  [debug] 3525d0d2-3c25-4cb7-b717-35f2f2dbeb6a UniverseTaskBase.java:2207 [TaskPool-EditUniverse(f6314ef5-7324-4671-a335-eea42bc4f758)-3] com.yugabyte.yw.commissioner.tasks.UniverseTaskBase Making url request to endpoint: http://10.9.120.245:7000/dump-entities
2024-04-15T20:51:18.891Z  [info]  AsyncYBClient.java:2758 [yb-nio-1] org.yb.client.AsyncYBClient Discovered tablet YB Master for table YB Master with partition ["", "")
2024-04-15T20:51:18.940Z  [debug] 3525d0d2-3c25-4cb7-b717-35f2f2dbeb6a UniverseTaskBase.java:2271 [TaskPool-EditUniverse(f6314ef5-7324-4671-a335-eea42bc4f758)-3] com.yugabyte.yw.commissioner.tasks.UniverseTaskBase Number of tablets on tserver yb-admin-nsingh-test-universe1-n1 is 0 tablets
2024-04-15T20:51:18.940Z  [debug] 3525d0d2-3c25-4cb7-b717-35f2f2dbeb6a UniverseTaskBase.java:2271 [TaskPool-EditUniverse(f6314ef5-7324-4671-a335-eea42bc4f758)-3] com.yugabyte.yw.commissioner.tasks.UniverseTaskBase Number of tablets on tserver yb-admin-nsingh-test-universe1-n2 is 0 tablets
2024-04-15T20:51:18.940Z  [debug] 3525d0d2-3c25-4cb7-b717-35f2f2dbeb6a UniverseTaskBase.java:2271 [TaskPool-EditUniverse(f6314ef5-7324-4671-a335-eea42bc4f758)-3] com.yugabyte.yw.commissioner.tasks.UniverseTaskBase Number of tablets on tserver yb-admin-nsingh-test-universe1-n3 is 0 tablets
2024-04-15T20:51:18.940Z  [debug] 3525d0d2-3c25-4cb7-b717-35f2f2dbeb6a UniverseTaskBase.java:2271 [TaskPool-EditUniverse(f6314ef5-7324-4671-a335-eea42bc4f758)-3] com.yugabyte.yw.commissioner.tasks.UniverseTaskBase Number of tablets on tserver yb-admin-nsingh-test-universe1-n4 is 0 tablets

```

Also tested with on-prem.
1. Create an onprem universe.
2. Run full move. The old nodes are DEAD but not blacklisted.
3. Run ybadmin command to blacklist the old nodes and make sure from the master leader UI that the DEAD node is blacklisted.
4. Run full move again. It completed successfully.

Reviewers: cwang, sanketh, yshchetinin

Reviewed By: cwang, yshchetinin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34140
…view Flag

Summary:
Convert to enable_xcluster_api_v2 to a Preview Flag
Jira: DB-10928

Test Plan: Jenkins

Reviewers: slingam, xCluster

Reviewed By: slingam

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34295
…y test with Connection Manager enabled

Summary:
In the test `org.yb.pgsql.TestYbPgStatActivity.testMemUsageOfQueryFromPgStatActivity`, we check the RSS memory consumed in a session before and after doing certain operations, with the help of a second connection.
With YSQL Connection Manager enabled, both the connections would use the same physical connection, defeating the purpose of running this test. This patch ensures to skip this test whenever Connection Manager is enabled at the time of running the test.
Jira: DB-10907

Test Plan:
Ensure below test is skip when executed:
```./yb_build.sh --enable-ysql-conn-mgr-test --java-test org.yb.pgsql.TestYbPgStatActivity#testMemUsageOfQueryFromPgStatActivity```

Reviewers: rbarigidad

Reviewed By: rbarigidad

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34096
vipul-yb and others added 12 commits April 30, 2024 04:34
…r/dr

Summary:
Added support to remove dropped index_table/table from xclyuster/dr.
As part of this change, we will avoid fetching index tables for already dropped tables, and customers will have to remove tables and index tables separately.
On the newer DB version, it does not require any additional ignore error flags, but since the old DB version requires it, we will pass the ignore error flag while removing dropped tableIDs from replication for both the old and newer versions.

Test Plan:
 - Create universe and setup xcluster/dr having tables and indexes
 - drop tables and indexes from source
 - remove tables and indexes from xcluster/dr
 - verify that tables does not exists anymore in the xcluster_config available in master UI and YBA DB.
 - verified on old YBDB version, where DB errors out while altering replication, and on newer DB version which ignore the errors without any flag.

Additional test case:

 - Removed tables form 2 database where removed 1 table from DB_1 which is not dropped and removed 2 table where one is dropped nad another is not from DB_2 in a single edit replication request.

Reviewers: hzare, cwang, sanketh, #yba-api-review

Reviewed By: hzare, sanketh, #yba-api-review

Subscribers: jmak, hsunder, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34407
Summary: Increase the limit for az names from 25 to 100 characters.

Test Plan:
Verified the migration works fine.  Monitor UTs and Itests for any failures.
The ticket mentions ap-southeast-1-sggov-sin-1a zone  but since this is private region we won't have
access to it for testing.

Reviewers: #yba-api-review, sneelakantan

Reviewed By: #yba-api-review, sneelakantan

Subscribers: sanketh, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34465
…er attributes as well

Summary:
Currently during our YBA ↔︎ YBDB LDAP sync, we assume that the user name we want to sync will be present on the DN. While that will be true for most of the scenarios, but we can have customers wanting to sync the user present on a different attribute for example - `sAMAccountName`

This diff performs the sync based on the attribute the user specified in the payload. If the specified `ldapUserfield` is not present in the `DN`, the user name will be retrieved from this attribute on the LDAP server. If the attribute is also not found, the user is simply skipped from the sync.

Test Plan:
Manual testing
  - Triggered the sync with the ldapUserfield and observed the sync where the user name is retrieved from the dn
  - Triggered the sync with the ldapUserfield [not present on the DN] and synced only the users that have this attribute set on the LDAP server

Reviewers: #yba-api-review!, svarshney

Reviewed By: svarshney

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34518
…l Flag value

Summary:
In this diff, support has been added to allow changing the value of publication refresh interval via the flag `cdcsdk_publication_list_refresh_interval_secs`. Inorder to protect LSN determinism, the values of publication refresh times will be persisted until a suitable acknowledgement is received. For this purpose a new key has been added to the data map in cdc_state called `pub_refresh_times`. It will contain coma separated values of the publication refresh times which have been popped from the priority queue but not yet acknowledged.

In the GetConsistentChanges, whenever the tablet queue for publication_refresh_records becomes empty, the new entry added to the tablet queue will also be added to `pub_refresh_times` list  and persisted in the state table. This will ensure that before shipping any LSN with commit time greater than the last_pub_refresh_time, we persist the next pub_refresh_time. When an acknowledgement reaches virtual WAL, in `UpdateAndPersistLSN()`, the `pub_refresh_times` list will be trimmed, so that it contains only those values which are strictly greater than the acknowledged publication refresh time. The field `last_pub_refresh_time` in the state table slot entry will hold the latest acknowledged publication refresh time.

In this diff the precision of the flag `cdcsdk_publication_list_refresh_interval` has also been changed from microseconds to seconds. This change is done to improve the usability of the flag. For the test purposes the flags `TEST_cdcsdk_use_microseconds_refresh_interval` and `TEST_cdcsdk_publication_list_refresh_interval_micros` can be used to set the refresh interval in microseconds.
Jira: DB-10688

Test Plan:
Jenkins: urgent
Jenkins: test regex: .*CDCSDKConsumptionConsistentChangesTest.*
./yb_build.sh --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test --gtest_filter CDCSDKConsumptionConsistentChangesTest.TestChangingPublicationRefreshInterval
./yb_build.sh --cxx-test integration-tests_cdcsdk_consumption_consistent_changes-test --gtest_filter CDCSDKConsumptionConsistentChangesTest.TestLSNDeterminismWithChangingPubRefreshInterval

Reviewers: skumar, asrinivasan, stiwary, siddharth.shah

Reviewed By: asrinivasan

Subscribers: ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34460
… operations

Summary:
This diff sets a randomly generated root_request_id for background operations
Jira: DB-10887

Test Plan: Jenkins

Reviewers: amitanand

Reviewed By: amitanand

Subscribers: hbhanawat, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34627
Summary:
Adds the validation for ssh key validation in case of onprem
providers.
Also, changes the validation error format validation to be in sync with other
providers.

Given the fact, that these validations are not consumed as of now by any client, we default to false https://github.com/yugabyte/yugabyte-db/blob/master/managed/src/main/resources/v1.routes#L61, should be safe to change the format.

Test Plan:
Manually tried creating onprem provider. Verified that the errors
are thrown as expected.

`
{
    "success": false,
    "error": {
        "$.regions[0].zones[0].name": [
            "Cannot contain any special characters except '-' and '_'."
        ],
        "$.allAccessKeys[0].keyInfo.sshPrivateKeyContent": [
            "Not a valid RSA key!"
        ],
        "errorSource": [
            "providerValidation"
        ],
        "$.regions[0].zones[1].code": [
            "Cannot contain any special characters except '-' and '_'."
        ],
        "$.regions[0].zones[1].name": [
            "Cannot contain any special characters except '-' and '_'."
        ]
    },
`

Reviewers: asharma, amalyshev, #yba-api-review, sneelakantan

Reviewed By: asharma, #yba-api-review, sneelakantan

Subscribers: dkumar, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D34593
Summary:
Stateful service client creates its own messenger, reactor and yb client. This will use up 4 threads, and its own meta cache.
Instead we should reuse the server processes yb_client.

Fixes yugabyte#22102
Jira: DB-11035

Test Plan: Jenkins

Reviewers: pjain

Reviewed By: pjain

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D34360
…-examples application (yugabyte#21900)

* Upgrade the gorm docs to use latest go version and gorm v2

* Mentioned the use of smart drivers

* Harsh daryani896 patch 1 (#2)

* Update pgx driver version from v4 to v5 in docs.

* review comments and copied to preview

---------

Co-authored-by: Harsh Daryani <82017686+HarshDaryani896@users.noreply.github.com>
Co-authored-by: aishwarya24 <ashchakravarthy@gmail.com>
* added r2dbc smart driver

* added supported versions

* added multiple hosts

* fixed typo

* added table defaults

* added table defaults

* edited the table parameters and URL

* more updates from review

* missed edit

* changed name to maintain consistency
Summary:
This diff enables DDL atomicity feature by default.
(1) changing the default value of several gflags from false to true.
--ysql_yb_ddl_rollback_enabled
--report_ysql_ddl_txn_status_to_master
--ysql_ddl_transaction_wait_for_ddl_verification

(2) code cleanup related to (1), for example, some unit tests needed to
explicitly enable one or more of these 3 gflags. Now that they are turned on by
default, those code are removed.

(3) other unit tests update related to (1). For example, in pg_packed_row-test.cc,
the test output has changed from `PACKED_ROW[2]` to `PACKED_ROW[3]`. The
number in the bracket represents the table schema version. With DDL atomicity, a DDL
such as `ALTER TABLE test DROP COLUMN v2` causes the schema version of table
test to bump by 2 after the DDL commits. Without DDL atomicity, the schema
version of table test used to only bump up by 1 after the DDL commits.
Jira: DB-11028

Test Plan: Jenkins run

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ycdcxcluster, hsunder

Differential Revision: https://phorge.dev.yugabyte.com/D30471
…ber of CPU used

Summary: Fix an issue where the number of used/available cores is not calculated correctly in the Sankey diagram for CPU usage.

Test Plan: no test plan

Reviewers: nikhil

Reviewed By: nikhil

Subscribers: yugabyted-dev, djiang

Differential Revision: https://phorge.dev.yugabyte.com/D34575
@ddhodge ddhodge self-assigned this Apr 30, 2024
@ddhodge ddhodge added the area/documentation Documentation needed label Apr 30, 2024
@ddhodge ddhodge added this to In progress in Documentation via automation Apr 30, 2024
Copy link

netlify bot commented Apr 30, 2024

Deploy Preview for infallible-bardeen-164bc9 ready!

Name Link
🔨 Latest commit d81006c
🔍 Latest deploy log https://app.netlify.com/sites/infallible-bardeen-164bc9/deploys/663472edb105d000089fc1e3
😎 Deploy Preview https://deploy-preview-22209--infallible-bardeen-164bc9.netlify.app/preview/yugabyte-platform/anywhere-automation/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ddhodge ddhodge changed the title [doc][yba] YBA CLI landing page [doc][yba] 2024.1 YBA CLI landing page Apr 30, 2024
Copy link
Contributor

@subramanian-neelakantan subramanian-neelakantan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Dwight! A few minor suggestions and comments to remove unsupported material.

Copy link
Collaborator

@aishwarya24 aishwarya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@@ -19,6 +19,7 @@ Use the following automation tools to manage your YugabyteDB Anywhere installati
| :--------- | :---------- |
| [REST API](anywhere-api/) | Deploy and manage database universes using a REST API. |
| [Terraform provider](anywhere-terraform/) | Provider for automating YugabyteDB Anywhere resources that are accessible via the API. |
| [CLI](anywhere-cli/) | Manage YugabyteDB Anywhere resources from the command line. {{<badge/tp>}} |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tile below for automation.

Copy link
Contributor

@subramanian-neelakantan subramanian-neelakantan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Documentation needed
Projects
Documentation
In progress
Development

Successfully merging this pull request may close these issues.

None yet