Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28205: Implement direct sql for get_partitions_ps_with_auth api #5206

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

wecharyu
Copy link
Contributor

What changes were proposed in this pull request?

Implement direct sql for getNumPartitionsByPs and listPartitionsPsWithAuth.

Why are the changes needed?

To improve performance.

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Passing existing tests and add benchmark test benchmarkGetPartitionsByPs:

java -jar hmsbench.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -N 1000 -o bench_results.csv -C -d testbench_http --params=100 -M 'getPartitionsByPs.*'
  • Before this patch
Operation       Mean    Med     Min     Max     Err%
getPartitionsByPs       14.0807 13.9823 12.8813 19.2509 5.63364
getPartitionsByPs.100   276.291 275.259 267.642 301.320 2.33622
getPartitionsByPs.1000  2631.50 2631.01 2551.76 2991.55 1.86420
  • After this patch
Operation       Mean    Med     Min     Max     Err%
getPartitionsByPs       11.6247 11.2780 10.7851 23.8686 12.7184
getPartitionsByPs.100   18.3550 18.0011 17.1338 36.5164 10.8893
getPartitionsByPs.1000  61.5760 60.6764 59.3697 78.2133 5.14500

@@ -409,10 +409,13 @@ public void testListPartitionsByValues() throws Exception {
assertEquals(0, partitions.size());
}

@Test(expected = MetaException.class)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty part_vals should not throw MetaException because we support it in ObjectStore and regard it as fetching all partitions:

private boolean canTryDirectSQL(List<String> partVals) {
if (partVals == null || partVals.isEmpty()) {
return true;
}

return getNumPartitionsViaSql(queryText, params);
}

private int getNumPartitionsViaSql(String queryText, Object[] params) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this method to getCountOfQuery and move it to MetastoreDirectSqlUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -121,8 +122,8 @@ List<Partition> addPartitions(List<Partition> partitions, boolean ifNotExists)
* {@link MetaStoreUtils#getPvals(List, Map)}
*/
List<Partition> getPartitionsByPartitionVals(List<String> partialPartVals) throws MetaException {
if (partialPartVals == null || partialPartVals.isEmpty()) {
throw new MetaException("Partition partial vals cannot be null or empty");
if (MetaStoreUtils.arePartValsEmpty(partialPartVals)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add null check in SessionHiveMetaStoreClient, here we can follow the server side behavior to support empty values.

@@ -584,15 +584,30 @@ static List<Partition> createManyPartitions(@NotNull Table table,
@Nullable Map<String, String> parameters,
@NotNull List<String> arguments,
int npartitions) {
return IntStream.range(0, npartitions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why split this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@@ -22,6 +22,7 @@
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
import org.apache.hadoop.hive.metastore.api.Partition;
import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we revert the import as well?

Copy link

sonarcloud bot commented May 16, 2024

Quality Gate Passed Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@@ -1201,7 +1201,7 @@ public List<String> listPartitionNames(String catName, String dbName, String tbl
public List<String> listPartitionNames(String catName, String dbName, String tblName,
List<String> partVals, int maxParts) throws TException {
org.apache.hadoop.hive.metastore.api.Table table = getTempTable(dbName, tblName);
if (table == null) {
if (table == null || partVals == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the table is a temporary table while the partVals is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would throw MetaException like HiveMetaStoreClient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not revert back the check on PartitionTree.java, it's a bit confusing a temporary table just performs the null check in HiveMetaStoreClient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SessionHiveMetaStoreClient extends HiveMetaStoreClient, it should also check null for part_vals like HiveMetaStoreClient, and PartitionTree should support empty or even null part_vals like metastore server in my opinion.

Otherwise it would cause the inconsistent test results in TestListPartitions for HiveMetaStoreClient and SessionHiveMetaStoreClient, for example testListPartitionsByValuesNoVals(). The master branch mark this test expect MetaException, but actually it should return all partitions for empty part_vals. It could pass the test in master because it throws MetaException when making part name for empty partVals in getNumPartitionsByPs, which should be a bug in implement this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SessionHiveMetaStoreClient extends HiveMetaStoreClient, it should also check null for part_vals like HiveMetaStoreClient

I think you cannot just push the null check of a temporary table in HiveMetaStoreClient, and the PartitionTree is for the temporary table to process the partition request, it works at the client side, why not put the check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PartitionTree is for the temporary table to process the partition request, it works at the client side, why not put the check here?

Because get_partitions_ps_with_auth support null and empty part_vals, the null check is added in client side. So in PartitionTree we can also follow the server behavior and push null check in SessionHiveMetaStoreClient. This change seems to have no side effects, and in turn supports empty part_vals like HiveMetaStoreClient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this idea: null check for temporary table in HiveMetaStoreClient, let check others' opinion. cc @deniskuzZ @zhangbutao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to understand you folk's talk:

@wecharyu thinks that some hms partition api support null & empty , such as get_partitions_ps_with_auth, but some hms partition api don't support null & empty. So, for temp table, check null & empry and always throw exception in PartitionTree in not inappropriate.
I think in this case, @wecharyu you need to make sure all the api's behaviors which invoke the PartitionTree ::getPartitionsByPartitionVals are unaffected. For example, TempTable::getPartitionsByPartitionVals is invoked in four api, but i see that two of them are not changed after you change PartitionTree ::getPartitionsByPartitionVals, is that safe?

@dengzhhu653 I am not very familair with temp table's ability. PartitionTree is only used for temp table, if temp table has same ability with common table, especially partition ability, and meanwhile some hms api like get_partitions_ps_with_auth can allow null & empty , i think we can check in HiveMetaStoreClient side, and it is good for me to unify the behavior of common table and temp table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i see that two of them are not changed

One is that SessionHiveMetaStoreClient#istPartitionNamesRequest does not check null for part_vals, I think it's acceptable because HiveMetaStoreClient#listPartitionNamesRequest also not check it.

The other is SessionHiveMetaStoreClient#exchangePartitions where part_vals can not be null.

I can not find out the side effect on enable null or empty part_vals in these calls if we do regard null or empty part_vals is used to fetch all parts or part names.

Comment on lines +471 to +481
public static boolean arePartValsEmpty(List<String> partVals) {
if (partVals == null || partVals.isEmpty()) {
return true;
}
for (String val : partVals) {
if (val != null && !val.isEmpty()) {
return false;
}
}
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static boolean arePartValsEmpty(List<String> partVals) {
if (partVals == null || partVals.isEmpty()) {
return true;
}
for (String val : partVals) {
if (val != null && !val.isEmpty()) {
return false;
}
}
return true;
}
public static boolean arePartValsEmpty(List<String> partVals) {
return partVals == null || partVals.stream().allMatch(val -> val == null || val.isEmpty());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, will update together with the final discussion result of part_val null check.

return true;
}
for (String val : partVals) {
if (val != null && !val.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we did this check?

In which case, the val will be equal the null or empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's existing code, the val is determined in client side.

@@ -1201,7 +1201,7 @@ public List<String> listPartitionNames(String catName, String dbName, String tbl
public List<String> listPartitionNames(String catName, String dbName, String tblName,
List<String> partVals, int maxParts) throws TException {
org.apache.hadoop.hive.metastore.api.Table table = getTempTable(dbName, tblName);
if (table == null) {
if (table == null || partVals == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to understand you folk's talk:

@wecharyu thinks that some hms partition api support null & empty , such as get_partitions_ps_with_auth, but some hms partition api don't support null & empty. So, for temp table, check null & empry and always throw exception in PartitionTree in not inappropriate.
I think in this case, @wecharyu you need to make sure all the api's behaviors which invoke the PartitionTree ::getPartitionsByPartitionVals are unaffected. For example, TempTable::getPartitionsByPartitionVals is invoked in four api, but i see that two of them are not changed after you change PartitionTree ::getPartitionsByPartitionVals, is that safe?

@dengzhhu653 I am not very familair with temp table's ability. PartitionTree is only used for temp table, if temp table has same ability with common table, especially partition ability, and meanwhile some hms api like get_partitions_ps_with_auth can allow null & empty , i think we can check in HiveMetaStoreClient side, and it is good for me to unify the behavior of common table and temp table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants