New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28205: Implement direct sql for get_partitions_ps_with_auth api #5206
base: master
Are you sure you want to change the base?
Conversation
@@ -409,10 +409,13 @@ public void testListPartitionsByValues() throws Exception { | |||
assertEquals(0, partitions.size()); | |||
} | |||
|
|||
@Test(expected = MetaException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty part_vals should not throw MetaException
because we support it in ObjectStore
and regard it as fetching all partitions:
Lines 3890 to 3893 in 2d855b2
private boolean canTryDirectSQL(List<String> partVals) { | |
if (partVals == null || partVals.isEmpty()) { | |
return true; | |
} |
...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Show resolved
Hide resolved
return getNumPartitionsViaSql(queryText, params); | ||
} | ||
|
||
private int getNumPartitionsViaSql(String queryText, Object[] params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename this method to getCountOfQuery
and move it to MetastoreDirectSqlUtils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
...ore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
Outdated
Show resolved
Hide resolved
@@ -121,8 +122,8 @@ List<Partition> addPartitions(List<Partition> partitions, boolean ifNotExists) | |||
* {@link MetaStoreUtils#getPvals(List, Map)} | |||
*/ | |||
List<Partition> getPartitionsByPartitionVals(List<String> partialPartVals) throws MetaException { | |||
if (partialPartVals == null || partialPartVals.isEmpty()) { | |||
throw new MetaException("Partition partial vals cannot be null or empty"); | |||
if (MetaStoreUtils.arePartValsEmpty(partialPartVals)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null check in SessionHiveMetaStoreClient
, here we can follow the server side behavior to support empty values.
.../metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/Util.java
Outdated
Show resolved
Hide resolved
@@ -584,15 +584,30 @@ static List<Partition> createManyPartitions(@NotNull Table table, | |||
@Nullable Map<String, String> parameters, | |||
@NotNull List<String> arguments, | |||
int npartitions) { | |||
return IntStream.range(0, npartitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why split this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
...metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
Outdated
Show resolved
Hide resolved
...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Outdated
Show resolved
Hide resolved
@@ -22,6 +22,7 @@ | |||
import org.apache.hadoop.hive.metastore.api.MetaException; | |||
import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; | |||
import org.apache.hadoop.hive.metastore.api.Partition; | |||
import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we revert the import as well?
...metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
Outdated
Show resolved
Hide resolved
Quality Gate passedIssues Measures |
@@ -1201,7 +1201,7 @@ public List<String> listPartitionNames(String catName, String dbName, String tbl | |||
public List<String> listPartitionNames(String catName, String dbName, String tblName, | |||
List<String> partVals, int maxParts) throws TException { | |||
org.apache.hadoop.hive.metastore.api.Table table = getTempTable(dbName, tblName); | |||
if (table == null) { | |||
if (table == null || partVals == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the table is a temporary table while the partVals
is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would throw MetaException like HiveMetaStoreClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not revert back the check on PartitionTree.java, it's a bit confusing a temporary table just performs the null check in HiveMetaStoreClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because SessionHiveMetaStoreClient
extends HiveMetaStoreClient
, it should also check null for part_vals like HiveMetaStoreClient
, and PartitionTree
should support empty or even null part_vals like metastore server in my opinion.
Otherwise it would cause the inconsistent test results in TestListPartitions
for HiveMetaStoreClient
and SessionHiveMetaStoreClient
, for example testListPartitionsByValuesNoVals()
. The master branch mark this test expect MetaException
, but actually it should return all partitions for empty part_vals. It could pass the test in master because it throws MetaException
when making part name for empty partVals in getNumPartitionsByPs
, which should be a bug in implement this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because SessionHiveMetaStoreClient extends HiveMetaStoreClient, it should also check null for part_vals like HiveMetaStoreClient
I think you cannot just push the null check of a temporary table in HiveMetaStoreClient
, and the PartitionTree
is for the temporary table to process the partition request, it works at the client side, why not put the check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PartitionTree is for the temporary table to process the partition request, it works at the client side, why not put the check here?
Because get_partitions_ps_with_auth
support null and empty part_vals, the null check is added in client side. So in PartitionTree
we can also follow the server behavior and push null check in SessionHiveMetaStoreClient
. This change seems to have no side effects, and in turn supports empty part_vals like HiveMetaStoreClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this idea: null check for temporary table in HiveMetaStoreClient, let check others' opinion. cc @deniskuzZ @zhangbutao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to understand you folk's talk:
@wecharyu thinks that some hms partition api support null
& empty
, such as get_partitions_ps_with_auth
, but some hms partition api don't support null
& empty
. So, for temp table, check null
& empry
and always throw exception in PartitionTree
in not inappropriate.
I think in this case, @wecharyu you need to make sure all the api's behaviors which invoke the PartitionTree ::getPartitionsByPartitionVals
are unaffected. For example, TempTable::getPartitionsByPartitionVals
is invoked in four api, but i see that two of them are not changed after you change PartitionTree ::getPartitionsByPartitionVals
, is that safe?
@dengzhhu653 I am not very familair with temp table's ability. PartitionTree
is only used for temp table, if temp table has same ability with common table, especially partition ability, and meanwhile some hms api like get_partitions_ps_with_auth
can allow null
& empty
, i think we can check in HiveMetaStoreClient
side, and it is good for me to unify the behavior of common table and temp table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but i see that two of them are not changed
One is that SessionHiveMetaStoreClient#istPartitionNamesRequest
does not check null for part_vals, I think it's acceptable because HiveMetaStoreClient#listPartitionNamesRequest
also not check it.
The other is SessionHiveMetaStoreClient#exchangePartitions
where part_vals can not be null.
I can not find out the side effect on enable null or empty part_vals in these calls if we do regard null or empty part_vals is used to fetch all parts or part names.
public static boolean arePartValsEmpty(List<String> partVals) { | ||
if (partVals == null || partVals.isEmpty()) { | ||
return true; | ||
} | ||
for (String val : partVals) { | ||
if (val != null && !val.isEmpty()) { | ||
return false; | ||
} | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static boolean arePartValsEmpty(List<String> partVals) { | |
if (partVals == null || partVals.isEmpty()) { | |
return true; | |
} | |
for (String val : partVals) { | |
if (val != null && !val.isEmpty()) { | |
return false; | |
} | |
} | |
return true; | |
} | |
public static boolean arePartValsEmpty(List<String> partVals) { | |
return partVals == null || partVals.stream().allMatch(val -> val == null || val.isEmpty()); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, will update together with the final discussion result of part_val null check.
return true; | ||
} | ||
for (String val : partVals) { | ||
if (val != null && !val.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we did this check?
In which case, the val will be equal the null
or empty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's existing code, the val is determined in client side.
@@ -1201,7 +1201,7 @@ public List<String> listPartitionNames(String catName, String dbName, String tbl | |||
public List<String> listPartitionNames(String catName, String dbName, String tblName, | |||
List<String> partVals, int maxParts) throws TException { | |||
org.apache.hadoop.hive.metastore.api.Table table = getTempTable(dbName, tblName); | |||
if (table == null) { | |||
if (table == null || partVals == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to understand you folk's talk:
@wecharyu thinks that some hms partition api support null
& empty
, such as get_partitions_ps_with_auth
, but some hms partition api don't support null
& empty
. So, for temp table, check null
& empry
and always throw exception in PartitionTree
in not inappropriate.
I think in this case, @wecharyu you need to make sure all the api's behaviors which invoke the PartitionTree ::getPartitionsByPartitionVals
are unaffected. For example, TempTable::getPartitionsByPartitionVals
is invoked in four api, but i see that two of them are not changed after you change PartitionTree ::getPartitionsByPartitionVals
, is that safe?
@dengzhhu653 I am not very familair with temp table's ability. PartitionTree
is only used for temp table, if temp table has same ability with common table, especially partition ability, and meanwhile some hms api like get_partitions_ps_with_auth
can allow null
& empty
, i think we can check in HiveMetaStoreClient
side, and it is good for me to unify the behavior of common table and temp table.
What changes were proposed in this pull request?
Implement direct sql for
getNumPartitionsByPs
andlistPartitionsPsWithAuth
.Why are the changes needed?
To improve performance.
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?
Passing existing tests and add benchmark test
benchmarkGetPartitionsByPs
:java -jar hmsbench.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -N 1000 -o bench_results.csv -C -d testbench_http --params=100 -M 'getPartitionsByPs.*'