Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41558: [C++] Improve fixed_width_test_util.h #41575

Merged
merged 14 commits into from May 17, 2024

Conversation

llama90
Copy link
Contributor

@llama90 llama90 commented May 7, 2024

Rationale for this change

Improve the fixed_width_test_util.h.

What changes are included in this PR?

  • Move the fixed_width_test_util.h to arrow/testing
  • Divide the fixed_width_test_util to .cc and .h
  • Remove unused headers

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link

github-actions bot commented May 7, 2024

⚠️ GitHub issue #41558 has been automatically assigned in GitHub to PR creator.

@llama90
Copy link
Contributor Author

llama90 commented May 7, 2024

@felipecrv Hello.

Is this what you were thinking?

I would appreciate it if you could review it when you have time. Thank you.

Comment on lines 20 to 29
#include <cstddef>
#include <cstdint>
#include <memory>
#include <vector>

#include "arrow/array/builder_primitive.h"
#include "arrow/builder.h"
#include "arrow/type.h"
#include "arrow/util/checked_cast.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant reducing the code in the header. We should keep the necessary includes.

The class should only declare the public function members here and all the private members and template code can be defined in the .cc file (to be created) outside of the class.

See how chunk_resolver.h/cc does it reducing the bloat in the header:

template <typename T>
inline std::vector<int64_t> MakeChunksOffsets(const std::vector<T>& chunks) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so I need to organize the code according to C++ conventions. I should include only the declarations in the .h file and put the implementations in the .cc file.

I will refer to the code you shared, organize it accordingly, and then upload it.

Thank you for your guidance.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 7, 2024
@kou kou changed the title GH-41558: Move and update fixed_width_test_util.h GH-41558: [C++] Move and update fixed_width_test_util.h May 7, 2024
Copy link

github-actions bot commented May 7, 2024

⚠️ GitHub issue #41558 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 12, 2024
@llama90 llama90 changed the title GH-41558: [C++] Move and update fixed_width_test_util.h GH-41558: [C++] Improve fixed_width_test_util.h May 12, 2024
Copy link

⚠️ GitHub issue #41558 has been automatically assigned in GitHub to PR creator.

1 similar comment
Copy link

⚠️ GitHub issue #41558 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/testing/fixed_width_test_util.h Show resolved Hide resolved
@@ -120,84 +170,6 @@ class NestedListGenerator {
}
return nested_builder->Finish();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have the definition of NestedListArray in the .cc as well.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 14, 2024
@llama90 llama90 force-pushed the ARROW-41558 branch 2 times, most recently from 75ed690 to d4961bc Compare May 15, 2024 11:50
@llama90
Copy link
Contributor Author

llama90 commented May 15, 2024

@felipecrv Hello, and thank you for your review.

I have applied the changes as advised, but I am a bit unsure about one aspect, particularly because I am not very familiar with template functions.

Should the VisitAllNestedListConfigurations template function also be moved to an anonymous namespace? It is directly called from the test function (vector_selection_test.cc), so I have kept it as it currently is.

Additionally, I made modifications to the Makefile to accommodate the Windows CI process. Could you please confirm if this approach is correct?

I would greatly appreciate your advice on these matters.

Thank you!

@felipecrv
Copy link
Contributor

felipecrv commented May 15, 2024

Should the VisitAllNestedListConfigurations template function also be moved to an anonymous namespace? It is directly called from the test function (vector_selection_test.cc), so I have kept it as it currently is.

Sorry, I missed that it was called from others. We can make it a non-template by using std::function<...> instead of inlining the lambda expression. This is not performance-critical code, so it's not gonna be a problem:

-  /// \tparam Visit a function type with signature
-  ///     void(const std::shared_ptr<DataType>& inner_type,
-  ///          const std::vector<int>& list_sizes)
-  template <class Visit>
   static void VisitAllNestedListConfigurations(
-      const std::vector<std::shared_ptr<DataType>>& inner_value_types, Visit&& visit,
+      const std::vector<std::shared_ptr<DataType>>& inner_value_types,
+      const std::function<void(const std::shared_ptr<DataType>& inner_type,
+                               const std::vector<int>& list_sizes)> &visit,
       int max_depth = 3, int max_power_of_2_size = 32) {

Additionally, I made modifications to the Makefile to accommodate the Windows CI process. Could you please confirm if this approach is correct?

We avoid relative paths in CMakeLists.txt because that gets very confusing very quickly. You should list this new .cc file in

set(ARROW_TESTING_SRCS
io/test_common.cc
ipc/test_common.cc
testing/gtest_util.cc
testing/random.cc
testing/generator.cc
testing/util.cc)

The trick to find where to list new .cc files is to grep for .cc files in the same directory as yours. Things could be more complicated than that, but it usually works.

@llama90
Copy link
Contributor Author

llama90 commented May 16, 2024

I think I applied everything you suggested:

  • Converted the template function to a regular function.
  • Modified to not include relative paths in the Makefile (this was a problem due to a missing macro as ARROW_TESTING_EXPORT).

Additionally, I want to express my gratitude for your patience and thorough review. It has been instrumental in completing this PR.

Thank you! cc @felipecrv

Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I want to express my gratitude for your patience and thorough review. It has been instrumental in completing this PR.

Thank you for not giving up ;) C++ is weird.

I added comments about includes, but other than that everything is looking very good.

// specific language governing permissions and limitations
// under the License.

#include "fixed_width_test_util.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter is complaining about this line because includes should be fully qualified based on the cpp/src/ root, so this should be "arrow/testing/fixed_width_test_util.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the removal of include bloat in the header, you should also add all the includes for things referred from this file:

  • "arrow/array/builder_primitive.h" for NumericBuilder and ArrowBuilder
  • "arrow/array/builder_base.h" for MakeBuilder [1]
  • <cstdint>
  • <limits>
  • <vector>
  • <functional>
  • "arrow/type.h"

[1] my header had an include to arrow/builder.h, but that was unnecessary now that I thought more about it

Comment on lines 20 to 25
#include <vector>

#include "arrow/builder.h"
#include "arrow/testing/visibility.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to "include what you use" [1] as much as possible. Even when headers that we need are already included by other headers.

  • <memory>
  • <functional>
  • <cstdint>
  • "arrow/type.h" for DataType (complete type is needed for the pointers and vector declarations)
  • "arrow/type_fwd.h" forward-declares ArrayBuilder (which we don't need to be a complete type in the header because we only pointers/references to it in this header)

With these included, you should be able to remove "arrow/builder.h" which is a very expensive header.

[1] https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/WhyIWYU.md

Comment on lines 35 to 38
template Status AppendNumeric<Int8Type>(ArrayBuilder* builder, int64_t* next_value);
template Status AppendNumeric<Int16Type>(ArrayBuilder* builder, int64_t* next_value);
template Status AppendNumeric<Int32Type>(ArrayBuilder* builder, int64_t* next_value);
template Status AppendNumeric<Int64Type>(ArrayBuilder* builder, int64_t* next_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need these (in this context) because since this template is in an anon namespace the compiler will only instantiate the 4 instances that are referred to in this compilation unit.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 17, 2024
@llama90 llama90 force-pushed the ARROW-41558 branch 2 times, most recently from 8a3f0b5 to f76416f Compare May 17, 2024 10:10
@llama90
Copy link
Contributor Author

llama90 commented May 17, 2024

@felipecrv I applied your suggestions

Thank you for review :)

@llama90 llama90 requested a review from felipecrv May 17, 2024 13:52
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 17, 2024
@felipecrv felipecrv merged commit 7aff9d5 into apache:main May 17, 2024
37 of 38 checks passed
@felipecrv felipecrv removed the awaiting merge Awaiting merge label May 17, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 7aff9d5.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
### Rationale for this change

Improve the `fixed_width_test_util.h`.

### What changes are included in this PR?

- Move the `fixed_width_test_util.h` to `arrow/testing`
- Divide the `fixed_width_test_util` to `.cc` and `.h`
- Remove unused headers

### Are these changes tested?

Yes

### Are there any user-facing changes?

No

* GitHub Issue: apache#41558

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants