Remove liboauthcpp #8030

Osyotr · 2024-04-29T00:26:04Z

Depends on #8048

coding/base64.cpp

coding/sha1.cpp

coding/sha1.hpp

coding/base64.cpp

coding/coding_tests/sha1_test.cpp

biodranik · 2024-04-30T22:22:15Z

coding/sha1.cpp

+  uint32_t digest[5];
+  sha1.get_digest(digest);
+  for (auto & b : digest)
+    b = boost::core::byteswap(b);


Why do you think a byte swap is necessary compared to the previous implementation? Why are implementations different?

How critical is ExtractHash performance? Does it make sense to paste here boost's implementation and fix it to avoid bits rotation? Or maybe paste here the necessary part from liboauth implementation? Which implementation is faster, or are they the same?

Why do you think a byte swap is necessary compared to the previous implementation?

Digest from liboauthcpp is little-endian (controlled by SHA1_LITTLE_ENDIAN).

organicmaps/3party/liboauthcpp/src/SHA1.h

Lines 55 to 57 in 967ffc1

#if !defined(SHA1_LITTLE_ENDIAN) && !defined(SHA1_BIG_ENDIAN)

#define SHA1_LITTLE_ENDIAN

#endif

Digest from boost is ~~little-endian~~ big-endian.

Why are implementations different?

I can only speculate but it's probably just a preference.

How critical is ExtractHash performance?

Not critical. It's used to verify downloaded map data in a separate thread.

Does it make sense to paste here boost's implementation and fix it to avoid bits rotation? Or maybe paste here the necessary part from liboauth implementation?

It's up to you to decide ;)

Which implementation is faster, or are they the same?

Since the code is not critical, I don't see a reason to write benchmarks.

Digest from boost is little-endian.

You likely meant big-endian? Does the result depend on the current arch?

Maybe use boost::endian::endian_reverse_inplace ?

You likely meant big-endian?

Yes.

Does the result depend on the current arch?

Hmm I have no idea how to verify that.

Signed-off-by: Osyotr <Osyotr@users.noreply.github.com>

biodranik · 2024-04-30T23:21:17Z

coding/sha1.cpp

+{
+SHA1::Hash ExtractHash(boost::uuids::detail::sha1 & sha1)
+{
+  uint32_t digest[5];


boost::uuids::detail::sha1::digest_type ?

biodranik · 2024-04-30T23:21:23Z

coding/sha1.cpp

+  uint32_t digest[5];
+  sha1.get_digest(digest);
+  for (auto & b : digest)
+    b = boost::core::byteswap(b);


Digest from boost is little-endian.

You likely meant big-endian? Does the result depend on the current arch?

Maybe use boost::endian::endian_reverse_inplace ?

biodranik · 2024-04-30T23:24:08Z

coding/sha1.cpp

+
+  SHA1::Hash result;
+  static_assert(result.size() == sizeof(digest));
+  std::copy_n(reinterpret_cast<uint8_t const *>(digest), sizeof(digest), std::begin(result));


Can it be avoided in favor of writing into result directly?

Osyotr · 2024-05-02T20:19:57Z

I'm inclined to C&P one of implementations into the codebase, but you need to decide:

Keep as-is
- ugly conversions
C&P and modify boost::uuids::detail::sha1:
- Is in namespace detail, so there's one in a million chance that it's API and/or behavior will change
- BSL-1.0 licenced
C&P CSHA1 from liboauthcpp
- Does not require additional changes compared to the original implementation
- 100% free public domain for whatever that means

biodranik · 2024-05-02T21:34:30Z

Lincenses are not a big deal, they're already included in copyright.txt

Does the currently printed sha match standard Linux/mac utilities output? I've checked it with

echo 'Organic Maps is the ultimate companion app for travellers, tourists, hikers, and cyclists!' | sha1sum -b
d73a52fac194560cfe61583253f38b0518e279b9 *-

... and the output doesn't match. Does Boost's default output without modifications match it?

@vng the checksum calculation can be safely migrated to a different algo, right? A new version will contain a new algo with newly generated checksums.

Are there better/faster algos with a shorter hash?

biodranik · 2024-05-02T21:50:57Z

If the algo can be easily replaced, then this PR can be merged, and replacement can be done later. If it's better to leave the same implementation, then less ugly/faster one is preferred.

The worst case is when a user downloads 65+Gb of mwm (whole planet) and all hashes are calculated.

vng · 2024-05-02T22:24:51Z

Why SHA is different? unit test shows the same sha? It should be the same for current countries.txt

Osyotr · 2024-05-02T22:36:50Z

and the output doesn't match

I honestly don't know why. PowerShell and various online services give the correct output.

Are there better/faster algos with a shorter hash?

https://github.com/Cyan4973/xxHash?tab=readme-ov-file#benchmarks

biodranik · 2024-05-03T21:51:11Z

Why SHA is different? unit test shows the same sha? It should be the same for current countries.txt

My bad, I forgot -n parameter to echo and it included a newline. SHA1 matches.

biodranik · 2024-05-03T21:52:48Z

https://github.com/Cyan4973/xxHash?tab=readme-ov-file#benchmarks

Thanks! What's the best/simplest can be used?

Osyotr · 2024-05-04T10:08:01Z

Thanks! What's the best/simplest can be used?

Depends on the usage.
Non-cryptographic: xxhash64
Cryptographic: blake3

https://jolynch.github.io/posts/use_fast_data_algorithms/

biodranik · 2024-05-04T10:49:14Z

We don't need cryptographic strength here. Would it be hard to integrate https://xxhash.com/ ?

Osyotr · 2024-05-04T12:25:04Z

We don't need cryptographic strength here. Would it be hard to integrate https://xxhash.com/ ?

Here's drop-in replacement:

3party/CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
  xxHash
  GIT_REPOSITORY https://github.com/Cyan4973/xxHash.git
  GIT_TAG        bbb27a5efb85b92a0486cf361a8635715a53f6ba # v0.8.2
  SOURCE_SUBDIR cmake_unofficial
  #FIND_PACKAGE_ARGS NAMES xxHash # CMake 3.24+
)

set(BUILD_SHARED_LIBS OFF)
set(XXHASH_BUILD_XXHSUM OFF)
FetchContent_MakeAvailable(xxHash)

coding/CMakeLists.txt

target_link_libraries(${PROJECT_NAME} xxHash::xxhash)

uint64_t Calculate(std::string const & filePath)
{
  uint32_t constexpr kFileBufferSize = 8192;
  try
  {
    base::FileData file(filePath, base::FileData::OP_READ);
    uint64_t const fileSize = file.Size();

    struct XXH3StateDeleter
    {
      void operator()(XXH3_state_t * p) const
      {
        (void)XXH3_freeState(p);
      }
    };
    using XXH3StatePtr = std::unique_ptr<XXH3_state_t, XXH3StateDeleter>;
    XXH3StatePtr state = XXH3StatePtr(XXH3_createState());
    if (!state || XXH3_64bits_reset(state.get()) != XXH_OK)
    {
      LOG(LERROR, ("Could not create XXH3 state."));
      return {};
    }

    uint64_t currSize = 0;
    unsigned char buffer[kFileBufferSize];
    while (currSize < fileSize)
    {
      auto const toRead = std::min(kFileBufferSize, static_cast<uint32_t>(fileSize - currSize));
      file.Read(currSize, buffer, toRead);
      if (XXH3_64bits_update(state.get(), buffer, toRead) != XXH_OK)
      {
        LOG(LERROR, ("Could not update XXH3 state."));
        return {};
      }
      currSize += toRead;
    }

    return XXH3_64bits_digest(state.get());
  }
  catch (Reader::Exception const & ex)
  {
    LOG(LERROR, ("Error reading file:", filePath, ex.what()));
  }
  return {};
}

uint64_t CalculateForString(std::string_view data)
{
  return XXH3_64bits(data.data(), data.size());
}

Note that you need to update countries.txt to reflect changes (I don't know the current process so can't help here).

vng

What are the next steps here? LGTM, but first of all we can ensure that tests only commit works on current master.

biodranik

Getting rid of some 3p library is good
Introducing a change brings potential risks
Changing to boost instead of already working 3p lib without clear benefits brings a bit of risk
Changing to a faster, more optimal implementation may bring benefits that overweight risks of introducing changes

Using FetchContent in CMake looks risky to me compared to using submodules or plain copies of the code in the main repo, because if there's no internet connection then you're stuck. Maybe we need to get used to it first.

Regarding the choice of the implementation: we don't need constexpr calculations at compile time, do we?

There is BSD-licensed C++17 header-only implementation, that can be easily copied into the main repo without any dependencies and complexities: https://github.com/RedSpah/xxhash_cpp

Changing client's code will also require modifying the code in the python generator part in tools/python/post_generation/hierarchy_to_countries.py:

def get_mwm_hash(path, name):
    filename = os.path.join(path, f"{name}.mwm")
    h = hashlib.sha1()
    with open(filename, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            h.update(chunk)
    return str(base64.b64encode(h.digest()), "utf-8")

We likely may use this drop-in replacement: https://pypi.org/project/xxhash/ , it will require adding one more dependency to the generator.

WDYT about it?

vng · 2024-05-20T11:57:38Z

This PR makes our codebase much better and doesn't break an existing behavior.

Changing hash for countries is a more complex task and may break existing behavior. Can investigate in a separate PR.

Osyotr marked this pull request as draft April 29, 2024 06:02

biodranik reviewed Apr 29, 2024

View reviewed changes

coding/base64.cpp Outdated Show resolved Hide resolved

coding/sha1.cpp Show resolved Hide resolved

Osyotr force-pushed the liboauthcpp-removal branch from c91992f to f3b1e1d Compare April 29, 2024 21:10

Osyotr marked this pull request as ready for review April 29, 2024 21:30

vng reviewed Apr 30, 2024

View reviewed changes

coding/sha1.hpp Show resolved Hide resolved

Osyotr force-pushed the liboauthcpp-removal branch from f3b1e1d to c153cf6 Compare April 30, 2024 21:42

biodranik reviewed Apr 30, 2024

View reviewed changes

Osyotr added 2 commits May 1, 2024 02:09

[coding] Add SHA1 test

b2ed932

Signed-off-by: Osyotr <Osyotr@users.noreply.github.com>

Remove liboauthcpp

5939751

Signed-off-by: Osyotr <Osyotr@users.noreply.github.com>

Osyotr mentioned this pull request Apr 30, 2024

[coding] Add SHA1 test #8048

Merged

Osyotr force-pushed the liboauthcpp-removal branch from c153cf6 to 5939751 Compare April 30, 2024 23:12

biodranik reviewed Apr 30, 2024

View reviewed changes

vng approved these changes May 1, 2024

View reviewed changes

vng approved these changes May 20, 2024

View reviewed changes

biodranik reviewed May 20, 2024

View reviewed changes

vng merged commit 621eaaf into organicmaps:master May 20, 2024
15 checks passed

Osyotr deleted the liboauthcpp-removal branch May 20, 2024 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove liboauthcpp #8030

Remove liboauthcpp #8030

Osyotr commented Apr 29, 2024 •

edited

biodranik Apr 30, 2024

Osyotr Apr 30, 2024 •

edited

biodranik Apr 30, 2024

Osyotr May 2, 2024

biodranik Apr 30, 2024

biodranik Apr 30, 2024

biodranik Apr 30, 2024

Osyotr commented May 2, 2024

biodranik commented May 2, 2024

biodranik commented May 2, 2024

vng commented May 2, 2024 •

edited

Osyotr commented May 2, 2024

biodranik commented May 3, 2024

biodranik commented May 3, 2024

Osyotr commented May 4, 2024

biodranik commented May 4, 2024

Osyotr commented May 4, 2024

vng left a comment

biodranik left a comment

vng commented May 20, 2024

	#if !defined(SHA1_LITTLE_ENDIAN) && !defined(SHA1_BIG_ENDIAN)
	#define SHA1_LITTLE_ENDIAN
	#endif

Remove liboauthcpp #8030

Remove liboauthcpp #8030

Conversation

Osyotr commented Apr 29, 2024 • edited

biodranik Apr 30, 2024

Choose a reason for hiding this comment

Osyotr Apr 30, 2024 • edited

Choose a reason for hiding this comment

biodranik Apr 30, 2024

Choose a reason for hiding this comment

Osyotr May 2, 2024

Choose a reason for hiding this comment

biodranik Apr 30, 2024

Choose a reason for hiding this comment

biodranik Apr 30, 2024

Choose a reason for hiding this comment

biodranik Apr 30, 2024

Choose a reason for hiding this comment

Osyotr commented May 2, 2024

biodranik commented May 2, 2024

biodranik commented May 2, 2024

vng commented May 2, 2024 • edited

Osyotr commented May 2, 2024

biodranik commented May 3, 2024

biodranik commented May 3, 2024

Osyotr commented May 4, 2024

biodranik commented May 4, 2024

Osyotr commented May 4, 2024

vng left a comment

Choose a reason for hiding this comment

biodranik left a comment

Choose a reason for hiding this comment

vng commented May 20, 2024

Osyotr commented Apr 29, 2024 •

edited

Osyotr Apr 30, 2024 •

edited

vng commented May 2, 2024 •

edited