Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance for multipolygon contains much worse than for geos #1184

Open
gauteh opened this issue May 18, 2024 · 14 comments
Open

performance for multipolygon contains much worse than for geos #1184

gauteh opened this issue May 18, 2024 · 14 comments

Comments

@gauteh
Copy link

gauteh commented May 18, 2024

Hi,

I'm trying to convert some code for roaring-landmask (https://github.com/gauteh/roaring-landmask) from geos to geo (gauteh/roaring-landmask#27). However, checking if a point is contained by a large multipolygon (with all countries from GSHHG) is far slower with geo than with geos. With geos I used PreparedGeometry, does something similar exist?

geos:

test shapes::tests::benches::test_contains_in_ocean ... bench:       1,586 ns/iter (+/- 45)
test shapes::tests::benches::test_contains_on_land  ... bench:       1,499 ns/iter (+/- 169)

geo:

test shapes::tests::benches::test_contains_in_ocean ... bench:  10,713,374 ns/iter (+/- 356,454)
test shapes::tests::benches::test_contains_on_land  ... bench:   6,709,172 ns/iter (+/- 255,050)
@urschrei
Copy link
Member

No, but you can decompose the multipolygon into polygons and add them to an rstar spatial index: https://docs.rs/rstar/latest/rstar/struct.RTree.html. The rtree is serializable, re-exported by geo (so it should "just work" for bulk loading the multipolygon Vec), and you can associate extra data with each polygon by using https://docs.rs/rstar/latest/rstar/primitives/struct.GeomWithData.html

@gauteh
Copy link
Author

gauteh commented May 18, 2024

Thanks, for the tip. Do you know if that is how geos is doing it?

@urschrei
Copy link
Member

A geos prepared geometry is an rtree behind the scenes, yes. So things like point in multipolygon query the tree transparently.

@gauteh
Copy link
Author

gauteh commented May 19, 2024

I'm trying to add all my Polygons to an RTree, and then check if a point is contained by any of the polygons. But I'm getting this error:

   Compiling roaring-landmask v0.7.3 (/home/gauteh/met/roaring-landmask)
error[E0599]: the method `nearest_neighbor` exists for struct `RTree<Polygon>`, but its trait bounds were not satisfied
  --> src/shapes.rs:74:28
   |
74 |         let po = self.geom.nearest_neighbor(&p);
   |                            ^^^^^^^^^^^^^^^^ method cannot be called on `RTree<Polygon>` due to unsatisfied trait bounds
   |
  ::: /home/gauteh/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-types-0.7.13/src/geometry/polygon.rs:71:1
   |
71 | pub struct Polygon<T: CoordNum = f64> {
   | ------------------------------------- doesn't satisfy `geo::Polygon: PointDistance`
   |
   = note: the following trait bounds were not satisfied:
           `geo::Polygon: PointDistance`

gauteh/roaring-landmask@b8e3a21

How can I search through the Polygons and check if the point is contained?

@gauteh
Copy link
Author

gauteh commented May 19, 2024

I think Polygons should implement contains_point, but not PointDistance. Only contains_point and envelope-stuff is necessary for this check? Maybe PointDistance needs to be split.

@gauteh
Copy link
Author

gauteh commented May 19, 2024

I wrapped the polygons and implemented contains (gauteh/roaring-landmask@60bc592)

The performance is better, but still roughly 1000x worse than geos :/

test shapes::tests::benches::test_contains_in_ocean ... bench:   1,403,390 ns/iter (+/- 175,985)
test shapes::tests::benches::test_contains_on_land  ... bench:   1,395,731 ns/iter (+/- 111,216)

@urschrei
Copy link
Member

Can you post a link to your benchmark? I just want to make sure I'm understanding everything here.

@gauteh
Copy link
Author

gauteh commented May 19, 2024

Hi, the benchmark is here: https://github.com/gauteh/roaring-landmask/blob/685ed6f43e944a604545c452273ded4ab901ce42/src/shapes.rs#L201

You run it with: cargo bench shapes --features nightly, requires python at least 3.10. main branch is with geos, the linked PR is with geo.

Thanks

@kylebarron
Copy link
Collaborator

FWIW I believe a GEOS prepared geometry indexes every coordinate pair (Line in geo terminology). This is vastly different than indexing each polygon, especially for large polygons like country/land boundaries. I think what you want is #803.

I think that geo will automatically build in-effect a prepared geometry before doing the intersection, but there's currently no API to allow the user to persist this. So if you're doing repeated intersections I'd expect bad performance compared to GEOS

@gauteh
Copy link
Author

gauteh commented May 21, 2024

Thanks, that seems like something that needs to be done on geo level. It appears to have got some work, but still require some effort?

@gauteh
Copy link
Author

gauteh commented Jun 6, 2024

Would it be enough to persist/cache this intermediate step? Any way to do a quick test if that makes any difference?

@urschrei
Copy link
Member

urschrei commented Jun 6, 2024

You can enable serialise / deserialise support for trees by enabling the serde feature on the rstar dependency. I haven't done it myself, but it should be pretty straightforward using something like https://docs.rs/bincode/latest/bincode/

@gauteh
Copy link
Author

gauteh commented Jun 6, 2024

It's not the rstar, but what is mentioned here: #1184 (comment) . Rstar didn't help enough.

@michaelkirk
Copy link
Member

Any way to do a quick test if that makes any difference?

As discussed in #803, https://github.com/georust/geo/compare/mkirk/prepared-geom?expand=1 could be a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants