Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose getaddrinfo errors. Was: Return empty list on EAI_ errors for getaddrinfo@luv (#351) #352

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

haesbaert
Copy link
Contributor

Unix.getaddrinfo used in u-ring ignores gar_errno and returns an empty list, this makes the luv backend follow a similar behaviour.

Should fix #351 where we would get EAI_NONAME (or others) if resolution failed, this is a bit defensive as we just handle expected EAI errors.

@haesbaert
Copy link
Contributor Author

cc @patricoferris

Comment on lines 582 to 585
| Error `EAI_ADDRFAMILY | Error `EAI_AGAIN | Error `EAI_BADFLAGS | Error `EAI_BADHINTS
| Error `EAI_CANCELED | Error `EAI_FAIL | Error `EAI_FAMILY | Error `EAI_MEMORY
| Error `EAI_NODATA | Error `EAI_NONAME| Error `EAI_OVERFLOW | Error `EAI_PROTOCOL
| Error `EAI_SERVICE | Error `EAI_SOCKTYPE -> []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list seems excessive (even if it does match the stdlib version). In particular, EAI_AGAIN, EAI_BADFLAGS, EAI_MEMORY and EAI_SOCKTYPE seem like they should be reported as errors, rather than claiming there are no addresses. Maybe EAI_FAIL too.

Copy link
Contributor Author

@haesbaert haesbaert Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it too, just some notes:
EAI_FAIL and EAI_AGAIN (unlike EAGAIN) can be the result of a an actual DNS reply (would have to dig deeper, but those should return an empty list). EAI_SOCKTYPE, EAI_BADFLAGS, EAI_FAMILY we could, since they are mostly programming errors from our side.
I don't have a strong opinion, but I'd err on the side of just emulating the same behaviour of Unix.getaddrinfo which ignores everything.

Copy link
Contributor Author

@haesbaert haesbaert Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick grep in the OpenBSD tree, just to see the most common idiom.
Basically it's just log gai_errno -> return NULL, some cases are a bit more careful into checking for EAI_NONAME and EAI_NODATA and not logging it as an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just copy the getaddrinfo API and not allow an empty result. The wrapper function could enforce this:

let getaddrinfo ?(service="") (t:#t) hostname =
  match t#getaddrinfo ~service hostname with
  | [] -> raise (Dns_error (Failure "No addresses for ..."))
  | xs -> xs

I guess most people getting an empty list will want to report an error, and it's a bit silly you have to guess what the error was when we just threw the real error away. e.g. in with_tcp_connect we currently do:

    | [] -> raise (Connection_failure (Failure (Fmt.str "No TCP addresses for %S" host)))

Then we could say if that's because the DNS server is busy, the entry doesn't exist, doesn't have an IP address, etc.

Copy link
Contributor Author

@haesbaert haesbaert Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited since I misread the first time)
I like this better than the empty list, but if we're going this direction we could expose the gai_strerror in Dns_error so the user has something more meaningful than "not found". That would require providing a C stub for u-ring that exposes the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this, I think connect should take a Sockaddr.stream list, and loop over the candidates until it succeeds connecting to one. This is virtually what everyone does with getaddrinfo + connect and bind.
In this case connect itself could raise something, getaddrinfo could still return just an empty list, and the user would still have to handle an exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the error code was just an example. We can use a better exception name and contents.

I think connect should take a Sockaddr.stream list, and loop over the candidates until it succeeds connecting to one.

with_tcp_connect already handles looping over the candidates; I'm not sure there's much benefit to changing connect too. In fact, that would just mean having to do something for the extra case of an empty list.

In this case connect itself could raise something, getaddrinfo could still return just an empty list, and the user would still have to handle an exception.

I don't see how that helps. We'd still be losing the detailed error from getaddrinfo and replacing it with a guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the error code was just an example. We can use a better exception name and contents.

I think connect should take a Sockaddr.stream list, and loop over the candidates until it succeeds connecting to one.

with_tcp_connect already handles looping over the candidates; I'm not sure there's much benefit to changing connect too. In fact, that would just mean having to do something for the extra case of an empty list.

empty list would return some Dns_error 'no candidates found'.

In this case connect itself could raise something, getaddrinfo could still return just an empty list, and the user would still have to handle an exception.

I don't see how that helps. We'd still be losing the detailed error from getaddrinfo and replacing it with a guess.

Yes, I was just assuming you wouldn't want a getaddrinfo stub that exports the getaddrinfo errno 😄 .
I'd prefer to export the error from getaddrinfo and also make connect loop, so users don't connect (List.first (getaddrinfo...)).
I don't feel too strong about it, I just want to move forward so both backends have the same behaviour.
Should we just raise all errors from getaddrinfo on both cases ? That involves the stub and creating some Dns_error values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I get a ruling here ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just raise all errors from getaddrinfo on both cases?

Yes, that makes sense to me.

( Having connect take a Sockaddr.stream list doesn't seem useful to me. You can't give a good error if there are no addresses - you don't know what the DNS error was, what the failing hostname was, or even whether DNS was used at all! )

@patricoferris
Copy link
Collaborator

Thanks @haesbaert ! This works for me, but I'll differ to @talex5's knowledge of which EAI_ errors to catch and which to no :))

@talex5
Copy link
Collaborator

talex5 commented Oct 20, 2022

I'll differ to @talex5's knowledge of which EAI_ errors to catch and which to no :))

I have no special knowledge - just reading the getaddrinfo man-page :-)

@haesbaert haesbaert marked this pull request as draft October 26, 2022 12:48
…#351)

Unix.getaddrinfo used in u-ring ignores gar_errno and returns an empty list, this
makes the luv backend follow a similar behaviour.
@haesbaert
Copy link
Contributor Author

haesbaert commented Dec 1, 2022

Some corrections to my previous statement, getaddrinfo(3) cannot return an empty list and no error, it's either something or an error. I checked libc.

This PR should be ready for reviewing (debian is failing, looking into it), some notes:

  • A lot of the stub is just copied from ocaml's stdlib
  • The values chosen for the undefined EAI_* values are taken from libuv (since it works for them it works for us), they're added for completion sakeness and a future guard. I'm sure once we have some other backend those will popup and we will move the stub, or maybe glibc adds them in the future, who knows.
  • The strings for getaddrinfo_error_to_string were copied verbatim from libc.

@haesbaert
Copy link
Contributor Author

haesbaert commented Dec 1, 2022

Tests are zefixed, marking this as ready for review

Another note, I've added the stubs to a new file for two reasons

  • It's too polluting
  • Since it's a lot of copy and paste from stdlib, I have to preserve the copyright. Actually now I noticed EIO's is not GPL, so I should get a clear from Xavier to relicense those bits in EIO, forget that, it's LGPL.

@haesbaert haesbaert marked this pull request as ready for review December 1, 2022 10:08
Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible.

Note that I'm changing the error handling quite a lot in #378, which will impact this PR a bit.

Comment on lines +11 to +13
/* All rights reserved. This file is distributed under the terms of */
/* the GNU Lesser General Public License version 2.1, with the */
/* special exception on linking described in the file LICENSE. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid LGPL code in Eio (it's currently BSD and ISC only). Though they might relicense this bit if asked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche @xavierleroy (sorry for the ping).
I've copied significant part of otherlibs/unix/getaddrinfo.c into EIO (https://github.com/ocaml-multicore/eio/blob/61b737d5be683b9be611844b16c7e73b5bece09a/lib_eio_linux/getaddrinfo_stubs.c). The original file is LGPLed and we would like to relicense our new file to ISC.
Do you permit us to relicense the bits we copied out from LGPL to ISC in EIO ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I only use copyleft licenses for my free software. Please consider using the LGPL with OCaml linking exception for your library, or do not reuse any of my code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I only use copyleft licenses for my free software. Please consider using the LGPL with OCaml linking exception for your library, or do not reuse any of my code.

Thank you, will do one of the two as suggested.

Copy link
Contributor

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comments on the borrowed C code!

#include <caml/unixsupport.h>
#include <caml/socketaddr.h>

static value caml_unix_cst_to_constr(int n, int *tbl, int size, int deflt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid this duplication by borrowing the extern declaration in cst2constr.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, somehow my brain remembered the function being static in first place.

if (retcode == 0) {
for (r = res; r != NULL; r = r->ai_next) {
v = caml_alloc_small(2, Tag_cons);
Field(v, 0) = convert_addrinfo(r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates GC rule 5 - the convert_addrinfo call needs to be done before the caml_alloc_small and stored in a registered local (convert_addrinfo allocates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting this, fixed it now

Many thanks to dra27 !

Somehow this function was static on first place in my head.
When I removed the `e` and I violated GC #rule 5. Basically a small block must
be fully initialized before we trigger the next allocation, TIL.
@haesbaert haesbaert changed the title Return empty list on EAI_ errors for getaddrinfo@luv (#351) ~~Return empty list on EAI_ errors for getaddrinfo@luv (#351)~~ Expose getaddrinfo errors Dec 1, 2022
@haesbaert haesbaert changed the title ~~Return empty list on EAI_ errors for getaddrinfo@luv (#351)~~ Expose getaddrinfo errors Expose getaddrinfo errors. Was: Return empty list on EAI_ errors for getaddrinfo@luv (#351) Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Getaddrinfo raises error on some backends
5 participants