Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OffsetUnitCalculusError on String Parsing of degC, degF #386

Open
KeyboardNerd opened this issue Jun 10, 2016 · 12 comments · May be fixed by #1349
Open

OffsetUnitCalculusError on String Parsing of degC, degF #386

KeyboardNerd opened this issue Jun 10, 2016 · 12 comments · May be fixed by #1349

Comments

@KeyboardNerd
Copy link

KeyboardNerd commented Jun 10, 2016

I encountered the problem by:

>>> from pint import UnitRegistry
>>> ureg = UnitRegistry()

and these following commands
>>> 1*ureg['celsius']
>>> 1*ureg['degC']
>>> 1*ureg.parse_expression('degC')
>>> 1*ureg('degC')
>>> ureg.parse_expression('1*degC')
all will throw error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/pint/quantity.py", line 727, in __mul__
    return self._mul_div(other, operator.mul)
  File "/Library/Python/2.7/site-packages/pint/quantity.py", line 682, in _mul_div
    getattr(other, 'units', ''))
pint.errors.OffsetUnitCalculusError: Ambiguous operation with offset unit (degC).

using ureg.degC or ureg.degF will not cause the problem

>>> 1*ureg.degC
<Quantity(1, 'degC')>
>>> 1*ureg.degF
<Quantity(1, 'degF')>

using ureg.Quantity will not cause the problem

>>> q_ = ureg.Quantity
>>> q_(1,'celsius')
<Quantity(1, 'degC')>

My python version

python --version
Python 2.7.10

My Pint version

pip show pint

---
Metadata-Version: 1.1
Name: Pint
Version: 0.7.2

My System version

sw_vers
ProductName:    Mac OS X
ProductVersion: 10.11.4
BuildVersion:   15E65
@natezb
Copy link
Contributor

natezb commented Aug 16, 2016

I think it is reasonable for there to be an OffsetUnitCalculusError when you multiply an offset unit/quantity.

However, I think there's a different issue in parsing offset quantities—you should be able to do ureg.Quantity('32 degF'), especially since str(ureg.Quantity(32, 'degF')) is '32 degF'.

@dopplershift
Copy link
Contributor

What if you create the registry with:

ureg = UnitRegistry(autoconvert_offset_to_baseunit=True)

?

@natezb
Copy link
Contributor

natezb commented Aug 16, 2016

Thanks for the suggestion, it works nicely! I probably should've looked at the docs more carefully.

However, I still think there should still be a way to allow parsing of '32 degF' without also allowing autoconversion to base units. Perhaps there could be a separate flag which enables multiplying a bare number/ndarray by a single offset unit. It looks like you could use this flag here in Quantity._mul_div, or possibly in _ok_for_muldiv.

@burnpanck
Copy link

Why is this closed? autoconvert_offset_to_baseunit=True is at best a workaround, since it creates unwanted side-effects.
Why should autoconvert_offset_to_baseunit=False be incompatible with parsing of temperatures?

@dalito
Copy link
Contributor

dalito commented Jun 17, 2021

You can create the unit registry with autoconvert_offset_to_baseunit=False and set ureg.autoconvert_offset_to_baseunit = True only when you need it.

@burnpanck
Copy link

That still doesn't make it more than a workaround, and a horrible one at that:

  • That is unacceptable in library code, which may run in multithreaded environments.
  • Without a context-manager, it needs try:/finally: to be exception-safe.
  • The code requiring string parsing might not even be aware of offset units (e.g. a generic JSON serialiser for Quantities), requiring the user of that code to enable autoconvert_offset_to_baseunit, far away from where it is needed - risking code breakage all over the place.

Modifying global state to achieve a local effect should never pass a code review. And in this case, it would be completely unnecessary: String parsing is well-specified independent of the handling of offset units.

burnpanck added a commit to burnpanck/pint that referenced this issue Jul 14, 2021
@burnpanck burnpanck linked a pull request Jul 14, 2021 that will close this issue
5 tasks
@hgrecco
Copy link
Owner

hgrecco commented Jul 14, 2021

(As I mentioned in #1342 I think we can use contextvars and contextanagers to modify parsing settings in a safe manner. Take a look at that issue for the proposal)

Going back to this specific question, I would suggest that you read this excellent summary #147 by @dalito who also made a PR to make pint safer to handle non multiplicative units.

Having said this, current version of Pint has an advantage that might make this open for discussion again. In the older versions of pint degC was a Quantity with 1 as magnitude. Now is a unit so some ambivalence is not there any more.

@burnpanck
Copy link

Indeed, context managers and context variables are the right way to solve the problems mentioned in my first two bullets with changing global state locally. I fully second that.

However, I believe it is beside the point: This issue is about parsing strings, which, in pint fails for some of the units (offset units) for no convincing reason. I do not see how u.Quantity("1 m") should be accepted, but u.Quantity("1 degC") should require certain global registry options to pass. That is a bug, and any amount of modification of global variables is just a workaround.

I am actually offering a PR in #1349 which fixes this bug - I'm just not sure what the CI is telling me over there.

On the larger discussion around what offset units should do or not do, I see that #147 does show a large discussion, which I wasn't aware of. I just recently switched to pint because of pandas support, but have been working with a lot unit packages before (Python: python-quantities and astropy.units, C++: boost.units, nholthaus/units, benri and now mpusz/units, some in JS and Java), and contributed to quite a few of them. Those temperatures / quantity-points / offset-units just always seem to be a challenge. I did note down one observation in #1350 - however, I don't think it is of relevance to the specific issue here.

@burnpanck
Copy link

Sorry, I might have been a bit ahead of myself. I see that the OP offered a few other attempts on constructing offset quantities, involving a multiplication sign, for which I agree; that is mathematically incorrect and thus correctly raises an Exception. I am specifically referring to the sub-aspect pointed out in one of the early responses and matching the issue title: That u.Quantity(str(q)) fails for offset quantities. If you prefer, I can open a separate issue.

@hgrecco
Copy link
Owner

hgrecco commented Jul 15, 2021

That is a bug,

The point I was trying to make by linking #147 is that this was a conscious decision. It is not a bug. We decided to be less ergonomic when parsing strings to be sure we always do what the user intended ("do not guess" has been always one of our mottos). When the outcome is ambiguous we opt for asking the user to be specific, even if is annoying. An early raised exception is better than a something that gives a wrong/unintended result.

With this I am not saying that this cannot be improved. My problem with parsing strings is that associativity is assumed. We expect 1 A B == (1 A) B == 1 (A B). With offset units this is tricky and with some edge cases.

So before approving a PR, I think we need a concrete, concise and easy to explain ruleset that is surprise free.

Sorry, I might have been a bit ahead of myself.

Don't be sorry for being enthusiatic. This is most welcome, as well as discussions with fresh eyes.

Ps.- Not sure if this is agood idea but in the past I was toying with the idea of making use of things like this: 4.3degC; i.e. use a value followed without spaces to a unit to indicate a strong associativity. Not sure if it is a good idea or has been used.

@hgrecco hgrecco reopened this Jul 15, 2021
@burnpanck
Copy link

I must admit, I haven't read #147 all the way through. However, from what I was able to follow, a significant part of the discussion revolved around products of units (not quantities) involving "offset units". As pointed out in #1350, IMHO such products have no physical meaning and should be disallowed. The bug I see however is that string parsing fails for simple expressions where there is no ambiquity.

I was toying with the idea of making use of things like this: 4.3degC; i.e. use a value followed without spaces to a unit to indicate a strong associativity.

I don't think that this is a good idea - I'm sure that to most people 4.3degC and 4.3 degC look exactly the same and therefore should produce the same value.

However, I do believe that "implicit multiplications" in the context of expressing a quantity are not multiplications at all, but "forming a unit/quantity". This is the approach taken by the PR. I believe it would make sense to make all such "implicit multiplications" higher priority (and right-to-left associativity) than multiplications and divisions. The PR doesn't implement that though. That of course would be a breaking change, because now 200 W / m K would be equivalent to 200 W m^-1 K^-1 rather than 200 W K m^-1. While a respectable scientist should, of course, write the heat conductivity of Aluminium e.g. as "200 W / (m K)", googling for "W / m K" does bring an almost surprising wealth of scientific and engineering results which do use that convention. Again, IMHO the best thing in this case would be to raise an error (PEP 20 "In the face of ambiguity, refuse the temptation to guess."). Making the parser recognise that situation however is more difficult.

burnpanck added a commit to burnpanck/pint that referenced this issue Nov 28, 2021
@burnpanck
Copy link

We decided to be less ergonomic when parsing strings to be sure we always do what the user intended ("do not guess" has been always one of our mottos).

My problem with parsing strings is that associativity is assumed.

I am a big advocate of the python Zen in general and "refuse to guess" in particular. However, the ambiguity with associativity is not a problem brought about by offset units, but by implicit multiplications. Indeed products of offset units have no physical meaning and thus inherently are ambiguous in what they mean - see #1350. Instead, it is actually the combination of implicit multiplication with division that cause the ambiguity. Both 100 V / A s and 200 V / A / 2 s could be either interpreted as 100 V / (A s) or as 100 V s / A - no offset units involved. I would argue that if we had to guess, I suspect the majority of persons would assume the first interpretation. The current parser however takes the second variant however.

IMHO, the current parser is broken in two ways: a) it throws on unambiguous strings describing "offset quantities" such as 37 °C and b) it does not throw on ambiguous quantity expressions such as 1 m / 2 s (and even choses the arguably less-popular interpretation).

My PR fixes a) and leaves b) broken as-is.

So before approving a PR, I think we need a concrete, concise and easy to explain ruleset that is surprise free.

If we want to use the opportunity to fix both issues, then, I have three rulesets to offer:

  1. We designate a parser as purely parsing quantities, not expressions; quantities are formatted <number> <unit> (space optional), where <unit> is a product of powers of other units, where no numbers appear as factors (only in exponents - which are required to be a single (rational) number). Any product directly to the RHS of a division is forbidden tue to ambiguity (formally: parsing * and / as equal precedence, then, within an equal-precedence sequence of them, * may only appear to the left of / and vice versa). It really should't be difficult to write a BNF for that purpose. Implicit multiplications can be allowed anywhere or disallowed without much difference. Examples:

    • 1 °C is allowed and unambiguous
    • 1 m / s**2 is allowed and unambiguous
    • 1 V / Hz**1/2 is allowed and unambiguous
    • 1 V / Hz**(1/2) is allowed and unambiguous
    • 1 V / (A*s) is allowed and unambiguous
    • 1 W / m K is disallowed as ambiguous
    • 1 m / (2 s) is disallowed as not a quantity but an expression
    • 1 m 2 s is disallowed as not a quantity but an expression
  2. Similar as 1), just relaxing the requirements that numbers may not appear within units. Unprotected rationals in exponents are disallowed for ambiguity. We then have a full expression parser. Examples:

    • 1 °C is allowed and unambiguous
    • 1 m / s**2 is allowed and unambiguous
    • 1 V / Hz**1/2 is disallowed as ambiguous
    • 1 V / Hz**(1/2) is allowed and unambiguous
    • 1 V / (A*s) is allowed and unambiguous
    • 1 W / m K is disallowed as ambiguous
    • 1 m / (2 s) is allowed and unambiguous
    • 1 m 2 s is allowed (2 m s)
  3. Implicit multiplications bind stronger than * and / but weaker than **. Implicit multiplications do not take numbers on their RHS. No special handling of rationals in exponents.

    • 1 °C is allowed and unambiguous
    • 1 m / s**2 is allowed and unambiguous
    • 2 / Hz**1/2 V is allowed and equivalent to 1 / (V Hz)
    • 2 / Hz**(1/2) V is allowed and equivalent to 2 / (V Hz^(1/2))
    • 1 V / (A*s) is allowed and unambiguous
    • 1 W / m K is allowed and equivalent to 1 W / (m K)
    • 1 m / (2 s) is allowed and unambiguous
    • 1 m 2 s is allowed (2 m s)

My favourites are 2) as ruleset of ureg.parse_expression and 1) for a new ureq.parse_quantity. But that would be a breaking change. My PR is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants