Meta.parse(repr(Char(0x110000))) fails #54396

stevengj · 2024-05-08T01:03:50Z

Meta.parse(repr(Char(0x110000))) fails because

julia> show(Char(0x110000))
'\U110000'

but '\U110000' is not parseable:

julia> '\U110000'
ERROR: ParseError:
# Error @ REPL[17]:1:2
'\U110000'
#└──────┘ ── invalid unicode escape sequence

isvalid(Char(0x110000)) is false, but other invalid characters are parsed okay:

julia> '\ud800'
'\ud800': Unicode U+D800 (category Cs: Other, surrogate)

julia> isvalid('\ud800')
false

so this seems kind of inconsistent.

Options are either (a) change the printing of Char(0x110000) or (b) change the parsing to allow this. I lean towards (a). Thoughts?

The text was updated successfully, but these errors were encountered:

Seelengrab · 2024-05-08T08:48:28Z

I think this is a bug in the parser. What would the printing be changed to to make it parse? Just using u doesn't work because then the literal is too large:

julia> '\u11000'
ERROR: ParseError:
# Error @ REPL[27]:1:2
'\u11000'
#└─────┘ ── character literal contains multiple characters
Stacktrace:
 [1] top-level scope
   @ REPL:1

stevengj · 2024-05-08T12:39:36Z

The printing could be changed to '\xf4\x90\x80\x80', by calling Base.show_invalid, for example. ('\U110000' is a lot more understandable, but is meaningless from the perspective of Unicode.)

It could also print as Char(0x110000), but that's a pretty radical change from how other characters are printed.

If we extend the parser to allow this, I guess we would parse up to '\U1fffff', since Char(0x200000) throws an error. That seems reasonable to me, since there is still a clear upper bound on what we should parse.

Seelengrab · 2024-05-08T14:30:27Z

The manual has that exact value as an example, and documents that up to the following 8 bytes are allowed for \U, so I'd be in favor of fixing the parser.

stevengj added domain:unicode Related to unicode characters and encodings parser Language parsing and surface syntax domain:display and printing Aesthetics and correctness of printed representations of objects. labels May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta.parse(repr(Char(0x110000))) fails #54396

Meta.parse(repr(Char(0x110000))) fails #54396

stevengj commented May 8, 2024 •

edited

Seelengrab commented May 8, 2024 •

edited

stevengj commented May 8, 2024 •

edited

Seelengrab commented May 8, 2024

Meta.parse(repr(Char(0x110000))) fails #54396

Meta.parse(repr(Char(0x110000))) fails #54396

Comments

stevengj commented May 8, 2024 • edited

Seelengrab commented May 8, 2024 • edited

stevengj commented May 8, 2024 • edited

Seelengrab commented May 8, 2024

stevengj commented May 8, 2024 •

edited

Seelengrab commented May 8, 2024 •

edited

stevengj commented May 8, 2024 •

edited