Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update grammar.y #43

Open
6 tasks
perillo opened this issue Dec 26, 2022 · 12 comments
Open
6 tasks

Update grammar.y #43

perillo opened this issue Dec 26, 2022 · 12 comments

Comments

@perillo
Copy link
Contributor

perillo commented Dec 26, 2022

Currently, grammar.y is out of sync with the actual grammar implemented by the zig compiler (in std/zig/tokenizer and std/zig/parser.zig).

These are the places where grammar.y and zig diverges, found with the improved parser from #42, using files in src and lib/std in the zig repository.

  • doc-comments not allowed in top-level comptime and test declaration.

    /// doc-comment for comptime.
    comptime {
        var a = 1;
        _ = a;
    }
    
    /// doc-comment for test.
    test {}
  • Saturating arithmetic is not supported by grammar.y.

    test {
        var a: isize = 10;
    
        a +|= 1;
        a -|= 1;
        a *|= 1;
        a <<|= 1;
    
        const b: isize = a +| 1;
        const c: isize = b -| 1;
        const d: isize = c *| 1;
        const e: isize = d <<| 1;
        _ = e;
    }
  • Mixed doc-comment and line-comment is not supported by grammar.y.

    // A doc-comment followed by a line-comment is not supported by grammar.y.
    const S = struct {
        //! doc
        /// doc
        // doc
        a: i32,
    };

    NOTE: I found mixing doc-comment and line-comment confusing, and autodoc doesn't not handle them correctly.

    Examples:

    • std/mem.zig:3760

      /// Force an evaluation of the expression; this tries to prevent
      /// the compiler from optimizing the computation away even if the
      /// result eventually gets discarded.
      // TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168
      pub fn doNotOptimizeAway(val: anytype) void {

      See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.

    • std/coff.zig:354

      /// This relocation is meaningful only when the machine type is ARM or Thumb.
      /// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair.
      // ARM_MOV32 = 5,
      
      /// This relocation is only meaningful when the machine type is RISC-V.
      /// The base relocation applies to the high 20 bits of a 32-bit absolute address.
      // RISCV_HIGH20 = 5,
      
      /// Reserved, must be zero.
      RESERVED = 6,

      See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.

  • New addrspace keyword.

    Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing)
    Date: 2021-09-14

    test {
        const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined;
        _ = y;
    }
    
  • Inline switch prong not supported by grammar.y.

    Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases)
    Date: 2022-02-13

    const std = @import("std");
    const expect = std.testing.expect;
    
    const SliceTypeA = extern struct {
        len: usize,
        ptr: [*]u32,
    };
    const SliceTypeB = extern struct {
        ptr: [*]SliceTypeA,
        len: usize,
    };
    const AnySlice = union(enum) {
        a: SliceTypeA,
        b: SliceTypeB,
        c: []const u8,
        d: []AnySlice,
    };
    
    fn withSwitch(any: AnySlice) usize {
        return switch (any) {
            // With `inline else` the function is explicitly generated
            // as the desired switch and the compiler can check that
            // every possible case is handled.
            inline else => |slice| slice.len,
        };
    }
    
    test "inline else" {
        var any = AnySlice{ .c = "hello" };
        try expect(withSwitch(any) == 5);
    }
  • New packed struct syntax.

    Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs)
    Date: 2022-02-23

    pub const AbsolutePointerModeAttributes = packed struct(u32) {
        supports_alt_active: bool,
        supports_pressure_as_z: bool,
        _pad: u30 = 0,
    };
@Vexu Vexu closed this as completed in 28673ec Dec 26, 2022
@perillo
Copy link
Contributor Author

perillo commented Dec 26, 2022

@Vexu, the grammar still fails with some source file in the zig project.

Using the script https://gist.github.com/perillo/a1a731d234b425597a70412a7343b772, these are the errors:

compiler source

./parse.sh ../../zig/src/**/*.zig:

parsing 128 files...
FAIL: ../../zig/src/arch/wasm/CodeGen.zig: 1
<stdin>:661: syntax error
/// 

FAIL: ../../zig/src/link/SpirV.zig: 1
<stdin>:19: syntax error
// 

std source

./parse.sh ../../zig/lib/std/**/*.zig:

parsing 479 files...
FAIL: ../../zig/lib/std/coff.zig: 1
<stdin>:356: syntax error
    /// 

FAIL: ../../zig/lib/std/comptime_string_map.zig: 1
<stdin>:99: syntax error
    const KV = struct { [

FAIL: ../../zig/lib/std/fmt/parse_float/convert_hex.zig: 1
<stdin>:3: syntax error

FAIL: ../../zig/lib/std/http/method.zig: 1
<stdin>:4: syntax error
// 

FAIL: ../../zig/lib/std/meta.zig: 1
<stdin>:118: syntax error
            const EnumKV = struct { [

In the previous list I forgot to add the new tuple type declaration.

Thanks.

@Vexu
Copy link
Member

Vexu commented Dec 27, 2022

Done in 3aefd84

@perillo
Copy link
Contributor Author

perillo commented Dec 27, 2022

@Vexu, found more errors when checking the zig test suite:

Compiler tests

./check_parser.sh ../../zig/test/**/*.zig:

running 1456 tests...
FAIL: ../../zig/test/behavior/decltest.zig: zig: 0, grammar: 1
<stdin>:5: syntax error
test t
FAIL: ../../zig/test/behavior/tuple_declarations.zig: zig: 0, grammar: 1
<stdin>:13: syntax error
        const T = struct { comptime u32 a
FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/alignment_of_enum_field_specified.zig: zig: 0, grammar: 1
<stdin>:3: syntax error
    b a
FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/compile_errors/invalid_empty_unicode_escape.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    const a = '\u{}
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x1.0p1ab
FAIL: ../../zig/test/cases/compile_errors/invalid_exponent_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x1.0p50F
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-10.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-11.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-12.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1_x
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-13.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-14.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0x0.0_p
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1_.
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 0.0_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-5.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e+_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-6.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-_
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-7.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1.0e-1_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_float_literal-9.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: f128 = 1__
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-1.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 10_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-2.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0b0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-3.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0o0010_;
FAIL: ../../zig/test/cases/compile_errors/invalid_underscore_placement_in_int_literal-4.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    var bad: u128 = 0x0010_;
FAIL: ../../zig/test/cases/compile_errors/missing_digit_after_base.zig: zig: 0, grammar: 1
<stdin>:2: syntax error
    const x = @as(usize, -0x)
FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0

Most of the errors are invalid literals; however an interesting case is:

pub fn the_add_function(a: u32, b: u32) u32 {
    return a + b;
}

test the_add_function {
    if (the_add_function(1, 2) != 3) unreachable;
}

decltest was added in ziglang/zig@3bbe6a28e.

Thanks

@Vexu
Copy link
Member

Vexu commented Dec 27, 2022

Done in 11ed032, I'm not sure whether this grammar should allow valid numbers or not so I didn't touch them.

@perillo
Copy link
Contributor Author

perillo commented Dec 27, 2022

In addition to number literals, the zig parser seems to be able to do additional analysis compared to grammar.y, in order to handle ambiguous cases.

./check_parser.sh ../../zig/test/**/*.zig | grep "zig: 1":

FAIL: ../../zig/test/cases/compile_errors/Issue_6823_dont_allow_._to_be_followed_by_.zig: zig: 1, grammar: 0
    var sequence = "repeat".*** 10;

FAIL: ../../zig/test/cases/compile_errors/attempted_double_ampersand.zig: zig: 1, grammar: 0
    if (a && b) {
        return 1234;
    }

FAIL: ../../zig/test/cases/compile_errors/chained_comparison_operators.zig: zig: 1, grammar: 0
    return 1 < value < 1000

FAIL: ../../zig/test/cases/compile_errors/reject_extern_function_definitions_with_body.zig: zig: 1, grammar: 0
    extern "c" fn definitelyNotInLibC(a: i32, b: i32) i32 {
        return a + b;
    }

FAIL: ../../zig/test/cases/double_ampersand.0.zig: zig: 1, grammar: 0
    pub const a = if (true && false) 1 else 2;

@Vexu
Copy link
Member

Vexu commented Dec 27, 2022

Those shouldn't be in the grammar IMO.

@perillo
Copy link
Contributor Author

perillo commented Dec 27, 2022

Do you think they should be documented in the Zig Language Reference?

@Vexu
Copy link
Member

Vexu commented Dec 27, 2022

I don't think it's strictly necessary but if you want to mention it somewhere then go ahead.

@perillo
Copy link
Contributor Author

perillo commented Jan 9, 2023

@Vexu, in order to check the PEG parser I wrote a tool (in Go, since it was more convenient) that downloads up to 1000 repositories using the GitHub API, and check each .zig file with the check_parser.sh script.

For now I found these issues:

  • A Windows style line ending is not supported

    const std = @import("std")^M

    https://github.com/ziglibs/zig-lv2/blob/master/examples/fifths/fifths.zig

    check https://github.com/ziglibs/zig-lv2/blob/main/examples/fifths/fifths.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/346899503/examples/fifths/fifths.zig: zig: 0, grammar: 1
    <stdin>:1: syntax error
    const std = @import("std");^M
                               ^
    
  • Official grammar does not support the UTF-8 BOM

    https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig

    check https://github.com/MasterQ32/zero-graphics/blob/master/src/gl_es_2v0.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/368337113/src/gl_es_2v0.zig: zig: 0, grammar: 1
    <stdin>:1: syntax error
    <feff>//
    ^
    

    NOTE: <feff> is Vim bomb.

  • Alignment using tabs is not supported

    test {
        var window = try capy.Window.init();
        try window.set(capy.Column(.{}, .{
            capy.Label(.{ .text = "Balls with attraction and friction" }),
            capy.Label(.{ })
                    .bind("text", totalEnergyFormat),
                    capy.Align(.{}, &canvas),
        }));
    }

    There are two tabs before capy.Align(.{}, &canvas),

    https://github.com/capy-ui/capy/blob/master/examples/balls.zig

    check https://github.com/capy-ui/capy/blob/master/examples/balls.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/351056766/examples/balls.zig: zig: 0, grammar: 1
    <stdin>:58: syntax error
    		capy.Align(.{}, &canvas),
    ^
    
  • PEG parser crash

    https://github.com/marlersoft/zigwin32/blob/master/win32/everything.zig

    check marlersoft/zigwin32:everything.zig: error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/333504729/win32/everything.zig: zig: 0, grammar: 139
    

    The culprit is probably the source file size: 20.6 MB

@perillo
Copy link
Contributor Author

perillo commented Jan 9, 2023

Found other issues:

  • Not sure where is the problem

    pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) };

    https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig

    check https://github.com/paoda/zba/blob/main/src/core/bus/dma.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/464707106/src/core/bus/dma.zig: zig: 0, grammar: 1
    <stdin>:8: syntax error
    pub const DmaTuple = struct { DmaController(0), DmaController(1), DmaController(2), DmaController(3) };
                                               ^
    
  • Official grammar does not report an error for "binary operator xxx has whitespace on one side, but not the other"

    const a = 1 +2;

    https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig

    check https://github.com/wapc/wapc-guest-zig/blob/master/wapc.zig : error: exit status 1
    running 1 tests...
    FAIL: /data/cache/zig-package-index/207880856/wapc.zig: zig: 1, grammar: 0
    

    Not sure if this is responsibility of the PEG grammar.

Here is the full report: https://gist.github.com/perillo/c33e79e6d95efabaf302ef3c5d5907ad

@anatol
Copy link

anatol commented Jan 23, 2023

Thank you for taking care of the grammar. A few observations:

  • I see that ContainerDeclarations is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with *, like this?
diff --git a/grammar/grammar.y b/grammar/grammar.y
index ec08094..c5c47be 100644
--- a/grammar/grammar.y
+++ b/grammar/grammar.y
@@ -1,13 +1,12 @@
 Root <- skip container_doc_comment? ContainerMembers eof
 
 # *** Top level ***
-ContainerMembers <- ContainerDeclarations (ContainerField COMMA)* (ContainerField / ContainerDeclarations)
+ContainerMembers <- ContainerDeclaration* (ContainerField COMMA)* (ContainerField / ContainerDeclaration*)
 
-ContainerDeclarations
-    <- TestDecl ContainerDeclarations
-     / ComptimeDecl ContainerDeclarations
-     / doc_comment? KEYWORD_pub? Decl ContainerDeclarations
-     /
+ContainerDeclaration
+    <- TestDecl
+     / ComptimeDecl
+     / doc_comment? KEYWORD_pub? Decl
  • ascii_char_not_nl_slash_squote rule specifies a weird range (see multiple dashes) [\000-\011\013-\046-\050-\133\135-\177] - could it be clarified?

@McSinyx
Copy link
Contributor

McSinyx commented Apr 3, 2024

I see that ContainerDeclarations is declared with right-hand recursion and an additional empty rule. Would it be more readable to refactor the rule and replace it with *

Seconded. Also IMHO container_doc_comment? should be part of ContainerMembers so the container declaration expression needs not repeat it:

ContainerDeclAuto <- ContainerDeclType LBRACE container_doc_comment? ContainerMembers RBRACE

ascii_char_not_nl_slash_squote rule specifies a weird range (see multiple dashes) [\000-\011\013-\046-\050-\133\135-\177] - could it be clarified?

It skips \012 (newline), \047 (single quote) and \134 (backslash). Perhaps the name should be clarified that it's backslash instead of slash and sort them in order? BTW the dash in the middle was a typo and fixed in GH-52.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants