[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Syntax error messages
From: |
Christian Schoenebeck |
Subject: |
Re: Syntax error messages |
Date: |
Fri, 01 Oct 2021 23:30:39 +0200 |
On Freitag, 1. Oktober 2021 09:37:52 CEST Hans Åberg wrote:
> > On 28 Sep 2021, at 14:10, Christian Schoenebeck
> > <schoenebeck@crudebyte.com> wrote:>
> > On Montag, 27. September 2021 22:07:33 CEST Hans Åberg wrote:
> >>>> In order to generate better syntax error messages writing out the input
> >>>> line with the error and a line with a marker underneath, I thought of
> >>>> checking how Bison does it, but I could not find the place in its
> >>>> sources. —Specifically, a suggestion is to tweak YY_INPUT in the lexer
> >>>> to buffer one input line at a time, but Bison does not seem to do
> >>>> that.>
> >>>
> >>> No, I keep track of the byte offset in the file, and print from the
> >>> file,
> >>> which I reopen to quote the source.
> >>
> >> OK. I thought of this method, but then it does not work with streams.
> >
> > In the past at least, builtin location support did not work well for me.
> > So
> > I'm usually overriding location data type and behaviour with custom type
> > declaration, plus implementation on lexer side.
> >
> > I also prefer this data type presentation:
> >
> > // custom Bison location type to support raw byte positions
> > struct _YYLTYPE {
> >
> > int first_line;
> > int first_column;
> > int last_line;
> > int last_column;
> > int first_byte;
> > int length_bytes;
> >
> > };
> > #define YYLTYPE _YYLTYPE
> > #define YYLTYPE_IS_DECLARED 1
> >
> > // override Bison's default location passing to support raw byte positions
> > #define YYLLOC_DEFAULT(Cur, Rhs, N) \
> > do \
> >
> > if (N) \
> >
> > { \
> >
> > (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \
> > (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \
> > (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \
> > (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \
> > (Cur).first_byte = YYRHSLOC(Rhs, 1).first_byte; \
> > (Cur).length_bytes = (YYRHSLOC(Rhs, N).first_byte - \
> >
> > YYRHSLOC(Rhs, 1).first_byte) + \
> > YYRHSLOC(Rhs, N).length_bytes; \
> >
> > } \
> >
> > else \
> >
> > { \
> >
> > (Cur).first_line = (Cur).last_line = \
> >
> > YYRHSLOC(Rhs, 0).last_line; \
> >
> > (Cur).first_column = (Cur).last_column = \
> >
> > YYRHSLOC(Rhs, 0).last_column; \
> >
> > (Cur).first_byte = YYRHSLOC(Rhs, 0).first_byte; \
> > (Cur).length_bytes = YYRHSLOC(Rhs, 0).length_bytes; \
> >
> > } \
> >
> > while (0)
> >
> > Because sometimes you need high level column & line span, and sometimes
> > you
> > rather need low level raw byte position & byte length in the input data
> > stream.
>
> For the purpose of writing out the line in the error messages, this method
> (using C++) did not work out well, because I have two parsers, one for the
> language and one for directives, and it turns out to be difficult to pass
> the location information back to the top parser.
>
> So instead, in addition to the input stream stack, I added two, for the
> current stream position, and the current stream line position. Because of
> the lexer buffering, they are computed in the lexer. These are properties
> attached to the input streams then, not the parser locations.
>
> In the Bison type, I use line number and for columns the number of UTF-8
> characters. An ASCII caret marking the error is surprisingly accurate even
> in the presence of non-ASCII characters. But perhaps one should have a
> method to mark it on the line itself, not underneath.
Hmm, those two parsers run independently from each other, or do you rather
mean you have coupled them in a way that they cross-influence their behaviour
*while* they are still running?
So far I have not encountered any restriction with my location approach. I'm
using it for all kinds of things like, of course warnings/errors on the CLI,
highlighting of the same in code editors, but also for code refactoring stuff.
The latter only works well with a full language aware parser, unlike those
typical RegEx hacks.
Best regards,
Christian Schoenebeck