Version 12 (modified by simonmar@…, 7 years ago) (diff) 

Proposal: FixityResolution
Ticket  #30 
Dependencies  
Related  NegativeSyntax 
Compiler support
GHC  full (no flag) 
nhc98  full (no flag) 
Hugs  full (no flag) 
UHC  full (no flag) 
JHC  full (no flag) 
LHC  full (no flag) 
Summary
Change the way fixity resolution is specified in the Report. Remove fixity resolution from the contextfree grammar, and specify it separately. This does change the syntax slightly by eliminating some impossibletoparsecorrectly cases, but in practice all existing compilers already implement the proposed syntax, not the current syntax.
Description
This is a proposal that doesn't affect anything except the presentation of the report, because all Haskell implementations currently do it this way anyway. There are some legal Haskell programs, according to the report, that aren't accepted by current implementations (see below). I propose we make the language specification match the implementations.
The Problem
The Haskell 98 contextfree syntax includes fixity resolution as part of the grammar, with a fixed number of fixities ([1..9]), essentially using macro expansion to define the grammar. Apart from being really ugly, and not corresponding to any known implementation (long ago GHC used a yacc parser formulated like this, but not any more), this gives rise to some constructions that are very hard to parse correctly. For example, the expression:
\x > x == x == True
should be legal, and parse as (\x > x == x) == True. The reason is that == is nonfix (assuming the == in scope is from the Prelude), and this combined with the rule that says "lambda expressions extend as far to the right as possible", forces the lambda expression to end before the second ==.
No known Haskell compiler actually parses this correctly. Admittedly it's a type error, but that's not the point  the grammar should be parsable.
Furthermore:
do a == b == c
This is legal syntax, and parses as do { a == b } == c, again because == is nonfix, and the second == closes the layout context by virtue of the layout rule's parse error condition.
As far as I know, no Haskell compiler gets this right.
The Solution
These constructions rely on the parser being aware of operator fixities during parsing, which is an unreasonable restriction  it's often easier to follow imports after parsing a file, rather than during it.
Hence, I propose that we remove operator fixity resolution from the contextfree grammar, and specify it separately. The contextfree grammar would parse applications of infix operators as a flat list to be resolved separately. The examples above would parse but fail during fixity resolution.
This would also let us expand the current paultry set of 9 fixity levels to an arbitrary limit, if we so wished (but that introduces backwards compatibility issues).
Sample code for resolving fixities is here: resolve.hs. I believe this implements Haskell 98 fixity resolution, without prefix negation (prefix negation could be added, but we might not need to; see NegativeSyntax). The core of the parser is 12 lines of code, with a few lines of datatype declarations and prettyprinting. Note that there is no upper limit on precedence levels, but there is a lower limit of zero. This code could serve as the basis for specifying fixity resolution in the Haskell' report.
References
Report Delta
Section 3 (Expressions)
remove
In the syntax that follows, there are some families of nonterminals indexed by precedence levels (written as a superscript). Similarly, the nonterminals op, varop, and conop may have a double index: a letter l, r, or n for left, right or nonassociativity and a precedence level. A precedencelevel variable i ranges from 0 to 9; an associativity variable a varies over {l, r, n}. For example
aexp > ( expi+1 qop(a,i) )
actually stands for 30 productions, with 10 substitutions for i and 3 for a.
replace
exp > exp0 :: [context =>] type (expression type signature)  exp0 expi > expi+1 [qop(n,i) expi+1]  lexpi  rexpi lexpi > (lexpi  expi+1) qop(l,i) expi+1 lexp6 >  exp7 rexpi > expi+1 qop(r,i) (rexpi  expi+1)
with
exp > infixexp :: [context =>] type (expression type signature)  infixexp infixexp > exp10 qop infixexp   infixexp  exp10
replace
 ( expi+1 qop(a,i) ) (left section)  ( lexpi qop(l,i) ) (left section)  ( qop(a,i)<> expi+1 ) (right section)  ( qop(r,i)<> rexpi ) (right section)
with
 ( infixexp qop ) (left section)  ( qop<> infixexp ) (right section)
replace
Expressions involving infix operators are disambiguated by the operator's fixity (see Section 4.4.2). Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. Given an unparenthesized expression "x qop(a,i) y qop(b,j) z", parentheses must be added around either "x qop(a,i) y" or "y qop(b,j) z" when i=j unless a=b=l or a=b=r.
with
Expressions involving infix operators are disambiguated by the operator's fixity (see Section 4.4.2). Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. Given an unparenthesized expression "x qop(a,i) y qop(b,j) z" (where "qop(a,i)" means an operator with associativity "a" and precedence "i"), parentheses must be added around either "x qop(a,i) y" or "y qop(b,j) z" when i=j unless a=b=l or a=b=r.
An example algorithm for resolving expressions involving infix operators is given in Section 9.6.
remove
A note about parsing. Expressions that involve the interaction of fixities with the let/lambda metarule may be hard to parse. For example, the expression
let x = True in x == x == True
cannot possibly mean
let x = True in (x == x == True)
because (==) is a nonassociative operator; so the expression must parse thus:
(let x = True in (x == x)) == True
However, implementations may well use a postparsing pass to deal with fixities, so they may well incorrectly deliver the former parse. Programmers are advised to avoid constructs whose parsing involves an interaction of (lack of) associativity with the let/lambda metarule.
replace
For the sake of clarity, the rest of this section shows the syntax of expressions without their precedences.
with
For the sake of clarity, the rest of this section will assume that expressions involving infix operators have been resolved according to the fixities of the operators.
Section 3.5 (Sections)
replace the grammar, as above. The text is still correct.
Section 9 (Syntax reference)
Make the corresponding changes as for Section 3.
Section 9.6, Resolving expressions involving infix operators
New subsection
The following is an example implementation of fixity resolution for Haskell expressions. The function @resolve@ takes a list consisting of alternating expressions and operators (an instance of the @infixexp@ nonterminal in the contextfree grammar), and returns either @Just e@ where @e@ is the resolved expression, or @Nothing@ if the input does not represent a valid expression.
data Op = Op String Prec Fixity deriving Eq data Fixity = Leftfix  Rightfix  Nonfix deriving Eq type Prec = Int data Exp = Var Var  OpApp Exp Op Exp  Neg Exp deriving Eq type Var = String data Tok = TExp Exp  TOp Op  TNeg resolve :: [Tok] > Maybe Exp resolve tokens = fmap fst $ parseNeg (Op "" (1) Nonfix) tokens where parseNeg :: Op > [Tok] > Maybe (Exp,[Tok]) parseNeg op1 (TExp e1 : rest) = parse op1 e1 rest parseNeg op1 (TNeg : rest) = do guard (prec1 < 6) (r, rest') < parseNeg (Op "" 6 Leftfix) rest parse op1 (Neg r) rest' where Op _ prec1 fix1 = op1 parse :: Op > Exp > [Tok] > Maybe (Exp, [Tok]) parse _ e1 [] = Just (e1, []) parse op1 e1 (TOp op2 : rest)  case (1): check for illegal expressions  prec1 == prec2 && (fix1 /= fix2  fix1 == Nonfix) = Nothing  case (2): op1 and op2 should associate to the left  prec1 > prec2  (prec1 == prec2 && fix1 == Leftfix) = Just (e1, TOp op2 : rest)  case (3): op1 and op2 should associate to the right  otherwise = do (r,rest') < parseNeg op2 rest parse op1 (OpApp e1 op2 r) rest' where Op _ prec1 fix1 = op1 Op _ prec2 fix2 = op2
The algorithm works as follows. At each stage we have a call
parse op1 E1 (op2 : tokens)
which means that we are looking at an expression like
E0 `op1` E1 `op2` ... (*1)
(the caller holds E0). The job of @parse@ is to build the expression to the right of @op1@, returning the expression and any remaining input.
There are three cases to consider:
(1) if @op1@ and @op2@ have the same precedence, but they do not have the same associativity, or they are declared to be nonfix, then the expression is illegal.
(2) If @op1@ has a higher precedence than @op2@, or @op1@ and @op2@ should leftassociate, then we know that the expression to the right of @op1@ is @E1@, so we return this to the caller.
(3) Otherwise, we know we want to build an expression of the form @E1 op2 R@. To find R, we call @parseNeg op2 tokens@ to compute the expression to the right of @op2@, namely @R@ (more about @parseNeg@ below, but essentially if @tokens@ is of the form @(E2 : rest)@, then this is equivalent to @parse op2 E2 rest@). Now, we have
E0 op1 (E1 op2 R) op3 …
where @op3@ is the next operator in the input. This is an instance of (*1) above, so to continue we call parse, with the new E1 == (E1 op2 R)
To initialise the algorithm, we set @op1@ to be an imaginary operator with precedence lower than anyything else. Hence @parse@ will consume the whole input, and return the resulting expression.
The handling of the prefix negation operator, @@, complicates matters only slightly. Recall that prefix negation has the same fixity as infix negation: leftassociative with precedence 6. The operator to the left of @@, if there is one, must have precedence lower than 6 for the expression to be legal. The negation operator itself may leftassociate with operators of the same fixity (e.g. @+@). So for example @a + b@ is legal and resolves as @(a) + b), but @a + b@ is illegal.
The function @parseNeg@ handles prefix negation. If we encounter a negation operator, and it is legal in this position (the operator to the left has precedence lower than 6), then we proceed in a similar way to case (3) above: compute the argument to '' by recursively calling @parseNeg@, and then continue by calling @parse@.
Note that this algorithm is insensitive to the range of precedences. There is no reason in principle that Haskell should be limited to integral precedences in the range 1 to 10; a larger range, or fractional values, would present no additional implementation difficulties.
Attachments (2)

resolve.hs
(1.4 KB) 
added by simonmar@… 9 years ago.
sample code for resolving fixity

resolve.2.hs
(4.9 KB) 
added by simonmar@… 7 years ago.
new version of the resolver, matching the report delta, with some test code
Download all attachments as: .zip