Changes between Version 1 and Version 2 of UnicodeInHaskellSource

Jan 25, 2006 2:10:27 PM (12 years ago)
Simon Marlow


  • UnicodeInHaskellSource

    v1 v2  
    33The Haskell 98 Report ([ Lexical Structure]) claims that Haskell source code uses the [wiki:Unicode] character set.
     5== Current support for Unicode in source files ==
    47Haskell source code is stored in text files using various character sets and encodings.
    5 If Unicode were allowed, how would implementations know which encoding was used?
    69 * Jhc allows unrestricted use of the Unicode character set in Haskell source, treating input as UTF-8. Several uses of Unicode characters in place of Haskell keywords are permitted:
    710    * '→' ('\x2192') is equivalent to '->'
    1417  In addition there is experimental support for defining new operators and names using various Unicode characters.
    1518 * Hugs treats input as being in the encoding specified by the current locale, but permits Unicode only in comments and character and string literals.
     19 * GHC now (as of early Jan 2006) interprets source files as UTF-8.  In {{{-fglasgow-exts}}} mode the above special symbols are interpreted as in JHC, and additionally the lambda symbol 'λ' is interpreted as lambda.  GHC knows about the characters classifications of all unicode characters via the Data.Char library, and can therefore understand identifiers written using alphanumeric characters from any language (but see below for note about caseless character sets).
    1620 * Others treat source code as ISO 8858-1 (Latin-1).
    18 Some things we could do:
     22== Problems with Unicode in Haskell 98 ==
     24There are plenty of Unicode alphabetic characters which are neither upper, lower, or title case, and hence are not allowed in identifiers.  Some languages have no notion of case at all.  Since Haskell's syntax relies on case for distinguishing constructors and variables, what should our position be with respect to caseless character sets?
     26The report should at least be absolutely clear about which Unicode character properties (N, Ll, Lu, Sm, etc.) correspond to which lexical class in the syntax.
     28== Some things we could do ==
    1929 * Revert to US-ASCII, Latin-1 or implementation-defined character sets.
    2030 * Allow Unicode with the encoding specified outside source files (e.g. by the current locale, as currently done by Hugs). This would make Haskell source containing non-ASCII characters non-portable.