Opened 13 years ago

Last modified 3 years ago

#16 new task

Create unicode proposal

Reported by: ijones Owned by:
Priority: normal Milestone:
Version: Keywords:
Cc: Meta Owner:
State: discussion Section: N/A or multiple
Related Tickets:

Description (last modified by autrijus@…)

Create a proposal based on unicode.

Current proposals:

See:

Change History (13)

comment:1 Changed 13 years ago by ijones

component: HaskellPrimeProposal

comment:2 Changed 13 years ago by ijones

Description: modified (diff)

comment:3 Changed 13 years ago by ijones

Owner: changed from ijones to autrijus@…

comment:4 Changed 13 years ago by john@…

topic: Syntax

comment:5 Changed 13 years ago by ijones

topic: SyntaxLexical Syntax

comment:6 Changed 13 years ago by freegoldbar yahoo com

I have not tried to write on a wiki before, so let me know if there is something I can improve. I write because of this:

"Revert to US-ASCII, Latin-1 or implementation-defined character sets." (from http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource)

Summary of argumentation against the reversing to ASCII proposal:

  • Unicode operators, greek, etc are preferable when using Haskell for mathematics.
  • Unicode in identifiers are preferable for non-English who do not need international collaboration and have the letters available on their keyboard.
  • Several major languages support Unicode internationalization of character sets, numbers, calendars and support non-English identifiers in source code, so should Haskell.
  • Automatic translation of source code is possible to a reasonable degree and has been implemented in spreadsheet languages.

The benefits of being allowed to use Unicode in source code include:

  • Being able to define operators and identifiers exactly matching the standard definitions in a domain such as mathematics have been proven beneficial over many years in tools like Mathematica. Using keyboard shortcuts and palettes with such symbols and operators in whatever editor makes entry easy.
  • Allow non-English students to focus on logic rather that on translating concepts into English. While students may have a rudimentary grasp of English, special terminology may be unknown to them.
  • Interoperate with Microsoft .Net and by extension Novell Mono where identifiers etc are allowed to be Unicode. (I'm not familiar with Sun Java but the same seem to be the case there).
  • While using English can be of benefit for international projects, there are many tasks where this is not a factor such as adhoc research calculations, in teaching and projects where all the programmers are non-English. Using Unicode does not prevent specific projects from choosing to restrict themselves to only ASCII characters, the reverse is not true.

Typical arguments against Unicode are:

  • English is 'better'. This seems to be propagated by people who only understand English and who has never been exposed to other cultures and thus feel threatened when confronted with something they do not understand. However it is generally better to be allowed to think and write in one's native tongue. It is difficult to debug/modify a program written in a human language that one does not understand, but this can be overcome to some degree by automatic translation as proposed below.
  • It is not portable. This can be solved by standardization as discussed elsewhere in the wiki article.
  • It may result in obfuscated code. True if misused, however using ASCII does not prevent anyone from obfuscating by naming variables a1, a2, a3, … or whatever else. Common sense makes this claim null and void. Naming a variable correctly in one's own language is preferable to an incorrect translation attempt into English.
  • It takes up more space. True but hardly relevant except in degenerate cases because of the ever increasing availability of memory. Even resource constrained embedded devices such as Windows CE exclusively support Unicode.
  • It is not common practice. Practically all leading user applications provide multi-language versions, some Office products even has multi-language scripting built-in. I can recall when it was common practice to use punch cards with their 80 characters per line restriction, it is my understanding that Haskell is leading edge rather than legacy entrenched.

Note that there is a difference between supporting internationalization (writing one application typically in English that supports many languages/cultures in its GUI) and national language support where an application is simply written in a non-English language. While using \xUUUU can be acceptable for an internationalized application that needs only a few special symbols and everything else is in external resource files - it really isn't acceptable in a non-English application if pressing the non-English keys on the keyboard when defining an identifier or typing in a string causes the compiler to fail.

It is sad to see that most Haskell compilers and interpreters does not implement the Haskell language with regard to Unicode (perhaps except JHC but I haven't succeeded in getting it to compile on Windows yet - my own fault no doubt). It is even more sad to see that this affects applications such as Pivotal, which would be a truly wonderful teaching and research aid with Unicode support and user-provided translations into major languages.

My experience is from South-East Asia where written languages are markedly different from English and European languages. The alfabets are different, for example in Khmer there is no space between words, spaces are used somewhat like commas in English, even the symbol for full stop "." is different. The symbols for numbers are different (though English numbers are widely understood) and in Thailand the official calendar is Thai-Buddhist (current year 2549) not Christian. These aspects are supported in Microsoft .Net language applications consequently Microsoft .Net is being widely taught here now! I understand the situation to be much the same in Arabic countries and then there is China and Japan…

I would like to see things taken one step further by having compilers support translation tables for keywords, system defined function names, and even external user defined identifiers in libraries, so automatic translation of source code would be possible, but that would propably be outside the scope of Haskell'. It would however go a long way in solving the problem of debugging an appplication written in a human language that one does not understand. This can be implemented in an IDE or with a preprocessor, but then error messages would refer to the English identifiers, so compiler support is preferable.

As for myself I will continue to use Mathematica where I can define my operators and symbols exactly like they are in the literature which I consider to be in the spirit of the literate programming style. But I continue to keep an eye on Haskell and Pivotal hoping it will support Unicode one day, I have done that since Gofer.

I hope this is enough of argumentation to make it clear that while it is easier to implement a compiler in ASCII only there are benefits to using Unicode for non-English speakers also when programming.

comment:7 Changed 13 years ago by autrijus@…

Status: newassigned

comment:8 Changed 13 years ago by autrijus@…

Description: modified (diff)

comment:9 Changed 13 years ago by

Maybe obvious, but it might be an idea to explicitly state that to be Haskell compatible a Haskell implementation has to allow overloading of mathematical operators i.e. unicode characters 0x2200 to 0x22ff. As defined here: http://www.unicode.org/charts/PDF/U2200.pdf. This would avoid some glorious Haskell implementation from claiming it "… supports the entire Haskell 98 language …" when Unicode is not implemented.

It could be also be an idea to have equivalent ASCII names for unicode operators for those who for whatever reason might prefer not to use Unicode. This could be the all capital Unicode name with underscores instead of spaces. That would be simple and somewhat compatible with the choice made in the Fortress language specification, see: http://research.sun.com/projects/plrg/fortress0866.pdf

comment:10 Changed 10 years ago by (none)

Milestone: Scope Defined

Milestone Scope Defined deleted

comment:11 Changed 3 years ago by Herbert Valerio Riedel

Milestone:

moving non-milestoned many year old legacy tickets out of the way

comment:12 Changed 3 years ago by Herbert Valerio Riedel

Priority: majornormal

Set default priority (as this confuses Trac otherwise)

comment:13 Changed 3 years ago by Herbert Valerio Riedel

Owner: autrijus@… deleted
Status: assignednew

remove owners from legacy tickets

Note: See TracTickets for help on using tickets.