I think python should by default support UCS-4

I think on the platforms that support enough bits python should be compiled with support for all Unicode codepoints. This of course waste a bit of memory, and because of that have impact on I/O etc, but the amount of impact is I think worth it.

I downloaded Python 3.0.1 and run configure with –with-wide-unicode. That did the trick (solved the problems I described here and here). More good information about how to install and have parallell versions of Python can for example be found at Farm Development.

At least the MacOSX build should be for UCS-4.


$ /usr/local/bin/python
Python 3.0.1 (r301:69556, Apr  6 2009, 20:51:21)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> a=chr(0x01D400)
>>> len(a)
1
>>> import unicodedata
>>> unicodedata.name(a)
'MATHEMATICAL BOLD CAPITAL A'
>>> unicodedata.name(a)
'MATHEMATICAL BOLD CAPITAL A'
>>> b=unicodedata.normalize('NFKC',a)
>>> hex(ord(b))
'0x41'
>>>