
Unipre字符
U+FEFF是字节顺序标记或BOM,用于区分大尾数UTF-16编码之间的区别。如果您使用正确的编解码器解码网页,Python会为您删除它。例子:
#!python2#coding: utf8u = u'ABC'e8 = u.enpre('utf-8') # enpre without BOMe8s = u.enpre('utf-8-sig') # enpre with BOMe16 = u.enpre('utf-16') # enpre with BOMe16le = u.enpre('utf-16le') # enpre without BOMe16be = u.enpre('utf-16be') # enpre without BOMprint 'utf-8 %r' % e8print 'utf-8-sig %r' % e8sprint 'utf-16 %r' % e16print 'utf-16le %r' % e16leprint 'utf-16be %r' % e16beprintprint 'utf-8 w/ BOM depred with utf-8 %r' % e8s.depre('utf-8')print 'utf-8 w/ BOM depred with utf-8-sig %r' % e8s.depre('utf-8-sig')print 'utf-16 w/ BOM depred with utf-16 %r' % e16.depre('utf-16')print 'utf-16 w/ BOM depred with utf-16le %r' % e16.depre('utf-16le')请注意,这
EF BB BF是UTF-8编码的BOM。对于UTF-8,它不是必需的,而仅用作签名(通常在Windows上)。
输出:
utf-8 'ABC'utf-8-sig 'xefxbbxbfABC'utf-16 'xffxfeAx00Bx00Cx00' # Adds BOM and enpres using native processor endian-ness.utf-16le 'Ax00Bx00Cx00'utf-16be 'x00Ax00Bx00C'utf-8 w/ BOM depred with utf-8 u'ufeffABC' # doesn't remove BOM if present.utf-8 w/ BOM depred with utf-8-sig u'ABC' # removes BOM if present.utf-16 w/ BOM depred with utf-16 u'ABC' # *requires* BOM to be present.utf-16 w/ BOM depred with utf-16le u'ufeffABC' # doesn't remove BOM if present.
请注意,
utf-16编解码器 要求 存在BOM表,否则Python将不知道数据是大端还是小端。
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)