u' ufeff'在Python字符串中_随笔

u' ufeff'在Python字符串中

Unipre字符

U+FEFF

是字节顺序标记或BOM，用于区分大尾数UTF-16编码之间的区别。如果您使用正确的编解码器解码网页，Python会为您删除它。例子：

#!python2#coding: utf8u = u'ABC'e8 = u.enpre('utf-8')        # enpre without BOMe8s = u.enpre('utf-8-sig')   # enpre with BOMe16 = u.enpre('utf-16')      # enpre with BOMe16le = u.enpre('utf-16le')  # enpre without BOMe16be = u.enpre('utf-16be')  # enpre without BOMprint 'utf-8     %r' % e8print 'utf-8-sig %r' % e8sprint 'utf-16    %r' % e16print 'utf-16le  %r' % e16leprint 'utf-16be  %r' % e16beprintprint 'utf-8  w/ BOM depred with utf-8     %r' % e8s.depre('utf-8')print 'utf-8  w/ BOM depred with utf-8-sig %r' % e8s.depre('utf-8-sig')print 'utf-16 w/ BOM depred with utf-16    %r' % e16.depre('utf-16')print 'utf-16 w/ BOM depred with utf-16le  %r' % e16.depre('utf-16le')

请注意，这

EF BB BF

是UTF-8编码的BOM。对于UTF-8，它不是必需的，而仅用作签名（通常在Windows上）。

输出：

utf-8     'ABC'utf-8-sig 'xefxbbxbfABC'utf-16    'xffxfeAx00Bx00Cx00'    # Adds BOM and enpres using native processor endian-ness.utf-16le  'Ax00Bx00Cx00'utf-16be  'x00Ax00Bx00C'utf-8  w/ BOM depred with utf-8     u'ufeffABC'    # doesn't remove BOM if present.utf-8  w/ BOM depred with utf-8-sig u'ABC'          # removes BOM if present.utf-16 w/ BOM depred with utf-16    u'ABC'          # *requires* BOM to be present.utf-16 w/ BOM depred with utf-16le  u'ufeffABC'    # doesn't remove BOM if present.

请注意，

utf-16

编解码器要求存在BOM表，否则Python将不知道数据是大端还是小端。

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/zaji/5630087.html

u' ufeff'在Python字符串中

发表评论

评论列表（0条）

u&#039; ufeff&#039;在Python字符串中

发表评论

评论列表（0条）

u' ufeff'在Python字符串中