u' ufeff'在Python字符串中

u' ufeff'在Python字符串中,第1张

u' ufeff'在Python字符串中

Unipre字符

U+FEFF
是字节顺序标记或BOM,用于区分大尾数UTF-16编码之间的区别。如果您使用正确的编解码器解码网页,Python会为您删除它。例子:

#!python2#coding: utf8u = u'ABC'e8 = u.enpre('utf-8')        # enpre without BOMe8s = u.enpre('utf-8-sig')   # enpre with BOMe16 = u.enpre('utf-16')      # enpre with BOMe16le = u.enpre('utf-16le')  # enpre without BOMe16be = u.enpre('utf-16be')  # enpre without BOMprint 'utf-8     %r' % e8print 'utf-8-sig %r' % e8sprint 'utf-16    %r' % e16print 'utf-16le  %r' % e16leprint 'utf-16be  %r' % e16beprintprint 'utf-8  w/ BOM depred with utf-8     %r' % e8s.depre('utf-8')print 'utf-8  w/ BOM depred with utf-8-sig %r' % e8s.depre('utf-8-sig')print 'utf-16 w/ BOM depred with utf-16    %r' % e16.depre('utf-16')print 'utf-16 w/ BOM depred with utf-16le  %r' % e16.depre('utf-16le')

请注意,这

EF BB BF
是UTF-8编码的BOM。对于UTF-8,它不是必需的,而仅用作签名(通常在Windows上)。

输出:

utf-8     'ABC'utf-8-sig 'xefxbbxbfABC'utf-16    'xffxfeAx00Bx00Cx00'    # Adds BOM and enpres using native processor endian-ness.utf-16le  'Ax00Bx00Cx00'utf-16be  'x00Ax00Bx00C'utf-8  w/ BOM depred with utf-8     u'ufeffABC'    # doesn't remove BOM if present.utf-8  w/ BOM depred with utf-8-sig u'ABC'          # removes BOM if present.utf-16 w/ BOM depred with utf-16    u'ABC'          # *requires* BOM to be present.utf-16 w/ BOM depred with utf-16le  u'ufeffABC'    # doesn't remove BOM if present.

请注意,

utf-16
编解码器 要求 存在BOM表,否则Python将不知道数据是大端还是小端。



欢迎分享,转载请注明来源:内存溢出

原文地址:https://54852.com/zaji/5630087.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2022-12-15
下一篇2022-12-15

发表评论

登录后才能评论

评论列表(0条)

    保存