perl – 在Windows上使用BOM和CRLF行分隔符创建UTF-16LE_语言综合

概述我需要在 Windows 7机器上生成一些带有CRLF行分隔符的UTF-16LE编码文件. (目前有草莓5.20.1) 在获得正确的输出之前我需要很长时间才能搞清楚,我想知道我的解决方案是否是正确的方法,因为它似乎在Perl的其他语言方面过于复杂.特别是： >为什么Perl使用正确的BOM编码(UTF-16)制作有效的UTF-16大端,而如果我使用UTF-16LE或UTF-16BE而不使用额外的包我需要在 Windows 7机器上生成一些带有CRLF行分隔符的UTF-16LE编码文件. (目前有草莓5.20.1)

在获得正确的输出之前我需要很长时间才能搞清楚,我想知道我的解决方案是否是正确的方法,因为它似乎在Perl的其他语言方面过于复杂.特别是：

>为什么Perl使用正确的BOM编码(UTF-16)制作有效的UTF-16大端,而如果我使用UTF-16LE或UTF-16BE而不使用额外的包file :: BOM,则没有BOM？
>为什么开箱即用的CRLF处理似乎有问题(它输出为0D 0A 00而不是0D 00 0A 00)没有过滤器的一些麻烦？我怀疑这对于拥有这么多用户的语言来说可能是一个真正的错误……

以下是我的评论尝试,我发现正确的是最后的陈述

use strict;use warnings;use utf8;use file::BOM;use feature 'say';my $UTF;my $data = "Hello,héhé,中文.\nsecond line : my 2€"; # 中文 = zhong wen = chinese# UTF16 BE + BOM but incorrect CRLF: "0D 0A 00" instead of "0D 00 0A 00"open $UTF,">:enCoding(UTF-16)","utf-16-std-be.txt" or dIE $!;say $UTF $data;close $UTF;# same as UTF-16BE (no BOM,incorrect CRLF)open $UTF,">:enCoding(ucs2)","utf-ucs2.txt" or dIE $!;say $UTF $data;close $UTF;# UTF16 BE,no BOM,incorrect CRLFopen $UTF,">:enCoding(UTF-16BE)","utf-16-be-nobom.txt" or dIE $!;say $UTF $data;close $UTF;# UTF16 LE,">:enCoding(UTF-16LE)","utf-16-le-nobom-wrongcrlf.txt" or dIE $!;say $UTF $data;close $UTF;# UTF16 LE,BOM OK but still incorrect CRLFopen $UTF,">:enCoding(UTF-16LE):via(file::BOM)","utf-16-le-bom-wrongcrlf.txt" or dIE $!;say $UTF $data;close $UTF;# UTF16 LE non raw incorrect # (crlf by default on windows) -> 0A => 0D 0Aopen $UTF,"utf-16-le-bom-wrongcrlf2.txt" or dIE $!;print $UTF $data,"\x0a"; # 0A is magically expanded to 0D 0A but wrongclose $UTF;# UTF16 LE + BOM + LF # raw -> 0A => 0A# Could be correct on UNIX but I need CRLFopen $UTF,">raw::enCoding(UTF-16LE):via(file::BOM)","utf-16-le-bom-wrongcrlf3.txt" or dIE $!;say $UTF $data;close $UTF;# manual BOM,but CRLF OKopen $UTF,">:raw:enCoding(UTF-16LE):crlf","utf-16-le-bommanual-crlfok.txt" or dIE $!;print $UTF "\x{FEFF}";say $UTF $data;close $UTF;#auto BOM,CRLF OK ?#incorrect,says utf8 "\xA9" does not map to Unicode at c:/perl/DWimperl-5.14/perl/lib/Encode.pm line 176.# But I cannot see where the A9 comes from ??!#~ open $UTF,">:raw:enCoding(UTF-16LE):via(file::BOM):crlf","utf-16-le-autobom-crlfok1.txt" or dIE $!;#~ print $UTF $data;#~ say $UTF $data;#~ close $UTF;# WTF? \n becomes 0D 00 0D 0A 00open $UTF,">:enCoding(UTF-16LE):crlf:via(file::BOM)","utf-16-le-autobom-crlf2.txt" or dIE $!;say $UTF $data;close $UTF;#CORRECT WAY?? : automatic BOM,CRLF is OKopen $UTF,">:raw:enCoding(UTF-16LE):crlf:via(file::BOM)","utf-16-le-autobom-crlfok3.txt" or dIE $!;say $UTF $data;close $UTF;

解决方法

manual BOM,but CRLF OK

是的,以下确实是正确的：

:raw:enCoding(UTF-16LE):crlf + manual BOM

>：raw“清除”现有的：crlf和：编码层.
>：编码在字节和代码点之间转换.
>：crlf在CRLF和LF之间转换.

所以,

Read        ===================================================>                               Code                 Code+------+   bytes   +------+   Points   +-------+   Points   +------+| file |-----------| :enc |------------| :crlf |------------| Code |+------+           +------+    CRLF    +-------+     LF     +------+         <===================================================                               Write

您希望对代码点(而不是字节)执行CRLF⇔LF转换,就像使用此设置一样.

CORRECT WAY?? : automatic BOM,CRLF is OK

while：raw：enCoding(UTF-16LE)：crlf：via(file :: BOM)可能适用于写句柄,看起来不对(我原以为：raw：via(file :: BOM,UTF-) 16LE)：crlf),它对于一个读取句柄来说是悲惨的(至少对我来说是Perl 5.16.3).

我只是看了,背后的代码：via(file :: BOM)做了一些非常值得怀疑的事情.我不会用它.

why Perl is making a valID UTF-16 big-endian with correct BOM with enCoding(UTF-16) while there is no BOM if I use either UTF-16LE or UTF-16BE without using an additional package file::BOM

因为您可能不需要BOM.