包详细信息

iconv-lite

pillarjs572.2mMIT0.7.1

Convert character encodings in pure javascript.

iconv, convert, charset, icu

自述文件

iconv-lite: Pure JS character encoding conversion

NPM Version NPM Downloads License NPM Install Size

  • No need for native code compilation. Quick to install, works on Windows, Web, and in sandboxed environments.
  • Used in popular projects like Express.js (body_parser), Grunt, Nodemailer, Yeoman and others.
  • Faster than node-iconv (see below for performance comparison).
  • Intuitive encode/decode API, including Streaming support.
  • In-browser usage via browserify or webpack (~180kb gzip compressed with Buffer shim included).
  • Typescript type definition file included.
  • React Native is supported (need to install stream module to enable Streaming API).

Usage

Basic API

var iconv = require('iconv-lite');

// Convert from an encoded buffer to a js string.
str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');

// Convert from a js string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');

// Check if encoding is supported
iconv.encodingExists("us-ascii")

Streaming API

// Decode stream (from binary data stream to js strings)
http.createServer(function(req, res) {
    var converterStream = iconv.decodeStream('win1251');
    req.pipe(converterStream);

    converterStream.on('data', function(str) {
        console.log(str); // Do something with decoded strings, chunk-by-chunk.
    });
});

// Convert encoding streaming example
fs.createReadStream('file-in-win1251.txt')
    .pipe(iconv.decodeStream('win1251'))
    .pipe(iconv.encodeStream('ucs2'))
    .pipe(fs.createWriteStream('file-in-ucs2.txt'));

// Sugar: all encode/decode streams have .collect(cb) method to accumulate data.
http.createServer(function(req, res) {
    req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
        assert(typeof body == 'string');
        console.log(body); // full request body string
    });
});

Supported encodings

  • All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
  • Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
  • All widespread singlebyte encodings: Windows 125x family, ISO-8859 family, IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library. Aliases like 'latin1', 'us-ascii' also supported.
  • All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.

See all supported encodings on wiki.

Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!

Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!

Encoding/decoding speed

Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0). Note: your results may vary, so please always check on your hardware.

operation             iconv@2.1.4   iconv-lite@0.4.7
----------------------------------------------------------
encode('win1251')     ~96 Mb/s      ~320 Mb/s
decode('win1251')     ~95 Mb/s      ~246 Mb/s

BOM handling

  • Decoding: BOM is stripped by default, unless overridden by passing stripBOM: false in options (f.ex. iconv.decode(buf, enc, {stripBOM: false})). A callback might also be given as a stripBOM parameter - it'll be called if BOM character was actually found.
  • If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
  • Encoding: No BOM added, unless overridden by addBOM: true option.

UTF-16 Encodings

This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be smart about endianness in the following ways:

  • Decoding: uses BOM and 'spaces heuristic' to determine input endianness. Default is UTF-16LE, but can be overridden with defaultEncoding: 'utf-16be' option. Strips BOM unless stripBOM: false.
  • Encoding: uses UTF-16LE and writes BOM by default. Use addBOM: false to override.

UTF-32 Encodings

This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and 'spaces heuristics' to determine input endianness.

  • The default of UTF-32LE can be overridden with the defaultEncoding: 'utf-32be' option. Strips BOM unless stripBOM: false.
  • Encoding: uses UTF-32LE and writes BOM by default. Use addBOM: false to override. (defaultEncoding: 'utf-32be' can also be used here to change encoding.)

Other notes

When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.
Untranslatable characters are set to � or ?. No transliteration is currently supported.
Node versions 0.10.31 and 0.11.13 are buggy, don't use them (see #65, #77).

Testing

git clone git@github.com:ashtuchkin/iconv-lite.git
cd iconv-lite
npm install
npm test

# To view performance:
npm run test:performance

# To view test coverage: 
npm run test:cov
open coverage/index.html

更新日志

0.7.1

🚀 Improvements

0.7.0

🐞 Bug fixes

  • Handle split surrogate pairs when encoding utf8 - by @yosion-p and @ashtuchkin in #282:

    Handle a case where streaming utf8 encoder (converting js strings -> buffers) encounters surrogate pairs split between chunks (last character of one chunk is high surrogate and first character of the next chunk is a low surrogate).

  • Avoid false positives in encodingExists by using objects without a prototype - by @bjohansebas in #328

    The encodingExists method could return incorrect results if the lookup matched properties inherited from the prototype of the object that stores the encodings, such as constructor and others. This change replaces that object with one that has no prototype, ensuring that only explicitly defined valid encodings in the library are considered. In addition, the fix is applied to the internal cache system to avoid the same kind of false positives

🚀 Improvements

  • Make explicit that decode() method supports Uint8Array input - by @jardicc in #271
  • Remove compatibility check for StringDecoder.end method - by @bjohansebas in #331

0.6.3 / 2021-05-23

  • Fix HKSCS encoding to prefer Big5 codes if both Big5 and HKSCS codes are possible (#264)

0.6.2 / 2020-07-08

  • Support Uint8Array-s decoding without conversion to Buffers, plus fix an edge case.

0.6.1 / 2020-06-28

  • Support Uint8Array-s directly when decoding (#246, by @gyzerok)
  • Unify package.json version ranges to be strictly semver-compatible (#241)
  • Fix minor issue in UTF-32 decoder's endianness detection code.

0.6.0 / 2020-06-08

  • Updated 'gb18030' encoding to :2005 edition (see https://github.com/whatwg/encoding/issues/22).
  • Removed iconv.extendNodeEncodings() mechanism. It was deprecated 5 years ago and didn't work in recent Node versions.
  • Reworked Streaming API behavior in browser environments to fix #204. Streaming API will be excluded by default in browser packs, saving ~100Kb bundle size, unless enabled explicitly using iconv.enableStreamingAPI(require('stream')).
  • Updates to development environment & tests:
    • Added ./test/webpack private package to test complex new use cases that need custom environment. It's tested as a separate job in Travis CI.
    • Updated generation code for the new EUC-KR index file format from Encoding Standard.
    • Removed Buffer() constructor in tests (#197 by @gabrielschulhof).

0.5.2 / 2020-06-08

  • Added iconv.getEncoder() and iconv.getDecoder() methods to typescript definitions (#229).
  • Fixed semver version to 6.1.2 to support Node 8.x (by @tanandara).
  • Capped iconv version to 2.x as 3.x has dropped support for older Node versions.
  • Switched from instanbul to c8 for code coverage.

0.5.1 / 2020-01-18

  • Added cp720 encoding (#221, by @kr-deps)
  • (minor) Changed Changelog.md formatting to use h2.

0.5.0 / 2019-06-26

  • Added UTF-32 encoding, both little-endian and big-endian variants (UTF-32LE, UTF32-BE). If endianness is not provided for decoding, it's deduced automatically from the stream using a heuristic similar to what we use in UTF-16. (great work in #216 by @kshetline)
  • Several minor updates to README (#217 by @oldj, plus some more)
  • Added Node versions 10 and 12 to Travis test harness.

0.4.24 / 2018-08-22

  • Added MIK encoding (#196, by @Ivan-Kalatchev)

0.4.23 / 2018-05-07

  • Fix deprecation warning in Node v10 due to the last usage of new Buffer (#185, by @felixbuenemann)
  • Switched from NodeBuffer to Buffer in typings (#155 by @felixfbecker, #186 by @larssn)

0.4.22 / 2018-05-05

  • Use older semver style for dependencies to be compatible with Node version 0.10 (#182, by @dougwilson)
  • Fix tests to accomodate fixes in Node v10 (#182, by @dougwilson)

0.4.21 / 2018-04-06

  • Fix encoding canonicalization (#156)
  • Fix the paths in the "browser" field in package.json (#174 by @LMLB)
  • Removed "contributors" section in package.json - see Git history instead.

0.4.20 / 2018-04-06

  • Updated new Buffer() usages with recommended replacements as it's being deprecated in Node v10 (#176, #178 by @ChALkeR)

0.4.19 / 2017-09-09

  • Fixed iso8859-1 codec regression in handling untranslatable characters (#162, caused by #147)
  • Re-generated windows1255 codec, because it was updated in iconv project
  • Fixed grammar in error message when iconv-lite is loaded with encoding other than utf8

0.4.18 / 2017-06-13

  • Fixed CESU-8 regression in Node v8.

0.4.17 / 2017-04-22

  • Updated typescript definition file to support Angular 2 AoT mode (#153 by @larssn)

0.4.16 / 2017-04-22

  • Added support for React Native (#150)
  • Changed iso8859-1 encoding to usine internal 'binary' encoding, as it's the same thing (#147 by @mscdex)
  • Fixed typo in Readme (#138 by @jiangzhuo)
  • Fixed build for Node v6.10+ by making correct version comparison
  • Added a warning if iconv-lite is loaded not as utf-8 (see #142)

0.4.15 / 2016-11-21

  • Fixed typescript type definition (#137)

0.4.14 / 2016-11-20

  • Preparation for v1.0
  • Added Node v6 and latest Node versions to Travis CI test rig
  • Deprecated Node v0.8 support
  • Typescript typings (@larssn)
  • Fix encoding of Euro character in GB 18030 (inspired by @lygstate)
  • Add ms prefix to dbcs windows encodings (@rokoroku)

0.4.13 / 2015-10-01

  • Fix silly mistake in deprecation notice.

0.4.12 / 2015-09-26

  • Node v4 support:
    • Added CESU-8 decoding (#106)
    • Added deprecation notice for extendNodeEncodings
    • Added Travis tests for Node v4 and io.js latest (#105 by @Mithgol)

0.4.11 / 2015-07-03

  • Added CESU-8 encoding.

0.4.10 / 2015-05-26

  • Changed UTF-16 endianness heuristic to take into account any ASCII chars, not just spaces. This should minimize the importance of "default" endianness.

0.4.9 / 2015-05-24

  • Streamlined BOM handling: strip BOM by default, add BOM when encoding if addBOM: true. Added docs to Readme.
  • UTF16 now uses UTF16-LE by default.
  • Fixed minor issue with big5 encoding.
  • Added io.js testing on Travis; updated node-iconv version to test against. Now we just skip testing SBCS encodings that node-iconv doesn't support.
  • (internal refactoring) Updated codec interface to use classes.
  • Use strict mode in all files.

0.4.8 / 2015-04-14

  • added alias UNICODE-1-1-UTF-7 for UTF-7 encoding (#94)

0.4.7 / 2015-02-05

  • stop official support of Node.js v0.8. Should still work, but no guarantees. reason: Packages needed for testing are hard to get on Travis CI.
  • work in environment where Object.prototype is monkey patched with enumerable props (#89).

0.4.6 / 2015-01-12

  • fix rare aliases of single-byte encodings (thanks @mscdex)
  • double the timeout for dbcs tests to make them less flaky on travis

0.4.5 / 2014-11-20

  • fix windows-31j and x-sjis encoding support (@nleush)
  • minor fix: undefined variable reference when internal error happens

0.4.4 / 2014-07-16

  • added encodings UTF-7 (RFC2152) and UTF-7-IMAP (RFC3501 Section 5.1.3)
  • fixed streaming base64 encoding

0.4.3 / 2014-06-14

  • added encodings UTF-16BE and UTF-16 with BOM

0.4.2 / 2014-06-12

  • don't throw exception if extendNodeEncodings() is called more than once

0.4.1 / 2014-06-11

  • codepage 808 added

0.4.0 / 2014-06-10

  • code is rewritten from scratch
  • all widespread encodings are supported
  • streaming interface added
  • browserify compatibility added
  • (optional) extend core primitive encodings to make usage even simpler
  • moved from vows to mocha as the testing framework