包详细信息

node-expat

astro566.8kMIT2.4.1

NodeJS binding for fast XML parsing.

xml, sax, expat, libexpat

自述文件

node-expat

build status js-standard-style

Motivation

You use Node.js for speed? You process XML streams? Then you want the fastest XML parser: libexpat!

Install

npm install node-expat

Usage

Important events emitted by a parser:

(function () {
  "use strict";

  var expat = require('node-expat')
  var parser = new expat.Parser('UTF-8')

  parser.on('startElement', function (name, attrs) {
    console.log(name, attrs)
  })

  parser.on('endElement', function (name) {
    console.log(name)
  })

  parser.on('text', function (text) {
    console.log(text)
  })

  parser.on('error', function (error) {
    console.error(error)
  })

  parser.write('<html><head><title>Hello World</title></head><body><p>Foobar</p></body></html>')

}())

API

  • #on('startElement' function (name, attrs) {})
  • #on('endElement' function (name) {})
  • #on('text' function (text) {})
  • #on('processingInstruction', function (target, data) {})
  • #on('comment', function (s) {})
  • #on('xmlDecl', function (version, encoding, standalone) {})
  • #on('startCdata', function () {})
  • #on('endCdata', function () {})
  • #on('entityDecl', function (entityName, isParameterEntity, value, base, systemId, publicId, notationName) {})
  • #on('error', function (e) {})
  • #stop() pauses
  • #resume() resumes

Error handling

We don't emit an error event because libexpat doesn't use a callback either. Instead, check that parse() returns true. A descriptive string can be obtained via getError() to provide user feedback.

Alternatively, use the Parser like a node Stream. write() will emit error events.

Namespace handling

A word about special parsing of xmlns: this is not necessary in a bare SAX parser like this, given that the DOM replacement you are using (if any) is not relevant to the parser.

Benchmark

npm run benchmark

module ops/sec native XML compliant stream
sax-js 99,412
node-xml 130,631
libxmljs 276,136
node-expat 322,769

Higher is better.

Testing

npm install -g standard
npm test

Windows

If you fail to install node-expat as a dependency of node-xmpp, please update node-xmpp as it doesn't use node-expat anymore.

Dependencies for node-gyp https://github.com/TooTallNate/node-gyp#installation

See https://github.com/astro/node-expat/issues/78 if you are getting errors about not finding nan.h.

expat.vcproj

VCBUILD : error : project file 'node-expat\build\deps\libexpat\expat.vcproj' was not found or not a valid proj
ect file. [C:\Users\admin\AppData\Roaming\npm\node_modules\node-expat\build\bin
ding.sln]

Install Visual Studio C++ 2012 and run npm with the --msvs_version=2012 flag.

更新日志

NOTE: We are looking for help with a few things: https://github.com/libexpat/libexpat/labels/help%20wanted If you can help, please get in touch. Thanks!

Release 2.2.1 Sat June 17 2017 Security fixes: CVE-2017-9233 -- External entity infinite loop DoS Details: https://libexpat.github.io/doc/cve-2017-9233/ Commit c4bf96bb51dd2a1b0e185374362ee136fe2c9d7f [MOX-002] CVE-2016-9063 -- Detect integer overflow; commit d4f735b88d9932bd5039df2335eefdd0723dbe20 (Fixed version of existing downstream patches!) (SF.net) #539 Fix regression from fix to CVE-2016-0718 cutting off longer tag names; commits

                * 896b6c1fd3b842f377d1b62135dccf0a579cf65d
                * af507cef2c93cb8d40062a0abe43a4f4e9158fb2
         #16    * 0dbbf43fdb20f593ddf4fa1ff67288000dd4a7fd
         #25  More integer overflow detection (function poolGrow); commits
                * 810b74e4703dcfdd8f404e3cb177d44684775143
                * 44178553f3539ce69d34abee77a05e879a7982ac

[MOX-002] Detect overflow from len=INT_MAX call to XML_Parse; commits

                * 4be2cb5afcc018d996f34bbbce6374b7befad47f
                * 7e5b71b748491b6e459e5c9a1d090820f94544d8

[MOX-005] #30 Use high quality entropy for hash initialization:

                * arc4random_buf on BSD, systems with libbsd
                  (when configured with --with-libbsd), CloudABI
                * RtlGenRandom on Windows XP / Server 2003 and later
                * getrandom on Linux 3.17+
                In a way, that's still part of CVE-2016-5300.
                https://github.com/libexpat/libexpat/pull/30/commits

[MOX-005] For the low quality entropy extraction fallback code, the parser instance address can no longer leak, commit 04ad658bd3079dd15cb60fc67087900f0ff4b083 [MOX-003] Prevent use of uninitialised variable; commit [MOX-004] a4dc944f37b664a3ca7199c624a98ee37babdb4b Add missing parameter validation to public API functions and dedicated error code XML_ERROR_INVALID_ARGUMENT: [MOX-006] * NULL checks; commits

                  * d37f74b2b7149a3a95a680c4c4cd2a451a51d60a (merge/many)
                  * 9ed727064b675b7180c98cb3d4f75efba6966681
                  * 6a747c837c50114dfa413994e07c0ba477be4534
                * Negative length (XML_Parse); commit

[MOX-002] 70db8d2538a10f4c022655d6895e4c3e78692e7f [MOX-001] #35 Change hash algorithm to William Ahern's version of SipHash to go further with fixing CVE-2012-0876. https://github.com/libexpat/libexpat/pull/39/commits

    Bug fixes:
         #32  Fix sharing of hash salt across parsers;
                relevant where XML_ExternalEntityParserCreate is called
                prior to XML_Parse, in particular (e.g. FBReader)
         #28  xmlwf: Auto-disable use of memory-mapping (and parsing
                as a single chunk) for files larger than ~1 GB (2^30 bytes)
                rather than failing with error "out of memory"
          #3  Fix double free after malloc failure in DTD code; commit
                7ae9c3d3af433cd4defe95234eae7dc8ed15637f
         #17  Fix memory leak on parser error for unbound XML attribute
                prefix with new namespaces defined in the same tag;
                found by Google's OSS-Fuzz; commits
                * 16f87daae5a16132e479e4f71862128c7a915c73
                * b47dbc9745932c160893d433220e462bd605f8cd
              xmlwf on Windows: Add missing calls to CloseHandle

    New features:
         #30  Introduced environment switch EXPAT_ENTROPY_DEBUG=1
                for runtime debugging of entropy extraction

    Other changes:
              Increase code coverage
         #33  Reject use of XML_UNICODE_WCHAR_T with sizeof(wchar_t) != 2;
                XML_UNICODE_WCHAR_T was never meant to be used outside
                of Windows; 4-byte wchar_t is common on Linux

(SF.net) #538 Start using -fno-strict-aliasing (SF.net) #540 Support compilation against cloudlibc of CloudABI Allow MinGW cross-compilation (SF.net) #534 CMake: Introduce option "BUILD_doc" (enabled by default) to bypass compilation of the xmlwf.1 man page (SF.net) pr2 CMake: Introduce option "INSTALL" (enabled by default) to bypass installation of expat files CMake: Fix ninja support Autotools: Add parameters --enable-xml-context [COUNT] and --disable-xml-context; default of context of 1024 bytes enabled unchanged

         #14  Drop AmigaOS 4.x code and includes
         #14  Drop ancient build systems:
                * Borland C++ Builder
                * OpenVMS
                * Open Watcom
                * Visual Studio 6.0
                * Pre-X Mac OS (MPW Makefile)
                If you happen to rely on some of these, please get in
                touch for joining with maintenance.
         #10  Move from WIN32 to _WIN32
         #13  Fix "make run-xmltest" order instability
              Address compile warnings
              Bump version info from 7:2:6 to 7:3:6
              Add AUTHORS file

    Infrastructure:
          #1  Migrate from SourceForge to GitHub (except downloads):
                https://github.com/libexpat/
          #1  Re-create http://libexpat.org/ project website
              Start utilizing Travis CI

    Special thanks to:
        Andy Wang
        Don Lewis
        Ed Schouten
        Karl Waclawek
        Pascal Cuoq
        Rhodri James
        Sergei Nikulov
        Tobias Taschner
        Viktor Szakats
             and
        Core Infrastructure Initiative
        Mozilla Foundation (MOSS Track 3: Secure Open Source)
        Radically Open Security

Release 2.2.0 Tue June 21 2016 Security fixes:

        #537  CVE-2016-0718 -- Fix crash on malformed input
              CVE-2016-4472 -- Improve insufficient fix to CVE-2015-1283 /
                               CVE-2015-2716 introduced with Expat 2.1.1
        #499  CVE-2016-5300 -- Use more entropy for hash initialization
                               than the original fix to CVE-2012-0876
        #519  CVE-2012-6702 -- Resolve troublesome internal call to srand
                               that was introduced with Expat 2.1.0
                               when addressing CVE-2012-0876 (issue #496)

    Bug fixes:
              Fix uninitialized reads of size 1
                (e.g. in little2_updatePosition)
              Fix detection of UTF-8 character boundaries

    Other changes:
        #532  Fix compilation for Visual Studio 2010 (keyword "C99")
              Autotools: Resolve use of "$<" to better support bmake
              Autotools: Add QA script "qa.sh" (and make target "qa")
              Autotools: Respect CXXFLAGS if given
              Autotools: Fix "make run-xmltest"
              Autotools: Have "make run-xmltest" check for expected output
         p90  CMake: Fix static build (BUILD_shared=OFF) on Windows
        #536  CMake: Add soversion, support -DNO_SONAME=yes to bypass
        #323  CMake: Add suffix "d" to differentiate debug from release
              CMake: Define WIN32 with CMake on Windows
              Annotate memory allocators for GCC
              Address all currently known compile warnings
              Make sure that API symbols remain visible despite
                -fvisibility=hidden
              Remove executable flag from source files
              Resolve COMPILED_FROM_DSP in favor of WIN32

    Special thanks to:
        Björn Lindahl
        Christian Heimes
        Cristian Rodríguez
        Daniel Krügler
        Gustavo Grieco
        Karl Waclawek
        László Böszörményi
        Marco Grassi
        Pascal Cuoq
        Sergei Nikulov
        Thomas Beutlich
        Warren Young
        Yann Droneaud

Release 2.1.1 Sat March 12 2016 Security fixes:

        #582: CVE-2015-1283 - Multiple integer overflows in XML_GetBuffer

    Bug fixes:
        #502: Fix potential null pointer dereference
        #520: Symbol XML_SetHashSalt was not exported
        Output of "xmlwf -h" was incomplete

    Other changes:
        #503: Document behavior of calling XML_SetHashSalt with salt 0
        Minor improvements to man page xmlwf(1)
        Improvements to the experimental CMake build system
        libtool now invoked with --verbose

Release 2.1.0 Sat March 24 2012

    - Security fixes:
      #2958794: CVE-2012-1148 - Memory leak in poolGrow.
      #2895533: CVE-2012-1147 - Resource leak in readfilemap.c.
      #3496608: CVE-2012-0876 - Hash DOS attack.
      #2894085: CVE-2009-3560 - Buffer over-read and crash in big2_toUtf8().
      #1990430: CVE-2009-3720 - Parser crash with special UTF-8 sequences.
    - Bug Fixes:
      #1742315: Harmful XML_ParserCreateNS suggestion.
      #1785430: Expat build fails on linux-amd64 with gcc version>=4.1 -O3.
      #1983953, 2517952, 2517962, 2649838: 
            Build modifications using autoreconf instead of buildconf.sh.
      #2815947, #2884086: OBJEXT and EXEEXT support while building.
      #2517938: xmlwf should return non-zero exit status if not well-formed.
      #2517946: Wrong statement about XMLDecl in xmlwf.1 and xmlwf.sgml.
      #2855609: Dangling positionPtr after error.
      #2990652: CMake support.
      #3010819: UNEXPECTED_STATE with a trailing "%" in entity value.
      #3206497: Unitialized memory returned from XML_Parse.
      #3287849: make check fails on mingw-w64.
    - Patches:
      #1749198: pkg-config support.
      #3010222: Fix for bug #3010819.
      #3312568: CMake support.
      #3446384: Report byte offsets for attr names and values.
    - New Features / API changes:
      Added new API member XML_SetHashSalt() that allows setting an initial
            value (salt) for hash calculations. This is part of the fix for
            bug #3496608 to randomize hash parameters.
      When compiled with XML_ATTR_INFO defined, adds new API member
            XML_GetAttributeInfo() that allows retrieving the byte
            offsets for attribute names and values (patch #3446384).
      Added CMake build system.
            See bug #2990652 and patch #3312568.
      Added run-benchmark target to Makefile.in - relies on testdata module
            present in the same relative location as in the repository.

Release 2.0.1 Tue June 5 2007

    - Fixed bugs #1515266, #1515600: The character data handler's calling
      of XML_StopParser() was not handled properly; if the parser was
      stopped and the handler set to NULL, the parser would segfault.
    - Fixed bug #1690883: Expat failed on EBCDIC systems as it assumed
      some character constants to be ASCII encoded.
    - Minor cleanups of the test harness.
    - Fixed xmlwf bug #1513566: "out of memory" error on file size zero.
    - Fixed outline.c bug #1543233: missing a final XML_ParserFree() call.
    - Fixes and improvements for Windows platform:
      bugs #1409451, #1476160, #1548182, #1602769, #1717322.
    - Build fixes for various platforms:
      HP-UX, Tru64, Solaris 9: patch #1437840, bug #1196180.
      All Unix: #1554618 (refreshed config.sub/config.guess).
                #1490371, #1613457: support both, DESTDIR and INSTALL_ROOT,
                without relying on GNU-Make specific features.
      #1647805: Patched configure.in to work better with Intel compiler.
    - Fixes to Makefile.in to have make check work correctly:
      bugs #1408143, #1535603, #1536684.
    - Added Open Watcom support: patch #1523242.

Release 2.0.0 Wed Jan 11 2006

    - We no longer use the "check" library for C unit testing; we
      always use the (partial) internal implementation of the API.
    - Report XML_NS setting via XML_GetFeatureList().
    - Fixed headers for use from C++.
    - XML_GetCurrentLineNumber() and  XML_GetCurrentColumnNumber()
      now return unsigned integers.
    - Added XML_LARGE_SIZE switch to enable 64-bit integers for
      byte indexes and line/column numbers.
    - Updated to use libtool 1.5.22 (the most recent).
    - Added support for AmigaOS.
    - Some mostly minor bug fixes. SF issues include: #1006708,
      #1021776, #1023646, #1114960, #1156398, #1221160, #1271642.

Release 1.95.8 Fri Jul 23 2004

    - Major new feature: suspend/resume.  Handlers can now request
      that a parse be suspended for later resumption or aborted
      altogether.  See "Temporarily Stopping Parsing" in the
      documentation for more details.
    - Some mostly minor bug fixes, but compilation should no
      longer generate warnings on most platforms.  SF issues
      include: #827319, #840173, #846309, #888329, #896188, #923913,
      #928113, #961698, #985192.

Release 1.95.7 Mon Oct 20 2003

    - Fixed enum XML_Status issue (reported on SourceForge many
      times), so compilers that are properly picky will be happy.
    - Introduced an XMLCALL macro to control the calling
      convention used by the Expat API; this macro should be used
      to annotate prototypes and definitions of callback
      implementations in code compiled with a calling convention
      other than the default convention for the host platform.
    - Improved ability to build without the configure-generated
      expat_config.h header.  This is useful for applications
      which embed Expat rather than linking in the library.
    - Fixed a variety of bugs: see SF issues #458907, #609603,
      #676844, #679754, #692878, #692964, #695401, #699323, #699487,
      #820946.
    - Improved hash table lookups.
    - Added more regression tests and improved documentation.

Release 1.95.6 Tue Jan 28 2003

    - Added XML_FreeContentModel().
    - Added XML_MemMalloc(), XML_MemRealloc(), XML_MemFree().
    - Fixed a variety of bugs: see SF issues #615606, #616863,
      #618199, #653180, #673791.
    - Enhanced the regression test suite.
    - Man page improvements: includes SF issue #632146.

Release 1.95.5 Fri Sep 6 2002

    - Added XML_UseForeignDTD() for improved SAX2 support.
    - Added XML_GetFeatureList().
    - Defined XML_Bool type and the values XML_TRUE and XML_FALSE.
    - Use an incomplete struct instead of a void* for the parser
      (may not retain).
    - Fixed UTF-8 decoding bug that caused legal UTF-8 to be rejected.
    - Finally fixed bug where default handler would report DTD
      events that were already handled by another handler.
      Initial patch contributed by Darryl Miles.
    - Removed unnecessary DllMain() function that caused static
      linking into a DLL to be difficult.
    - Added VC++ projects for building static libraries.
    - Reduced line-length for all source code and headers to be
      no longer than 80 characters, to help with AS/400 support.
    - Reduced memory copying during parsing (SF patch #600964).
    - Fixed a variety of bugs: see SF issues #580793, #434664,
      #483514, #580503, #581069, #584041, #584183, #584832, #585537,
      #596555, #596678, #598352, #598944, #599715, #600479, #600971.

Release 1.95.4 Fri Jul 12 2002

    - Added support for VMS, contributed by Craig Berry.  See
      vms/README.vms for more information.
    - Added Mac OS (classic) support, with a makefile for MPW,
      contributed by Thomas Wegner and Daryle Walker.
    - Added Borland C++ Builder 5 / BCC 5.5 support, contributed
      by Patrick McConnell (SF patch #538032).
    - Fixed a variety of bugs: see SF issues #441449, #563184,
      #564342, #566334, #566901, #569461, #570263, #575168, #579196.
    - Made skippedEntityHandler conform to SAX2 (see source comment)
    - Re-implemented WFC: Entity Declared from XML 1.0 spec and
      added a new error "entity declared in parameter entity":
      see SF bug report #569461 and SF patch #578161
    - Re-implemented section 5.1 from XML 1.0 spec:
      see SF bug report #570263 and SF patch #578161

Release 1.95.3 Mon Jun 3 2002

    - Added a project to the MSVC workspace to create a wchar_t
      version of the library; the DLLs are named libexpatw.dll.
    - Changed the name of the Windows DLLs from expat.dll to
      libexpat.dll; this fixes SF bug #432456.
    - Added the XML_ParserReset() API function.
    - Fixed XML_SetReturnNSTriplet() to work for element names.
    - Made the XML_UNICODE builds usable (thanks, Karl!).
    - Allow xmlwf to read from standard input.
    - Install a man page for xmlwf on Unix systems.
    - Fixed many bugs; see SF bug reports #231864, #461380, #464837,
      #466885, #469226, #477667, #484419, #487840, #494749, #496505,
      #547350.  Other bugs which we can't test as easily may also
      have been fixed, especially in the area of build support.

Release 1.95.2 Fri Jul 27 2001

    - More changes to make MSVC happy with the build; add a single
      workspace to support both the library and xmlwf application.
    - Added a Windows installer for Windows users; includes
      xmlwf.exe.
    - Added compile-time constants that can be used to determine the
      Expat version
    - Removed a lot of GNU-specific dependencies to aide portability
      among the various Unix flavors.
    - Fix the UTF-8 BOM bug.
    - Cleaned up warning messages for several compilers.
    - Added the -Wall, -Wstrict-prototypes options for GCC.

Release 1.95.1 Sun Oct 22 15:11:36 EDT 2000

    - Changes to get expat to build under Microsoft compiler
    - Removed all aborts and instead return an UNEXPECTED_STATE error.
    - Fixed a bug where a stray '%' in an entity value would cause an
      abort.
    - Defined XML_SetEndNamespaceDeclHandler. Thanks to Darryl Miles for
      finding this oversight.
    - Changed default patterns in lib/Makefile.in to fit non-GNU makes
      Thanks to robin@unrated.net for reporting and providing an
      account to test on.
    - The reference had the wrong label for XML_SetStartNamespaceDecl.
      Reported by an anonymous user.

Release 1.95.0 Fri Sep 29 2000

    - XML_ParserCreate_MM
            Allows you to set a memory management suite to replace the
            standard malloc,realloc, and free.
    - XML_SetReturnNSTriplet
            If you turn this feature on when namespace processing is in
            effect, then qualified, prefixed element and attribute names
            are returned as "uri|name|prefix" where '|' is whatever
            separator character is used in namespace processing.
    - Merged in features from perl-expat
            o XML_SetElementDeclHandler
            o XML_SetAttlistDeclHandler
            o XML_SetXmlDeclHandler
            o XML_SetEntityDeclHandler
            o StartDoctypeDeclHandler takes 3 additional parameters:
                    sysid, pubid, has_internal_subset
            o Many paired handler setters (like XML_SetElementHandler)
              now have corresponding individual handler setters
            o XML_GetInputContext for getting the input context of
              the current parse position.
    - Added reference material
    - Packaged into a distribution that builds a sharable library