Détail du package

feedparser

danmactough70.1kMIT2.2.10

Robust RSS Atom and RDF feed parsing using sax js

rss, feed, atom, rdf

readme

Feedparser - Robust RSS, Atom, and RDF feed parsing in Node.js

Greenkeeper badge

Join the chat at https://gitter.im/danmactough/node-feedparser

Build Status

NPM

Feedparser is for parsing RSS, Atom, and RDF feeds in node.js.

It has a couple features you don't usually see in other feed parsers:

  1. It resolves relative URLs (such as those seen in Tim Bray's "ongoing" feed).
  2. It properly handles XML namespaces (including those in unusual feeds that define a non-default namespace for the main feed elements).

Installation

npm install feedparser

Usage

This example is just to briefly demonstrate basic concepts.

Please also review the complete example for a thorough working example that is a suitable starting point for your app.


var FeedParser = require('feedparser');
var fetch = require('node-fetch'); // for fetching the feed

var req = fetch('http://somefeedurl.xml')
var feedparser = new FeedParser([options]);

req.then(function (res) {
  if (res.status !== 200) {
    throw new Error('Bad status code');
  }
  else {
    // The response `body` -- res.body -- is a stream
    res.body.pipe(feedparser);
  }
}, function (err) {
  // handle any request errors
});

feedparser.on('error', function (error) {
  // always handle errors
});

feedparser.on('readable', function () {
  // This is where the action is!
  var stream = this; // `this` is `feedparser`, which is a stream
  var meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
  var item;

  while (item = stream.read()) {
    console.log(item);
  }
});

You can also check out this nice working implementation that demonstrates one way to handle all the hard and annoying stuff. :smiley:

options

  • normalize - Set to false to override Feedparser's default behavior, which is to parse feeds into an object that contains the generic properties patterned after (although not identical to) the RSS 2.0 format, regardless of the feed's format.

  • addmeta - Set to false to override Feedparser's default behavior, which is to add the feed's meta information to each article.

  • feedurl - The url (string) of the feed. FeedParser is very good at resolving relative urls in feeds. But some feeds use relative urls without declaring the xml:base attribute any place in the feed. This is perfectly valid, but we don't know know the feed's url before we start parsing the feed and trying to resolve those relative urls. If we discover the feed's url, we will go back and resolve the relative urls we've already seen, but this takes a little time (not much). If you want to be sure we never have to re-resolve relative urls (or if FeedParser is failing to properly resolve relative urls), you should set the feedurl option. Otherwise, feel free to ignore this option.

  • resume_saxerror - Set to false to override Feedparser's default behavior, which is to emit any SAXError on error and then automatically resume parsing. In my experience, SAXErrors are not usually fatal, so this is usually helpful behavior. If you want total control over handling these errors and optionally aborting parsing the feed, use this option.

Examples

See the examples directory.

API

Transform Stream

Feedparser is a transform stream operating in "object mode": XML in -> Javascript objects out. Each readable chunk is an object representing an article in the feed.

Events Emitted

  • meta - called with feed meta when it has been parsed
  • error - called with error whenever there is a Feedparser error of any kind (SAXError, Feedparser error, etc.)

What is the parsed output produced by feedparser?

Feedparser parses each feed into a meta (emitted on the meta event) portion and one or more articles (emited on the data event or readable after the readable is emitted).

Regardless of the format of the feed, the meta and each article contain a uniform set of generic properties patterned after (although not identical to) the RSS 2.0 format, as well as all of the properties originally contained in the feed. So, for example, an Atom feed may have a meta.description property, but it will also have a meta['atom:subtitle'] property.

The purpose of the generic properties is to provide the user a uniform interface for accessing a feed's information without needing to know the feed's format (i.e., RSS versus Atom) or having to worry about handling the differences between the formats. However, the original information is also there, in case you need it. In addition, Feedparser supports some popular namespace extensions (or portions of them), such as portions of the itunes, media, feedburner and pheedo extensions. So, for example, if a feed article contains either an itunes:image or media:thumbnail, the url for that image will be contained in the article's image.url property.

All generic properties are "pre-initialized" to null (or empty arrays or objects for certain properties). This should save you from having to do a lot of checking for undefined, such as, for example, when you are using jade templates.

In addition, all properties (and namespace prefixes) use only lowercase letters, regardless of how they were capitalized in the original feed. ("xmlUrl" and "pubDate" also are still used to provide backwards compatibility.) This decision places ease-of-use over purity -- hopefully, you will never need to think about whether you should camelCase "pubDate" ever again.

The title and description properties of meta and the title property of each article have any HTML stripped if you let feedparser normalize the output. If you really need the HTML in those elements, there are always the originals: e.g., meta['atom:subtitle']['#'].

List of meta properties

  • title
  • description
  • link (website link)
  • xmlurl (the canonical link to the feed, as specified by the feed)
  • date (most recent update)
  • pubdate (original published date)
  • author
  • language
  • image (an Object containing url and title properties)
  • favicon (a link to the favicon -- only provided by Atom feeds)
  • copyright
  • generator
  • categories (an Array of Strings)

List of article properties

  • title
  • description (frequently, the full article content)
  • summary (frequently, an excerpt of the article content)
  • link
  • origlink (when FeedBurner or Pheedo puts a special tracking url in the link property, origlink contains the original link)
  • permalink (when an RSS feed has a guid field and the isPermalink attribute is not set to false, permalink contains the value of guid)
  • date (most recent update)
  • pubdate (original published date)
  • author
  • guid (a unique identifier for the article)
  • comments (a link to the article's comments section)
  • image (an Object containing url and title properties)
  • categories (an Array of Strings)
  • source (an Object containing url and title properties pointing to the original source for an article; see the RSS Spec for an explanation of this element)
  • enclosures (an Array of Objects, each representing a podcast or other enclosure and having a url property and possibly type and length properties)
  • meta (an Object containing all the feed meta properties; especially handy when using the EventEmitter interface to listen to article emissions)

Help

  • Don't be afraid to report an issue.
  • You can drop by Gitter, too.

Contributors

View all the contributors.

Although node-feedparser no longer shares any code with node-easyrss, it was the original inspiration and a starting point.

License

(The MIT License)

Copyright (c) 2011-2020 Dan MacTough and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

changelog

2.2.10 / 2020-05-01

  • Changes a direct use of hasOwnProperty builtin to call it from Object.prototype instead
  • Update mri
  • Update readable-stream
  • Update npm audit fixes
  • Remove unused Makefile
  • Update examples to use node-fetch in place of request
  • Update mocha v7
  • Update eslint v6
  • Replace iconv with iconv-lite
  • Update travis config; drop support for unmaintained node versions
  • Merge pull request #271 from jakutis/readme-use-https
  • README: make links use https: instead of http: protocol
  • Update copyright
  • Update README
  • Merge pull request #255 from danmactough/greenkeeper/mocha-5.0.0
  • chore(package): update mocha to version 5.0.0

2.2.9 / 2018-01-27

  • Skip illegally-nested items
  • Add failing test for illegally nested items

2.2.8 / 2018-01-07

  • Fix meta['#ns'] array to avoid duplicates

2.2.7 / 2017-12-11

  • Enhance cli to take feedparser options as cli parameters
  • Improve relative url resolution in RSS feeds
  • Add issue template
  • Add link to Dave Winer's demo to README

2.2.6 / 2017-12-10

  • Prioritize alternate links for item.link

2.2.5 / 2017-12-09

  • Fix reresolve helper to correctly resolve relative URLs in RSS channel image

2.2.4 / 2017-11-08

  • Fix reresolve logic
  • Add failing test - no reresolving first link in feed
  • Add a test assertion for xml base resolution

2.2.3 / 2017-10-25

  • Update npm package to minimize dist size

2.2.2 / 2017-10-12

  • Update devDependencies
  • Update sax v1.2.4
  • Update travis - node 7->8
  • Make sure that all links are parsed, not only text/html
  • docs(readme): add Greenkeeper badge
  • chore(package): update dependencies

2.2.1 / 2017-06-22

  • fix: pin sax to 1.2.3
  • Update mocha to version 3.4.1

2.2.0 / 2017-04-11

  • support for g:image_link attribute

2.1.0 / 2017-01-18

  • Keep optional media:content attributes in the enclosures default property

2.0.0 / 2016-12-26

  • Make bin script useful as command line tool and rename to "feedparser"
  • Add lint script and run lint before tests
  • Update README to clarify the importance of the compressed example
  • Drop support for Node 0.10 and 0.12
  • Fix xml declaration parsing to handle extra whitespace
  • Fix assignment by reference in options parsing
  • Remove unnecessary method
  • Replace bespoke helpers with lodash equivalents where possible
  • Move feedparser to lib
  • Move helpers lib
  • Remove weird comment
  • Update copyright
  • Update addressparser v1.0.1
  • Update dependecy readable-stream v2.2.2 and update tests to conform to api change
  • Update dev-dependency (iconv v2.2.1)
  • Update dev-dependency (mocha v3.2.0)
  • Add eslint/editorconfig and linting

1.1.5 / 2016-09-24

  • Handles line breaks in xml declaration.
  • Update README to remove suggestion to use IRC
  • Add Gitter badge
  • Update examples to work with current versions of request module

1.1.4 / 2015-10-24

  • Display nested objects.

1.1.3 / 2015-06-12

  • Prefer atom link elements with type=text/html

1.1.2 / 2015-06-02

  • Be more careful about assigning item.link from atom:link elements

1.1.1 / 2015-05-28

  • Add license attribute

1.1.0 / 2015-05-21

  • Fix channel link selection when there is a mixture of rss and atom. Closes #142

1.0.1 / 2015-04-07

  • Fix category parsing to avoid null in results. Resolves #136

1.0.0 / 2015-02-26

  • Bump mocha devDependency to v2.1.x
  • Cleanup package.json
  • Update copyright year in README
  • Remove node v0.8 support
  • Merge pull request #134 from designfrontier/master
  • added a testing environment for node v0.12
  • removed resanitize as a dependency since the only thing in use was a 4 line function. Moved the function to utils

v0.19.2 / 2014-09-02

  • Change ispermalink value check to be case-insensitive. Closes #123.
  • Whoops. Remove debugging from example

v0.19.1 / 2014-07-31

  • Add compressed example
  • Refactor iconv example

v0.19.0 / 2014-07-30

  • Remove unnecessary code to trigger saxparser error. Apparently, calling the callback with an error will trigger an error anyway. Totally undocumented. So, this was actually calling double error emitting.
  • Manually trigger end when an exception is caught. We can't continue parsing after an exception is thrown. Also update test.
  • Use native try/catch. Other method is not a performance enhancement.
  • Wrap sax write and end methods in try/catch. Resolves #112 sax >= v0.6.0 can throw if a gzipped data stream containing certain characters gets written to the parser. This is a user error (to pipe gzipped data), but sometimes servers send gzipped data even when you've told them not to. So, we try to let the user handle this more gracefully.
  • Add failing test case for sax throwing

v0.18.1 / 2014-06-20

  • Don't assume el is not an array when defining attrs hash. Resolves #113
  • Add failing test for #113

v0.18.0 / 2014-06-18

  • Enforce de-duping on atom enclosures
  • Fix modification by reference defeating indexOf checking
  • Fix inverted index checking
  • Update test and fixture with tougher test case suggested by #111
  • Revert "test for different enclosure type"
  • test for different enclosure type

v0.17.0 / 2014-05-27

  • Improve tests
  • Use readable-stream instead of core stream; update dependencies.
  • Update README
  • Add permalink property for RSS feeds
  • Add nodeico badge
  • Remove unnecessary test server
  • Only colorize dump output if outputing to a terminal.
  • Fix small typo.

v0.16.6 / 2014-02-12

  • Update README to improve example code.
  • Fix error check in handleEnd method.
  • Remove unused dependency.
  • Add to namespaces and prettify.
  • Update iconv example to remove event-stream dependency.
  • Cleanup iconv example
  • Add gitignore
  • Merge branch 'kof-iconv'
  • Refactor iconv example to be more explicit.
  • Create a localhost server for example.
  • Refactor getParams method.
  • Move tips for url fetching to example script
  • Remove gitignore
  • complicated example using iconv and request

v0.16.5 / 2013-12-29

  • Workaround addressparser failing to parse strings ending with a colon. Closes #94.

v0.16.4 / 2013-12-26

  • Fix bad logic setting meta.image properties.
  • Fix TypeError in utils.reresolve failing to check for existence of parameter. Resolves #92.

v0.16.3 / 2013-10-27

  • Merge remote-tracking branch 'PaulMougel/master'
  • Updated readable side highWaterMark to be forward-compatible with node.
  • Improved stream watermark and buffering.
  • Reduced memory consumption.

v0.16.2 / 2013-10-08

  • Bump dependencies
  • Merge pull request #75 from jcrugzz/request-depend
  • [fix] remove unneeded dependency request
  • Update README.md
  • Update example code

v0.16.1 / 2013-06-13

  • Update travis config
  • Only emit meta once. title is a required channel element, so a feed without it is broken, but emitting more than once is still a no-no. Closes #69
  • Bump version: v0.16.0
  • Update README
  • Remove legacy libxml-like helpers
  • Update dump script
  • Update examples
  • Update tests
  • Emit SAXErrors and allow consumer to handle or bail on SAXErrors
  • Update copyright notices
  • Merge branch 'AndreasMadsen-transform-stream'
  • Change stream test to not require additional dependency
  • make feedparser a transform stream

v0.16.0 / 2013-06-11

  • Update README
  • Remove legacy libxml-like helpers
  • Update dump script
  • Update examples
  • Update tests
  • Emit SAXErrors and allow consumer to handle or bail on SAXErrors
  • Update copyright notices
  • Merge branch 'AndreasMadsen-transform-stream'
  • Change stream test to not require additional dependency
  • make feedparser a transform stream

v0.15.8 / 2013-10-08

  • Fix package.json

v0.15.7 / 2013-09-26

  • Bump dependencies

v0.15.6 / 2013-09-24

  • Bump dependencies
  • Update travis config

v0.15.5 / 2013-06-13

  • Only emit meta once. title is a required channel element, so a feed without it is broken, but emitting more than once is still a no-no. Closes #69
  • Update copyright notices

v0.15.4 / 2013-06-04

  • Fix processing instruction handler to avoid interpretting extraneouso whitespace as attribute names.
  • Use item source for xmlurl, if absent. Closes #63
  • Add more xml:base fallbacks. Resolves #64
  • Merge branch 'unexpected-arrays'
  • Fix date parsing. Don't trust that the dates are not arrays.
  • Make tests run on v0.10. Closes #61.

v0.15.3 / 2013-05-05

  • Update README to point to contributors graph
  • Merge pull request #59 from AndreasMadsen/rss-category
  • do not seperate rss catgories by comma

v0.15.2 / 2013-04-16

  • Be more forgiving of poorly-formatted feeds. Closes #58

v0.15.1 / 2013-04-15

  • Fix for no Content-Type header

v0.15.0 / 2013-04-11

  • Tweak #content-type; add #xml to meta
  • Tweak stream api test
  • Fix missing scope
  • Linting
  • Fix typo in README code example
  • Update README to add link to Issues page and IRC

v0.14.0 / 2013-03-25

  • Update examples
  • Update README
  • Remove nextEmit. Only use nextTick on parseString (other methods don't need it).
  • Remove _setCallback and set the callback directly. Don't use nextTick.
  • Add basic test for writable stream input api
  • Add basic tests for callback and event apis
  • Implement naive v0.8-style Stream API
  • Fix README (incorrect stream pipe examples)
  • Merge pull request #52 from supahgreg/master
  • Correcting a typo in README.md

v0.13.4 / 2013-03-15

  • Fix unsafe usage of 'in' when variable may be not an object. Closes #51.

v0.13.3 / 2013-03-14

  • Fix reresolve function to not assume that node property is a string. Closes #50.

v0.13.2 / 2013-02-21

  • Fix issue where namespaced elements with the same local part as a root element were being treated as having the save name, e.g., atom:link in an rss feed being part of the 'link' element.
  • Remove stray console.log from test

v0.13.1 / 2013-02-21

  • Deal with the astonishing fact that someone thinks a feed with 4 diffenet cloud/pubsubhubub elements is helpful. Resolves #49.

v0.13.0 / 2013-02-18

  • Remove old API. Update docs, examples and tests.
  • Fix .parseUrl url parameter processing. Throw early if no valid url is given. Also pass all options to request. Add tests. Closes #44 and #46.
  • Add url to error when possible. Change "Not a feed" error message because it's not always a remote server. Update tests. Closes #43."
  • Raise default sax.MAX_BUFFER_LENGTH to 16M and allow it to be set in options. Closes #38.
  • Strip HTML from meta.title, meta.description and item.title

v0.12.0 / 2013-02-12

  • Expose rssCloud/pubsubhubbub on meta.cloud property. Resolves #47.
  • Expose "has" util

v0.11.0 / 2013-02-03

  • Dedupe enclosures. Resolves #45.
  • Change test to be more lenient about which error code is returned as it seems to differ for no known reason
  • Drop support for node pre-v0.8.x
  • Refactor tests to not fetch remote URLs
  • Tell TravisCI to only run tests on master
  • Enable silencing the deprecation warnings

v0.10.13 / 2013-01-08

  • Bump sax version

v0.10.12 / 2012-12-31

  • Expose HTTP response on FeedParser instance

v0.10.11 / 2012-12-28

  • Update tests
  • Change HTTP Content-Type head checking to allow parsing valid feeds with incorrect Content-Type header. Add value of Content-Type header to meta.

v0.10.10 / 2012-12-28

  • Add example and test for passing request headers to .parseUrl()
  • Enable FeedParser.parseUrl to accept a Request object with headers
  • Update utils.merge() to be safer about relying on Object properties
  • Skip failing test that's not failing. Maybe the remote server changed something.
  • Cleanup 5f642af. Don't overwrite media:thumbnail array.
  • Increase test timeout. Fix incorrect test usage of deepEqual instead of strictEqual.
  • Merge pull request #41 from rborn/master
  • fix for multiple media:thumbnail
  • Add test for fetching uncompressed feed.

v.0.10.9 / 2012-12-03

  • Add "Accept-Encoding: identity" header on HTTP requests to only fetch uncompressed data. Resolves issue #36.
  • Merge pull request #37 from jchris/patch-1
  • make example work with new api

v0.10.8 / 2012-11-06

  • Ensure we only emit end once. Bump version.
  • Change FeedParser.parseStream so it doesn't try to attach to a stream that is not defined. A user could pass in a stream thinking it's valid, but the stream has been destroyed. Try not to throw.
  • Change FeedParser#handleError to not remove 'error' listeners on this.stream

v0.10.7 / 2012-11-01

  • Fix issue #34 .parseString() emitting too soon. All emit() and callback() are wrapped in process.nextTick(). Bump version.

v0.10.6 / 2012-10-27

  • Fix issue #33 uncaught exception trying to get the text string for an HTTP status code.

v0.10.5 / 2012-10-26

  • Bump version. Update README with additional dependency. Add History.md.
  • Fix issue #32 - parse RSS item:author. Enhance RSS authorish elements with parsed properties via addressparser.

v0.10.4 / 2012-10-25

  • Bump version
  • Fix major bug in parseString, parseFile, and parseStream -- failed to return the event emitter.
  • Refactor dump script to use new API
  • Fix dump script for API change

v0.10.3 / 2012-10-24

  • Bump version
  • Update documentation
  • Rename 'notModified' event to '304'
  • Add deprecation warnings to prototype methods. Reorganize .parseUrl and handleResponse.
  • Update tests for new static methods
  • Fix initialization of saxstream. Rename parser to feedparser. Add doc to parseString static.
  • Refactor options and init parsing. Refine error handling. Fix bug in handleSaxError. Add static methods for parseString, parseFile and parseStream.
  • Initial refactor of error handling
  • Reorganize some code
  • Rename FeedParser#_reset to FeedParser#init
  • Change module.exports to use an instance of FeedParser. Add non-prototype-based parseUrl.
  • :gem: Travis CI image/link in readme :gem:
  • :gem: Added travis.yml file :gem:

v0.10.2 / 2012-10-17

  • Add static callback methods
  • Move reresolve to utils
  • Update inline documentation of public api
  • Bump version
  • Refactor (part 2) to eliminate scope-passing (just moves things around in the class)
  • Refactor (part 1) to eliminate scope-passing

v0.10.1 / 2012-10-05

  • Bump version. Fix issue #25; add test. Add ability to pass "strict" boolean option to Sax.
  • Fix failing test

v0.10.0-beta / 2012-09-13

  • Mark package as beta version
  • Add more namespaces and sort sort-of alphabetically
  • Add brief description and usage info
  • Bump version
  • Add more namespaces
  • Handle namespaced elements that use nondefault namespace prefixes
  • Add more namespace-awareness tests
  • Add test for issue #23 (non-default namespaces)
  • Refactor to handle use of nondefault namespaces
  • Add Makefile to run tests
  • Add nsprefix function for getting the "default" prefix for a given namespace uri.
  • Add 'xml' to default namespaces lookup table.
  • Add nslookup function for checking whether a uri matches the default for a namespace.
  • Add default namespaces lookup table.
  • Add script to dump parsed feeds to console. Useful for debugging.