Detalhes do pacote

@javivelasco/isbot

omrilotan14Unlicense3.3.3

🤖 detect bots/crawlers/spiders via the user agent.

bot, crawlers, spiders, googlebot

readme (leia-me)

isbot 🤖/👨‍🦰

Detect bots/crawlers/spiders using the user agent string.

Usage

import isbot from 'isbot'

// Nodejs HTTP
isbot(request.getHeader('User-Agent'))

// ExpressJS
isbot(req.get('user-agent'))

// Browser
isbot(navigator.userAgent)

// User Agent string
isbot('Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)') // true
isbot('Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36') // false

Additional functionality

Extend: Add user agent patterns

Add rules to user agent match RegExp: Array of strings

isbot('Mozilla/5.0') // false
isbot.extend([
    'istat',
    '^mozilla/\\d\\.\\d$'
])
isbot('Mozilla/5.0') // true

Exclude: Remove matches of known crawlers

Remove rules to user agent match RegExp (see existing rules in src/list.json file)

isbot('Chrome-Lighthouse') // true
isbot.exclude(['chrome-lighthouse']) // pattern is case insensitive
isbot('Chrome-Lighthouse') // false

Find: Verbose result

Return the respective match for bot user agent rule

isbot.find('Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 DejaClick/2.9.7.2') // 'DejaClick'

Spawn: Create new instances

Create new instances of isbot. Instance is spawned using spawner's list as base

const one = isbot.spawn()
const two = isbot.spawn()

two.exclude(['chrome-lighthouse'])
one('Chrome-Lighthouse') // true
two('Chrome-Lighthouse') // false

Create isbot using custom list (instead of the maintained list)

const lean = isbot.spawn([ 'bot' ])
lean('Googlebot') // true
lean('Chrome-Lighthouse') // false

Definitions

  • Bot. Autonomous program imitating or replacing some aspect of a human behaviour, performing repetitive tasks much faster than human users could.
  • Good bot. Automated programs who visit websites in order to collect useful information. Web crawlers, site scrapers, stress testers, preview builders and other programs are welcomed on most websites because they serve purposes of mutual benefits.
  • Bad bot. Programs which are designed to perform malicious actions, ultimately hurting businesses. Testing credential databases, DDoS attacks, spam bots.

Clarifications

What does "isbot" do?

This package aims to identify "Good bots". Those who voluntarily identify themselves by setting a unique, preferably descriptive, user agent, usually by setting a dedicated request header.

What doesn't "isbot" do?

It does not try to recognise malicious bots or programs disguising themselves as real users.

Why would I want to identify good bots?

Recognising good bots such as web crawlers is useful for multiple purposes. Although it is not recommended to serve different content to web crawlers like Googlebot, you can still elect to

  • Flag pageviews to consider with business analysis.
  • Prefer to serve cached content and relieve service load.
  • Omit third party solutions' code (tags, pixels) and reduce costs.

    It is not recommended to whitelist requests for any reason based on user agent header only. Instead other methods of identification can be added such as reverse dns lookup.

Data sources

We use external data sources on top of our own lists to keep up to date

Crawlers user agents:

Non bot user agents:

Missing something? Please open an issue

Major releases breaking changes (full changelog)

Version 3

Remove testing for node 6 and 8

Version 2

Change return value for isbot: true instead of matched string

Version 1

No functional change

Real world data

Execution times in milliseconds

changelog (log de mudanças)

Changelog

3.3.3

  • Add generic patterns (name/version) reduces pattern list size by >20%
  • Internal formatting

3.3.2

  • Remove const keyword from build (Fix)

3.3.1

  • Fix in type definition

3.3.0

  • Add "spawn" interface

3.2.4

  • Add some RSS readers detection

3.2.3

  • Refine amiga user agent detection

3.2.2

  • One mode duckduckgo pattern

3.2.1

  • Add bitdiscovery, Invision bot, ddg_android (duckduckgo), Braze, gobuster

3.2.0

New features

  • Typescript definition (isbot) supports any. Where a non-string argument is cast to a string before execution

3.1.0

New features

  • Native support for ESM and CommonJS
  • Start maintaining a security policy

List updates

  • Remove WAPCHOI from bot list
  • Recognise Google/google user agent for Android webview

3.0.27

  • Add a few known crawlers

3.0.26

  • Open source projects with indication to github.com

3.0.25

  • Address webview "Channel/googleplay", "GoogleApp/"
  • Add 4 more bot patterns
  • Stop treating Splash browser as bot

3.0.24

  • Add Prometheus new user agent (prometheus)
  • Add RestSharp .NET HTTP client
  • Add M2E Pro Cron Service
  • Add Deluge
  • Deprecate asafaweb.com (EOL)

3.0.23

  • Recognise Mozilla MozacFetch as natural non bot browser

3.0.22

  • Add generic term: "manager"

3.0.21

  • Reduce pattern complexity

3.0.20

  • Add Anonymous and bit.ly

3.0.19

  • Fix: It's not needed to download fixtures at postinstall

3.0.18

3.0.17

  • Add Neustar WPM
  • Internal change accommodates TypeScript compiler

3.0.16

  • Add pagespeed (Serf)
  • Add SmallProxy
  • Add CaptiveNetworkSupport

3.0.15

  • Recognise a bunch of more bots
  • Optimise some of the list so we still have the same length

3.0.14

  • Add Gozilla
  • Add PerimeterX Integration Services

3.0.13

  • Add Kubernetes probe bot (ping and health-check) @simonecorsi

3.0.12

3.0.11

  • Add 5538 known crawler user agent strings from myip.ms
  • Reduce complexity by 79 by introducing "https?:" pattern

3.0.10

3.0.9

  • Add Shared Web Credentials tool
  • Add Java runtime request
  • Add 2GDPR
  • Add GetRight
  • Add Pompos

3.0.8

  • Add SignalR client
  • Add FirePHP
  • Reduce complexity for UAs containing "amiga" (by 3)
  • Reduce complexity for UAs containing "download" (by 2)

3.0.7

  • Reduce pattern complexity by 14

3.0.6

  • Respond to crawler user agents added to user-agents.net/bots
  • ApplicationHealthService: Ping Service

3.0.5

3.0.4

  • Hexometer
  • Respond to crawler user agents added to user-agents.net/bots
  • Add an "ignoreList" to exclude user agents from user-agents.net

3.0.3

Add bots

  • Respond to crawler user agents added to user-agents.net/bots

3.0.2

Optimise pattern list

Combine all google products: Google browsers' user agent do not contain the word "Google".

Add bots

  • M4A1-WAPCHOI/2.0 (Java; U; MIDP-2.0; vi; NokiaC5-00.2) WAPCHOI/1.0.0 UCPro/9.4.1.377 U2/1.0.0 Mobile UNTRUSTED/1.0 3gpp-gba
  • Mozilla/5.0 (compatible; Domains Project/1.0.3; +https://github.com/tb0hdan/domains)

Overall reduces list by 25 rules (from 345 rules to 320)

3.0.1

Crawlers list update

Add patterns for:

  • Google WebLight Proxy
  • HighWinds Content Delivery System
  • Hydra by addthis
  • RebelMouse
  • Scanners: Jorgee Vulnerability, ClamAV Website, Burp Collaborator
  • Monitoring services: Xymon, AlertSite, Hobbit, updown.io, Monit, Dotcom

Testing

Add some legit browser user-agent strings Fix periodic tests environment Add a tester page to check user agents easily

3.0.0: Maintainability and performance through automation

The API and code has not changed

Breaking changes

  • Remove testing on node 6 and 8
  • Some crawlers list updates can potentially change identification

Non breaking changes

  • Improve efficiency of rule by optimising some parts and removing others

Testing

  • Automatically download crawlers lists for verification
  • Add tests to improve efficiency