包详细信息

isbot

omrilotan4.7mUnlicense5.1.27

🤖/👨‍🦰 Recognise bots/crawlers/spiders using the user agent string.

bot, crawlers, spiders, googlebot

自述文件

isbot 🤖/👨‍🦰

Identify bots, crawlers, and spiders using the user agent string.

Usage

Install

npm i isbot

Straightforward usage

import { isbot } from "isbot";

// Request
isbot(request.headers.get("User-Agent"));

// Nodejs HTTP
isbot(request.getHeader("User-Agent"));

// ExpressJS
isbot(req.get("user-agent"));

// Browser
isbot(navigator.userAgent);

// User Agent string
isbot(
  "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
); // true

isbot(
  "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
); // false

Use JSDeliver CDN you can import to the browser directly

See specific versions and instructions https://www.jsdelivr.com/package/npm/isbot

ESM

<script type="module">
  import { isbot } from "https://cdn.jsdelivr.net/npm/isbot@5/+esm";
  isbot(navigator.userAgent);
</script>

UMD

<script src="https://cdn.jsdelivr.net/npm/isbot@5"></script>
<script>
  // isbot is now global
  isbot(navigator.userAgent);
</script>

All named imports

import Type Description
isbot (string?): boolean Check if the user agent is a bot
isbotNaive (string?): boolean Check if the user agent is a bot using a naive pattern (less accurate)
getPattern (): RegExp The regular expression used to identify bots
list string[] List of all individual pattern parts
isbotMatch _(string?): string \ null_ The substring matched by the regular expression
isbotMatches (string?): string[] All substrings matched by the regular expression
isbotPattern _(string?): string \ null_ The regular expression used to identify bot substring in the user agent
isbotPatterns (string?): string[] All regular expressions used to identify bot substrings in the user agent
createIsbot (RegExp): (string?): boolean Create a custom isbot function
createIsbotFromList (string[]): (string?): boolean Create a custom isbot function from a list of string representation patterns

Example usages of helper functions

Create a custom isbot that does not consider Chrome Lighthouse user agent as bots.

import { createIsbotFromList, isbotMatches, list } from "isbot";

const ChromeLighthouseUserAgentStrings: string[] = [
  "mozilla/5.0 (macintosh; intel mac os x 10_15_7) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4590.2 safari/537.36 chrome-lighthouse",
  "mozilla/5.0 (linux; android 7.0; moto g (4)) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4590.2 mobile safari/537.36 chrome-lighthouse",
];
const patternsToRemove = new Set<string>(
  ChromeLighthouseUserAgentStrings.map(isbotMatches).flat(),
);
const isbot: (ua: string) => boolean = createIsbotFromList(
  list.filter(
    (record: string): boolean => patternsToRemove.has(record) === false,
  ),
);

Create a custom isbot that considers another pattern as a bot, which is not included in the package originally.

import { createIsbotFromList, list } from "isbot";

const isbot = createIsbotFromList(list.concat("shmulik"));

Definitions

  • Bot. Autonomous program imitating or replacing some aspect of a human behaviour, performing repetitive tasks much faster than human users could.
  • Good bot. Automated programs who visit websites in order to collect useful information. Web crawlers, site scrapers, stress testers, preview builders and other programs are welcomed on most websites because they serve purposes of mutual benefits.
  • Bad bot. Programs which are designed to perform malicious actions, ultimately hurting businesses. Testing credential databases, DDoS attacks, spam bots.

Clarifications

What does "isbot" do?

This package aims to identify "Good bots". Those who voluntarily identify themselves by setting a unique, preferably descriptive, user agent, usually by setting a dedicated request header.

What doesn't "isbot" do?

It does not try to recognise malicious bots or programs disguising themselves as real users.

Why would I want to identify good bots?

Recognising good bots such as web crawlers is useful for multiple purposes. Although it is not recommended to serve different content to web crawlers like Googlebot, you can still elect to

  • Flag pageviews to consider with business analysis.
  • Prefer to serve cached content and relieve service load.
  • Omit third party solutions' code (tags, pixels) and reduce costs.

It is not recommended to whitelist requests for any reason based on user agent header only. Instead, other methods of identification can be added such as reverse dns lookup.

How isbot maintains accuracy

isbot is an asset when it can most accurately identify bots by the user agent string. It uses expansive and regularly updated lists of user agent strings to create a regular expression that matches bots and only bots.

And above everything else, it is maintained by a community of contributers who help keep the list up to date.

Fallback

The pattern uses lookbehind methods which are not supported in all environments. A fallback is provided for environments that do not support lookbehind. The fallback is less accurate. The test suite includes a percentage of false positives and false negatives which is deemed acceptable for the fallback: 1% false positive and 75% bot coverage.

Data sources

We use external data sources on top of our own lists to keep up to date

Crawlers user agents

Non bot user agents

Missing something? Please open an issue

Major releases breaking changes (full changelog)

Version 5

Remove named export "pattern" from the interface, instead use "getPattern" method

Version 4

Remove isbot function default export in favour of a named export.

import { isbot } from "isbot";

Version 3

Remove testing for node 6 and 8

Version 2

Change return value for isbot: true instead of matched string

Version 1

No functional change

更新日志

Changelog

5.1.27

  • [Pattern] Pattern update

5.1.26

  • [Pattern] Pattern update

5.1.25

  • [Pattern] Pattern update: Reduce complexity

5.1.24

  • [Pattern] Pattern update: Add generic pattern, remove some specific patterns

5.1.23

  • [Pattern] Pattern updates

5.1.22

  • [Pattern] Pattern updates

5.1.21

  • [Pattern] Pattern updates

5.1.20

  • [Pattern] Pattern updates

5.1.19

  • [Pattern] Pattern updates

5.1.18

  • [Pattern] Pattern updates

5.1.17

  • [Pattern] Pattern updates for better recognition

5.1.16

  • [Pattern] Treat CCleaner broswer as an actual browser, not a bot

5.1.15

  • [Pattern] Pattern updates for better recognition

5.1.14

  • [Pattern] More accurate patterns for some substrings

5.1.13

5.1.12

5.1.11

  • [Pattern] Pattern updates

5.1.10

  • [Pattern] Pattern updates

5.1.9

  • [Pattern] A more careful match for RSS substring

5.1.8

  • [Pattern] Recognise timestamp in user agent string - is used to generate unique strings for each request

5.1.7

  • [Pattern] Ignore NewsSapphire in-app browser (news app)
  • [Pattern] Ignore locales with calendar in user agent

5.1.6

  • [FIX] Browser files (jsdeliver): UMD is global and ESM is named

5.1.5

  • Add substring "watch" to pattern

5.1.4

  • Recognise search providers inapp browsers
  • Ignore Crosswalk project: An old project that is no longer maintained and has insignificant usage
  • PDRL Analyzer

5.1.3

  • Recognise browsers: Ecosia ios in-app browser, Phantom in-app browser

5.1.2

  • Add bots: Cypress, Detectify, InternetMeasurement, BuiltWith
  • Recognise browser: Zip Recruiter job search app, Ecosia android in-app browser

5.1.1

  • Reduce pattern size by introducing the substring ".com" and improve generic pattern

5.1.0

  • Build now compatibile with older Javascript version: es2016

5.0.0

  • Remove named export "pattern" from the interface, instead use "getPattern" method
  • Add a couple of bot patterns

4.4.0

  • Add a naive fallback pattern for engines that do not support lookbehind in regular expressions
  • Add isbotNaive function to identify bots using a naive approach (simpler and faster)

4.3.0

  • Accept undefined in place of user agent string to allow headers property to be used "as is" (request.headers["user-agent"])

4.2.0

  • Accept null in place of user agent string to allow header value to be used "as is" (request.headers.get("user-agent"))

4.1.1

  • Recognise browsers with GMS Core (Google's Play Services) as natural non-bot browsers
  • A slightly neater typescript decleration file
  • Adjust "bot" pattern to recognise bot as a standalone word or word suffix (excluding "Cubot")
  • Recognise "rest-client" as a bot

4.1.0

  • Add createIsbotFromList: Create a custom isbot function from a list of string representation patterns
  • Recognise browsers with HMS Core (Huawei Mobile Services) as natural non-bot browsers

4.0.1

  • Pattern optimisation (performance improvement)

4.0.0

Breaking changes

This change is meant to reduce the size of the package and improve performance by building the regular expression in build time instead of runtime.

  • Change interface

    • Remove default import. Use named import instead: import { isbot } from "isbot";
    • Drop isbot attached functions from the interface. isbot.<SOMETHING> is no longer supported
  • Drop support for EOL node versions

New features

import { <SOMETHING> }  from "isbot";
import Type Description
pattern {RegExp} The regular expression used to identify bots
list {string[]} List of all individual pattern parts
isbotMatch _{(userAgent: string): string \ null}_ The substring matched by the regular expression
isbotMatches {(userAgent: string): string[]} All substrings matched by the regular expression
isbotPattern _{(userAgent: string): string \ null}_ The regular expression used to identify bot substring in the user agent
isbotPatterns {(userAgent: string): string[]} All regular expressions used to identify bot substrings in the user agent
createIsbot {(pattern: RegExp): (userAgent: string): boolean} Create a custom isbot function

3.8.0

  • Add "isbot.isbot" property and "isbot" named export to allow easier migration to version 4

3.7.1

  • Replace "ghost" with "inspect" to avoid false positives

3.7.0

  • Expose iife and support JSDeliver CDN

3.6.13

3.6.12

  • mem: Make a group non capturing

3.6.11

3.6.10

  • Adjust the "client" substring pattern

3.6.9

  • Adjust GOGGalaxy pattern
  • Update built files

3.6.8

3.6.7

  • Add PhantomJS substring

3.6.6

  • Add CryptoAPI to known bots list
  • Add Pageburst

3.6.6

  • Add CryptoAPI to known bots list

3.6.5

  • Improvement: List reduced by >50 patterns for a better one-word pattern

3.6.4

3.6.3

  • Adjust single word pattern: Add brackets

3.6.2

  • Recognise Uptime-Kuma/1.18.0
  • Reintroduce Yandex Search app exclusion

3.6.1

  • Edit list and exception patterns (more bots, simpler pattern)

3.6.0

  • Expose a copy of the regular expression pattern via isbot.pattern getter

3.5.4

  • Add strings starting with the word "nginx"

3.5.3

  • Fix for "Google Pixel" combination
  • Add strings starting with "custom"

3.5.2

  • Build supports more interpolation (transform class etc.)

3.5.1

  • Add SERP (Search Engine Results Pages) Reputation Management tools

3.5.0

  • Specify browser and node entries for require and import (resolves issue with jest 28)

3.4.8

  • Replace single space pattern with literal white space, which is more efficient
  • Add a more generic identifier to simplified user agent names

3.4.7

  • Add Zoom Webhook

3.4.6

  • Add nodejs native agent (undici)
  • Add random long string

3.4.5

  • Add CF-UC web crawler
  • Add TagInspector
  • Add Request-Pomise

3.4.4

3.4.3

  • Add Postman

3.4.2

  • Add generic term: "proxy"
  • Optimise "email" rule
  • Add Rexx

3.4.1

  • Add recognised bots user agent patterns

3.4.0

  • Add "matches" and "clear" to interface
  • Recognise axios/ user agent as bot

3.3.4

3.3.3

  • Add generic patterns (name/version) reduces pattern list size by >20%
  • Internal formatting

3.3.2

  • Remove const keyword from build (Fix)

3.3.1

  • Fix in type definition

3.3.0

  • Add "spawn" interface

3.2.4

  • Add some RSS readers detection

3.2.3

  • Refine amiga user agent detection

3.2.2

  • One mode duckduckgo pattern

3.2.1

  • Add bitdiscovery, Invision bot, ddg_android (duckduckgo), Braze, gobuster

3.2.0

New features

  • Typescript definition (isbot) supports any. Where a non-string argument is cast to a string before execution

3.1.0

New features

  • Native support for ESM and CommonJS
  • Start maintaining a security policy

List updates

  • Remove WAPCHOI from bot list
  • Recognise Google/google user agent for Android webview

3.0.27

  • Add a few known crawlers

3.0.26

  • Open source projects with indication to github.com

3.0.25

  • Address webview "Channel/googleplay", "GoogleApp/"
  • Add 4 more bot patterns
  • Stop treating Splash browser as bot

3.0.24

  • Add Prometheus new user agent (prometheus)
  • Add RestSharp .NET HTTP client
  • Add M2E Pro Cron Service
  • Add Deluge
  • Deprecate asafaweb.com (EOL)

3.0.23

  • Recognise Mozilla MozacFetch as natural non bot browser

3.0.22

  • Add generic term: "manager"

3.0.21

  • Reduce pattern complexity

3.0.20

  • Add Anonymous and bit.ly

3.0.19

  • Fix: It's not needed to download fixtures at postinstall

3.0.18

3.0.17

  • Add Neustar WPM
  • Internal change accommodates TypeScript compiler

3.0.16

  • Add pagespeed (Serf)
  • Add SmallProxy
  • Add CaptiveNetworkSupport

3.0.15

  • Recognise a bunch of more bots
  • Optimise some of the list so we still have the same length

3.0.14

  • Add Gozilla
  • Add PerimeterX Integration Services

3.0.13

  • Add Kubernetes probe bot (ping and health-check) @simonecorsi

3.0.12

3.0.11

  • Add 5538 known crawler user agent strings from myip.ms
  • Reduce complexity by 79 by introducing "https?:" pattern

3.0.10

3.0.9

  • Add Shared Web Credentials tool
  • Add Java runtime request
  • Add 2GDPR
  • Add GetRight
  • Add Pompos

3.0.8

  • Add SignalR client
  • Add FirePHP
  • Reduce complexity for UAs containing "amiga" (by 3)
  • Reduce complexity for UAs containing "download" (by 2)

3.0.7

  • Reduce pattern complexity by 14

3.0.6

  • Respond to crawler user agents added to user-agents.net/bots
  • ApplicationHealthService: Ping Service

3.0.5

3.0.4

  • Hexometer
  • Respond to crawler user agents added to user-agents.net/bots
  • Add an "ignoreList" to exclude user agents from user-agents.net

3.0.3

Add bots

  • Respond to crawler user agents added to user-agents.net/bots

3.0.2

Optimise pattern list

Combine all google products: Google browsers' user agent do not contain the word "Google".

Add bots

  • M4A1-WAPCHOI/2.0 (Java; U; MIDP-2.0; vi; NokiaC5-00.2) WAPCHOI/1.0.0 UCPro/9.4.1.377 U2/1.0.0 Mobile UNTRUSTED/1.0 3gpp-gba
  • Mozilla/5.0 (compatible; Domains Project/1.0.3; +https://github.com/tb0hdan/domains)

Overall reduces list by 25 rules (from 345 rules to 320)

3.0.1

Crawlers list update

Add patterns for:

  • Google WebLight Proxy
  • HighWinds Content Delivery System
  • Hydra by addthis
  • RebelMouse
  • Scanners: Jorgee Vulnerability, ClamAV Website, Burp Collaborator
  • Monitoring services: Xymon, AlertSite, Hobbit, updown.io, Monit, Dotcom

Testing

Add some legit browser user-agent strings Fix periodic tests environment Add a tester page to check user agents easily

3.0.0: Maintainability and performance through automation

The API and code has not changed

Breaking changes

  • Remove testing on node 6 and 8
  • Some crawlers list updates can potentially change identification

Non breaking changes

  • Improve efficiency of rule by optimising some parts and removing others

Testing

  • Automatically download crawlers lists for verification
  • Add tests to improve efficiency