Google-img-scrap

Scrap images from google images with customs pre filled dorking options

Update

See changelog

Found a bug ?

Tell it in my github issues dont be afraid :)

Installation

npm i google-img-scrap

Import

const {
  GOOGLE_IMG_SCRAP,
  GOOGLE_IMG_INVERSE_ENGINE_URL,
  GOOGLE_IMG_INVERSE_ENGINE_UPLOAD,
  GOOGLE_QUERY
} = require('google-img-scrap');
// OR
import {
  GOOGLE_IMG_SCRAP,
  GOOGLE_IMG_INVERSE_ENGINE_URL,
  GOOGLE_IMG_INVERSE_ENGINE_UPLOAD,
  GOOGLE_QUERY
} from 'google-img-scrap';

Options definition

"search" string what you want to search
"proxy" AxiosProxyConfig configure a proxy with axios proxy
"excludeWords" string[] exclude some words from the search
"domains" string[] filter by domains
"excludeDomains" string[] exclude some domains
"safeSearch" boolean active safe search or not for nsfw for example
"custom" string add extra query
"urlMatch" string[][] get image when an url match a string (example: "cdn") | example below
"filterByTitles" string[][] filter images by titles | example below
"query" GoogleQuery set a query (can be [TYPE, DATE, COLOR, SIZE, LICENCE, EXTENSION]) (use GOOGLE_QUERY items | example below
"limit" number to limit the size of the results

Result

{
  url: 'https://images.google.com/search?tbm=isch&tbs=&q=cats',
  search: "cats",
  result: [
    {
      id: 'K6Qd9XWnQFQCoM',
      title: 'Domestic cat',
      url: 'https://i.natgeofe.com/n/548467d8-c5f1-4551-9f58-6817a8d2c45e/NationalGeographic_2572187_2x1.jpg',
      originalUrl: 'https://www.nationalgeographic.com/animals/mammals/facts/domestic-cat',
      height: 1536,
      width: 3072
    },
    {
      id: 'HkevFQZ5DYu7oM',
      title: 'Cat - Wikipedia',
      url: 'https://upload.wikimedia.org/wikipedia/commons/1/15/Cat_August_2010-4.jpg',
      originalUrl: 'https://en.wikipedia.org/wiki/Cat',
      height: 2226,
      width: 3640
    },
    ...
  ]
}

How to use ?

Simple example

Search cats images

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats'
});

console.log(test);

Reverse search engine

The second parameter is like GOOGLE_IMG_SCRAP it include all type of options omitting search. (Omit<Config, "search">)

With an url (cost: 2 request)

const test = await GOOGLE_IMG_INVERSE_ENGINE_URL(
  'https://upload.wikimedia.org/wikipedia/commons/1/15/Cat_August_2010-4.jpg',
  { limit: 5 }
);

console.log(test);

With a local image (cost: 3 request)

const imageBuffer = fs.readFileSync('demonSlayer.png');
const test = await GOOGLE_IMG_INVERSE_ENGINE_UPLOAD(imageBuffer, {
  limit: 5
});

console.log(test);

Custom query

All query options are optional (see below for all the options) and need to be in uppercase. You can combine as much as you want. Find all possible query options below.

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  query: {
    TYPE: GOOGLE_QUERY.TYPE.CLIPART,
    LICENCE: GOOGLE_QUERY.LICENCE.COMMERCIAL_AND_OTHER,
    EXTENSION: GOOGLE_QUERY.EXTENSION.JPG
  }
});

console.log(test);

Limit result size

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  limit: 5
});

console.log(test);

Proxy

See axios documentation to setup the proxy

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  proxy: {
    protocol: 'https',
    host: 'example.com',
    port: 8080
  }
});

console.log(test);

Domains

Only scrap from a specific domain

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  domains: ['alamy.com', 'istockphoto.com', 'vecteezy.com']
});

console.log(test);

Exclude domains

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  excludeDomains: ['istockphoto.com', 'alamy.com']
});

console.log(test);

Exclude words

If you don' like black cats and white cats

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  excludeWords: ['black', 'white'] //If you don't like black cats and white cats
});

console.log(test);

Safe search (no nsfw)

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  safeSearch: false
});

console.log(test);

Custom query params

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  custom: 'name=content&name2=content2'
});

console.log(test);

How urlMatch and filterByTitles work ?

const test = await GOOGLE_IMG_SCRAP({
  search: 'cats',
  //will build something like this "(draw and white) or (albino and white)"
  filterByTitles: [
    ['draw', 'white'],
    ['albino', 'white']
  ],
  //will build something like this "(cdn and wikipedia) or (cdn istockphoto)"
  urlMatch: [
    ['cdn', 'wikipedia'],
    ['cdn', 'istockphoto']
  ]
});

console.log(test);

Google query

{
  SIZE: {
    LARGE,
    MEDIUM,
    ICON
  },
  COLOR: {
    BLACK_AND_WHITE,
    TRANSPARENT,
    RED,
    BLUE,
    PURPLE,
    ORANGE,
    YELLOW,
    GREEN,
    TEAL,
    PINK,
    WHITE,
    GRAY,
    BLACK,
    BROWN
  },
  TYPE: {
    CLIPART,
    DRAW,
    GIF
  },
  EXTENSION: {
    JPG,
    GIF,
    BMP,
    PNG,
    SVG,
    WEBP,
    ICO,
    RAW
  },
  DATE: {
    DAY,
    WEEK,
    MONTH,
    YEAR
  },
  LICENCE: {
    CREATIVE_COMMONS,
    COMMERCIAL_AND_OTHER
  }
}

Changelog

1.1.4

Fixed user agent to avoid bad image quality, errors and captcha (gohoski)

1.1.3

Some fixes

1.1.2

Fixed empty result
Removed average color

1.1.1

Fixed empty result

1.1.0

Added google image inverse search engine. You can now search images with a local image or with an image url.

1.0.9

Fixed many bugs
filterByTitles is now working
urlMatch added in types
All the code have been write back in typescript with a new structure
Removed execute
Added proxy configuration
Writed back all test with jest

1.0.8

Fixed "ERROR: Cannot assign to "queryName" because it is a constant" (by GaspardCulis)
Removed gstatic url
Added average color, id, title and originalUrl

1.0.7

Readme update

1.0.6

Fixed types
Added limit to limit the size of the results

1.0.5

Added types (by christophe77)

v1.0.4

New option urlMatch. You now get image when an url match a string (example: "cdn")
New option filterByTitles. Filter images by titles

v1.0.3

New option execute. allow you to execute a function to remove "gstatic.com" domains for example

v1.0.2

Cannot set 'domains' and 'excludeDomains' as same time
Fixed some bugs
New option excludeWords

v1.0.1

Added the missing dependencie

Detalhes do pacote

readme (leia-me)