

npm install compromise








js
import nlp from 'compromise'
let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

if (doc.has('simon says #Verb')) {
return true
}

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"


js
import plg from 'compromise-speech'
nlp.extend(plg)
let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
"text": "Milwaukee",
"terms": [{
"normal": "milwaukee",
"syllables": ["mil", "wau", "kee"]
}]
}]
*/

avoid the problems of brittle parsers:
let doc = nlp("we're not gonna take it..")
doc.has('gonna') // true
doc.has('going to') // true (implicit)
// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'


js
let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'


js
let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

Use it on the client-side:
<script src="https://unpkg.com/compromise"></script>
<script>
var doc = nlp('two bottles of beer')
doc.numbers().minus(1)
document.body.innerHTML = doc.text()
// 'one bottle of beer'
</script>
or likewise:
import nlp from 'compromise'
var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'
compromise is ~250kb (minified):
it's pretty fast. It can run on keypress:
it works mainly by conjugating all forms of a basic word list.
The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.
okay -
compromise/one
A tokenizer
of words, sentences, and punctuation.

js
import nlp from 'compromise/one'
let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
normal:"wayne's world party time",
terms:[{ text: "Wayne's", normal: "wayne" },
...
]
}]
*/
-
and does nothing else -


compromise/two
A part-of-speech
tagger, and grammar-interpreter.

js
import nlp from 'compromise/two'
let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

doc.debug()
you can see the reasoning for each tag with nlp.verbose('tagger')
.
if you prefer Penn tags, you can derive them with:
js
let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

compromise/three
Phrase
and sentence tooling.

js
import nlp from 'compromise/three'
let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"
.numbers()
grabs all the numbers in a document, for example - and extends it with new methods, like .subtract()
.
When you have a phrase, or group of words, you can see additional metadata about it with .json()
js
let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
text: 'four out of five',
terms: [ [Object], [Object], [Object], [Object] ],
fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
}
]*/
js
let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
text: '$4.09CAD',
terms: [ [Object] ],
number: { prefix: '$', num: 4.09, suffix: 'cad'}
}
]*/

nlp
object)
- nlp.tokenize(str) - parse text without running POS-tagging
- nlp.lazy(str, match) - scan through a text with minimal analysis
- nlp.plugin({}) - mix in a compromise-plugin
- nlp.parseMatch(str) - pre-parse any match statements into json
- nlp.world() - grab or change library internals
- nlp.model() - grab all current linguistic data
- nlp.methods() - grab or change internal methods
- nlp.hooks() - see which compute methods run automatically
- nlp.verbose(mode) - log our decision-making for debugging
- nlp.version - current semver version of the library
- nlp.addWords(obj, isFrozen?) - add new words to the lexicon
- nlp.addTags(obj) - add new tags to the tagSet
- nlp.typeahead(arr) - add words to the auto-fill dictionary
- nlp.buildTrie(arr) - compile a list of words into a fast lookup form
- nlp.buildNet(arr) - compile a list of matches into a fast match form


'football captain' → 'football captains'
- .nouns().toSingular() - 'turnovers' → 'turnover'
- .nouns().adjectives() - get any adjectives describing this noun
##### Verbs
- .verbs() - return any subsequent terms tagged as a Verb
- .verbs().json() - overloaded output with verb metadata
- .verbs().parse() - get tokenized verb-phrase
- .verbs().subjects() - what is doing the verb action
- .verbs().adverbs() - return the adverbs describing this verb.
- .verbs().isSingular() - return singular verbs like 'spencer walks'
- .verbs().isPlural() - return plural verbs like 'we walk'
- .verbs().isImperative() - only instruction verbs like 'eat it!'
- .verbs().toPastTense() - 'will go' → 'went'
- .verbs().toPresentTense() - 'walked' → 'walks'
- .verbs().toFutureTense() - 'walked' → 'will walk'
- .verbs().toInfinitive() - 'walks' → 'walk'
- .verbs().toGerund() - 'walks' → 'walking'
- .verbs().toPastParticiple() - 'drive' → 'had driven'
- .verbs().conjugate() - return all conjugations of these verbs
- .verbs().isNegative() - return verbs with 'not', 'never' or 'no'
- .verbs().isPositive() - only verbs without 'not', 'never' or 'no'
- .verbs().toNegative() - 'went' → 'did not go'
- .verbs().toPositive() - "didn't study" → 'studied'
##### Numbers
- .numbers() - grab all written and numeric values
- .numbers().parse() - get tokenized number phrase
- .numbers().get() - get a simple javascript number
- .numbers().json() - overloaded output with number metadata
- .numbers().toNumber() - convert 'five' to 5
- .numbers().toLocaleString() - add commas, or nicer formatting for numbers
- .numbers().toText() - convert '5' to five
- .numbers().toOrdinal() - convert 'five' to fifth
or 5th
- .numbers().toCardinal() - convert 'fifth' to five
or 5
- .numbers().isOrdinal() - return only ordinal numbers
- .numbers().isCardinal() - return only cardinal numbers
- .numbers().isEqual(n) - return numbers with this value
- .numbers().greaterThan(min) - return numbers bigger than n
- .numbers().lessThan(max) - return numbers smaller than n
- .numbers().between(min, max) - return numbers between min and max
- .numbers().isUnit(unit) - return only numbers in the given unit, like 'km'
- .numbers().set(n) - set number to n
- .numbers().add(n) - increase number by n
- .numbers().subtract(n) - decrease number by n
- .numbers().increment() - increase number by 1
- .numbers().decrement() - decrease number by 1
- .money() - things like '$2.50'
- .money().get() - retrieve the parsed amount(s) of money
- .money().json() - currency + number info
- .money().currency() - which currency the money is in
- .fractions() - like '2/3rds' or 'one out of five'
- .fractions().parse() - get tokenized fraction
- .fractions().get() - simple numerator, denominator data
- .fractions().json() - json method overloaded with fractions data
- .fractions().toDecimal() - '2/3' -> '0.66'
- .fractions().normalize() - 'four out of 10' -> '4/10'
- .fractions().toText() - '4/10' -> 'four tenths'
- .fractions().toPercentage() - '4/10' -> '40%'
- .percentages() - like '2.5%'
- .percentages().get() - return the percentage number / 100
- .percentages().json() - json overloaded with percentage information
- .percentages().toFraction() - '80%' -> '8/10'
##### Sentences
- .sentences() - return a sentence class with additional methods
- .sentences().json() - overloaded output with sentence metadata
- .sentences().toPastTense() - he walks
-> he walked
- .sentences().toPresentTense() - he walked
-> he walks
- .sentences().toFutureTense() -- he walks
-> he will walk
- .sentences().toInfinitive() -- verb root-form he walks
-> he walk
- .sentences().toNegative() - - he walks
-> he didn't walk
- .sentences().isQuestion() - return questions with a ?
- .sentences().isExclamation() - return sentences with a !
- .sentences().isStatement() - return sentences without ?
or !
##### Adjectives
- .adjectives() - things like 'quick'
- .adjectives().json() - get adjective metadata
- .adjectives().conjugate() - return all inflections of these adjectives
- .adjectives().adverbs() - get adverbs describing this adjective
- .adjectives().toComparative() - 'quick' -> 'quicker'
- .adjectives().toSuperlative() - 'quick' -> 'quickest'
- .adjectives().toAdverb() - 'quick' -> 'quickly'
- .adjectives().toNoun() - 'quick' -> 'quickness'
##### Misc selections
- .clauses() - split-up sentences into multi-term phrases
- .chunks() - split-up sentences noun-phrases and verb-phrases
- .hyphenated() - all terms connected with a hyphen or dash like 'wash-out'
- .phoneNumbers() - things like '(939) 555-0113'
- .hashTags() - things like '#nlp'
- .emails() - things like 'hi@compromise.cool'
- .emoticons() - things like :)
- .emojis() - things like 💋
- .atMentions() - things like '@nlp_compromise'
- .urls() - things like 'compromise.cool'
- .pronouns() - things like 'he'
- .conjunctions() - things like 'but'
- .prepositions() - things like 'of'
- .abbreviations() - things like 'Mrs.'
- .people() - names like 'John F. Kennedy'
- .people().json() - get person-name metadata
- .people().parse() - get person-name interpretation
- .places() - like 'Paris, France'
- .organizations() - like 'Google, Inc'
- .topics() - people()
+ places()
+ organizations()
- .adverbs() - things like 'quickly'
- .adverbs().json() - get adverb metadata
- .acronyms() - things like 'FBI'
- .acronyms().strip() - remove periods from acronyms
- .acronyms().addPeriods() - add periods to acronyms
- .parentheses() - return anything inside (parentheses)
- .parentheses().strip() - remove brackets
- .possessives() - things like "Spencer's"
- .possessives().strip() - "Spencer's" -> "Spencer"
- .quotations() - return any terms inside paired quotation marks
- .quotations().strip() - remove quotation marks
- .slashes() - return any terms grouped by slashes
- .slashes().split() - turn 'love/hate' into 'love hate'

.extend():
This library comes with a considerate, common-sense baseline for english grammar.
You're free to change, or lay-waste to any settings - which is the fun part actually.
the easiest part is just to suggest tags for any given words:
let myWords = {
kermit: 'FirstName',
fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)
or make heavier changes with a compromise-plugin.
import nlp from 'compromise'
nlp.extend({
// add new tags
tags: {
Character: {
isA: 'Person',
notA: 'Adjective',
},
},
// add or change words in the lexicon
words: {
kermit: 'Character',
gonzo: 'Character',
},
// change inflections
irregulars: {
get: {
pastTense: 'gotten',
gerund: 'gettin',
},
},
// add new methods to compromise
api: View => {
View.prototype.kermitVoice = function () {
this.sentences().prepend('well,')
this.match('i [(am|was)]').prepend('um,')
return this
}
},
})


Docs:
gentle introduction:

Documentation:

Talks:
- Language as an Interface - by Spencer Kelly
- Coding Chat Bots - by KahWee Teng
- On Typing and data - by Spencer Kelly
Articles:
- Geocoding Social Conversations with NLP and JavaScript - by Microsoft
- Microservice Recipe - by Eventn
- Adventure Game Sentence Parsing with Compromise
- Building Text-Based Games - by Matt Eland
- Fun with javascript in BigQuery - by Felipe Hoffa
- Natural Language Processing... in the Browser? - by Charles Landau
Some fun Applications:
- Automated Bechdel Test - by The Guardian
- Story generation framework - by Jose Phrocca
- Tumbler blog of lists - horse-ebooks-like lists - by Michael Paulukonis
- Video Editing from Transcription - by New Theory
- Browser extension Fact-checking - by Alexander Kidd
- Siri shortcut - by Michael Byrns
- Amazon skill - by Tajddin Maghni
- Tasking Slack-bot - by Kevin Suh [see more]
Comparisons


Plugins:
These are some helpful extensions:
Dates
npm install compromise-dates
- .dates() - find dates like
June 8th
or03/03/18
- .dates().get() - simple start/end json result
- .dates().json() - overloaded output with date metadata
- .dates().format('') - convert the dates to specific formats
- .dates().toShortForm() - convert 'Wednesday' to 'Wed', etc
- .dates().toLongForm() - convert 'Feb' to 'February', etc
- .durations() -
2 weeks
or5mins
- .durations().get() - return simple json for duration
- .durations().json() - overloaded output with duration metadata
- .times() -
4:30pm
orhalf past five
- .times().get() - return simple json for times
- .times().json() - overloaded output with time metadata
Stats
npm install compromise-stats
.tfidf({}) - rank words by frequency and uniqueness
.ngrams({}) - list all repeating sub-phrases, by word-count
- .unigrams() - n-grams with one word
- .bigrams() - n-grams with two words
- .trigrams() - n-grams with three words
- .startgrams() - n-grams including the first term of a phrase
- .endgrams() - n-grams including the last term of a phrase
- .edgegrams() - n-grams including the first or last term of a phrase
Speech
npm install compromise-syllables
- .syllables() - split each term by its typical pronunciation
- .soundsLike() - produce a estimated pronunciation
Wikipedia
npm install compromise-wikipedia
- .wikipedia() - compressed article reconciliation

Typescript
we're committed to typescript/deno support, both in main and in the official-plugins:
import nlp from 'compromise'
import stats from 'compromise-stats'
const nlpEx = nlp.extend(stats)
nlpEx('This is type safe!').ngrams({ min: 1 })

Limitations:
slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work:
nlp('the koala eats/shoots/leaves').has('koala leaves') //false
inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin:
nlp("that's it. Back to Winnipeg!").has('it back')//false
nested match syntax: the
dangerbeauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible:doc.match('(modern (major|minor))? general')
complex matches must be achieved with successive .match() statements.dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.
FAQ
-
Only if it's water-proof!
Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.
-
we do offer a tokenize-only build, which has the POS-tagger pulled-out.
but otherwise, compromise isn't easily tree-shaken.
the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
Note that without a full POS-tagging, the contraction-parser won't work perfectly. ((spencer's cool) vs. (spencer's house))
It's recommended to run the library fully.

See Also:
- en-pos - very clever javascript pos-tagger by Alex Corvi
- naturalNode - fancier statistical nlp in javascript
- winkJS - POS-tagger, tokenizer, machine-learning in javascript
- dariusk/pos-js - fastTag fork in javascript
- compendium-js - POS and sentiment analysis in javascript
- nodeBox linguistics - conjugation, inflection in javascript
- reText - very impressive text utilities in javascript
- superScript - conversation engine in js
jsPos - javascript build of the time-tested Brill-tagger
spaCy - speedy, multilingual tagger in C/python
- Prose - quick tagger in Go by Joseph Kato
- TextBlob - python tagger
MIT