Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
eilvelia committed Apr 28, 2024
0 parents commit a26ad71
Show file tree
Hide file tree
Showing 18 changed files with 60,162 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dist/*.js linguist-generated=true
*.flow linguist-language=JavaScript
21 changes: 21 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: CI
on: [push, pull_request]
jobs:
build-and-test:
name: 'Test / Node v${{ matrix.node }} / ${{ matrix.os }}'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os:
- ubuntu-latest
node:
- 20
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
cache: npm
- run: npm install
- run: npm test
- run: npm run lint
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules/
/*.txt
.DS_Store
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## v0.1.0

Initial release.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2024 https://github.com/Bannerets

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# porter2   [![npm](https://img.shields.io/npm/v/porter2.svg)](https://www.npmjs.com/package/porter2) [![CI](https://github.com/Bannerets/porter2.js/actions/workflows/ci.yml/badge.svg)](https://github.com/Bannerets/porter2.js/actions/workflows/ci.yml)

Fast JavaScript implementation of the [porter2] English [stemming] algorithm.

```console
$ npm install porter2
```

[porter2]: https://snowballstem.org/algorithms/english/stemmer.html
[stemming]: https://en.wikipedia.org/wiki/Stemming

## Usage

The package is simple: it has no dependencies and exports a single function
named `stem`.

Import using CommonJS:

```javascript
const { stem } = require('porter2')
```

Or, import using EcmaScript Modules (through interopability with CommonJS):

```javascript
import { stem } from 'porter2'
```

Use the stemmer:

```javascript
const word = stem('animadversion')
console.log(word) //=> animadvers
```

This stemmer expects lowercase text.

The code is compatible with ES5. TypeScript type declarations are included.

## Benchmarks

On my machine, the 29.4k test suite executes in ~10 ms (~3M/s throughput) in a
hot loop (~70ms for the first run).

The benchmark code is in `bench/index.js`.

Here is a comparison with some other libraries (you probably should take it with
a little grain of salt):

| library | throughput |
| ------------------------------------ | ------------- |
| porter2.js | 3120 kops/s |
| [stemr][] | 354 kops/s |
| [wink-porter2-stemmer][] [^1] | 168 kops/s |

[stemr]: https://github.com/localvoid/stemr
[wink-porter2-stemmer]: https://github.com/winkjs/wink-porter2-stemmer

Here are libraries that implement older porter 1 (note the behavior is not
identical):

| library | throughput |
| ------------------------------------ | ------------- |
| [porter-stemmer-js][] [^2] | 1430 kops/s |
| [stemmer][] [^3] | 1121 kops/s |
| [@stdlib/nlp-porter-stemmer][] | 839 kops/s |
| [porter-stemmer][] | 514 kops/s |

[porter-stemmer-js]: https://github.com/evi1Husky/PorterStemmer
[stemmer]: https://github.com/words/stemmer
[@stdlib/nlp-porter-stemmer]: https://github.com/stdlib-js/nlp-porter-stemmer
[porter-stemmer]: https://github.com/jedp/porter-stemmer

This is tested with Node.js v20.12.2. bun v1.1.4 shows a little bit different
but comparable results.

[^1]: 99.97% porter2 compliant (fails on `'` cases only)

[^2]: that one has similar goals and surprisingly was published just 3 days
before this package was released! (And after I started working on it.)

[^3]: ESM only
43 changes: 43 additions & 0 deletions bench/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
const fs = require('node:fs')
const path = require('node:path')

const toRequire = process.env.LIBRARY ?? '../dist'
console.log(`Requiring ${toRequire}`)
let stem = require(toRequire)
if (toRequire === 'wink-porter2-stemmer' || toRequire === 'porter-stemmer-js')
;
else if (toRequire === 'porter-stemmer')
stem = stem.stemmer
else
stem = stem.stem

let words = fs.readFileSync(path.join(__dirname, '..', 'test', 'english.txt'))
.toString()
.trim()
.split(/\r?\n/)

let average = null
// gc()
console.profile('aa')
for (let run = 1; run <= 30; run++) {
// words = words.map(x => x)
const startTime = performance.now()
// let out = ''
let i = 0
for (; i < words.length; i++) {
const result = stem(words[i])
// out += result
}
const endTime = performance.now()
// console.log(out)
const elapsed = endTime - startTime
console.log(elapsed)
if (run >= 5)
average = average == null
? elapsed
: ((run - 5) * average + elapsed) / (run - 4)
}
console.profileEnd('aa')

const ops = (words.length / average).toFixed(2)
console.log(`Average: ${average.toFixed(6)}ms (warmup: 4), ${ops}k ops/s`)
Loading

0 comments on commit a26ad71

Please sign in to comment.