Compare commits
No commits in common. 'master' and 'develop' have entirely different histories.
@ -1,5 +0,0 @@
@@ -1,5 +0,0 @@
|
||||
## Security contact information |
||||
|
||||
To report a security vulnerability, please use the |
||||
[Tidelift security contact](https://tidelift.com/security). |
||||
Tidelift will coordinate the fix and disclosure. |
@ -1,29 +0,0 @@
@@ -1,29 +0,0 @@
|
||||
# This workflow will do a clean install of node dependencies, build the source code and run tests across different versions of node |
||||
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-nodejs-with-github-actions |
||||
|
||||
name: Node.js CI |
||||
|
||||
on: |
||||
push: |
||||
branches: [ master ] |
||||
pull_request: |
||||
branches: [ master ] |
||||
|
||||
jobs: |
||||
build: |
||||
|
||||
runs-on: ubuntu-latest |
||||
|
||||
strategy: |
||||
matrix: |
||||
node-version: [14.x, 16.x] |
||||
|
||||
steps: |
||||
- uses: actions/checkout@v2 |
||||
- name: Use Node.js ${{ matrix.node-version }} |
||||
uses: actions/setup-node@v1 |
||||
with: |
||||
node-version: ${{ matrix.node-version }} |
||||
- run: npm ci |
||||
- run: npm run lint |
||||
- run: npm test |
@ -1,2 +0,0 @@
@@ -1,2 +0,0 @@
|
||||
FROM gitpod/workspace-full |
||||
RUN sudo apt-get update && sudo apt-get install -y libgtk-3-0 libx11-xcb1 libnss3 libxss1 libasound2 |
@ -1,9 +1,7 @@
@@ -1,9 +1,7 @@
|
||||
image: |
||||
file: .gitpod.Dockerfile |
||||
tasks: |
||||
- command: gp await-port 3000 && sleep 3 && gp preview $(gp url 3000)/examples/browser/demo.html |
||||
- init: npm install |
||||
command: npm start |
||||
ports: |
||||
- port: 3000 |
||||
onOpen: ignore |
||||
onOpen: ignore |
@ -0,0 +1,7 @@
@@ -0,0 +1,7 @@
|
||||
language: node_js |
||||
node_js: |
||||
- "lts/*" # Use LTS version |
||||
|
||||
script: |
||||
- npm run lint |
||||
- npm test |
@ -1,18 +1,17 @@
@@ -1,18 +1,17 @@
|
||||
# Image Format |
||||
|
||||
The main Tesseract.js functions (ex. recognize, detect) take an `image` parameter. The image formats and data types supported are listed below. |
||||
Support Format: **bmp, jpg, png, pbm** |
||||
|
||||
Support Image Formats: **bmp, jpg, png, pbm, webp** |
||||
The main Tesseract.js functions (ex. recognize, detect) take an `image` parameter, which should be something that is like an image. What's considered "image-like" differs depending on whether it is being run from the browser or through NodeJS. |
||||
|
||||
For browser and Node, supported data types are: |
||||
- string with base64 encoded image (fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp) |
||||
- buffer |
||||
On a browser, an image can be: |
||||
- an `img`, `video`, or `canvas` element |
||||
- a `File` object (from a file `<input>`) |
||||
- a `Blob` object |
||||
- a path or URL to an accessible image |
||||
- a base64 encoded image fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp |
||||
|
||||
For browser only, supported data types are: |
||||
- `File` or `Blob` object |
||||
- `img` or `canvas` element |
||||
|
||||
For Node only, supported data types are: |
||||
- string containing a path to local image |
||||
|
||||
Note: images must be a supported image format **and** a supported data type. For example, a buffer containing a png image is supported. A buffer containing raw pixel data is not supported. |
||||
In Node.js, an image can be |
||||
- a path to a local image |
||||
- a Buffer storing binary image |
||||
- a base64 encoded image fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp |
||||
|
Before Width: | Height: | Size: 105 KiB |
Before Width: | Height: | Size: 237 KiB |
@ -1,3 +1,3 @@
@@ -1,3 +1,3 @@
|
||||
# Tesseract Languages |
||||
|
||||
Please check [HERE](https://tesseract-ocr.github.io/tessdoc/Data-Files#data-files-for-version-400-november-29-2016) for supported languages |
||||
Please check [HERE](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-400-november-29-2016) for supported languages |
||||
|
@ -1,37 +0,0 @@
@@ -1,37 +0,0 @@
|
||||
<!DOCTYPE HTML> |
||||
<html> |
||||
<head> |
||||
<script src="/dist/tesseract.dev.js"></script> |
||||
</head> |
||||
<body> |
||||
<input type="file" id="uploader"> |
||||
<script> |
||||
const recognize = function(evt){ |
||||
const files = evt.target.files; |
||||
const worker = Tesseract.createWorker({ |
||||
/* |
||||
* As Edge don't support webassembly, |
||||
* here we force to use asm.js version. |
||||
*/ |
||||
corePath: '../../node_modules/tesseract.js-core/tesseract-core.asm.js', |
||||
logger: function(m){console.log(m);}, |
||||
/* |
||||
* As there is no indexedDB in earlier version |
||||
* of Edge, here we disable cache. |
||||
*/ |
||||
cacheMethod: 'none', |
||||
}); |
||||
Promise.resolve() |
||||
.then(() => worker.load()) |
||||
.then(() => worker.loadLanguage('eng')) |
||||
.then(() => worker.initialize('eng')) |
||||
.then(() => worker.recognize(files[0])) |
||||
.then((ret) => { |
||||
console.log(ret.data.text); |
||||
}); |
||||
} |
||||
const elm = document.getElementById('uploader'); |
||||
elm.addEventListener('change', recognize); |
||||
</script> |
||||
</body> |
||||
</html> |
@ -1,33 +0,0 @@
@@ -1,33 +0,0 @@
|
||||
<html> |
||||
<head> |
||||
<script src="/dist/tesseract.dev.js"></script> |
||||
</head> |
||||
<body> |
||||
<textarea id="message">Working...</textarea> |
||||
|
||||
<script> |
||||
const { createWorker } = Tesseract; |
||||
const worker = createWorker(); |
||||
(async () => { |
||||
await worker.load(); |
||||
await worker.loadLanguage('eng'); |
||||
await worker.initialize('eng'); |
||||
|
||||
const fileArr = ["../data/meditations.jpg", "../data/tyger.jpg", "../data/testocr.png"]; |
||||
let timeTotal = 0; |
||||
for (let file of fileArr) { |
||||
let time1 = Date.now(); |
||||
for (let i=0; i < 10; i++) { |
||||
await worker.recognize(file); |
||||
} |
||||
let time2 = Date.now(); |
||||
const timeDif = (time2 - time1) / 1e3; |
||||
timeTotal += timeDif; |
||||
document.getElementById('message').innerHTML += "\n" + file + " [x10] runtime: " + timeDif + "s"; |
||||
} |
||||
document.getElementById('message').innerHTML += "\nTotal runtime: " + timeTotal + "s"; |
||||
|
||||
})(); |
||||
</script> |
||||
</body> |
||||
</html> |
Before Width: | Height: | Size: 1011 KiB |
Before Width: | Height: | Size: 23 KiB |
Before Width: | Height: | Size: 408 KiB |
@ -1,27 +0,0 @@
@@ -1,27 +0,0 @@
|
||||
#!/usr/bin/env node
|
||||
const path = require('path'); |
||||
const { createWorker } = require('../../'); |
||||
|
||||
const worker = createWorker(); |
||||
|
||||
(async () => { |
||||
await worker.load(); |
||||
await worker.loadLanguage('eng'); |
||||
await worker.initialize('eng'); |
||||
const fileArr = ["../data/meditations.jpg", "../data/tyger.jpg", "../data/testocr.png"]; |
||||
let timeTotal = 0; |
||||
for (let file of fileArr) { |
||||
let time1 = Date.now(); |
||||
for (let i=0; i < 10; i++) { |
||||
await worker.recognize(file) |
||||
} |
||||
let time2 = Date.now(); |
||||
const timeDif = (time2 - time1) / 1e3; |
||||
timeTotal += timeDif; |
||||
|
||||
console.log(file + " [x10] runtime: " + timeDif + "s"); |
||||
} |
||||
console.log("Total runtime: " + timeTotal + "s"); |
||||
|
||||
await worker.terminate(); |
||||
})(); |
@ -1,20 +1,13 @@
@@ -1,20 +1,13 @@
|
||||
#!/usr/bin/env node
|
||||
const path = require('path'); |
||||
const { createWorker } = require('../../'); |
||||
const Tesseract = require('../../'); |
||||
|
||||
const [,, imagePath] = process.argv; |
||||
const image = path.resolve(__dirname, (imagePath || '../../tests/assets/images/cosmic.png')); |
||||
|
||||
console.log(`Recognizing ${image}`); |
||||
const worker = createWorker({ |
||||
logger: m => console.log(m), |
||||
}); |
||||
|
||||
(async () => { |
||||
await worker.load(); |
||||
await worker.loadLanguage('eng'); |
||||
await worker.initialize('eng'); |
||||
const { data: { text } } = await worker.recognize(image); |
||||
console.log(text); |
||||
await worker.terminate(); |
||||
})(); |
||||
Tesseract.recognize(image, 'eng', { logger: m => console.log(m) }) |
||||
.then(({ data: { text } }) => { |
||||
console.log(text); |
||||
}); |
||||
|
@ -1,13 +0,0 @@
@@ -1,13 +0,0 @@
|
||||
import commonjs from "@rollup/plugin-commonjs"; |
||||
|
||||
export default [ |
||||
{ |
||||
input: "dist/tesseract.min.js", |
||||
output: { |
||||
file: "dist/tesseract.esm.min.js", |
||||
format: "esm", |
||||
banner: "/* eslint-disable */", |
||||
}, |
||||
plugins: [commonjs()], |
||||
}, |
||||
]; |
@ -1,218 +0,0 @@
@@ -1,218 +0,0 @@
|
||||
/* |
||||
* languages with existing tesseract traineddata |
||||
* https://tesseract-ocr.github.io/tessdoc/Data-Files#data-files-for-version-400-november-29-2016
|
||||
*/ |
||||
|
||||
/** |
||||
* @typedef {object} Languages |
||||
* @property {string} AFR Afrikaans |
||||
* @property {string} AMH Amharic |
||||
* @property {string} ARA Arabic |
||||
* @property {string} ASM Assamese |
||||
* @property {string} AZE Azerbaijani |
||||
* @property {string} AZE_CYRL Azerbaijani - Cyrillic |
||||
* @property {string} BEL Belarusian |
||||
* @property {string} BEN Bengali |
||||
* @property {string} BOD Tibetan |
||||
* @property {string} BOS Bosnian |
||||
* @property {string} BUL Bulgarian |
||||
* @property {string} CAT Catalan; Valencian |
||||
* @property {string} CEB Cebuano |
||||
* @property {string} CES Czech |
||||
* @property {string} CHI_SIM Chinese - Simplified |
||||
* @property {string} CHI_TRA Chinese - Traditional |
||||
* @property {string} CHR Cherokee |
||||
* @property {string} CYM Welsh |
||||
* @property {string} DAN Danish |
||||
* @property {string} DEU German |
||||
* @property {string} DZO Dzongkha |
||||
* @property {string} ELL Greek, Modern (1453-) |
||||
* @property {string} ENG English |
||||
* @property {string} ENM English, Middle (1100-1500) |
||||
* @property {string} EPO Esperanto |
||||
* @property {string} EST Estonian |
||||
* @property {string} EUS Basque |
||||
* @property {string} FAS Persian |
||||
* @property {string} FIN Finnish |
||||
* @property {string} FRA French |
||||
* @property {string} FRK German Fraktur |
||||
* @property {string} FRM French, Middle (ca. 1400-1600) |
||||
* @property {string} GLE Irish |
||||
* @property {string} GLG Galician |
||||
* @property {string} GRC Greek, Ancient (-1453) |
||||
* @property {string} GUJ Gujarati |
||||
* @property {string} HAT Haitian; Haitian Creole |
||||
* @property {string} HEB Hebrew |
||||
* @property {string} HIN Hindi |
||||
* @property {string} HRV Croatian |
||||
* @property {string} HUN Hungarian |
||||
* @property {string} IKU Inuktitut |
||||
* @property {string} IND Indonesian |
||||
* @property {string} ISL Icelandic |
||||
* @property {string} ITA Italian |
||||
* @property {string} ITA_OLD Italian - Old |
||||
* @property {string} JAV Javanese |
||||
* @property {string} JPN Japanese |
||||
* @property {string} KAN Kannada |
||||
* @property {string} KAT Georgian |
||||
* @property {string} KAT_OLD Georgian - Old |
||||
* @property {string} KAZ Kazakh |
||||
* @property {string} KHM Central Khmer |
||||
* @property {string} KIR Kirghiz; Kyrgyz |
||||
* @property {string} KOR Korean |
||||
* @property {string} KUR Kurdish |
||||
* @property {string} LAO Lao |
||||
* @property {string} LAT Latin |
||||
* @property {string} LAV Latvian |
||||
* @property {string} LIT Lithuanian |
||||
* @property {string} MAL Malayalam |
||||
* @property {string} MAR Marathi |
||||
* @property {string} MKD Macedonian |
||||
* @property {string} MLT Maltese |
||||
* @property {string} MSA Malay |
||||
* @property {string} MYA Burmese |
||||
* @property {string} NEP Nepali |
||||
* @property {string} NLD Dutch; Flemish |
||||
* @property {string} NOR Norwegian |
||||
* @property {string} ORI Oriya |
||||
* @property {string} PAN Panjabi; Punjabi |
||||
* @property {string} POL Polish |
||||
* @property {string} POR Portuguese |
||||
* @property {string} PUS Pushto; Pashto |
||||
* @property {string} RON Romanian; Moldavian; Moldovan |
||||
* @property {string} RUS Russian |
||||
* @property {string} SAN Sanskrit |
||||
* @property {string} SIN Sinhala; Sinhalese |
||||
* @property {string} SLK Slovak |
||||
* @property {string} SLV Slovenian |
||||
* @property {string} SPA Spanish; Castilian |
||||
* @property {string} SPA_OLD Spanish; Castilian - Old |
||||
* @property {string} SQI Albanian |
||||
* @property {string} SRP Serbian |
||||
* @property {string} SRP_LATN Serbian - Latin |
||||
* @property {string} SWA Swahili |
||||
* @property {string} SWE Swedish |
||||
* @property {string} SYR Syriac |
||||
* @property {string} TAM Tamil |
||||
* @property {string} TEL Telugu |
||||
* @property {string} TGK Tajik |
||||
* @property {string} TGL Tagalog |
||||
* @property {string} THA Thai |
||||
* @property {string} TIR Tigrinya |
||||
* @property {string} TUR Turkish |
||||
* @property {string} UIG Uighur; Uyghur |
||||
* @property {string} UKR Ukrainian |
||||
* @property {string} URD Urdu |
||||
* @property {string} UZB Uzbek |
||||
* @property {string} UZB_CYRL Uzbek - Cyrillic |
||||
* @property {string} VIE Vietnamese |
||||
* @property {string} YID Yiddish |
||||
*/ |
||||
|
||||
/** |
||||
* @type {Languages} |
||||
*/ |
||||
module.exports = { |
||||
AFR: 'afr', |
||||
AMH: 'amh', |
||||
ARA: 'ara', |
||||
ASM: 'asm', |
||||
AZE: 'aze', |
||||
AZE_CYRL: 'aze_cyrl', |
||||
BEL: 'bel', |
||||
BEN: 'ben', |
||||
BOD: 'bod', |
||||
BOS: 'bos', |
||||
BUL: 'bul', |
||||
CAT: 'cat', |
||||
CEB: 'ceb', |
||||
CES: 'ces', |
||||
CHI_SIM: 'chi_sim', |
||||
CHI_TRA: 'chi_tra', |
||||
CHR: 'chr', |
||||
CYM: 'cym', |
||||
DAN: 'dan', |
||||
DEU: 'deu', |
||||
DZO: 'dzo', |
||||
ELL: 'ell', |
||||
ENG: 'eng', |
||||
ENM: 'enm', |
||||
EPO: 'epo', |
||||
EST: 'est', |
||||
EUS: 'eus', |
||||
FAS: 'fas', |
||||
FIN: 'fin', |
||||
FRA: 'fra', |
||||
FRK: 'frk', |
||||
FRM: 'frm', |
||||
GLE: 'gle', |
||||
GLG: 'glg', |
||||
GRC: 'grc', |
||||
GUJ: 'guj', |
||||
HAT: 'hat', |
||||
HEB: 'heb', |
||||
HIN: 'hin', |
||||
HRV: 'hrv', |
||||
HUN: 'hun', |
||||
IKU: 'iku', |
||||
IND: 'ind', |
||||
ISL: 'isl', |
||||
ITA: 'ita', |
||||
ITA_OLD: 'ita_old', |
||||
JAV: 'jav', |
||||
JPN: 'jpn', |
||||
KAN: 'kan', |
||||
KAT: 'kat', |
||||
KAT_OLD: 'kat_old', |
||||
KAZ: 'kaz', |
||||
KHM: 'khm', |
||||
KIR: 'kir', |
||||
KOR: 'kor', |
||||
KUR: 'kur', |
||||
LAO: 'lao', |
||||
LAT: 'lat', |
||||
LAV: 'lav', |
||||
LIT: 'lit', |
||||
MAL: 'mal', |
||||
MAR: 'mar', |
||||
MKD: 'mkd', |
||||
MLT: 'mlt', |
||||
MSA: 'msa', |
||||
MYA: 'mya', |
||||
NEP: 'nep', |
||||
NLD: 'nld', |
||||
NOR: 'nor', |
||||
ORI: 'ori', |
||||
PAN: 'pan', |
||||
POL: 'pol', |
||||
POR: 'por', |
||||
PUS: 'pus', |
||||
RON: 'ron', |
||||
RUS: 'rus', |
||||
SAN: 'san', |
||||
SIN: 'sin', |
||||
SLK: 'slk', |
||||
SLV: 'slv', |
||||
SPA: 'spa', |
||||
SPA_OLD: 'spa_old', |
||||
SQI: 'sqi', |
||||
SRP: 'srp', |
||||
SRP_LATN: 'srp_latn', |
||||
SWA: 'swa', |
||||
SWE: 'swe', |
||||
SYR: 'syr', |
||||
TAM: 'tam', |
||||
TEL: 'tel', |
||||
TGK: 'tgk', |
||||
TGL: 'tgl', |
||||
THA: 'tha', |
||||
TIR: 'tir', |
||||
TUR: 'tur', |
||||
UIG: 'uig', |
||||
UKR: 'ukr', |
||||
URD: 'urd', |
||||
UZB: 'uzb', |
||||
UZB_CYRL: 'uzb_cyrl', |
||||
VIE: 'vie', |
||||
YID: 'yid', |
||||
}; |
@ -1,21 +1,10 @@
@@ -1,21 +1,10 @@
|
||||
const isElectron = require('is-electron'); |
||||
|
||||
module.exports = (key) => { |
||||
const env = {}; |
||||
|
||||
if (typeof WorkerGlobalScope !== 'undefined') { |
||||
env.type = 'webworker'; |
||||
} else if (isElectron()) { |
||||
env.type = 'electron'; |
||||
} else if (typeof window === 'object') { |
||||
env.type = 'browser'; |
||||
} else if (typeof process === 'object' && typeof require === 'function') { |
||||
env.type = 'node'; |
||||
} |
||||
const env = { |
||||
type: (typeof window !== 'undefined') && (typeof window.document !== 'undefined') ? 'browser' : 'node', |
||||
}; |
||||
|
||||
if (typeof key === 'undefined') { |
||||
return env; |
||||
} |
||||
|
||||
return env[key]; |
||||
}; |
||||
|
@ -0,0 +1 @@
@@ -0,0 +1 @@
|
||||
module.exports = require('resolve-url'); |
@ -0,0 +1 @@
@@ -0,0 +1 @@
|
||||
module.exports = s => s; |
@ -1,18 +0,0 @@
@@ -1,18 +0,0 @@
|
||||
<html> |
||||
<head> |
||||
<meta charset="utf-8"> |
||||
<link rel="stylesheet" href="../node_modules/mocha/mocha.css"> |
||||
</head> |
||||
<body> |
||||
<div id="mocha"></div> |
||||
<script src="../node_modules/mocha/mocha.js"></script> |
||||
<script src="../node_modules/expect.js/index.js"></script> |
||||
<script src="../dist/tesseract.dev.js"></script> |
||||
<script src="./constants.js"></script> |
||||
<script>mocha.setup('bdd');</script> |
||||
<script src="./FS.test.js"></script> |
||||
<script> |
||||
mocha.run(); |
||||
</script> |
||||
</body> |
||||
</html> |
@ -1,37 +0,0 @@
@@ -1,37 +0,0 @@
|
||||
const { createWorker } = Tesseract; |
||||
const FS_WAIT = 500; |
||||
const worker = createWorker(OPTIONS); |
||||
before(function cb() { |
||||
this.timeout(0); |
||||
return worker.load(); |
||||
}); |
||||
|
||||
describe('FS', async () => { |
||||
it('should write and read text from FS (using FS only)', () => { |
||||
[ |
||||
SIMPLE_TEXT, |
||||
].forEach(async (text) => { |
||||
const path = 'tmp.txt'; |
||||
await worker.FS('writeFile', [path, SIMPLE_TEXT]); |
||||
setTimeout(async () => { |
||||
const { data } = await worker.FS('readFile', [path]); |
||||
await worker.FS('unlink', [path]); |
||||
expect(data.toString()).to.be(text); |
||||
}, FS_WAIT); |
||||
}); |
||||
}).timeout(TIMEOUT); |
||||
|
||||
it('should write and read text from FS (using writeFile, readFile)', () => { |
||||
[ |
||||
SIMPLE_TEXT, |
||||
].forEach(async (text) => { |
||||
const path = 'tmp2.txt'; |
||||
await worker.writeText(path, SIMPLE_TEXT); |
||||
setTimeout(async () => { |
||||
const { data } = await worker.readText(path); |
||||
await worker.removeFile(path); |
||||
expect(data.toString()).to.be(text); |
||||
}, FS_WAIT); |
||||
}); |
||||
}).timeout(TIMEOUT); |
||||
}); |
Before Width: | Height: | Size: 1011 B |
Before Width: | Height: | Size: 3.7 KiB |