diff --git a/README.md b/README.md index dbc5e10..f3977cb 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ -Parse (jquery.parse) Plugin -=========================== +Parse CSV with Javascript +======================================== -The jQuery Parse plugin is a robust and efficient CSV (character-separated values) parser with these features: +Papa Parse (formerly the jQuery Parse Plugin) is a robust and powerful CSV (character-separated values) parser with these features: - Parses delimited text strings without any fuss - Attach to `` elements to load and parse files from disk - Automatically detects delimiter (or specify a delimiter yourself) -- Header row support +- Supports streaming large inputs +- Utilize the header row, if present - Gracefully handles malformed data - Optional dynamic typing so that numeric data is parsed as numbers - Descriptive and contextual errors @@ -16,380 +17,16 @@ The jQuery Parse plugin is a robust and efficient CSV (character-separated value Demo ---- -**[jsFIDDLE DEMO](http://jsfiddle.net/mholt/nCaee/)** - -Or download the repository and open `index.html` in your browser. - +Visit **[PapaParse.com](http://papaparse.com/#demo)** to give Papa a whirl! Get Started ----------- -For production: [jquery.parse.min.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.min.js) - -For debug/dev: [jquery.parse.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.js) - - - -### Config object - -Any time you invoke the parser, you may customize it using a "config" object. It supports these properties: - -| Option | Default | Description -|-------------------- | ------- | --------------- -| **`delimiter`** | ` ` | The delimiting character. Leave blank to auto-detect. If you specify a delimiter, it must be a string of length 1, and cannot be `\n`, `\r`, or `"`. -| **`header`** | `true` | If true, interpret the first row of parsed data as column titles; fields are returned separately from the data, and data will be returned keyed to its field name. Duplicate field names would be problematic. If false, the parser simply returns an array (list) of arrays (rows), including the first row. -| **`dynamicTyping`** | `true` | If true, fields that are only numeric will be converted to a number type. If false, each parsed datum is returned as a string. -| **`preview`** | `0` | If preview > 0, only that many rows will be parsed. - - - - - - -### Parsing strings - -To parse a delimited text string with default settings, simply do: - -```javascript -var results = $.parse(csvString); -``` - -Or to customize the settings, pass in a config object with any properties you wish to change: - -```javascript -var results = $.parse(csvString, { - delimiter: "\t", - header: false, - dynamicTyping: false, - preview: 10 -}); -``` - - - - -### Parsing files - -You can parse multiple files from multiple `` elements like so, where each property is optional: - -```javascript -$('input[type=file]').parse({ - config: { - // base settings to use for each file - }, - before: function(file, inputElem) - { - // executed before parsing each file begins; - // see documentation for how return values - // affect the behavior of the plugin - }, - error: function(err, file, inputElem) - { - // executed if an error occurs during loading the file, - // or if the file being iterated is the wrong type, - // or if the input element has no files selected - }, - complete: function(results, file, inputElem, event) - { - // executed when parsing each file completes; - // this function receives the parse results - } -}); -``` - -In order to be parsed, a file must have "text" in its MIME type. - - - - -#### Callbacks - -As indicated above, there are callbacks you can use when parsing files. - - - -##### `before(file, inputElem)` - -If the next file in the queue is found to be some sort of "text" MIME type, this callback will be executed immediately before setting up the FileReader, loading the file, and parsing it. It receives the file object and the `` element so you can inspect the file to be parsed. - -You can change what happens next depending on what you return: - -- Return `"skip"` to skip parsing this file. -- Return `false` to abort parsing this and all other files in the queue. -- Return a config object to alter the options for parsing this file only. - -Returning anything else, including `undefined`, continues without any changes. - - -##### `error(err, file, inputElem)` - -Invoked if there is an error loading the file. It receives an object that implements the [`DOMError`](https://developer.mozilla.org/en-US/docs/Web/API/DOMError) interface (i.e. call `err.name` to get the error), the file object at hand, and the `` element from which the file was selected. - -Errors can occur before reading the file if: - -- the HTML element has no files chosen -- a file chosen is not a "text" type (e.g. "text/csv" or "text/plain") -- a user-defined callback function (`before`) aborted the process - -Otherwise, errors are invoked by FileReader when opening the file. *Parse errors are not reported here, but are reported in the results later on.* - +Use [jquery.parse.min.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.min.js) for production. -##### `complete(results, file, inputElem, event)` - -Invoked when parsing a file completes. It receives the results of the parse (including errors), the file object, the `` element from which the file was chosen, and the FileReader-generated event. - - - - -Output ------- - -Whether you're parsing strings or files, the results returned by the parser are the same since, under the hood, the FileReader loads a file as a string. - -The results will always have this basic structure: - -```javascript -{ - results: // parse results - errors: // parse errors, keyed by row -} -``` - -If no delimiter is specified and a delimiter cannot be auto-detected, an error keyed by "config" will be produced, and a default delimiter will be chosen. - -**Example input:** - - Item,SKU,Cost,Quantity - Book,ABC1234,10.95,4 - Movie,DEF5678,29.99,3 - - -### Results if `header: true` and `dynamicTyping: true` - -With a header row, each value is keyed to its field name, so the result is an object with `fields` and `rows`. The fields are an array of strings, and the rows are an array of objects: - -```json -{ - "results": { - "fields": [ - "Item", - "SKU", - "Cost", - "Quantity" - ], - "rows": [ - { - "Item": "Book", - "SKU": "ABC1234", - "Cost": 10.95, - "Quantity": 4 - }, - { - "Item": "Movie", - "SKU": "DEF5678", - "Cost": 29.99, - "Quantity": 3 - } - ] - }, - "errors": { - "length": 0 - } -} -``` - -Notice how the numeric values were converted to numbers. That is what `dynamicTyping` does. - -With a header row, the field count must be the same on each row, or a FieldMismatch error will be produced for that row. (Without a header row, lines can have variable number of fields without errors.) - - -### Results if `header: false` and `dynamicTyping: false` - -Without a header row, the result is an array (list) of arrays (rows). - -```json -{ - "results": [ - [ - "Item", - "SKU", - "Cost", - "Quantity" - ], - [ - "Book", - "ABC1234", - "10.95", - "4" - ], - [ - "Movie", - "DEF5678", - "29.99", - "3" - ] - ], - "errors": { - "length": 0 - } -} -``` - -Notice how, since dynamic typing is disabled, the numeric values are strings. - -If you are concerned about optimizing the performance of the parser, disable dynamic typing. That should speed things up by at least 2x. - - -Parse Errors ------------- - -Parse errors are returned alongside the results as an array of objects. Here is the structure of an error object: - -```javascript -{ - type: "", // Either "Quotes" or "FieldMismatch" - code: "", // Standardized error code like "UnexpectedQuotes" - message: "", // Human-readable error details - line: 0, // Line of original input - row: 0, // Row index of parsed data where error is - index: 0 // Character index within original input -} -``` - -Assuming the default settings, suppose the input is malformed: - - Item,SKU,Cost,Quantity - Book,"ABC1234,10.95,4 - Movie,DEF5678,29.99,3 - -Notice the stray quotes on the second line. This is the output: - -```json -{ - "results": { - "fields": [ - "Item", - "SKU", - "Cost", - "Quantity" - ], - "rows": [ - { - "Item": "Book", - "SKU": "ABC1234,10.95,4\nMovie,DEF5678,29.99,3" - } - ] - }, - "errors": { - "0": [ - { - "type": "FieldMismatch", - "code": "TooFewFields", - "message": "Too few fields: expected 4 fields but parsed 2", - "line": 2, - "row": 0, - "index": 66 - }, - { - "type": "Quotes", - "code": "MissingQuotes", - "message": "Unescaped or mismatched quotes", - "line": 2, - "row": 0, - "index": 66 - } - ], - "length": 2 - } -} -``` - -If the header row is disabled, field counting does not occur because there is no need to key the data to the field name. Thus we only get a Quotes error: - -```json -{ - "results": [ - [ - "Item", - "SKU", - "Cost", - "Quantity" - ], - [ - "Book", - "ABC1234,10.95,4\nMovie,DEF5678,29.99,3" - ] - ], - "errors": { - "1": [ - { - "type": "Quotes", - "code": "MissingQuotes", - "message": "Unescaped or mismatched quotes", - "line": 2, - "row": 1, - "index": 66 - } - ], - "length": 1 - } -} -``` - -Suppose a field value with a delimiter is not escaped: - - Item,SKU,Cost,Quantity - Book,ABC1234,10,95,4 - Movie,DEF5678,29.99,3 - -Again, notice the second line, "10,95" instead of "10.95". This field *should* be quoted: `"10,95"` but the parser handles the problem gracefully: - -```json -{ - "results": { - "fields": [ - "Item", - "SKU", - "Cost", - "Quantity" - ], - "rows": [ - { - "Item": "Book", - "SKU": "ABC1234", - "Cost": 10, - "Quantity": 95, - "__parsed_extra": [ - "4" - ] - }, - { - "Item": "Movie", - "SKU": "DEF5678", - "Cost": 29.99, - "Quantity": 3 - } - ] - }, - "errors": { - "0": [ - { - "type": "FieldMismatch", - "code": "TooManyFields", - "message": "Too many fields: expected 4 fields but parsed 5", - "line": 2, - "row": 0, - "index": 43 - } - ], - "length": 1 - } -} -``` - -Since files with headers are supposed to have the same number of fields per row, any extra fields are parsed into a special array field named "__parsed_extra" in the order that the remaining line was parsed. +For usage instructions, see the [homepage](http://papaparse.com) and, for more detail, the [documentation](http://papaparse.com/docs.html). @@ -400,14 +37,7 @@ The Parser component is under test. Download this repository and open `tests.htm -The Parser function -------------------- - -Inside this jQuery plugin is a `Parser` function that performs the parsing of delimited text. It does not depend upon jQuery. This plugin uses jQuery to attach to `` elements and to make it more convenient to activate and use the parsing mechanism. - - - Contributing -------------------- +------------ -Please feel free to chip in! If you'd like to see a feature or fix, pull down the code and submit a pull request. But remember, if you're changing anything in the Parser function, a pull request, *with test*, is best. (All changes to the parser component should be validated with tests.) \ No newline at end of file +If you'd like to see a feature or bug fix, pull down the code and submit a pull request. But remember, if you're changing anything in the Parser function, a pull request, *with test*, is best. (All changes to the parser component should be validated with tests.) You may also open issues for discussion or join in on Twitter with [#PapaParse](https://twitter.com/search?q=%23PapaParse&src=typd&f=realtime) \ No newline at end of file diff --git a/index.html b/index.html deleted file mode 100644 index b349b15..0000000 --- a/index.html +++ /dev/null @@ -1,214 +0,0 @@ - - - - Parse jQuery Plugin - - - - - -
-
- Delimiter: (Tab) -
- -     - -     - -
-
- -
-
- -

- or -

- - -

- -
-

- -
- - - \ No newline at end of file diff --git a/jquery.parse.min.js b/jquery.parse.min.js index af230ab..1f68e85 100644 --- a/jquery.parse.min.js +++ b/jquery.parse.min.js @@ -1,6 +1,6 @@ /* - jQuery Parse Plugin - v1.1.1 + Papa Parse + v2.0.0 https://github.com/mholt/jquery.parse */ -;(function(e){"use strict";function t(e){return typeof e==="function"}function n(e){return typeof e!=="undefined"}function r(e){function a(e){if(typeof e.delimiter!=="string"||e.delimiter.length!=1)e.delimiter=o.delimiter;if(e.deimiter=='"'||e.delimiter=="\n")e.delimitelr=o.delimiter;if(typeof e.header!=="boolean")e.header=o.header;if(typeof e.dynamicTyping!=="boolean")e.dynamicTyping=o.dynamicTyping;if(typeof e.preview!=="number")e.preview=o.preview;return e}function f(e){var t=[","," ","|",";"];var n,s,o;for(var u in t){var a=t[u];var f=0,l=0;var c=(new r({delimiter:a,header:false,dynamicTyping:false,preview:10})).parse(e);for(var h in c.results){var p=c.results[h].length;l+=p;if(typeof o==="undefined"){o=p;continue}else if(p>1){f+=Math.abs(p-o);o=p}}l/=c.results.length;if((typeof s==="undefined"||f1.99){s=f;n=a}}i.delimiter=n;return!!n}function l(){return{i:0,lineNum:1,field:0,fieldVal:"",line:"",ch:"",inQuotes:false,parsed:i.header?{fields:[],rows:[]}:[[]],errors:{length:0}}}function c(){var e=s.i>0&&v(s.i-1)||s.i==0;var t=s.i=n.length)return false;var t=n[e];if(t==i.delimiter||t=="\n"||t=="\r"&&e=n.length)return false;if(e0)s.parsed.rows.push({});else s.parsed.push([]);s.lineNum++;s.line="";s.field=0}function b(){g();var e=E();if(!e&&i.header)S()}function w(e){var t=u.floats.test(e);return t?parseFloat(e):e}function E(){if(u.empty.test(s.line)){if(i.header){if(s.lineNum==1){s.parsed.fields=[];s.lineNum--}else s.parsed.rows.splice(s.parsed.rows.length-1,1)}else s.parsed.splice(s.parsed.length-1,1);return true}return false}function S(){if(!i.header)return true;if(s.parsed.rows.length==0)return true;var e=s.parsed.fields.length;var t=0;var n=s.parsed.rows[s.parsed.rows.length-1];for(var r in n)if(n.hasOwnProperty(r))t++;if(te)return x("FieldMismatch","TooManyFields","Too many fields: expected "+e+" fields but parsed "+t);return true}function x(e,t,n,r){var o=i.header?s.parsed.rows.length?s.parsed.rows.length-1:undefined:s.parsed.length-1;var u=r||o;if(typeof s.errors[u]==="undefined")s.errors[u]=[];s.errors[u].push({type:e,code:t,message:n,line:s.lineNum,row:o,index:s.i});s.errors.length++;return false}function T(){return{results:s.parsed,errors:s.errors}}function N(e){s=l();n=e}var t=this;var n="";var i={};var s=l();var o={delimiter:"",header:true,dynamicTyping:true,preview:0};var u={floats:/^\s*-?(\d*\.?\d+|\d+\.?\d*)(e[-+]?\d+)?\s*$/i,empty:/^\s*$/};this.parse=function(e){if(typeof e!=="string")return T();N(e);if(!i.delimiter&&!f(e)){x("Delimiter","UndetectableDelimiter","Unable to auto-detect delimiting character; defaulted to comma","config");i.delimiter=","}for(s.i=0;s.i0&&s.lineNum>i.preview)break;s.ch=n[s.i];s.line+=s.ch;if(s.ch=='"')c();else if(s.inQuotes)h();else d()}b();if(s.inQuotes)x("Quotes","MissingQuotes","Unescaped or mismatched quotes");return T()};this.setOptions=function(e){e=a(e);i={delimiter:e.delimiter,header:e.header,dynamicTyping:e.dynamicTyping,preview:e.preview}};this.getOptions=function(){return{delimiter:i.delimiter,header:i.header,dynamicTyping:i.dynamicTyping,preview:i.preview}};this.setOptions(e)}e.fn.parse=function(r){function i(e,n,i){if(t(r.error))r.error({name:e},n,i)}var s=n(r.config)?r.config:{};this.each(function(o){var u=e(this).prop("tagName").toUpperCase()=="INPUT"&&e(this).attr("type")=="file"&&window.FileReader;if(!u)return true;var a={delimiter:s.delimiter,header:s.header,dynamicTyping:s.dynamicTyping};if(!this.files||this.files.length==0){i("NoFileError",undefined,this);return true}for(var f=0;f-1){s=n.substring(r+1);n=n.substring(0,r)}var u=o.parse(n);if(i>=e.size)return f(t);else if(u.errors.abort)return;else c()}function f(n){if(typeof t.onComplete==="function")t.onComplete(undefined,e,t.inputElem,n)}function l(){if(typeof t.onFileError==="function")t.onFileError(u.error,e,t.inputElem)}function c(){if(i1){f+=Math.abs(p-s);s=p}}l/=c.results.length;if((typeof i==="undefined"||f1.99){i=f;n=a}}u.delimiter=n;return!!n}function p(){var e=a.i>0&&g(a.i-1)||a.i==0;var t=a.i=i.length)return false;var t=i[e];if(t==u.delimiter||t=="\n"||t=="\r"&&e=i.length)return false;if(e0){if(S())a.parsed.rows=[{}];else a.parsed.rows.push({})}else{if(S())a.parsed=[[]];else a.parsed.push([])}a.lineNum++;a.line="";a.field=0}function E(){if(o)return;b();var e=T();if(!e&&u.header)N();if(S()&&(!u.header||u.header&&a.parsed.rows.length>0)){var t=u.step(k());if(t===false)o=true}}function S(){return typeof u.step==="function"}function x(e){var t=l.floats.test(e);return t?parseFloat(e):e}function T(){if(l.empty.test(a.line)){if(u.header){if(a.lineNum==1){a.parsed.fields=[];a.lineNum--}else a.parsed.rows.splice(a.parsed.rows.length-1,1)}else a.parsed.splice(a.parsed.length-1,1);return true}return false}function N(){if(!u.header)return true;if(a.parsed.rows.length==0)return true;var e=a.parsed.fields.length;var t=0;var n=a.parsed.rows[a.parsed.rows.length-1];for(var r in n)if(n.hasOwnProperty(r))t++;if(te)return C("FieldMismatch","TooManyFields","Too many fields: expected "+e+" fields but parsed "+t);return true}function C(e,t,n,r){var i=u.header?a.parsed.rows.length?a.parsed.rows.length-1:undefined:a.parsed.length-1;var o=r||i;if(typeof a.errors[o]==="undefined")a.errors[o]=[];a.errors[o].push({type:e,code:t,message:n,line:a.lineNum,row:i,index:a.i+s});a.errors.length++;return false}function k(){return{results:a.parsed,errors:a.errors}}function L(e){n++;if(n>1&&S())s+=e.length;a=A();i=e}function A(){var e;if(u.header){e={fields:S()?a.parsed.fields||[]:[],rows:S()&&n>1?[{}]:[]}}else e=[[]];return{i:0,lineNum:S()?a.lineNum:1,field:0,fieldVal:"",line:"",ch:"",inQuotes:false,parsed:e,errors:{length:0}}}var t=this;var n=0;var i="";var s=0;var o=false;var u={};var a=A();var f={delimiter:"",header:true,dynamicTyping:true,preview:0};var l={floats:/^\s*-?(\d*\.?\d+|\d+\.?\d*)(e[-+]?\d+)?\s*$/i,empty:/^\s*$/};e=c(e);u={delimiter:e.delimiter,header:e.header,dynamicTyping:e.dynamicTyping,preview:e.preview,step:e.step};this.parse=function(e){if(typeof e!=="string")return k();L(e);if(!u.delimiter&&!h(e)){C("Delimiter","UndetectableDelimiter","Unable to auto-detect delimiting character; defaulted to comma","config");u.delimiter=","}for(a.i=0;a.i0&&a.lineNum>u.preview)break;a.ch=i[a.i];a.line+=a.ch;if(a.ch=='"')p();else if(a.inQuotes)d();else m()}if(o)C("Abort","ParseAbort","Parsing was aborted by the user's step function","abort");else{E();if(a.inQuotes)C("Quotes","MissingQuotes","Unescaped or mismatched quotes")}return k()};this.getOptions=function(){return{delimiter:u.delimiter,header:u.header,dynamicTyping:u.dynamicTyping,preview:u.preview,step:u.step}}}e.fn.parse=function(r){function o(i){var s=a,o;if(t(r.error))o=function(){r.error(c.error,i.file,i.inputElem)};if(t(r.complete))s=function(e,t,n,i){r.complete(e,t,n,i);a()};if(i.file.type.indexOf("text")<0){u("TypeMismatchError",i.file,i.inputElem);return a()}if(t(r.before)){var f=r.before(i.file,i.inputElem);if(typeof f==="object")i.instanceConfig=e.extend(i.instanceConfig,f);else if(f==="skip")return a();else if(f===false){u("AbortError",i.file,i.inputElem);return}}if(i.instanceConfig.step){var l=new n(i.file,{inputElem:i.inputElem,config:e.extend({},i.instanceConfig)});l.stream(s,o)}else{var c=new FileReader;c.onerror=o;c.onload=function(t){var n=t.target.result;var r=e.parse(n,i.instanceConfig);s(r,i.file,i.inputElem,t)};c.readAsText(i.file)}}function u(e,n,i){if(t(r.error))r.error({name:e},n,i)}function a(){s.splice(0,1);if(s.length>0)o(s[0])}var i=r.config?r.config:{};var s=[];this.each(function(t){var n=e(this).prop("tagName").toUpperCase()=="INPUT"&&e(this).attr("type")=="file"&&window.FileReader;if(!n)return true;var r=e.extend({},i);if(!this.files||this.files.length==0){u("NoFileError",undefined,this);return true}for(var a=0;a0)o(s[0])});return this};e.parse=function(e,t){var n=new r(t);return n.parse(e)}})(jQuery); diff --git a/tests.html b/tests.html index d9de57d..acf8998 100644 --- a/tests.html +++ b/tests.html @@ -1,7 +1,7 @@ - jQuery Parse Plugin Tests + Parser Tests