Browse Source

Handling extra fields better; updated read me

pull/1/head 0.5.2
Matthew Holt 11 years ago
parent
commit
12fbcc6252
  1. 218
      README.md
  2. 7
      jquery.parse.js
  3. 4
      jquery.parse.min.js
  4. 2
      parse.jquery.json

218
README.md

@ -1,56 +1,102 @@ @@ -1,56 +1,102 @@
jquery.parse
============
Robust, efficient CSV parsing (with nearly any delimiting character)
Robust, efficient CSV parsing (with nearly any delimiting character). Malformed CSV files are especially common, and this parser is an attempt to handle parsing errors more robustly and parse CSV text more efficiently.
Basic usage
-----------
The second argument is optional, but here it is with the defaults:
```javascript
results = $.parse(csvString, {
delimiter: "\t",
header: true
delimiter: ",",
header: true,
dynamicTyping: true
});
```
The default delimiter is `,` but can be set to anything anything except `"` or `\n`.
### Config options
By default, a header row is expected. The output and error handling depends on whether you include a header row with your data.
| Option | Description
|------------------ | -----------------
| `delimiter` | The delimiting character. Usually just a comma or tab. Can be set to anything anything except `"` or `\n`.
| `header` | If true, interpret the first row of parsed data as a header column; fields are returned separately from the data, and data will be returned keyed to its field name. If false, the parser simply returns an array (list) of arrays (rows), including the first column.
| `dynamicTyping` | If true, fields that are strictly numeric will be converted to a number type. If false, each parsed datum is returned as a string.
**If `header: true`, the output looks like:**
### Output
```javascript
The output and error handling depends on whether you include a header row with your data. If you have a header, each row must have the same number of fields as the header row, or an error will be produced.
**Example input:**
Item,SKU,Cost,Quantity
Book,ABC1234,10.95,4
Movie,DEF5678,29.99,3
**With header and dynamic typing:**
```json
{
errors: [
// errors, if any (parsing should not throw exceptions)
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
results: {
fields: [
// field names from the header row
],
rows: [
// objects, where each field value is keyed to the field name
]
"rows": [
{
"Item": "Book",
"SKU": "ABC1234",
"Cost": 10.95,
"Quantity": 4
},
{
"Item": "Movie",
"SKU": "DEF5678",
"Cost": 29.99,
"Quantity": 3
}
]
},
"errors": []
}
```
**Without headers and without dynamic typing:**
**If `header: false`, the output looks like:**
```javascript
```json
{
errors: [
// errors, if any (parsing should not throw exceptions)
"results": [
[
"Item",
"SKU",
"Cost",
"Quantity"
],
[
"Book",
"ABC1234",
"10.95",
"4"
],
results: [
// each row is itself an array of values separated by delimiter
[
"Movie",
"DEF5678",
"29.99",
"3"
]
],
"errors": []
}
```
**Errors look like:**
Errors
------
Here is the structure of an error:
```javascript
{
@ -60,3 +106,127 @@ By default, a header row is expected. The output and error handling depends on w @@ -60,3 +106,127 @@ By default, a header row is expected. The output and error handling depends on w
index: 0 // Character index within original input
}
```
(Assume again that the default config is used.) Suppose the input is malformed:
Item,SKU,Cost,Quantity
Book,"ABC1234,10.95,4
Movie,DEF5678,29.99,3
Notice the stray quotes on the second line. This is the output:
```json
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234,10.95,4\nMovie,DEF5678,29.99,3"
}
]
},
"errors": [
{
"message": "Too few fields; expected 4 fields, parsed 2",
"line": 2,
"row": 0,
"index": 66
},
{
"message": "Unescaped or mismatched quotes",
"line": 2,
"row": 0,
"index": 66
}
]
}
```
If the header row is disabled, field counting does not occur, because there is no need to key the data to the field name:
```json
{
"results": [
[
"Item",
"SKU",
"Cost",
"Quantity"
],
[
"Book",
"ABC1234,10.95,4\nMovie,DEF5678,29.99,3"
]
],
"errors": [
{
"message": "Unescaped or mismatched quotes",
"line": 2,
"row": 1,
"index": 66
}
]
}
```
But you will still be notified about the stray quotes, as shown above.
Suppose a field value with a delimiter is not escaped:
Item,SKU,Cost,Quantity
Book,ABC1234,10,95,4
Movie,DEF5678,29.99,3
Again, notice the second line, "10,95" instead of "10.95". This field *should* be quoted: `"10,95"` but the parser handles the problem gracefully:
```json
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234",
"Cost": 10,
"Quantity": 95,
"__parsed_extra": [
"4"
]
},
{
"Item": "Movie",
"SKU": "DEF5678",
"Cost": 29.99,
"Quantity": 3
}
]
},
"errors": [
{
"message": "Too many fields; expected 4 fields, found extra value: '4'",
"line": 2,
"row": 0,
"index": 43
},
{
"message": "Too few fields; expected 4 fields, parsed 5",
"line": 2,
"row": 0,
"index": 43
}
]
}
```
As you can see, any "extra" fields at the end, when using a header row, are simply tacked onto a special field named "__parsed_extra", in the order that the remaining line was parsed.

7
jquery.parse.js

@ -1,6 +1,6 @@ @@ -1,6 +1,6 @@
/*
jQuery Parse plugin
v0.5.1
v0.5.2
https://github.com/mholt/jquery.parse
*/
@ -225,9 +225,14 @@ @@ -225,9 +225,14 @@
currentRow[fieldName] = _state.fieldVal;
}
else
{
if (typeof currentRow.__parsed_extra === 'undefined')
currentRow.__parsed_extra = [];
currentRow.__parsed_extra.push(_state.fieldVal);
addError("Too many fields; expected " + _state.parsed.fields.length + " fields, found extra value: '" + _state.fieldVal + "'");
}
}
}
else
{
if (_config.dynamicTyping)

4
jquery.parse.min.js vendored

@ -1,6 +1,6 @@ @@ -1,6 +1,6 @@
/*
jQuery Parse plugin
v0.5.0
v0.5.2
https://github.com/mholt/jquery.parse
*/
;(function(e){function n(e){e.delimeter=e.delimiter||t.delimiter;e.header=typeof e.header==="undefined"?t.header:e.header;if(e.delimiter=='"'||e.delimiter=="\n")e.delimiter=t.delimiter;if(e.delimiter.length>1)e.delimiter=e.delimiter[0];return e}function r(e,t){function u(e){return e?{fields:[],rows:[]}:[[]]}function a(){return{i:0,line:1,field:0,fieldVal:"",ch:"",inQuotes:false,parsed:u(t.header)}}function f(){if(o.i<r.length-1){if(r[o.i+1]=='"'&&o.inQuotes){o.fieldVal+='"';o.i++}else if(r[o.i+1]!=i.delimiter&&o.inQuotes)return g("Unescaped quote in field value")}o.inQuotes=!o.inQuotes}function l(){c()}function c(){o.fieldVal+=o.ch}function h(){if(o.ch==i.delimiter){p()}else if(o.ch=="\n"){p();d()}else{c()}}function p(){if(i.header){if(o.line==1){o.parsed.fields.push(o.fieldVal)}else{var e=o.parsed.rows[o.parsed.rows.length-1];var t=o.parsed.fields[o.field];if(t)e[t]=o.fieldVal;else g("Too many fields; expected "+o.parsed.fields.length+" fields, found extra value: '"+o.fieldVal+"'")}}else o.parsed[o.parsed.length-1].push(o.fieldVal);o.fieldVal="";o.field++}function d(){v();if(i.header){m();if(o.line>0)o.parsed.rows.push({})}else o.parsed.push([]);o.line++;o.field=0}function v(){if(i.header){if(o.line==1){if(o.parsed.fields.length==1&&o.parsed.fields[0].length==0){o.parsed.fields=[];o.line--}}else{var e=o.parsed.rows[o.parsed.rows.length-1];if(!e[o.parsed.fields[0]])o.parsed.rows.splice(o.parsed.rows.length-1,1)}}else{var e=o.parsed[o.parsed.length-1];if(e.length==0||e[0].length==0)o.parsed.splice(o.parsed.length-1,1)}}function m(){if(!i.header)return true;if(o.parsed.rows.length==0)return true;var e=o.parsed.fields.length;var t=Object.keys(o.parsed.rows[o.parsed.rows.length-1]).length;if(e!=t)return g("Too few fields; expected "+e+" fields, parsed "+t);return true}function g(e){s.push({message:e,line:o.line,row:i.header?o.parsed.rows.length-1:o.parsed.length-1,index:o.i});return false}var n=this;var r=e;var i=t;var s=[];var o=a();this.parse=function(e){if(typeof e==="object")n.setConfig(e);else if(typeof e==="string")n.setInput(e);s=[];o=a();for(o.i=0;o.i<r.length;o.i++){o.ch=r[o.i];if(o.ch=='"')f();else if(o.inQuotes)l();else h()}p();v();m();if(o.inQuotes)g("Unescaped or mismatched quotes");return n.getParsed()};this.getDelimiter=function(){return t.delimiter};this.setDelimiter=function(e){var t=",";e=e?e=='"'||e=="\n"?t:e:t;i.delimiter=e[0]};this.setConfig=function(e){if(typeof e.header!=="undefined"&&e.header!=t.header||typeof e.delimiter!=="undefined"&&e.delimiter!=t.delimiter){o.parsed=u(e.header)}i=e};this.getInput=function(){return r};this.setInput=function(e){r=e};this.getParsed=function(){return o.parsed};this.getErrors=function(){return s}}var t={delimiter:",",header:true};e.parse=function(e,t){t=n(t);var i=new r(e,t);return{results:i.parse(),errors:i.getErrors()}}})(jQuery);
;(function(e){"use strict";function n(e){e.delimeter=e.delimiter||t.delimiter;e.header=typeof e.header==="undefined"?t.header:e.header;e.dynamicTyping=typeof e.dynamicTyping==="undefined"?t.dynamicTyping:e.dynamicTyping;if(e.delimiter=='"'||e.delimiter=="\n")e.delimiter=t.delimiter;if(e.delimiter.length>1)e.delimiter=e.delimiter[0];return e}function r(e,t){function u(e){return e?{fields:[],rows:[]}:[[]]}function a(){return{i:0,line:1,field:0,fieldVal:"",ch:"",inQuotes:false,parsed:u(t.header)}}function f(){var e=o.i>0&&p(r[o.i-1]);var t=o.i<r.length-1&&p(r[o.i+1]);var n=o.i<r.length-1&&r[o.i+1]=='"';if(o.inQuotes&&n){o.fieldVal+='"';o.i++}else if(e||t){o.inQuotes=!o.inQuotes}else{b("Unexpected quotes")}}function l(){c()}function c(){o.fieldVal+=o.ch}function h(){if(o.ch==i.delimiter){d()}else if(o.ch=="\n"){d();v()}else{c()}}function p(e){return e==i.delimiter||e=="\n"}function d(){if(i.header){if(o.line==1){o.parsed.fields.push(o.fieldVal)}else{var e=o.parsed.rows[o.parsed.rows.length-1];var t=o.parsed.fields[o.field];if(t){if(i.dynamicTyping)o.fieldVal=m(o.fieldVal);e[t]=o.fieldVal}else{if(typeof e.__parsed_extra==="undefined")e.__parsed_extra=[];e.__parsed_extra.push(o.fieldVal);b("Too many fields; expected "+o.parsed.fields.length+" fields, found extra value: '"+o.fieldVal+"'")}}}else{if(i.dynamicTyping)o.fieldVal=m(o.fieldVal);o.parsed[o.parsed.length-1].push(o.fieldVal)}o.fieldVal="";o.field++}function v(){g();if(i.header){y();if(o.line>0)o.parsed.rows.push({})}else o.parsed.push([]);o.line++;o.field=0}function m(e){var t=/^\d+(\.\d+)?$/.test(e);return t?parseFloat(e):e}function g(){if(i.header){if(o.line==1){if(o.parsed.fields.length==1&&o.parsed.fields[0].length==0){o.parsed.fields=[];o.line--}}else{var e=o.parsed.rows[o.parsed.rows.length-1];if(!e[o.parsed.fields[0]])o.parsed.rows.splice(o.parsed.rows.length-1,1)}}else{var e=o.parsed[o.parsed.length-1];if(e.length==0||e[0].length==0)o.parsed.splice(o.parsed.length-1,1)}}function y(){if(!i.header)return true;if(o.parsed.rows.length==0)return true;var e=o.parsed.fields.length;var t=Object.keys(o.parsed.rows[o.parsed.rows.length-1]).length;if(e!=t)return b("Too few fields; expected "+e+" fields, parsed "+t);return true}function b(e){s.push({message:e,line:o.line,row:i.header?o.parsed.rows.length-1:o.parsed.length-1,index:o.i});return false}var n=this;var r=e;var i=t;var s=[];var o=a();this.parse=function(e){if(typeof e==="object")n.setConfig(e);else if(typeof e==="string")n.setInput(e);s=[];o=a();for(o.i=0;o.i<r.length;o.i++){o.ch=r[o.i];if(o.ch=='"')f();else if(o.inQuotes)l();else h()}d();g();y();if(o.inQuotes)b("Unescaped or mismatched quotes");return n.getParsed()};this.getDelimiter=function(){return t.delimiter};this.setDelimiter=function(e){var t=",";e=e?e=='"'||e=="\n"?t:e:t;i.delimiter=e[0]};this.setConfig=function(e){if(typeof e.header!=="undefined"&&e.header!=t.header||typeof e.delimiter!=="undefined"&&e.delimiter!=t.delimiter){o.parsed=u(e.header)}i=e};this.getInput=function(){return r};this.setInput=function(e){r=e};this.getParsed=function(){return o.parsed};this.getErrors=function(){return s}}var t={delimiter:",",header:true,dynamicTyping:false};e.parse=function(e,t){t=n(t);var i=new r(e,t);return{results:i.parse(),errors:i.getErrors()}}})(jQuery);

2
parse.jquery.json

@ -1,6 +1,6 @@ @@ -1,6 +1,6 @@
{
"name": "parse",
"version": "0.5.0",
"version": "0.5.2",
"title": "jQuery Parse plugin",
"description": "Parse delimited text (like CSV or tab-delimited) into a usable data structure",
"keywords": [

Loading…
Cancel
Save