Documentation
Contents
Convert CSV to JSON
Delimited data can be parsed out of strings or files. Files that are parsed can be local or remote. Local files are opened with FileReader, and remote files are downloaded with XMLHttpRequest.
Parse string
Papa.parse(csvString[, config])
csvString
is a string of delimited text to be parsed.config
is an optional config object.- Returns a parse results object (if not streaming or using worker).
Parse local file
Papa.parse(file, config)
file
is a File object obtained from the DOM.config
is a config object which contains a callback.- Doesn't return anything. Results are provided asynchronously to a callback function.
Parse remote file
Papa.parse(url, {
download: true
// config ...
})
url
is the path or URL to the file to download.- The second argument is a config object where
download: true
is set. - Doesn't return anything. Results are provided asynchronously to a callback function.
Using jQuery to select files
$('input[type=file]').parse({
config: {
// base config to use for each file
},
before: function(file, inputElem)
{
// executed before parsing each file begins;
// what you return here controls the flow
},
error: function(err, file, inputElem, reason)
{
// executed if an error occurs while loading the file,
// or if before callback aborted for some reason
},
complete: function()
{
// executed after all files are complete
}
});
- Select the file input elements containing files you want to parse.
-
before
is an optional callback that lets you inspect each file before parsing begins. Return an object like:{ action: "abort", reason: "Some reason", config: // altered config... }
to alter the flow of parsing. Actions can be"abort"
to skip this and all other files in the queue,"skip"
to skip just this file, or"continue"
to carry on (equivalent to returning nothing).reason
can be a reason for aborting.config
can be a modified configuration for parsing just this file. - The
complete
callback shown here is executed after all files are finished and does not receive any data. Use the complete callback in config for per-file results.
Convert JSON to CSV
Papa's unparse
utility correctly writes out delimited text strings given an array of arrays or an array of objects.
Unparse
Papa.unparse(data[, config])
Examples
// Two-line, comma-delimited file
var csv = Papa.unparse([
["1-1", "1-2", "1-3"],
["2-1", "2-2", "2-3"]
]);
// With header row (all objects should look alike)
var csv = Papa.unparse([
{
"Column 1": "foo",
"Column 2": "bar"
},
{
"Column 1": "abc",
"Column 2": "def"
}
]);
// Specifying fields and data manually
var csv = Papa.unparse({
fields: ["Column 1", "Column 2"],
data: [
["foo", "bar"],
["abc", "def"]
]
});
- Returns the resulting delimited text as a string.
-
data
can be one of:- An array of arrays
- An array of objects
- An object with
fields
anddata
-
config
is an object with any of these properties:// defaults shown { quotes: false, delimiter: ",", newline: "\r\n" }
Setquotes
totrue
to force enclosing each datum around quotes, or an array of true/false values correlating to specific to columns to force-quote. Thedelimiter
can be any valid delimiting character. Thenewline
character(s) may also be customized.
The Config Object
Every call to parse
receives a configuration object. Its properties define settings, behavior, and callbacks used during parsing.
Default config
{
delimiter: "",
header: false,
dynamicTyping: false,
preview: 0,
step: undefined,
encoding: "",
worker: false,
comments: false,
complete: undefined,
error: undefined,
download: false,
keepEmptyRows: false,
chunk: undefined
}
delimiter
The delimiting character. Leave blank to auto-detect. If specified, it must be a string of length 1, and cannot be found in Papa.BAD_DELIMITERS.header
If true, the first row of parsed data will be interpreted as field names. Fields will be returned in the meta, and each row will be an object of data keyed by field name. If false, the parser simply returns an array of arrays, including the first row.dynamicTyping
If true, numeric and boolean data will be converted to their type instead of remaining strings.preview
If > 0, only that many rows will be parsed.-
step
To stream the input, define a callback function to receive results row-by-row rather than together at the end:step: function(results, handle) { console.log("Row data:", results.data); console.log("Row errors:", results.errors); }
Except when using a worker, you can callhandle.abort()
to stop parsing,handle.pause()
to pause it, orhandle.resume()
to resume.
encoding
The encoding to use when opening files locally.worker
Whether or not to use a worker thread. Using a worker will keep your page reactive, but may be slightly slower.comments
Specify a comment character (like"#"
) if your CSV file has commented lines, and Papa will skip them. This feature is disabled by default.-
complete
A callback to execute when parsing is complete. Results are passed in, and if parsing a file, the file is, too:complete: function(results, file) { console.log("Parsing complete:", results, file); }
If streaming, results will not be available in this function. error
A callback to execute if FileReader encounters an error. The function should receive two arguments: the error and the File.download
If true, this indicates that the string you passed in is actually a URL from which to download a file and parse it.keepEmptyRows
If true, rows that are empty will be included in the results as an empty array. This is useful if you want to maintain line (or at least row) parity with the original input.chunk
A callback, much like step, which activates streaming and is executed after every chunk (piece) is loaded and parsed. Works only with local and remote files. Do not use both chunk and step callbacks together. This function can be used to receive results one chunk at a time rather than one row at a time. If your file has a million rows, this results in, say, 10,000 function invocations rather than 1,000,000. In some cases, this may be faster.
The Results Object
Parse results are always (even when streaming) provided in a roughly consistent format: an object with data, errors, and meta. When streaming, results.data
contains only one row.
Results structure
{
data: // array of parse results
errors: // array of errors
meta: // object with extra info
}
data
is an array of rows. Rows are either arrays (ifheader: false
) or objects (ifheader: true
). Inside a step function, data will only contain one row.errors
is an array of errors.meta
contains extra information about the parse, such as delimiter used, number of lines, whether the process was aborted, etc.
results.data
// Example (without header)
[
["Column 1", "Column 2"],
["foo", "bar"],
["abc", "def"]
]
// Example (with header)
[
{
"Column 1": "foo",
"Column 2": "bar"
},
{
"Column 1": "abc",
"Column 2": "def"
}
]
- If header row is enabled, and more fields are found on a row of data than in the header row, an extra field will appear in the results called
__parsed_extra
. It contains an array of all data parsed from that row that was wider than the header row. - Using
dynamicTyping: true
will turn numeric and boolean data into number and boolean types, respectively. Otherwise, all parsed data is string.
results.errors
// Error structure
{
type: "", // A generalization of the error
code: "", // Standardized error code
message: "", // Human-readable details
line: 0, // Line of original input
row: 0, // Row index of parsed data where error is
index: 0 // Character index within original input
}
- The error
type
will be one of "Abort", "Quotes", "Delimiter", or "FieldMismatch". - The
code
may be "ParseAbort", "MissingQuotes", "UnexpectedQuotes", "UndetectableDelimiter", "TooFewFields", or "TooManyFields" (depending on the error type). line
andindex
may not be available on all error messages because some errors are only generated after parsing is already complete.- Just because errors are generated does not necessarily mean that parsing failed! Papa is strong, and usually parsing only bombs hard if the input has sloppy quotes.
results.meta
{
lines: // Number of lines parsed
delimiter: // Delimiter used
aborted: // Whether process was aborted
fields: // Array of field names
truncated: // Whether preview consumed all input
}
- Not all meta properties will always be available. For instance,
fields
is only given whenheader: true
is set.
Extras
There's a few other things that Papa exposes for you that weren't explained above.
These are provided as a convenience and should remain read-only, but feel free to use them:
-
Papa.BAD_DELIMITERS
An array of characters that are not allowed as delimiters (or comment characters). -
Papa.RECORD_SEP
The true delimiter. Invisible. ASCII code 30. Should be doing the job we strangely rely upon commas and tabs for. -
Papa.UNIT_SEP
Also sometimes used as a delimiting character. ASCII code 31. -
Papa.WORKERS_SUPPORTED
Whether or not the browser supports HTML5 Web Workers. If false,worker: true
will have no effect.
Some settings you may change:
-
Papa.LocalChunkSize
The size in bytes of each file chunk. Used when streaming files obtained from the DOM that exist on the local computer. Default 10 MB. -
Papa.RemoteChunkSize
Same as LocalChunkSize, but for downloading files from remote locations. Default 5 MB. -
Papa.DefaultDelimiter
The delimiter used when one is not specified and it cannot be detected automatically. Default is comma","
.
The following items are for internal use and testing only. It is not recommended that you use them unless you're familiar with the underlying code base:
-
Papa.Parser
The core parsing component. -
Papa.ParserHandle
A wrapper over the Parser which provides dynamic typing and header row support. -
Papa.NetworkStreamer
Facilitates downloading and parsing files in chunks over the network with XMLHttpRequest. -
Papa.FileStreamer
Similar to NetworkStreamer, but for local files, and using the HTML5 FileReader.