<iclass="fa fa-download"></i> Get Papa Parse on GitHub
</a>
<ahref="/docs"class="button red">
<iclass="fa fa-book"></i> Documentation
</a>
</div>
</div>
</div>
</main>
</main>
<footer>
<footer>
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
Delimited data can be parsed out of strings or files. Files that are parsed can be local or remote. Local files are opened with FileReader, and remote files are downloaded with XMLHttpRequest.
</p>
</div>
<hr>
<divclass="grid-100">
<divclass="grid-100">
<h3id="csv-to-json">Convert CSV to JSON</h3>
<h5id="strings">Parse string</h5>
<p>Delimited data can be parsed out of strings or files. Files that are parsed can be local or remote. Local files are opened with FileReader, and remote files are downloaded with XMLHttpRequest.</p>
to alter the flow of parsing. Actions can be <code>"abort"</code> to skip this and all other files in the queue, <code>"skip"</code> to skip just this file, or <code>"continue"</code> to carry on (equivalent to returning nothing). <code>reason</code> can be a reason for aborting. <code>config</code> can be a modified <ahref="#config">configuration</a> for parsing just this file.
to alter the flow of parsing. Actions can be <code>"abort"</code> to skip this and all other files in the queue, <code>"skip"</code> to skip just this file, or <code>"continue"</code> to carry on (equivalent to returning nothing). <code>reason</code> can be a reason for aborting. <code>config</code> can be a modified <ahref="#config">configuration</a> for parsing just this file.</li>
</li>
</li>
<li>The <code>complete</code> callback shown here is executed after <i>all</i> files are finished and does not receive any data. Use the complete callback in <ahref="#config">config</a> for per-file results.
<li>The <code>complete</code> callback shown here is executed after <i>all</i> files are finished and does not receive any data. Use the complete callback in <ahref="#config">config</a> for per-file results.</li>
</ul>
</ul>
</div>
</div>
<divclass="clear"></div>
<divclass="clear"></div>
</div>
</section>
@ -208,32 +210,65 @@
<section>
<divclass="grid-container">
<divclass="grid-100">
<h4id="json-to-csv">Convert JSON to CSV</h4>
<p>
Papa's <code>unparse</code> utility writes out correct delimited text strings given an array of arrays or an array of objects.
<hr>
</p>
</div>
<divclass="grid-100">
<divclass="grid-100">
<h3id="json-to-csv">Convert JSON to CSV</h3>
<h5id="unparse">Unparse</h5>
<p>Papa's <code>unparse</code> utility correctly writes out delimited text strings given an array of arrays or an array of objects.</p>
Set <code>quotes</code> to <code>true</code> to always enclose each field in quotes, or an array of true/false values correlating to specific to columns to force-quote. The <code>delimiter</code> can be any valid delimiting character. The <code>newline</code> character(s) may also be customized.
Set <code>quotes</code> to <code>true</code> to force enclosing each datum around quotes, or an array of true/false values correlating to specific to columns to force-quote. The <code>delimiter</code> can be any valid delimiting character. The <code>newline</code> character(s) may also be customized.
</li>
</ul>
</div>
</div>
<divclass="clear"></div>
<divclass="clear"></div>
</div>
</section>
@ -295,19 +311,25 @@ var csv = Papa.unparse({
<hr>
<section>
<divclass="grid-container">
<divclass="grid-100">
<divclass="grid-100">
<h3id="config">The Config Object</h3>
<h4id="config">The Parse Config Object</h4>
<p>Every call to <code>parse</code> receives a configuration object. Its properties define settings, behavior, and callbacks used during parsing.</p>
<p>
The <code>parse</code> function may be passed a configuration object. It defines settings, behavior, and callbacks used during parsing. Any properties left unspecified will resort to their default values.
</p>
</div>
</div>
<divclass="grid-100">
<h5id="config-default">Default Config With All Options</h5>
</div>
<divclass="grid-50">
<h4id="config-default">Default config</h4>
<divclass="prefix-25 grid-50 suffix-25">
<codeclass="block">{
<pre><codeclass="language-javascript">{
delimiter: "", // auto-detect
delimiter: "", // auto-detect
newline: "", // auto-detect
newline: "", // auto-detect
header: false,
header: false,
@ -323,46 +345,154 @@ var csv = Papa.unparse({
skipEmptyLines: false,
skipEmptyLines: false,
chunk: undefined,
chunk: undefined,
fastMode: false
fastMode: false
}</code>
}</code></pre>
<h4id="config-details">Config options</h4>
<ul>
<li><code>delimiter</code> The delimiting character. Leave blank to auto-detect. If specified, it must be a string of length 1, and cannot be found in <ahref="#extras">Papa.BAD_DELIMITERS</a>.</li>
<li><code>newline</code> The newline sequence. Leave blank to auto-detect. Must be one of \r, \n, or \r\n.</li>
<li><code>header</code> If true, the first row of parsed data will be interpreted as field names. Fields will be returned in the <ahref="#meta">meta</a>, and each row will be an object of data keyed by field name. If false, the parser simply returns an array of arrays, including the first row.</li>
<li><code>dynamicTyping</code> If true, numeric and boolean data will be converted to their type instead of remaining strings.</li>
<li><code>preview</code> If > 0, only that many rows will be parsed.</li>
<li><code>encoding</code> The encoding to use when opening files locally.</li>
<li><code>worker</code> Whether or not to use a <ahref="/faq#workers">worker thread</a>. Using a worker will keep your page reactive, but may be slightly slower.</li>
<li><code>comments</code> Specify a string that indicates a comment (like "#" or "//"). If your CSV file has commented lines, and Papa will skip them. This feature is disabled by default.</li>
</ul>
</div>
</div>
<divclass="grid-50">
<divclass="clear"></div>
<ul>
<liid="step">
<divclass="grid-100">
<code>step</code> To <ahref="/faq#streaming">stream</a> the input, define a callback function to receive <ahref="#results">results</a> row-by-row rather than together at the end:
The delimiting character. Leave blank to auto-detect. If specified, it must be a string of length 1 which cannot be found in <ahref="#readonly">Papa.BAD_DELIMITERS</a>.
</td>
</tr>
<tr>
<td>
<code>newline</code>
</td>
<td>
The newline sequence. Leave blank to auto-detect. Must be one of \r, \n, or \r\n.
</td>
</tr>
<tr>
<td>
<code>header</code>
</td>
<td>
If true, the first row of parsed data will be interpreted as field names. Fields will be returned in the <ahref="#meta">meta</a>, and each row will be an object of data keyed by field name. If false, the parser simply returns an array of arrays, including the first row.
</td>
</tr>
<tr>
<td>
<code>dynamicTyping</code>
</td>
<td>
If true, numeric and boolean data will be converted to their type instead of remaining strings.
</td>
</tr>
<tr>
<td>
<code>preview</code>
</td>
<td>
If > 0, only that many rows will be parsed.
</td>
</tr>
<tr>
<td>
<code>encoding</code>
</td>
<td>
The encoding to use when opening local files. If specified, it must be a value supported by the FileReader API.
</td>
</tr>
<tr>
<td>
<code>worker</code>
</td>
<td>
Whether or not to use a <ahref="/faq#workers">worker thread</a>. Using a worker will keep your page reactive, but may be slightly slower.
</td>
</tr>
<tr>
<td>
<code>comments</code>
</td>
<td>
A string that indicates a comment (for example, "#" or "//"). When Papa encounters a line starting with this string, it will skip the line.
</td>
</tr>
<tr>
<td>
<code>step</code>
</td>
<td>
To <ahref="/faq#streaming">stream</a> the input, define a callback function:
Except when using a worker, you can call <code>handle.abort()</code> to stop parsing, <code>handle.pause()</code> to pause it, or <code>handle.resume()</code> to resume.
Streaming is necessary for large files which would otherwise crash the browser. Except when using a <ahref="/faq#worker">Web Worker</a>, you can call <code>handle.abort()</code> to stop parsing, <code>handle.pause()</code> to pause it, or <code>handle.resume()</code> to resume.
</li>
</td>
<li>
</tr>
<code>complete</code> A callback to execute when parsing is complete. Results are passed in, and if parsing a file, the file is, too:
The callback to execute when parsing is complete. It receives the parse <ahref="#results">results</a>. If parsing a local file, the <ahref="https://developer.mozilla.org/en-US/docs/Web/API/File">File</a> is passed in, too:
When streaming, parse results are <i>not</i> available in this callback.
If streaming, results will <i>not</i> be available in this function.
</td>
</li>
</tr>
<li><code>error</code> A callback to execute if FileReader encounters an error. The function should receive two arguments: the error and the File.</li>
<tr>
<li><code>download</code> If true, this indicates that the string you passed in is actually a URL from which to download a file and parse it.</li>
<td>
<li><code>skipEmptyLines</code> If true, lines that are completely empty will be skipped. An empty line is defined to be one which evaluates to empty string.</li>
<code>error</code>
<li><code>chunk</code> A callback, much like step, which activates streaming and is executed after every whole chunk of the file is loaded and parsed, rather than every row. Works only with local and remote files. Do not use both chunk and step callbacks together. As arguments, it receives the results, the streamer, and if parsing a local file, the File object. You can pause, resume, and abort parsing from within this function.</li>
</td>
<li><code>fastMode</code> When enabled, fast mode executes parsing much more quickly. Only use this if you know your input won't have quoted fields.
<td>
</ul>
A callback to execute if FileReader encounters an error. The function is passed two arguments: the error and the File.
</td>
</tr>
<tr>
<td>
<code>download</code>
</td>
<td>
If true, this indicates that the string you passed as the first argument to <code>parse()</code> is actually a URL from which to download a file and parse its contents.
</td>
</tr>
<tr>
<td>
<code>skipEmptyLines</code>
</td>
<td>
If true, lines that are completely empty will be skipped. An empty line is defined to be one which evaluates to empty string.
</td>
</tr>
<tr>
<td>
<code>chunk</code>
</td>
<td>
A callback function, identical to step, which activates streaming. However, this function is executed after every <i>chunk</i> of the file is loaded and parsed rather than every row. Works only with local and remote files. Do not use both chunk and step callbacks together. For the function signature, see the documentation for the step function.
</td>
</tr>
<tr>
<td>
<code>fastMode</code>
</td>
<td>
When enabled, fast mode executes parsing much more quickly. However, this only works for input without quoted fields. If you cannot guarantee that, do not enable fast mode.
</td>
</tr>
</table>
</div>
</div>
<divclass="clear"></div>
</div>
</section>
@ -385,45 +515,55 @@ var csv = Papa.unparse({
<hr>
<section>
<divclass="grid-container">
<divclass="grid-100">
<divclass="grid-100">
<h3id="results">The Results Object</h3>
<h4id="results">The Parse Result Object</h4>
<p>
A parse result always contains three objects: data, errors, and meta. Data and errors are arrays, and meta is an object. In the step callback, the data array will only contain one element.
</p>
</div>
<p>Parse results are always (even when streaming) provided in a roughly consistent format: an object with data, errors, and meta. When streaming, <code>results.data</code> contains only one row.</p>
<divclass="grid-100">
<h5id="results-structure">Result Structure</h5>
</div>
</div>
<divclass="grid-50">
<divclass="grid-50">
<h4id="results-structure">Results structure</h4>
<pre><codeclass="language-javascript">{
<codeclass="block">{
data: // array of parsed data
data: <spanclass="comment">// array of parse results</span>
errors: // array of errors
errors: <spanclass="comment">// array of errors</span>
meta: // object with extra info
meta: <spanclass="comment">// object with extra info</span>
}</code></pre>
}</code>
</div>
</div>
<divclass="grid-50">
<divclass="grid-50">
<ul>
<ul>
<li><code>data</code> is an array of rows. Rows are either arrays (if <code>header: false</code>) or objects (if <code>header: true</code>). Inside a <ahref="#config">step</a> function, data will only contain one row.</li>
<li><code>data</code> is an array of rows. If header is false, rows are arrays; otherwise they are objects of data keyed by the field name.</li>
<li><code>errors</code> is an array of errors.</li>
<li><code>errors</code> is an array of <ahref="#errors">errors</a>.</li>
<li><code>meta</code> contains extra information about the parse, such as delimiter used, number of lines, whether the process was aborted, etc.
<li><code>meta</code> contains extra information about the parse, such as delimiter used, the newline sequence, whether the process was aborted, etc. Properties in this object are not guaranteed to exist in all situations.</li>
</ul>
</ul>
</div>
</div>
<divclass="clear"></div>
<divclass="clear"></div>
<divclass="grid-100">
<h5id="data">Data</h5>
</div>
<divclass="grid-50">
<divclass="grid-50">
<h4id="data">results.data</h4>
<pre><codeclass="language-javascript">// Example (header: false)
<codeclass="block"><spanclass="comment">// Example (without header)</span>
[
[
["Column 1", "Column 2"],
["Column 1", "Column 2"],
["foo", "bar"],
["foo", "bar"],
["abc", "def"]
["abc", "def"]
]
]
<spanclass="comment">// Example (with header)</span>
// Example (header: true)
[
[
{
{
"Column 1": "foo",
"Column 1": "foo",
@ -433,56 +573,64 @@ var csv = Papa.unparse({
"Column 1": "abc",
"Column 1": "abc",
"Column 2": "def"
"Column 2": "def"
}
}
]</code>
]</code></pre>
</div>
</div>
<divclass="grid-50">
<divclass="grid-50">
<ul>
<ul>
<li>If header row is enabled, and more fields are found on a row of data than in the header row, an extra field will appear in the results called <code>__parsed_extra</code>. It contains an array of all data parsed from that row that was wider than the header row.</li>
<li>If header row is enabled and more fields are found on a row of data than in the header row, an extra field will appear in that row called <code>__parsed_extra</code>. It contains an array of all data parsed from that row that extended beyond the header row.</li>
<li>Using <code>dynamicTyping: true</code> will turn numeric and boolean data into number and boolean types, respectively. Otherwise, all parsed data is string.</li>
row: 0, <spanclass="comment">// Row index of parsed data where error is</span>
row: 0, // Row index of parsed data where error is
<!--index: 0 <span class="comment">// Character index within original input</span>-->
<!--index: 0 // Character index within original input-->
}</code>
}</code></pre>
</div>
</div>
<divclass="grid-50">
<divclass="grid-50">
<ul>
<ul>
<li>The error <code>type</code> will be one of "Quotes", "Delimiter", or "FieldMismatch".</li>
<li>The error <code>type</code> will be one of "Quotes", "Delimiter", or "FieldMismatch".</li>
<li>The <code>code</code> may be "MissingQuotes", "UndetectableDelimiter", "TooFewFields", or "TooManyFields" (depending on the error type).</li>
<li>The <code>code</code> may be "MissingQuotes", "UndetectableDelimiter", "TooFewFields", or "TooManyFields" (depending on the error type).</li>
<!--<li><code>index</code> may not be available on all error messages because some errors are only generated after parsing is already complete.</li>-->
<!--<li><code>index</code> may not be available on all error messages because some errors are only generated after parsing is already complete.</li>-->
<li>Just because errors are generated does not necessarily mean that parsing failed! Papa is strong, and usually parsing only bombs hard if the input has sloppy quotes. In other words, MissingQuotes is usually a bad sign.</li>
<li>Just because errors are generated does not necessarily mean that parsing failed. The worst error you can get is probably MissingQuotes.</li>
<p>There's a few other things that Papa exposes for you that weren't explained above.</p>
<p>
There's a few other things that Papa exposes to you that weren't explained above.
</p>
</div>
</div>
<divclass="grid-100">
<h5id="readonly">Read-Only</h5>
</div>
<divclass="grid-50">
<p>
<divclass="grid-100">
These are provided as a convenience and should remain read-only, but <b>feel free to use them</b>:
<table>
</p>
<tr>
<ul>
<th>Read-Only Property</th>
<li>
<th>Explanation</th>
<code>Papa.BAD_DELIMITERS</code>
</tr>
An array of characters that are not allowed as delimiters (or comment characters).
<tr>
</li>
<td><code>Papa.BAD_DELIMITERS</code></td>
<li>
<td>
<code>Papa.RECORD_SEP</code>
An array of characters that are not allowed as delimiters.
</td>
</tr>
<tr>
<td><code>Papa.RECORD_SEP</code></td>
<td>
The true delimiter. Invisible. ASCII code 30. Should be doing the job we strangely rely upon commas and tabs for.
The true delimiter. Invisible. ASCII code 30. Should be doing the job we strangely rely upon commas and tabs for.
</li>
</td>
<li>
</tr>
<code>Papa.UNIT_SEP</code>
<tr>
<td><code>Papa.UNIT_SEP</code></td>
<td>
Also sometimes used as a delimiting character. ASCII code 31.
Also sometimes used as a delimiting character. ASCII code 31.
</li>
</td>
<li>
</tr>
<code>Papa.WORKERS_SUPPORTED</code>
<tr>
<td><code>Papa.WORKERS_SUPPORTED</code></td>
<td>
Whether or not the browser supports HTML5 Web Workers. If false, <code>worker: true</code> will have no effect.
Whether or not the browser supports HTML5 Web Workers. If false, <code>worker: true</code> will have no effect.
</li>
</td>
</ul>
</tr>
<tr>
<td><code>Papa.SCRIPT_PATH</code></td>
<td>
The relative path to Papa Parse. This is automatically detected when Papa Parse is loaded synchronously. However, if you load Papa Parse asynchronously (e.g. with RequireJS), you need to set this variable manually in order to use Web Workers.
</td>
</tr>
</table>
</div>
<p>
Some settings you may change:
</p>
<ul>
<divclass="grid-100">
<li>
<h5id="configurable">Configurable</h5>
<code>Papa.LocalChunkSize</code>
</div>
<divclass="grid-100">
<table>
<tr>
<th>Configurable Property</th>
<th>Explanation</th>
</tr>
<tr>
<td><code>Papa.LocalChunkSize</code></td>
<td>
The size in bytes of each file chunk. Used when streaming files obtained from the DOM that exist on the local computer. Default 10 MB.
The size in bytes of each file chunk. Used when streaming files obtained from the DOM that exist on the local computer. Default 10 MB.
</li>
</td>
<li>
</tr>
<code>Papa.RemoteChunkSize</code>
<tr>
<td><code>Papa.RemoteChunkSize</code></td>
<td>
Same as LocalChunkSize, but for downloading files from remote locations. Default 5 MB.
Same as LocalChunkSize, but for downloading files from remote locations. Default 5 MB.
</li>
</td>
<li>
</tr>
<code>Papa.DefaultDelimiter</code>
<tr>
The delimiter used when one is not specified and it cannot be detected automatically. Default is comma <code>","</code>.
<td><code>Papa.DefaultDelimiter</code></td>
</li>
<td>
The delimiter used when one is not specified and it cannot be detected automatically. Default is comma.
</td>
</tr>
<tr>
<td><code>Papa.WORKERS_SUPPORTED</code></td>
<td>
Whether or not the browser supports HTML5 Web Workers. If false, <code>worker: true</code> will have no effect.
</td>
</tr>
</table>
</div>
</div>
<divclass="grid-50">
<p>
<divclass="grid-100">
The following items are for internal use and testing only. <b>It is not recommended that you use them unless you're familiar with the underlying code base:</b>
<h5id="internal">For Internal Use Only</h5>
</p>
</div>
<ul>
<li>
<divclass="grid-100">
<code>Papa.Parser</code>
<table>
The core parsing component. Careful, it's fast.
<tr>
</li>
<th>Internal Property</th>
<li>
<th>Explanation</th>
<code>Papa.ParserHandle</code>
</tr>
<tr>
<td><code>Papa.Parser</code></td>
<td>
The core parsing component. Careful, it's fast and under rigorous test.
</td>
</tr>
<tr>
<td><code>Papa.ParserHandle</code></td>
<td>
A wrapper over the Parser which provides dynamic typing and header row support.
A wrapper over the Parser which provides dynamic typing and header row support.
</li>
</td>
<li>
</tr>
<code>Papa.NetworkStreamer</code>
<tr>
<td><code>Papa.NetworkStreamer</code></td>
<td>
Facilitates downloading and parsing files in chunks over the network with XMLHttpRequest.
Facilitates downloading and parsing files in chunks over the network with XMLHttpRequest.
</li>
</td>
<li>
</tr>
<code>Papa.FileStreamer</code>
<tr>
<td><code>Papa.FileStreamer</code></td>
<td>
Similar to NetworkStreamer, but for local files, and using the HTML5 FileReader.
Similar to NetworkStreamer, but for local files, and using the HTML5 FileReader.
</li>
</td>
</ul>
</tr>
</table>
</div>
</div>
<divclass="clear"></div>
<divclass="clear"></div>
</div>
</div>
</section>
</main>
</main>
<footer>
<footer>
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
There's a thousand CSV libraries for Javascript. Papa is different. It's written with correctness and performance in mind. Papa is the first (and so far only) multi-threaded CSV parser that runs on web pages. It can parse files gigabytes in size without crashing the browser. It correctly handles malformed or edge-case CSV text. It can parse files on the local file system or download them over the Internet. Papa is boss.
There's a thousand CSV libraries for Javascript. Papa is different. It's written with correctness and performance in mind. Papa is the first (and so far only) multi-threaded CSV parser that runs on web pages. It can parse files gigabytes in size without crashing the browser. It correctly handles malformed or edge-case CSV text. It can parse files on the local file system or download them over the Internet. Papa is boss.
</p>
</p>
@ -68,38 +72,43 @@
As of version 4, Papa Parse is the <ahref="http://jsperf.com/javascript-csv-parsers/4">fastest CSV parser</a> for the browser, whereas it used to be the slowest.
As of version 4, Papa Parse is the <ahref="http://jsperf.com/javascript-csv-parsers/4">fastest CSV parser</a> for the browser, whereas it used to be the slowest.
</p>
</p>
<h4id="nodejs">Can I use Papa Parse server-side with Node.js?</h4>
<h6id="nodejs">Can I use Papa Parse server-side with Node.js?</h6>
<p>
<p>
There's a fork of Papa called <ahref="https://github.com/Rich-Harris/BabyParse"target="_blank">Baby Parse</a> which is <ahref="https://www.npmjs.org/package/babyparse">published on npm</a>. Some features are unavailable (like worker threads and file opening/downloading), but the core parser is functional.
There's a fork of Papa called <ahref="https://github.com/Rich-Harris/BabyParse"target="_blank">Baby Parse</a> which is <ahref="https://www.npmjs.org/package/babyparse">published on npm</a>. Some features are unavailable (like worker threads and file opening/downloading), but the core parser is functional.
</p>
</p>
<h4id="dependencies">Does Papa Parse have any dependencies?</h4>
<h6id="dependencies">Does Papa Parse have any dependencies?</h6>
<p>
<p>
No. Papa Parse has no dependencies. If jQuery is present, however, it plugs in to make it easier to select files from the DOM.
No. Papa Parse has no dependencies. If jQuery is present, however, it plugs in to make it easier to select files from the DOM.
</p>
</p>
<h4id="browsers">Which browsers is it compatible with?</h4>
<h6id="browsers">Which browsers is it compatible with?</h6>
<p>
<p>
All modern, competent browsers should support all of the features. However, as usual, use IE at your own risk. It looks like IE 10+ and Safari 6+ should support all the features. Firefox and Chrome should work with all features back to versions 3 and 4. Opera 11 and up should be fine. If you really need to use Papa in old IE or Opera, then keep the fancy features off and you may be in luck.
All modern, competent browsers should support all of the features. However, as usual, use IE at your own risk. It looks like IE 10+ and Safari 6+ should support all the features. Firefox and Chrome should work with all features back to versions 3 and 4. Opera 11 and up should be fine. If you really need to use Papa in old IE or Opera, then keep the fancy features off and you may be in luck.
</p>
</p>
<h4id="async">Can Papa Parse be loaded asynchronously (after the page loads)?</h4>
<h6id="async">Can Papa Parse be loaded asynchronously (after the page loads)?</h6>
<p>
Yes. But if you want to use Web Workers, you'll need to specify the relative path to Papa Parse. To do this, set <ahref="/docs#readonly">Papa.SCRIPT_PATH</a> to the relative path of the Papa Parse file. In synchronous loading, this is automatically detected.
</p>
<h6id="combine">Can I build Papa Parse into the same file with other JS dependencies?</h6>
<p>
<p>
Not without some minor modifications. When Papa Parse loads, it has to obtain the script's relative path in order to facilitate worker threads. If you don't need this feature, Papa can be loaded asynchronously by removing or commenting a couple lines. If you do need workers, you can just hardcode the script's path. See <ahref="https://github.com/mholt/PapaParse/issues/69#issuecomment-49886575">issue 69</a> and <ahref="https://github.com/mholt/PapaParse/issues/87">issue 87</a> for more information.
Yes, but then don't use the Web Worker feature unless your other dependencies are battle-hardened for worker threads.
</p>
</p>
<h4id="open-source">Is it open source? (Can I contribute something?)</h4>
<h6id="open-source">Is it open source? (Can I contribute something?)</h6>
<p>
<p>
Yes, please! I don't want to do this all by myself. Head over to the <ahref="https://github.com/mholt/PapaParse">GitHub project page</a> and hack away. If you're making a significant change, open an issue first so we can talk about it.
Yes, please! I don't want to do this all by myself. Head over to the <ahref="https://github.com/mholt/PapaParse">GitHub project page</a> and hack away. If you're making a significant change, open an issue first so we can talk about it.
</p>
</p>
<h4id="fast-mode">Why wouldn't I always enable fast mode?</h4>
<h6id="fast-mode">Why wouldn't I always enable fast mode?</h6>
<p>
<p>
Fast mode makes <ahref="http://jsperf.com/javascript-csv-parsers/3">Papa Parse screaming fast</a>, but you wouldn't want to use it if there are (or may be) quoted fields in your input. Fast mode is fast because it makes one major assumption: no quoted fields. But if you know that your input has no quotes, turn that sucker on. With fast mode on, 1 GB files can be parsed in about 20 seconds.
Fast mode makes <ahref="http://jsperf.com/javascript-csv-parsers/3">Papa Parse screaming fast</a>, but you wouldn't want to use it if there are (or may be) quoted fields in your input. Fast mode is fast because it makes one major assumption: no quoted fields. But if you know that your input has no quotes, turn that sucker on. With fast mode on, 1 GB files can be parsed in about 20 seconds.
</p>
</p>
@ -112,28 +121,26 @@
<br><br>
<h4id="streaming">Streaming</h4>
<h6>Can Papa load and parse huge files?</h6>
<hr>
<h3id="streaming">Streaming</h3>
<h4>Can Papa load and parse huge files?</h4>
<p>
<p>
Yes. Parsing huge text files is facilitated by <i>streaming</i>, where the file is loaded a little bit at a time, parsed, and the results are sent to your <ahref="/docs#config">step</a> callback function, row-by-row.
Yes. Parsing huge text files is facilitated by <i>streaming</i>, where the file is loaded a little bit at a time, parsed, and the results are sent to your <ahref="/docs#config">step</a> callback function, row-by-row.
</p>
</p>
<h4>How do I stream my input?</h4>
<h6>How do I stream my input?</h6>
<p>
<p>
Just specify a <ahref="/docs#config">step</a> callback function. Results will <i>not</i> be available after parsing is finished, however. You have to inspect the results one row at a time.
Just specify a <ahref="/docs#config">step</a> callback function. Results will <i>not</i> be available after parsing is finished, however. You have to inspect the results one row at a time.
</p>
</p>
<h4>What is a stream and when should I stream files?</h4>
<h6>What is a stream and when should I stream files?</h6>
<p>
<p>
A stream is a unique data structure which, given infinite time, gives you infinite space.
A stream is a unique data structure which, given infinite time, gives you infinite space.
So if you're short on memory (as browsers often are), use a stream.
So if you're short on memory (as browsers often are), use a stream.
</p>
</p>
<h4>Wait, does that mean streaming takes more time?</h4>
<h6>Wait, does that mean streaming takes more time?</h6>
<p>
<p>
Yes and no. Typically, when we gain speed, we pay with space. The opposite is true, too. Streaming uses significantly less memory with large inputs, but since the reading happens in chunks and results are processed at each row instead of at the very end, yes, it can be slower.
Yes and no. Typically, when we gain speed, we pay with space. The opposite is true, too. Streaming uses significantly less memory with large inputs, but since the reading happens in chunks and results are processed at each row instead of at the very end, yes, it can be slower.
</p>
</p>
@ -144,26 +151,26 @@
So unless your clients have <ahref="http://google.com/fiber">a fiber line</a> and you have a scalable cloud application, local parsing by streaming is nearly guaranteed to be faster.
So unless your clients have <ahref="http://google.com/fiber">a fiber line</a> and you have a scalable cloud application, local parsing by streaming is nearly guaranteed to be faster.
</p>
</p>
<h4>How do I get all the results together after streaming?</h4>
<h6>How do I get all the results together after streaming?</h6>
<p>
<p>
You don't. Unless you assemble it manually. And really, don't do that... it defeats the purpose of using a stream. Just take the parts you need as they come through.
You don't. Unless you assemble it manually. And really, don't do that... it defeats the purpose of using a stream. Just take the parts you need as they come through.
</p>
</p>
<h4>Does Papa use a true stream?</h4>
<h6>Does Papa use a true stream?</h6>
<p>
<p>
Papa uses HTML5's FileReader API which uses a stream. FileReader doesn't technically allow us to hook into the underlying stream, but it does let us load the file in pieces. But fortunately you don't have to worry about that; it's all taken care of for you. Just take the results one row at a time.
Papa uses HTML5's FileReader API which uses a stream. FileReader doesn't technically allow us to hook into the underlying stream, but it does let us load the file in pieces. But fortunately you don't have to worry about that; it's all taken care of for you. Just take the results one row at a time.
</p>
</p>
<h4>Can I stream files over a network or the Internet?</h4>
<h6>Can I stream files over a network or the Internet?</h6>
<p>
<p>
Yes, Papa Parse supports this. It will download a file in pieces using HTTP's standard Range header, then pass the parsed results to your step function just like a local file. However, these requests may not work cross-origin (different domain/hostname), depending on the server's configuration.
Yes, Papa Parse supports this. It will download a file in pieces using HTTP's standard Range header, then pass the parsed results to your step function just like a local file. However, these requests may not work cross-origin (different domain/hostname), depending on the server's configuration.
</p>
</p>
<p>
<p>
Streaming remote files also requires the Content-Range header in the server's response. Most production-ready servers support this header, but Python's SimpleHTTPServer does not. If you need a quick and easy server, <ahref="https://github.com/rif/spark">spark</a> will do the trick.<code>./spark . -port=4100</code>
Streaming remote files also requires the Content-Range header in the server's response. Most production-ready servers support this header, but Python's SimpleHTTPServer does not. If you need a quick and easy server, <ahref="https://github.com/rif/spark">spark</a> will do the trick:<code>$ ./spark</code>
</p>
</p>
<h4>Can I pause and resume parsing?</h4>
<h6>Can I pause and resume parsing?</h6>
<p>
<p>
Yes, as long as you are streaming and not using a worker. Your <ahref="/docs#step">step callback</a> is passed a ParserHandle which has pause, resume, and abort functions.
Yes, as long as you are streaming and not using a worker. Your <ahref="/docs#step">step callback</a> is passed a ParserHandle which has pause, resume, and abort functions.
</p>
</p>
@ -173,23 +180,20 @@
<br><br>
<h4id="workers">Multi-Threading (Workers)</h4>
<h6>What is a web worker? Why use one?</h6>
<hr>
<h3id="workers">Multi-threading (Workers)</h3>
<h4>What is a web worker? Why use one?</h4>
<p>
<p>
<ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">HTML5 Web Workers</a> facilitate basic multi-threading in the browser. This means that a web page can spawn a new thread in the operating system that runs Javascript code. This is highly beneficial for long-running scripts that would otherwise lock up the web page.
<ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">HTML5 Web Workers</a> facilitate basic multi-threading in the browser. This means that a web page can spawn a new thread in the operating system that runs Javascript code. This is highly beneficial for long-running scripts that would otherwise lock up the web page.
</p>
</p>
<h4>How do I use a worker?</h4>
<h6>How do I use a worker?</h6>
<p>
<p>
Just specify <code>worker: true</code> in your <ahref="/docs#config">config</a>. You'll also need to make a <code>complete</code> callback (unless you're streaming) so that you can get the results, because using a worker makes the parse function asynchronous.
Just specify <code>worker: true</code> in your <ahref="/docs#config">config</a>. You'll also need to make a <code>complete</code> callback (unless you're streaming) so that you can get the results, because using a worker makes the parse function asynchronous.
</p>
</p>
<h4>When should I use a worker?</h4>
<h6>When should I use a worker?</h6>
<p>
<p>
That's up to you. The most typical reason to use a web worker is if your web page becomes unresponsive during parsing. In other words, if it freezes and you can't click things or the scrolling becomes choppy. If that happens, some browsers (like Firefox) will warn the user that a script has become unresponsive or is taking a long time (even if it's working properly). If this happens to you or some of your users, consider using a web worker, at least for the large inputs.
That's up to you. The most typical reason to use a web worker is if your web page becomes unresponsive during parsing. In other words, if it freezes and you can't click things or the scrolling becomes choppy. If that happens, some browsers (like Firefox) will warn the user that a script has become unresponsive or is taking a long time (even if it's working properly). If this happens to you or some of your users, consider using a web worker, at least for the large inputs.
</p>
</p>
@ -197,7 +201,7 @@
However, read the next answer for more info. Using workers has performance implications (both good and bad).
However, read the next answer for more info. Using workers has performance implications (both good and bad).
</p>
</p>
<h4>What are the performance implications of using a worker thread?</h4>
<h6>What are the performance implications of using a worker thread?</h6>
<p>
<p>
Using a worker will be a little slower. In Javascript, threads don't share memory. That's really annoying because sharing memory is the primary reason for multi-threading. As such, all parse results in a worker thread need to be <i>copied</i> to the main thread. And if you're parsing a string in a worker thread, that string also needs to be copied into the worker in the first place. (Files will be opened or downloaded by the worker itself, so the input doesn't need to be copied from the main thread in those cases.)
Using a worker will be a little slower. In Javascript, threads don't share memory. That's really annoying because sharing memory is the primary reason for multi-threading. As such, all parse results in a worker thread need to be <i>copied</i> to the main thread. And if you're parsing a string in a worker thread, that string also needs to be copied into the worker in the first place. (Files will be opened or downloaded by the worker itself, so the input doesn't need to be copied from the main thread in those cases.)
</p>
</p>
@ -209,26 +213,57 @@
</p>
</p>
<h4>Can I stream and use a worker at the same time?</h4>
<h6>Can I stream and use a worker at the same time?</h6>
<p>
<p>
Yup. If the input is too large to fit in memory (or large enough to crash the browser), streaming is <i>always</i> the answer, even in a worker thread. Workers keep the page reactive. Streaming makes it able to fit in memory. Use both if you need to.
Yup. If the input is too large to fit in memory (or large enough to crash the browser), streaming is <i>always</i> the answer, even in a worker thread. Workers keep the page reactive. Streaming makes it able to fit in memory. Use both if you need to.
</p>
</p>
</div>
</div>
</div>
</div>
</main>
</main>
<br><br><br>
<footer>
<footer>
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
<iclass="fa fa-download"></i> Get Papa Parse on GitHub
<pclass="lover">
</a>
<ahref="https://smartystreets.com">SmartyStreets</a> verifies addresses, many of which are in CSV files. Papa Parse can process huge files in the browser. <i>"We rapidly built an awesome client-side file processor with Papa Parse."</i>
<ahref="/demo"class="button red">
</p>
<iclass="fa fa-bolt"></i> Try the demo
</a>
</div>
</div>
<divclass="grid-33">
<pclass="lover">
<ahref="http://jannah.github.io/MetaReader/">MetaReader</a> helps you see your data from a meta level before you start detailed analysis. <i>"Papa Parse made it very easy to load and ready user CSV files in the browser on the client side."</i>
</p>
</div>
<divclass="grid-33">
<pclass="lover">
<ahref="http://jannah.github.io/MetaReader/">EpiML</a> is an agent-based mathematical model for the web, still in its early stages of development. <i>"Papa makes it so easy to use CSV, which is good for scientists."</i>
</p>
</div>
<divclass="clear"></div>
<divclass="clear"></div>
<divclass="grid-100 text-center">
<br>
<b><ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js"class="add-lover-link subheader"><iclass="fa fa-plus-square"></i> Add your link (it's free)</a></b>
</div>
</div>
</section>
<sectionid="parse">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>CSV Parsing</h4>
<h5>"Isn't parsing CSV just <code>String.split(',')</code>?"</h5>
<divclass="grid-40 suffix-5">
<p>Heavens, no. Papa does it right. Just pass in the CSV string with an optional <ahref="/docs#config">configuration</a>.</p>
That's okay. Papa will scan the first few rows of input to find the right delimiter for you. You can also set the delimiting character manually. Either way, the delimiter used is returned with every result set.
<h5>"Great, but I have a <i>file</i> to parse."</h5>
<p>Then give Papa a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/File">File</a> instead of a string. Since file parsing is asynchronous, don't forget a callback.</p>
That's what streaming is for. Specify a <code>step</code> callback to receive the results row-by-row. This way, you won't load the whole file into memory and crash the browser.
<p>That's what streaming is for. Specify a step callback to receive the results row-by-row. This way, you won't load the whole file into memory and crash the browser.</p>
<p>That happens when a long-running script is executing in the same thread as the page. Use a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">Worker</a> thread by specifying <code>worker: true</code>. It may take slightly longer, but your page will stay reactive.</p>
Oh. Yeah, that happens when a long-running script is executing in the same thread. Use a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">Worker</a> thread by specifying <code>worker: true</code>. It may take slightly longer, but your page will stay reactive.
</p>
</div>
<divclass="grid-55">
<codeclass="block">Papa.parse(bigFile, {
worker: true,
worker: true,
step: function(row) {
step: function(row) {
console.log("Row:", row.data);
console.log("Row:", row.data);
@ -276,174 +325,220 @@ console.log(results);
complete: function() {
complete: function() {
console.log("All done!");
console.log("All done!");
}
}
});</code>
});</code></pre>
</div>
</div>
<divclass="clear"></div>
</div>
<hr>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="header">Header Rows</div>
<sectionid="header">
<h4>"Great! Now I want data keyed by field name."</h4>
<divclass="grid-container narrow-grid">
<p>
<divclass="grid-100">
You can tell Papa that there is a header row.
<h4>Header Row</h4>
</p>
<h5>"Great! Now I want data keyed by field name."</h5>
</div>
<divclass="grid-55">
<p>If you tell Papa there is a header row, each row will be organized by field name instead of index.</p>
<codeclass="block"><spanclass="comment">// Key data by field name instead of index/position</span>
<pre><codeclass="language-javascript">// Key data by field name instead of index/position
<h4>"Hey, these numbers are all parsed as strings."</h4>
<divclass="grid-container narrow-grid">
<p>
<divclass="grid-100">
Everything is parsed as strings. If you need the convenience, you can have numeric and boolean data automatically converted to number and boolean types.
<h4>Type Conversion</h4>
</p>
<h5>"Hey, these numbers are parsed as strings."</h5>
</div>
<divclass="grid-55">
<p><i>Everything</i> is parsed as strings. If you want numbers and booleans, you can enable dynamic typing to do the conversion for you.</p>
<codeclass="block"><spanclass="comment">// All parsed data is normally returned as a string.
// Dynamic typing converts numbers to numbers
<pre><codeclass="language-javascript">// Converts numeric/boolean data
// and booleans to booleans.</span>
var results = Papa.parse(csv, {
var results = Papa.parse(csv, {
dynamicTyping: true
dynamicTyping: true
});</code>
});</code></pre>
</div>
</div>
<divclass="clear"></div>
</div>
<hr>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="comments">Comments</div>
<h4>"I forgot to mention: my CSV files have comments in them."</h4>
<sectionid="comments">
<p>
<divclass="grid-container narrow-grid">
Okay, first off: that's really weird. But you can skip those lines... just specify the comment character.
<divclass="grid-100">
</p>
<h4>Comments</h4>
</div>
<h5>"I forgot to mention: my CSV files have comments in them."</h5>
<divclass="grid-55">
<codeclass="block"><spanclass="comment">// Mostly found in academia, some CSV files
<p>Okay, first off: that's really weird. But fortunately, you can skip those lines... just specify the comment string.</p>
// may have commented lines in them</span>
<pre><codeclass="language-javascript">// Mostly found in academia, some CSV files
// may have commented lines in them
var results = Papa.parse(csv, {
var results = Papa.parse(csv, {
comments: "#"
comments: "#"
});</code>
});</code></pre>
</div>
</div>
<divclass="clear"></div>
</div>
<hr>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="errors">Error handling</div>
<sectionid="errors">
<h4>"I'm getting tired, are we done—aw, shoot. Errors."</h4>
<divclass="grid-container narrow-grid">
<p>
<divclass="grid-100">
Yeah, almost done. Fortunately, Papa handles errors pretty well. The <ahref="http://tools.ietf.org/html/rfc4180">CSV standard</a> is somewhat <strike>loose</strike> ambiguous, so Papa tries to consider the edge cases. For example, mismatched fields aren't always the end of the world.
<h4>Error Handling</h4>
</p>
<h5>"Aw, shoot. Errors."</h5>
</div>
<divclass="grid-55">
<p>Papa handles errors pretty well. The <ahref="http://tools.ietf.org/html/rfc4180">CSV standard</a> is somewhat <strike>loose</strike> ambiguous, so Papa is designed for edge cases. For example, mismatched fields won't break parsing.</p>
<codeclass="block"><spanclass="comment">// Example error:</span>
<pre><codeclass="language-javascript">// Example error:
{
{
type: "FieldMismatch",
type: "FieldMismatch",
code: "TooManyFields",
code: "TooManyFields",
message: "Expected 3 fields, but parsed 4",
message: "Expected 3 fields, but parsed 4",
row: 1
row: 1
}</code>
}</code></pre>
</div>
</div>
<divclass="clear"></div>
</div>
<hr>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="jquery">jQuery Plugin</div>
<sectionid="jquery">
<h4>"Can I use Papa with jQuery?"</h4>
<divclass="grid-container narrow-grid">
<p>
<divclass="grid-100">
Sure, but it's not required. You can use jQuery to select file input elements and then parse their files. Papa exposes its file parsing API as a jQuery plugin only when jQuery is defined. Papa Parse has <b>no dependencies</b>.
<h4>jQuery Plugin</h4>
</p>
<h5>"Can I use Papa with jQuery?"</h5>
</div>
<divclass="grid-55">
<p>Sure, but it's not required. You can use jQuery to select file input elements and then parse their files. Papa exposes its file parsing API as a jQuery plugin only when jQuery is defined. Papa Parse has <b>no dependencies</b>.</p>
<iclass="fa fa-download"></i> Get Papa Parse on GitHub
<iclass="fa fa-github"></i> GitHub
</a>
</a>
<ahref="/demo"class="button red">
<ahref="/demo"class="button red">
<iclass="fa fa-bolt"></i> Try the demo
<iclass="fa fa-magic"></i> Demo
</a>
<ahref="/docs"class="button gray">
<iclass="fa fa-book"></i> Documentation
</a>
</a>
</div>
</div>
<divclass="clear"></div>
</section>
</div>
</main>
</main>
<footer>
<footer>
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
description:"verifies addresses, many of which are submitted in CSV files. Papa Parse can process files with over a million addresses in the browser.",
quote:"Because of Papa Parse, we rapidly built an awesome client-side list processing service."
},
{
link:"http://jannah.github.io/MetaReader",
name:"MetaReader",
description:"helps you see your data from a meta perspective before you start detailed analysis.",
quote:"Papa Parse made it very easy to load and ready user CSV files in the browser on the client side."
},
{
link:"https://github.com/JamesJansson/EpiML",
name:"EpiML",
description:"is an agent-based mathematical model for the web, still in its early stages of development.",
quote:"Papa makes it so easy to use CSV, which is good for scientists."
},
{
link:"https://wikipedia.org",
name:"Wikipedia",
description:"uses Papa Parse in VisualEditor to help article editors effortlessly build data tables from text files."