LOAD CSV Cypher clause
The LOAD CSV
clause enables you to load and use data from a CSV file of your
choosing in a row-based manner, within a query. We support the Excel CSV dialect,
as it's the most commonly used one.
The syntax of the clause is:
LOAD CSV FROM <csv-file-path> ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER <delimiter-string>] [QUOTE <quote-string>] AS <variable-name>
<csv-file-path
is a string holding the path to the CSV file. There are no restrictions on where in your filesystem the file can be located, as long as the path is valid (i.e. the file exists).( WITH | NO ) HEADER
flag specifies whether the CSV file is to be parsed as though it has or hasn't got a header.IGNORE BAD
flag specifies whether rows containing errors should be ignored or not. If it's set, the parser attempts to return the first valid row from the CSV file. If it isn't set, an exception will be thrown on first invalid row encountered.DELIMITER <delimiter-string>
option enables you to specify the CSV delimiter character. If it isn't set, the default delimiter character,
is assumed.QUOTE <quote-string>
option enables you to specify the CSV quote character. If it isn't set, the default quote character"
is assumed.<variable-name>
is a symbolic name representing the variable to which the contents of the parsed row will be bound to, enabling access to the row contents later in the query.
The clause reads row by row from a CSV file, and binds the contents of the parsed row to the variable you specified.
It's important to note that the parser parses the values as strings. It's up to
the user to convert the parsed row values to the appropriate type. This can be
done using the built-in conversion functions such as ToInteger
, ToFloat
,
ToBoolean
etc. Consult the documentation on the available conversion functions.
Depending on how you set the HEADER
option (WITH
or NO
), a row will
be parsed as either a map or a list.
If the WITH HEADER
option is set, the very first line in the file will be
parsed as the header, and any remaining rows will be parsed as regular rows.
The value bound to the row variable will be a map of the form:
{ ( "header_field" : "row_value" )? ( , "header_field" : "row_value" )* }
To access a given field, you can use the property lookup syntax. Let's assume that the CSV file contents are as follows:
x|y|z
1|2|3
4|5|6
The following query will load row by row from the file, and create a new node for each row with properties based on the parsed row values:
LOAD CSV FROM "xyz.csv" WITH HEADER DELIMITER "|" AS row
CREATE (n:A {x: ToInteger(row.x), y: ToInteger(row.y), z: ToInteger(row.z)}) ;
If the NO HEADER
option is set, then each row is parsed as a list of values.
The contents of the row can be accessed using the list index syntax. Note that
in this mode, there are no restrictions on the number of values a row contains.
This isn't recommended, as the user must manually handle the varying number of
values in a row.
Let's assume that the CSV file contents are as follows:
1|2|3
4|5|6
The following query will load row by row from the file, and create a new node for each row with properties based on the parsed row values:
LOAD CSV FROM "xyz.csv" NO HEADER DELIMITER "|" AS row
CREATE (n:A {x: ToInteger(row[0]), y: ToInteger(row[1]), z: ToInteger(row[2])}) ;
The clause can't stand on its own, meaning there has to be at least one more clause in the query, in addition to it. In other words, the following query will throw an exception:
LOAD CSV FROM "file.csv" WITH HEADER AS row ;
On the other hand, the following query is valid:
MERGE (n:A) LOAD CSV FROM "file.csv" WITH HEADER AS row ;
Note that the row variable doesn't have to be used in any subsequent clause.
Also, it's important to note how the LOAD CSV
result exhaustion works. Let's
assume the following query:
MATCH (n)
LOAD CSV FROM "file.csv" WITH HEADER as row
SET n.p = row ;
Let's say the MATCH (n)
clause finds five nodes, whereas the "file.csv" has
only 2 rows. Then only the first two nodes returned by the MATCH (n)
will have
their properties set, using the two rows from the file.
Similarly, if the MATCH (n)
clause finds two nodes, whereas the "file.csv" has
five rows, the two nodes returned by MATCH (n)
will have their properties
set with values from the first two rows of the file.
In general, the clause that exhausts its results sooner will dictate how many
times the "loop" will be executed.
Finally, note that the LOAD CSV
clause can be used at most once per query.
Trying to use multiple LOAD CSV
clauses within a single query will throw an exception.
In other words, queries such as
LOAD CSV FROM "x.csv" WITH HEADER as x
LOAD CSV FROM "y.csv" WITH HEADER as y
CREATE (n:A { p1 : x, p2 : y });
are invalid.