Example Cassandra CSV Import w.Spark

I've been asked a few times over the last couple of months if there's a way to import CSV files into Cassandra without too much work. There are a number of different strategies but I thought I would look at using Spark to do this. The Spark CSV library allows the import of CSV files to create Spark RDDs. Combined with the Datastax Spark Cassandra Connector this makes it simple to import with very little code required.

This example prompts you for the name of a csv file from the command line and imports it into Cassandra. The example currently writes to a CQL table called products with the following columns:

id [int]	code [text]	description [text]	enabled[boolean]
1	FORKAN	Fork 'andles	true
2	FOOTPU	Foot Pumps	true
...	...	...	...

Obviously you can customise this to suit your needs.

Setup

To get started create the keyspace for the demo using cqlsh and the provided DDL script against your chosen Cassandra instance (local or remote):

cqlsh -h {cassandra-host} -f create-demo.sql

Once completed successfully you can now execute the program and import data.

NOTE: In order to build the project sbt must be installed.

Executing

In order to run from the command line enter sbt:

$ sbt

Then once in sbt run:

> clean

> compile

> run {csv.file=?} {spark.master=?} {cassandra.host=?} {spark.executor.memory=?} {processing.cores=?}

If you did not specify the path of the csv file to you'll then be prompted to enter the path of the file

Please enter the path of the file you wish to import > _

There is an example file included with the project called sample-products.csv enter this at the prompt and press return.

The default parameters configured within the app are as follows:

Name	default value
spark.master	local[2]
cassandra.host	127.0.0.1
executor.memory	512m
processing.cores	2
csv.file	N/A (prompted if not present)

You may override these on the command line as indicated above.

Results

Once the program has completed successfully you should see the following when performing a select:

Connected to SparkCluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.11.83 | DSE 4.6.0 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select * from demo.products;

 id | code  | description   | disabled
----+-------+---------------+----------
  5 | DEAPA |   Dead Parrot |    False
  1 | CHSGR | Cheese Grater |    False
  2 | CANOP |    Tin Opener |    False
  4 | FORCA |   Fork Handle |    False
  3 | TEAST |  Foot Pumps   |    False

(5 rows)

If this is not the result you see then check the application output (stdout) for errors.

Packaging

If you wish to run the application standalone without sbt then run the following sbt command:

sbt universal:packageBin

This will produce a zip file under the target/universal directory called spark-csv-cassandra-import-0.1.zip unzip this file to a directory of your choice and then from that directory execute the following:

Linux

bin/spark-csv-cassandra-import {cmd-line-args}

Windows (untested)

bin/spark-csv-cassandra-import.bat {cmd-line-args}

Again if you do not specify the name of the CSV file you wish to import then you'll be prompted to enter it.

Example Template

Look at src/main/scala/com/datastax/demo/mapper/ProductCSVMapAndSave.scala as a starting point to implement your own CSV mapping class

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.properties		build.properties
build.sbt		build.sbt
create-demo.cql		create-demo.cql
sample-products.csv		sample-products.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example Cassandra CSV Import w.Spark

Setup

Executing

Results

Packaging

Example Template

About

Uh oh!

Releases

Packages

Languages

License

jkds/datastax-spark-csv-importer

Folders and files

Latest commit

History

Repository files navigation

Example Cassandra CSV Import w.Spark

Setup

Executing

Results

Packaging

Example Template

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages