关注 spark技术分享,
撸spark源码 玩spark最佳实践

Working with Datasets from JDBC Data Sources (and PostgreSQL)

Working with Datasets from JDBC Data Sources (and PostgreSQL)

Start spark-shell with the JDBC driver for the database you want to use. In our case, it is PostgreSQL JDBC Driver.

Note
Download the jar for PostgreSQL JDBC Driver 42.1.1 directly from the Maven repository.
Tip

Execute the command to have the jar downloaded into ~/.ivy2/jars directory by spark-shell itself:

The entire path to the driver file is then like /Users/jacek/.ivy2/jars/org.postgresql_postgresql-42.1.1.jar.

You should see the following while spark-shell downloads the driver.

Start ./bin/spark-shell with –driver-class-path command line option and the driver jar.

It will give you the proper setup for accessing PostgreSQL using the JDBC driver.

Execute the following to access projects table in sparkdb.

Note
Use user and password options to specify the credentials if needed.

Troubleshooting

If things can go wrong, they sooner or later go wrong. Here is a list of possible issues and their solutions.

java.sql.SQLException: No suitable driver

Ensure that the JDBC driver sits on the CLASSPATH. Use –driver-class-path as described above (--packages or --jars do not work).

PostgreSQL Setup

Note
I’m on Mac OS X so YMMV (aka Your Mileage May Vary).

Use the sections to have a properly configured PostgreSQL database.

Installation

Install PostgreSQL as described in…​TK

Caution
This page serves as a cheatsheet for the author so he does not have to search Internet to find the installation steps.

Starting Database Server

Note
Consult 17.3. Starting the Database Server in the official documentation.
Tip

Enable all logs in PostgreSQL to see query statements.

Add log_statement = 'all' to /usr/local/var/postgres/postgresql.conf on Mac OS X with PostgreSQL installed using brew.

Start the database server using pg_ctl.

Alternatively, you can run the database server using postgres.

Create Database

Tip
Consult createdb in the official documentation.

Accessing Database

Use psql sparkdb to access the database.

Execute SELECT version() to know the version of the database server you have connected to.

Use \h for help and \q to leave a session.

Creating Table

Create a table using CREATE TABLE command.

Insert rows to initialize the table with data.

Execute select * from projects; to ensure that you have the following records in projects table:

Dropping Database

Tip
Consult dropdb in the official documentation.

Stopping Database Server

赞(0) 打赏
未经允许不得转载:spark技术分享 » Working with Datasets from JDBC Data Sources (and PostgreSQL)
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏