Adapter: Databricks

Installation

harlequin-databricks depends on harlequin, so installing this package will also install Harlequin.

Using pip

To install this adapter into an activated virtual environment:

$ pip install harlequin-databricks

Using poetry

$ poetry add harlequin-databricks

Using pipx

If you do not already have Harlequin installed:

$ pipx install harlequin[databricks]

If you would like to add the Databricks adapter to an existing Harlequin installation:

$ pipx inject harlequin harlequin-databricks

As an Extra

Alternatively, you can install Harlequin with the databricks extra:

$ pip install harlequin[databricks]
$ poetry add harlequin[databricks]
$ pipx install harlequin[databricks]

Usage and Configuration

For a minimum connection you are going to need:

  • server-hostname
  • http-path
  • access-token
$ harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --access-token dabpi***

Authentication is also possible using a username and password (known as basic authentication):

$ harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --username my_user --password my_pass

Or by using OAuth user-to-machine (U2M) authentication:

$ harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --auth-type databricks-oauth

For more details on command line options, run:

$ harlequin --help

Using Unity Catalog and experiencing slow legacy hive_metastore indexing?

Indexing legacy metastores is slow on Databricks because it requires a SQL call for every table in the legacy metastore to extract column metadata. This means refreshing Harlequin’s Data Catalog pane takes a long time for Databricks instances with lots of tables in legacy metastores like hive_metastore.

If your Databricks instance runs Unity Catalog, and you only want the Unity Catalog assets listed in the Data Catalog pane, supply the --skip-legacy-indexing CLI flag when loading Harlequin.

This flag means only Unity Catalogs will be indexed - legacy metastores will not appear.

Indexing Unity Catalogs is a super-fast operation requiring Harlequin to send only two SQL queries to Databricks because of Information Schema.

Issues and Contributing

Head over to the alexmalins/harlequin-databricks repo on GitHub.