Skip to main content
Generating Data-Intensive Insights

Activating & Accessing Personalized Data via OP+Databricks Delta Sharing

Edgar Nuñez avatar
Written by Edgar Nuñez
Updated over a year ago

In collaboration with our partners at Databricks, Omega Point provides secure access to computational resources — such as python notebooks — that generate data-intensive insights. For mutual customers with Databricks accounts, this is achieved through Databricks Delta Sharing.

This guide will help you activate the data share, including how to retrieve your Databricks workspace ID and recipient ID to provide to Omega Point.

Prerequisites

  • Databricks Account: If you use Databricks, please have your Workspace ID ready.

  • Permissions: Ensure you have the necessary permissions within your organization in Databricks: to accept data shares and modify Unity catalog settings within your Databricks workspace.

Steps to Activate and Access the Data Share

Available Data Shares

  1. Ask your Omega Point customer success manager for a list of data shares available. Each share is scoped to a specific 'release topic' that will manage its own versions.

    For example, Omega Point's Thematic Beta Package is available as a data share, and will be separate from other topics.

Provide your Databricks information to Omega Point

  1. Work with your Omega Point customer success manager and provide them with your Databricks Workspace ID.

    You can find your Workspace ID in Databricks by navigating to Settings > Workspace settings. The Workspace ID will appear as a unique alphanumeric identifier.

  2. Look up your Databricks Recipient ID and provide this to your Omega Point customer success manager.

Verify Data in Databricks Catalog

  1. After providing your information to OP, your requested data share will be sent securely to your Databricks environment.

  2. In Databricks, navigate to your Data section and select Catalog.

    1. Your newly activated data share should appear under Shared Data.

  3. If the data does not appear immediately, allow a few moments for processing and refresh your catalog.

Configuring Your Databricks Cluster

Once notebooks are received and cloned into your workspace, it may be necessary to configure your Databricks Cluster to access Omega Point's omegapoint-utils library, available via a docker image. These steps walk through setting up a new cluster.

Navigation

On the left-hand side, select the “Compute” option

Within the “Compute” page, click the “Create Compute” button

On this page you will create a new compute cluster, use the configuration in the following section.

Recommended Minimal Configuration

  • General Settings

    • Cluster Name: <your_cluster_name>

    • Policy: Unrestricted

  • Access Mode

    • Mode: Single user

    • User: <user_name>

  • Performance

    • Databricks Runtime Version: 14.3 LTS (Scala 2.12, Spark 3.5.0)

    • Node Type: i3.xlarge (30.5 GB Memory, 4 Cores, 1 Driver)

      • This is the minimal recommended cluster size

      • Additional workers and more cores / RAM will improve runtimes (to an extent)

    • Enable Autoscaling Local Storage: Checked

    • Terminate After: 120 minutes of inactivity

      • Recommended to reduce billing for unused resources

    • Photon Acceleration: Not enabled

  • Advanced Options

    • Docker Configuration:

      • Use Your Own Docker Container: Checked

      • Docker Image URL: omegapointresearch/omegapoint-utils:latest

      • Authentication: Default

  • IAM Role Passthrough: Not enabled


Example Configuration

Notes

  • Mutual customers must run provided notebooks code on a databricks cluster

    • Confirm that the cluster is a new cluster that is enabled with Databricks Unity catalog

  • The cluster must be configured to use the omegapoint_utils docker image

    • The omegapoint_utils docker image requires no authentication to use

  • The user must provide a recipient ID to gain access to the Thematic Analysis delta share release topic (which contains sample notebooks)

Troubleshooting

I'm getting a ModuleNotFoundError, what can I do?

  • The cluster must be configured to use the omegapoint_utils docker image

    • The omegapoint_utils docker image requires no authentication to use

I'm not seeing the cluster enabled with Unity Catalog?

  • Recommend to make a new cluster using the settings above, a new cluster should be enabled with Unity Catalog, not hive metastore.

I'm getting an execution failure. "Failure starting repl. Try detaching and re-attaching the notebook"

  • The databricks runtime should be 14.3 LTS (Scala 2.12, Spark 3.5.0)

I'm getting an error for "failed to create catalog".

  • Work with your Databricks administrator to ensure you have the right permissions to your databricks environment.

Did this answer your question?