How to Connect Azure Databricks to Data Lake Using SAS Token

August 4, 2024
Posted by user

Azure Databricks is a powerful platform for big data analytics and machine learning, and Azure Data Lake Storage (ADLS) provides a highly scalable and secure data lake for big data analytics. In this blog post, we will explore how to connect Azure Databricks to Azure Data Lake using a Shared Access Signature (SAS) token and how to list the contents of an Azure Data Lake container, specifically focusing on CSV file data.

Prerequisites

An active Azure subscription.
Azure Databricks workspace.
Azure Data Lake Storage Gen2 account with a container.

Step 1: Generate a SAS Token for Azure Data Lake

Navigate to your Azure Data Lake Storage account in the Azure portal.
Select the container you want to connect to.
Click on ‘Generate SAS’ using 3 dot.

Set the required permissions, expiry time, and other parameters.

A screenshot of a computer

Description automatically generated

Click ‘Generate SAS token and URL’.
Copy the SAS token (the part after? in the generated URL).

Step 2: Configure Azure Databricks

Launch your Azure Databricks workspace.
Create a new cluster or use an existing one.

Step 3: Connect to Azure Data Lake Storage Using SAS Token

To connect Azure Databricks to your Azure Data Lake Storage using a SAS token, you need to configure the SAS token and use it in your Databricks notebook.

Open a new Databricks notebook.
Set the SAS token and the storage account details as follows

spark.conf.set("fs.azure.account.auth.type..dfs.core.windows.net", "SAS") spark.conf.set("fs.azure.sas.token.provider.type..dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider") spark.conf.set("fs.azure.sas.fixed.token..dfs.core.windows.net", "SAS_TOKEN")

with the Azure Storage account name.
with the Azure Databricks secret scope name.
with the name of the key containing the Azure storage SAS token.

Below is the example

How to Connect Azure Databricks to Data Lake Using SAS Token

Step 2: Configure Azure Databricks

Step 3: Connect to Azure Data Lake Storage Using SAS Token

Step 4: List Contents of the Azure Data Lake Container

Introduction to PySpark an easy way

Creating External File Formats in using (Transact-SQL)

Leave a Reply Cancel reply

How to Connect Azure Databricks to Data Lake Using SAS Token

Step 2: Configure Azure Databricks

Step 3: Connect to Azure Data Lake Storage Using SAS Token

Step 4: List Contents of the Azure Data Lake Container

Related Posts

Introduction to PySpark an easy way

Creating External File Formats in using (Transact-SQL)

Leave a Reply Cancel reply