[vc_row][vc_column][vc_column_text css=”” woodmart_inline=”no” text_larger=”no”]
Creating an external file format in T-SQL is essential when working with Azure SQL Data Warehouse and Azure Synapse Analytics. It enables users to define a structure for reading data from files stored in Azure Blob Storage or Azure Data Lake, which is particularly useful when working with large datasets in distributed environments. In this blog, we’ll walk through the steps to create an external file format in T-SQL and discuss its key components, prerequisites, and potential use cases.
Table of Contents
1. Introduction to External File Formats
External file formats in T-SQL allow you to define the structure of files stored externally in services like Azure Blob Storage or Azure Data Lake. This structure guides SQL queries to understand how to read and parse the data in these files, whether they’re CSV, Parquet, ORC, or other file types. With Azure SQL Data Warehouse or Synapse Analytics, this capability enables efficient analysis on large datasets stored externally, without needing to move data directly into your database.
Why Use External File Formats?
-
- Data Accessibility: Read data directly from external storage without importing it.
- Resource Optimization: Minimize storage use and reduce costs by keeping data in cost-effective storage solutions.
- Performance: Gain the ability to perform large-scale queries and analytics on big data stored outside of SQL Database.
<a name=”prerequisites”></a>
2. Prerequisites
Before you can create an external file format in T-SQL, ensure that:
-
- You have access to Azure SQL Data Warehouse or Azure Synapse Analytics.
- The external data source (e.g., Azure Blob Storage or Azure Data Lake) is properly set up.
- You’ve created an External Data Source in T-SQL. This step defines the endpoint and credentials required to access the external storage.
3. Syntax
Creating an external file format in T-SQL involves defining:
-
- The type of file (e.g., DELIMITEDTEXT, PARQUET, ORC).
- Specific formatting details, like delimiters or row terminators, depending on the file type.
There are various Arguments we have to define.
[/vc_column_text][vc_column_text css=”” woodmart_inline=”no” text_larger=”no”]
Creating an external file format in T-SQL is essential when working with Azure SQL Data Warehouse and Azure Synapse Analytics. It enables users to define a structure for reading data from files stored in Azure Blob Storage or Azure Data Lake, which is particularly useful when working with large datasets in distributed environments. In this blog, we’ll walk through the steps to create an external file format in T-SQL and discuss its key components, prerequisites, and potential use cases.
Table of Contents
1. Introduction to External File Formats
External file formats in T-SQL allow you to define the structure of files stored externally in services like Azure Blob Storage or Azure Data Lake. This structure guides SQL queries to understand how to read and parse the data in these files, whether they’re CSV, Parquet, ORC, or other file types. With Azure SQL Data Warehouse or Synapse Analytics, this capability enables efficient analysis on large datasets stored externally, without needing to move data directly into your database.
Why Use External File Formats?
-
- Data Accessibility: Read data directly from external storage without importing it.
- Resource Optimization: Minimize storage use and reduce costs by keeping data in cost-effective storage solutions.
- Performance: Gain the ability to perform large-scale queries and analytics on big data stored outside of SQL Database.
<a name=”prerequisites”></a>
2. Prerequisites
Before you can create an external file format in T-SQL, ensure that:
-
- You have access to Azure SQL Data Warehouse or Azure Synapse Analytics.
- The external data source (e.g., Azure Blob Storage or Azure Data Lake) is properly set up.
- You’ve created an External Data Source in T-SQL. This step defines the endpoint and credentials required to access the external storage.
3. Syntax
Creating an external file format in T-SQL involves defining:
-
- The type of file (e.g., DELIMITEDTEXT, PARQUET, ORC).
- Specific formatting details, like delimiters or row terminators, depending on the file type.
There are various Arguments we have to define.
[/vc_column_text][vc_tta_tabs alignment=”center” active_section=”1″ css=”.vc_custom_1730577325512{margin-top: -50px !important;}”][vc_tta_section title=”Delimited Text” tab_id=”1730574921644-eea7dd96-7447″][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”67268073e0df1″ woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODA3M2UwZGYxIiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”]-- Create an external file format for DELIMITED (CSV/TSV) files.
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = DELIMITEDTEXT
[ , FORMAT_OPTIONS (
[ , DATA_COMPRESSION = {
'org.apache.hadoop.io.compress.GzipCodec'
}
]);
{
FIELD_TERMINATOR = field_terminator
| STRING_DELIMITER = string_delimiter
| FIRST_ROW = integer -- Applies to: Azure Synapse Analytics and SQL Server 2022 and later versions
| DATE_FORMAT = datetime_format
| USE_TYPE_DEFAULT = { TRUE | FALSE }
| ENCODING = {'UTF8' | 'UTF16'}
| PARSER_VERSION = {'parser_version'}
}[/woodmart_text_block][/vc_tta_section][vc_tta_section title=”ORC” tab_id=”1730574921645-bed10e6a-04bf”][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”672680dd4d354″ woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODBkZDRkMzU0Iiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”]
--Create an external file format for ORC file.
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = ORC
[ , DATA_COMPRESSION = {
'org.apache.hadoop.io.compress.SnappyCodec'
| 'org.apache.hadoop.io.compress.DefaultCodec' }
]);
[/woodmart_text_block][/vc_tta_section][vc_tta_section title=”RC” tab_id=”1730575109876-a727f15b-d647″][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”672680afaeaff” woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODBhZmFlYWZmIiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”]–Create an external file format for RC files.
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = RCFILE,
SERDE_METHOD = {
‘org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe’
| ‘org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe’
}
[ , DATA_COMPRESSION = ‘org.apache.hadoop.io.compress.DefaultCodec’ ]);[/woodmart_text_block][/vc_tta_section][vc_tta_section title=”JSON” tab_id=”1730575645153-e769e4b8-2281″][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”6726810b0aac1″ woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODEwYjBhYWMxIiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”][/woodmart_text_block][/vc_tta_section][vc_tta_section title=”Parquet” tab_id=”1730575659929-95ffc0d4-9693″][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”6726813a6b35f” woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODEzYTZiMzVmIiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”]
-- Create an external file format for JSON files.
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = JSON
[ , DATA_COMPRESSION = {
'org.apache.hadoop.io.compress.SnappyCodec'
| 'org.apache.hadoop.io.compress.GzipCodec'
| 'org.apache.hadoop.io.compress.DefaultCodec' }
]);[/woodmart_text_block][/vc_tta_section][vc_tta_section title=”Delta table” tab_id=”1730576792556-c7d1cdc9-2030″][woodmart_text_block text_color_scheme=”dark” woodmart_css_id=”672681d63687d” woodmart_inline=”no” responsive_spacing=”eyJwYXJhbV90eXBlIjoid29vZG1hcnRfcmVzcG9uc2l2ZV9zcGFjaW5nIiwic2VsZWN0b3JfaWQiOiI2NzI2ODFkNjM2ODdkIiwic2hvcnRjb2RlIjoid29vZG1hcnRfdGV4dF9ibG9jayIsImRhdGEiOnsidGFibGV0Ijp7fSwibW9iaWxlIjp7fX19″ parallax_scroll=”no” wd_hide_on_desktop=”no” wd_hide_on_tablet=”no” wd_hide_on_mobile=”no”]
--Create an external file format for PARQUET files.
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = PARQUET
[ , DATA_COMPRESSION = {
'org.apache.hadoop.io.compress.SnappyCodec'
| 'org.apache.hadoop.io.compress.GzipCodec' }
]);-- Create an external file format for delta table files (serverless SQL pools in Synapse analytics and SQL Server 2022).[/woodmart_text_block][/vc_tta_section][/vc_tta_tabs][/vc_column][/vc_row][vc_row][vc_column][vc_column_text css=”” woodmart_inline=”no” text_larger=”no”]
CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
FORMAT_TYPE = DELTA
);
Arguments
-
- file_format_name: The name of the file format being created.
- FORMAT_TYPE: Specifies the file type (e.g., DELIMITEDTEXT, PARQUET, JSON, Delta).
- DATA_COMPRESSION: Optionally, specify the compression type if the data is compressed. the default option is uncompressed data.
- FORMAT_OPTIONS: Specific formatting options depending on the file type (e.g., delimiter, row terminator for CSV).
4. Example: Creating an External File Format for a CSV File
Let’s create an external file format for a CSV file stored in Azure Blob Storage. In this case, we’ll define the format as DELIMITEDTEXT, with FORMAT_OPTIONS to specify a comma as the delimiter and a newline as the row terminator.
Step 1: Create an External Data Source
This step defines the connection to your Azure Blob Storage or Data Lake.
[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_tabs][vc_tab title=”Tab 1″ tab_id=”31581c38-341b-6″][/vc_tab][vc_tab title=”Tab 2″ tab_id=”37b89e76-124d-4″][/vc_tab][/vc_tabs][/vc_column][/vc_row]
Good post. I learn something totally new and challenging
on sites I stumbleupon everyday. It will always be exciting to
read through articles from other writers and practice something from other websites.
Do you have a spam problem on this website; I also am
a blogger, and I was wanting to know your situation; we have developed
some nice methods and we are looking to swap techniques with other folks, be
sure to shoot me an email if interested.