Spark Session
SparkSession serves as the initial interface for PySpark applications, established in Spark 2.0 to provide a consolidated API, thereby eliminating the requirement for distinct SparkContext, SQLContext, and HiveContext.
# Databricks notebook source
# MAGIC %md
# MAGIC # Creating Session
# MAGIC
# COMMAND ----------
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("My spark app") \
.getOrCreate()
# COMMAND ----------
The SparkSession.builder object provides various functions to configure the SparkSession before creating it. Some of the important functions are:
appName(name): Sets the application name, which will be displayed in the Spark web user interface.
config(key, value): Sets a configuration property with the specified key and value. You can use this method multiple times to set multiple configuration properties.
config(conf): Sets the Spark configuration object (SparkConf) to be used for building the SparkSession.
enableHiveSupport(): Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions (UDFs).
getOrCreate(): Retrieves an existing SparkSession or, if there is none, creates a new one based on the options set via the builder.