Uncategorized

Introduction to PySpark an easy way

Spark Session

SparkSession serves as the initial interface for PySpark applications, established in Spark 2.0 to provide a consolidated API, thereby eliminating the requirement for distinct SparkContext, SQLContext, and HiveContext.

# Databricks notebook source
# MAGIC %md
# MAGIC # Creating Session 
# MAGIC 

# COMMAND ----------

from pyspark.sql import SparkSession
spark = SparkSession.builder \
        .appName("My spark app") \
          .getOrCreate()


# COMMAND ----------

The SparkSession.builder object provides various functions to configure the SparkSession before creating it. Some of the important functions are:

appName(name): Sets the application name, which will be displayed in the Spark web user interface.

config(key, value): Sets a configuration property with the specified key and value. You can use this method multiple times to set multiple configuration properties.

config(conf): Sets the Spark configuration object (SparkConf) to be used for building the SparkSession.

enableHiveSupport(): Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions (UDFs).

getOrCreate(): Retrieves an existing SparkSession or, if there is none, creates a new one based on the options set via the builder.

Leave a Reply

Your email address will not be published. Required fields are marked *