Uncategorized

Mastering SQL & PySpark Interview Questions: Find the N-th Saturday on or After a Given Date

Mastering SQL & PySpark Interview Questions: Find the N-th Saturday on or After a Given Date

Introduction

In the world of data engineering, date calculations are common challenges that arise during data transformations and report generation. Whether it’s scheduling events, analyzing sales trends by weeks, or generating reminders, being able to compute specific future weekdays relative to a given date is a powerful skill. This problem is not just academic; it has practical implications in financial analytics, time-based segmentation, and ETL pipelines.

Problem Statement

Find the date of the N-th Saturday on or after the given date.

Dataset (SQL – Notebook Friendly)

-- Create table
CREATE TABLE sample_dates (
    today_date DATE,
    n INT
);

-- Insert sample data
INSERT INTO sample_dates VALUES
('2023-07-10', 1),
('2023-12-01', 4),
('2024-01-25', 2),
('2023-11-30', 3),
('2023-08-05', 5);

Sample Data (Tabular View)

today_date n
2023-07-10 1
2023-12-01 4
2024-01-25 2
2023-11-30 3
2023-08-05 5

Expected Output

today_date n nth_saturday_date
2023-07-10 1 2023-07-15
2023-12-01 4 2023-12-23
2024-01-25 2 2024-02-10
2023-11-30 3 2023-12-16
2023-08-05 5 2023-09-02

SQL Solution (Notebook Ready)

-- SQL solution to find the N-th Saturday on or after the given date
WITH base AS (
    SELECT today_date, n,
    -- Calculate weekday number for today_date (1=Sunday, 7=Saturday in some DBs, check for your SQL flavor)
    -- Using DAYOFWEEK() here assumes Sunday=1 ... Saturday=7 (MySQL/PostgreSQL variant)
    -- Adjust if your DB uses different numbering
    DAYOFWEEK(today_date) AS wd
    FROM sample_dates
),

offset_calc AS (
    SELECT
        today_date,
        n,
        wd,
        -- Days to next Saturday (7) including today if it's Saturday
        CASE WHEN wd = 7 THEN 0 ELSE (7 - wd) END AS days_to_first_saturday
    FROM base
)

SELECT
    today_date,
    n,
    -- Add days to first Saturday plus (n-1)*7 to get nth Saturday
    DATE_ADD(today_date, days_to_first_saturday + (n - 1) * 7) AS nth_saturday_date
FROM offset_calc;

PySpark Solution (Databricks Ready)

# PySpark solution to find the N-th Saturday on or after the given date
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, dayofweek, expr

spark = SparkSession.builder.appName("NthSaturday").getOrCreate()

# Sample data
sample_data = [
    ("2023-07-10", 1),
    ("2023-12-01", 4),
    ("2024-01-25", 2),
    ("2023-11-30", 3),
    ("2023-08-05", 5)
]

schema = ["today_date", "n"]
df = spark.createDataFrame(sample_data, schema)

# Convert today_date to date type
from pyspark.sql.functions import to_date

df = df.withColumn("today_date", to_date(col("today_date"), "yyyy-MM-dd"))

# Calculate day of week (1=Sunday, 7=Saturday in Spark)
df = df.withColumn("wd", dayofweek(col("today_date")))

# Calculate days to next Saturday, 0 if today is Saturday
# Saturday = 7, so days_to_first_saturday = (7 - wd) if wd != 7, else 0

df = df.withColumn("days_to_first_saturday", expr("CASE WHEN wd = 7 THEN 0 ELSE 7 - wd END"))

# Calculate nth Saturday date
# date_add(today_date, days_to_first_saturday + (n-1) * 7)

from pyspark.sql.functions import date_add

result_df = df.withColumn(
    "nth_saturday_date",
    date_add(
        col("today_date"),
        col("days_to_first_saturday") + (col("n") - 1) * 7
    )
).select("today_date", "n", "nth_saturday_date")

result_df.show()

Explanation

The core idea is to first find the nearest Saturday on or after the given today_date. To do this, we determine the day of the week for the given date.

In most SQL dialects and PySpark, DAYOFWEEK or dayofweek() returns integers where Sunday is 1 and Saturday is 7. So:

  • If the day is Saturday (7), the offset to the first Saturday is zero.
  • Otherwise, we subtract the current weekday number from 7 to get the days until the next Saturday.

Once the first Saturday is located, the N-th Saturday is simply calculated by adding (N-1)*7 days to the first Saturday.

This approach handles any date and N value robustly without loops or complex joins, making it efficient and scalable.

Run in Notebook

Download full notebook here: nth_saturday_notebook.ipynb

Conclusion

Mastering date calculations like finding the N-th Saturday after a specific date is a practical skill for data engineers. These problems help you understand date functions, conditional logic, and date arithmetic crucial for ETL and analytics tasks.

In interviews, clarify assumptions (e.g., when week starts and ends), explain your approach, and write clean, efficient code. Both SQL and PySpark are powerful tools—knowing how to implement the same logic in both is a strong advantage.

Leave a Reply

Your email address will not be published. Required fields are marked *