Mastering SQL & PySpark Interview Questions: Find the N-th Saturday on or After a Given Date
Introduction
In the world of data engineering, date calculations are common challenges that arise during data transformations and report generation. Whether it’s scheduling events, analyzing sales trends by weeks, or generating reminders, being able to compute specific future weekdays relative to a given date is a powerful skill. This problem is not just academic; it has practical implications in financial analytics, time-based segmentation, and ETL pipelines.
Problem Statement
Find the date of the N-th Saturday on or after the given date.
—
Dataset (SQL – Notebook Friendly)
-- Create table
CREATE TABLE sample_dates (
today_date DATE,
n INT
);
-- Insert sample data
INSERT INTO sample_dates VALUES
('2023-07-10', 1),
('2023-12-01', 4),
('2024-01-25', 2),
('2023-11-30', 3),
('2023-08-05', 5);
Sample Data (Tabular View)
| today_date | n |
|---|---|
| 2023-07-10 | 1 |
| 2023-12-01 | 4 |
| 2024-01-25 | 2 |
| 2023-11-30 | 3 |
| 2023-08-05 | 5 |
Expected Output
| today_date | n | nth_saturday_date |
|---|---|---|
| 2023-07-10 | 1 | 2023-07-15 |
| 2023-12-01 | 4 | 2023-12-23 |
| 2024-01-25 | 2 | 2024-02-10 |
| 2023-11-30 | 3 | 2023-12-16 |
| 2023-08-05 | 5 | 2023-09-02 |
SQL Solution (Notebook Ready)
-- SQL solution to find the N-th Saturday on or after the given date
WITH base AS (
SELECT today_date, n,
-- Calculate weekday number for today_date (1=Sunday, 7=Saturday in some DBs, check for your SQL flavor)
-- Using DAYOFWEEK() here assumes Sunday=1 ... Saturday=7 (MySQL/PostgreSQL variant)
-- Adjust if your DB uses different numbering
DAYOFWEEK(today_date) AS wd
FROM sample_dates
),
offset_calc AS (
SELECT
today_date,
n,
wd,
-- Days to next Saturday (7) including today if it's Saturday
CASE WHEN wd = 7 THEN 0 ELSE (7 - wd) END AS days_to_first_saturday
FROM base
)
SELECT
today_date,
n,
-- Add days to first Saturday plus (n-1)*7 to get nth Saturday
DATE_ADD(today_date, days_to_first_saturday + (n - 1) * 7) AS nth_saturday_date
FROM offset_calc;
PySpark Solution (Databricks Ready)
# PySpark solution to find the N-th Saturday on or after the given date
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, dayofweek, expr
spark = SparkSession.builder.appName("NthSaturday").getOrCreate()
# Sample data
sample_data = [
("2023-07-10", 1),
("2023-12-01", 4),
("2024-01-25", 2),
("2023-11-30", 3),
("2023-08-05", 5)
]
schema = ["today_date", "n"]
df = spark.createDataFrame(sample_data, schema)
# Convert today_date to date type
from pyspark.sql.functions import to_date
df = df.withColumn("today_date", to_date(col("today_date"), "yyyy-MM-dd"))
# Calculate day of week (1=Sunday, 7=Saturday in Spark)
df = df.withColumn("wd", dayofweek(col("today_date")))
# Calculate days to next Saturday, 0 if today is Saturday
# Saturday = 7, so days_to_first_saturday = (7 - wd) if wd != 7, else 0
df = df.withColumn("days_to_first_saturday", expr("CASE WHEN wd = 7 THEN 0 ELSE 7 - wd END"))
# Calculate nth Saturday date
# date_add(today_date, days_to_first_saturday + (n-1) * 7)
from pyspark.sql.functions import date_add
result_df = df.withColumn(
"nth_saturday_date",
date_add(
col("today_date"),
col("days_to_first_saturday") + (col("n") - 1) * 7
)
).select("today_date", "n", "nth_saturday_date")
result_df.show()
Explanation
The core idea is to first find the nearest Saturday on or after the given today_date. To do this, we determine the day of the week for the given date.
In most SQL dialects and PySpark, DAYOFWEEK or dayofweek() returns integers where Sunday is 1 and Saturday is 7. So:
- If the day is Saturday (7), the offset to the first Saturday is zero.
- Otherwise, we subtract the current weekday number from 7 to get the days until the next Saturday.
Once the first Saturday is located, the N-th Saturday is simply calculated by adding (N-1)*7 days to the first Saturday.
This approach handles any date and N value robustly without loops or complex joins, making it efficient and scalable.
Run in Notebook
Download full notebook here: nth_saturday_notebook.ipynb
Conclusion
Mastering date calculations like finding the N-th Saturday after a specific date is a practical skill for data engineers. These problems help you understand date functions, conditional logic, and date arithmetic crucial for ETL and analytics tasks.
In interviews, clarify assumptions (e.g., when week starts and ends), explain your approach, and write clean, efficient code. Both SQL and PySpark are powerful tools—knowing how to implement the same logic in both is a strong advantage.