Mastering SQL & PySpark Interview Questions: How to Find Users’ Most Visited Floor, Total Visits & Unique Resources

Introduction

In data engineering, understanding user interaction patterns across different locations or resources is crucial for business insights, optimizing experiences, and resource allocation. This problem demonstrates how to aggregate user visit data, identify the most frequented floor per user, calculate total visits, and list unique resources used, using powerful SQL and PySpark techniques.

Problem Statement

For each user in the dataset, determine the following:

The most visited floor number.
The total number of visits made.
The list of unique resources used by the user.

This helps to summarize user activity and resources interaction by aggregating across various visits.

—

Dataset (SQL – Notebook Friendly)

-- Create table
CREATE TABLE user_visits (
    name STRING,
    location STRING,
    email STRING,
    floor INT,
    resource STRING
);

-- Insert sample data
INSERT INTO user_visits VALUES
('John','New York','john@example.com',3,'Laptop'),
('John','New York','john.doe@example.com',2,'Monitor'),
('John','New York','johnd@example.com',3,'Laptop'),
('Emily','San Francisco','emily.sf@example.com',1,'Desktop'),
('Emily','San Francisco','emily.e@example.com',1,'Laptop'),
('Emily','San Francisco','emily.sf2@example.com',2,'Tablet'),
('Michael','Chicago','michael.c@example.com',4,'Projector'),
('Michael','Chicago','mike@example.com',4,'Projector'),
('Michael','Chicago','michael.c@example.com',3,'Desktop');

Sample Data (Tabular View)

name	location	email	floor	resource
John	New York	john@example.com	3	Laptop
John	New York	john.doe@example.com	2	Monitor
John	New York	johnd@example.com	3	Laptop
Emily	San Francisco	emily.sf@example.com	1	Desktop
Emily	San Francisco	emily.e@example.com	1	Laptop
Emily	San Francisco	emily.sf2@example.com	2	Tablet
Michael	Chicago	michael.c@example.com	4	Projector
Michael	Chicago	mike@example.com	4	Projector
Michael	Chicago	michael.c@example.com	3	Desktop

Expected Output

name	most_visited_floor	total_visits	used_resources
John	3	3	“Laptop,Monitor”
Emily	1	3	“Desktop,Laptop,Tablet”
Michael	4	3	“Desktop,Projector”

SQL Solution (Notebook Ready)

WITH floor_counts AS (
    SELECT
        name,
        floor,
        COUNT(*) AS visit_count
    FROM user_visits
    GROUP BY name, floor
),
ranked_floors AS (
    SELECT
        name,
        floor,
        visit_count,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY visit_count DESC, floor ASC) AS rn
    FROM floor_counts
),
total_visits AS (
    SELECT
        name,
        COUNT(*) AS total_visits
    FROM user_visits
    GROUP BY name
),
resources_list AS (
    SELECT
        name,
        STRING_AGG(DISTINCT resource, ',') AS used_resources
    FROM user_visits
    GROUP BY name
)
SELECT
    t.name,
    rf.floor AS most_visited_floor,
    t.total_visits,
    r.used_resources
FROM total_visits t
JOIN ranked_floors rf ON t.name = rf.name AND rf.rn = 1
JOIN resources_list r ON t.name = r.name
ORDER BY t.name;

PySpark Solution (Databricks Ready)

# PySpark DataFrame solution
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, row_number, collect_set, concat_ws
from pyspark.sql.window import Window

spark = SparkSession.builder.appName("UserVisitsAnalysis").getOrCreate()

# Sample data
data = [
    ('John','New York','john@example.com',3,'Laptop'),
    ('John','New York','john.doe@example.com',2,'Monitor'),
    ('John','New York','johnd@example.com',3,'Laptop'),
    ('Emily','San Francisco','emily.sf@example.com',1,'Desktop'),
    ('Emily','San Francisco','emily.e@example.com',1,'Laptop'),
    ('Emily','San Francisco','emily.sf2@example.com',2,'Tablet'),
    ('Michael','Chicago','michael.c@example.com',4,'Projector'),
    ('Michael','Chicago','mike@example.com',4,'Projector'),
    ('Michael','Chicago','michael.c@example.com',3,'Desktop')
]

columns = ['name', 'location', 'email', 'floor', 'resource']
df = spark.createDataFrame(data, schema=columns)

# Calculate visits per floor per user
floor_visits = df.groupBy('name', 'floor').agg(count('*').alias('visit_count'))

# Identify most visited floor using window function
window_spec = Window.partitionBy('name').orderBy(col('visit_count').desc(), col('floor').asc())
floor_ranked = floor_visits.withColumn('rank', row_number().over(window_spec))
most_visited_floor = floor_ranked.filter(col('rank') == 1).select('name', col('floor').alias('most_visited_floor'))

# Total visits per user
total_visits = df.groupBy('name').agg(count('*').alias('total_visits'))

# Unique resources per user
resources = df.groupBy('name').agg(concat_ws(",", collect_set('resource')).alias('used_resources'))

# Join all results
result = total_visits.join(most_visited_floor, 'name').join(resources, 'name').orderBy('name')

result.show(truncate=False)

Explanation

– First, we count the number of visits each user made to each floor. This helps us identify the floor with the highest visits per user using a window function ordered by visit count descending and floor ascending (to break ties).

– Next, we calculate total visits for each user by counting their visits across all floors.

– To get a unique list of resources used by each user, we collect distinct resource values and concatenate them into a comma-separated string.

– Finally, we join these intermediate results on the user’s name to produce the final output showing the most visited floor, total visits, and used resources.

Run in Notebook

Download full notebook here:
https://example.com/sql-pyspark-user-visits-notebook

Conclusion

This problem demonstrates critical SQL and PySpark concepts like window functions, aggregation, string aggregation, and joins in a straightforward use case similar to real-world user activity summaries. It’s a great interview question to understand your ability to manipulate grouped data and derive meaningful insights efficiently. Make sure you practice both SQL and PySpark approaches to stand out in data engineering interviews!

5 thoughts on “Mastering SQL & PySpark Interview Questions: How to Find Users’ Most Visited Floor, Total Visits & Unique Resources”

imkerei says:

May 16, 2026 at 3:46 pm

Sweet blog! I found it while surfing around on Yahoo News.
Do you have any tips on how to get listed in Yahoo
News? I’ve been trying for a while but I never seem to get
there! Thank you

imker portal says:

May 18, 2026 at 8:02 am

Greetings, I do think your website may be having internet browser compatibility
problems. When I look at your web site in Safari, it
looks fine however, if opening in I.E., it has some overlapping issues.
I simply wanted to give you a quick heads up!

Apart from that, great site!

dessousmaus.de says:

May 31, 2026 at 3:41 pm

My spouse and I absolutely love your blog and find the majority of your post’s to be exactly what I’m
looking for. Do you offer guest writers to write content in your case?
I wouldn’t mind producing a post or elaborating on a number of the subjects you write related to here.

Again, awesome web log!

xmxxtube bokep says:

June 5, 2026 at 12:15 am

Great article, totawlly what I wasnted too find.

Feel frde too visit myy wbpage :: xmxxtube bokep

cnhub.xyz says:

June 5, 2026 at 2:20 am

Petty niice post. I just stumbped upoin your weblpog andd
wished tto ssay that I havge reallly enjoyed surfiing aroumd yoyr blig posts.
After alll I’ll be subscribiing tto your feeed aand I hhope you
rite agaun soon!

my wweb page cnhub.xyz