Data Engineer Interview SQL Question-Day 2

Hey there, future data engineers! I’m Adarsh Yadav, a data engineer who’s navigated the tough world of tech interviews and learned what it takes to succeed. Welcome to Day 1 of my 60-day SQL challenge, designed to help you master SQL for data engineer interviews and land your dream job at top companies like Meta, Google, Amazon, Netflix, and more. SQL is the heart of data engineering, accounting for 40–50% of interview questions. Over the next 60 days, I’ll share one real SQL interview question daily, sourced from candidate experiences on platforms like Glassdoor and LeetCode, or drawn from my own work as a data engineer.

This challenge will cover everything you need to excel in data engineer interviews, from basic joins and aggregations to advanced topics like subqueries, window functions, and query optimization. Each post will include a problem, a detailed solution, a schema/dataset for practice, and tips to boost your SQL skills. You can practice in any database—online tools like SQL Fiddle or offline ones like SQL Server, PostgreSQL or MySQL. I’ll explain solutions clearly in the comments and answer your questions to keep you motivated. If you stick with this challenge, I guarantee you’ll be ready to crush any SQL interview. Let’s dive into our first question from []

Why SQL Matters for Data Engineer Interviews

Data engineers are the backbone of data pipelines, transforming raw data into insights that drive business decisions. SQL is your go-to tool for querying large datasets, building reports, and designing data models. In data engineer interviews, companies test your ability to write efficient, accurate queries and solve real-world problems. For example, Meta might ask you to analyze social network data, Google could focus on user behavior metrics, and Amazon often tests e-commerce analytics. SQL questions make up 40–50% of the technical interview, so mastering it is critical.

This challenge is built to prepare you for those high-pressure moments. Each question will teach you how to think like a data engineer—logically, methodically, and practically. I will share practical tips from my experience, like how to structure queries for clarity or optimize for performance. By the end, you’ll not only write flawless SQL but also explain your solutions confidently, a key skill for data engineer interviews. Let’s get started with today’s [] problem!

Note: As You are preparing SQL for data engineer this question you can feel little challenging but if your basics are clear then there will be not any problem in understanding. And if you are not aware of advance SQL data engineering concepts then don’t worry, we will explain them as we use in questions so let’s go for our 2nd question of SQL for data engineer challenge.

Problem for Data Engineer:

Company: Meta Amazon

Level: Hard (As we are data engineers, we will make it easy)

Concepts Covered: Window Function | LEAD/LAG | Format | Round function

Problem statement for data engineer:

You have a table named transactions that records purchase transactions with their dates and amounts.

Your task is to find out how the revenue changes each month compared to the previous month.

Expected Output:

Schema and Data Setup for Practice for Data Engineer

use SQL_learning
CREATE TABLE transactions(id INT, created_at datetime, value INT, purchase_id INT);
insert into transactions values
(1, '2019-01-01 00:00:00',  172692, 43), 
(2,'2019-01-05 00:00:00',  177194, 36),
(3, '2019-01-09 00:00:00',  109513, 30),
(4, '2019-01-13 00:00:00',  164911, 30),
(5, '2019-01-17 00:00:00',  198872, 39), 
(6, '2019-01-21 00:00:00',  184853, 31),
(7, '2019-01-25 00:00:00',  186817, 26),
(8, '2019-01-29 00:00:00',  137784, 22),
(9, '2019-02-02 00:00:00',  140032, 25), 
(10, '2019-02-06 00:00:00', 116948, 43), 
(11, '2019-02-10 00:00:00', 162515, 25), 
(12, '2019-02-14 00:00:00', 114256, 12), 
(13, '2019-02-18 00:00:00', 197465, 48), 
(14, '2019-02-22 00:00:00', 120741, 20), 
(15, '2019-02-26 00:00:00', 100074, 49), 
(16, '2019-03-02 00:00:00', 157548, 19), 
(17, '2019-03-06 00:00:00', 105506, 16), 
(18, '2019-03-10 00:00:00', 189351, 46), 
(19, '2019-03-14 00:00:00', 191231, 29), 
(20, '2019-03-18 00:00:00', 120575, 44), 
(21, '2019-03-22 00:00:00', 151688, 47), 
(22, '2019-03-26 00:00:00', 102327, 18), 
(23, '2019-03-30 00:00:00', 156147, 25);

A Data Engineer Approach:

Hey data engineers do not directly jump to my approach please try on your own

Ok so you need to

Calculate the total revenue for each month.

Compute the month-over-month percentage change in revenue.
Show the results with two columns: the year-month (formatted as YYYY-MM) and the percentage change.
The percentage change for the first month should be empty or NULL since there is no previous month to compare.
Make sure the percentage change is rounded to two decimal places.
Sort the results from the earliest month to the latest.

Solution:

with monthly_revenvue_table as (
select FORMAT(created_at,'yyyy-MM') as "Month",
sum(value)as monthly_revenue
from transactions 
group by FORMAT(created_at,'yyyy-MM')
),------
prcentagee_change as (
select month, monthly_revenue as 'current_month_value',
lag(monthly_revenue) over (order by Month)  as 'Previous_month_value'
from monthly_revenvue_table
)
select month,current_month_value,
case
when Previous_month_value is null then null
else
round((current_month_value-previous_month_value)*100.0/previous_month_value,2)
end as 'perctange change'
from prcentagee_change

Hey Data Engineer Family did not get the solution? I think you are not aware with advance SQL for data engineer Lets Discuss that first

Advance SQL for Data engineer:

Lead/ Lag Function:

Lead and Lag function is one of the most used window functions by data engineers and in every interview of SQL for data engineer it is asked so lets discuss Kya bala Hai ye Lead /Lag

Think of your data like a list of months with values (e.g., revenue).

LAG() lets you look backward — What was the previous month’s value?
LEAD() lets you look forward — What is the next month’s value?

They are like peeking at the previous or next row in a dataset without writing a JOIN.

OK Adrsh we understand how we will use this

How to use Lead/Lag:

The LEAD() and LAG() functions help you look ahead or behind in a dataset without using self joins.

✅ Syntax:

LEAD(column, offset, default) OVER (ORDER BY column)
LAG(column, offset, default) OVER (ORDER BY column)

LEAD() -gets next row’s value.
LAG() -gets previous row’s value.
offset -how many rows ahead/behind (default is 1).
default– value to show if no row exists.

Let’s See example:

Select * from transactions order by created_at;
---If you want access the next row you have to use lead or lag 
select id , value,created_at,LEAD(value,1,NULL)  over (order by created_at)as 'Next Value', 
Lag(Value,1,Null) over (order by created_at ) As 'Previous value'
from transactions

Explanation:

Creating The Monthly Revenue Table using group by and month (we are extracting the year and month from timestamp)
Getting previous month value as column
calculating the change

Tadda you are done. I hope you like it if yes then please subscribe.

I Hope you have solved Facebook meta interview SQL for data engineers’ day 2 question if not then pleas solve and let’s do that and complete 60-day challenge.

If you compete this SQL for data engineers’ series, then i can assure you will never fail SQL for data engineers any interview

Day 3: coming soon