tricky sql interview question(only give you 15mins to solve) - sql

The given table (order info):
client Date Product order amt
1001, 2020-01-01, Desktop1, 100
1001, 2020-01-01, Mobile2, 200
1001, 2020-01-01, Mobile2, 100
1002, 2020-01-02, Mobile1, 100
1002, 2020-01-01, Mobile1, 100
1003, 2020-01-01, Desktop1, 100
1003, 2020-01-02, Desktop2, 100
1004, 2020-01-02, Mobile, 100
The return table should give following information:
On each date, how many client buy only one type of product(either mobile_unique or desktop_unique), and the total amount of order under each type of product
AND
On each date, how many client buy both types pf product, and the total amount of order.
So the return table should like this:
Date. product type total amount number of client
2020-01-01 mobile_only 100 1
2020-01-01 desktop_only 100 1
2020-01-01 both 400 1
2020-01-02 mobile_only 200 2
2020-01-02 desktop_only 100 1
I have solved it by creating multiple tables. But he interviewer only gives 15 mins to solve it, so I'd like to see any simple way to solve it.

You can "classify" clients at first (Mob, Des, Bot) and then group:
select date_, class, sum(amt), count(client)
from (
select date_, client, sum(order_amt) amt,
case when min(substr(product, 1, 1)) <> max(substr(product, 1, 1)) then 'B'
else min(substr(product, 1, 1))
end class
from orders group by date_, client)
group by date_, class order by date_, class
dbfiddle for Oracle

This seems to be an ill designed table. What if product happens to be a vegetable? I think you should test for sanity of the data a give error in that case.
select Date_, product_type, sum(total_amount) as total_amount, count(*) as number_of_clients
from (
select
Date_, sum(order_amt) as total_amount,
case
when sum(case when SUBSTRING(Product,1,7) = 'Desktop'
or SUBSTRING(Product,1,6) = 'Mobile'
then 0 else 1 end) > 0 then 'Error'
when count(distinct SUBSTRING(Product,1,6)) = 2 then 'both'
when min(SUBSTRING(Product,1,6)) = 'Mobile' then 'mobile_only'
else 'desktop_only'
end as product_type
from orders
group by Date_, client
)x
group by Date_, product_type
order by Date_, product_type desc
output:
Date_ product_type total_amount number_of_clients
2020-01-01 mobile_only 100 1
2020-01-01 desktop_only 100 1
2020-01-01 both 400 1
2020-01-02 mobile_only 200 2
2020-01-02 desktop_only 100 1

Related

Calculating the cumulative sum with some conditions (gaps-and-islands problem)

Sorry if the title is a bit vague please suggest a title if you think it can articulate the problem. I'll start with what data I have and the end result I'm trying to get and then the TLDR:
This is the table I have:
Each row is a transaction. Outgoing amounts are negative, incomings are positive. The transactions can either be someone spending money ('spend' event) or it can be a loan disbursement into their account (amount > 0 and event = 'loan') or it can be them paying back their loan (amount < 0 and event = 'loan').
row number
id
created
amount
event
1
1
2022-01-01
-200
spend
2
1
2022-01-02
1000
loan
3
1
2022-01-03
-200
spend
4
1
2022-01-04
-500
spend
5
1
2022-01-05
-500
loan
6
1
2022-01-06
100
spend
7
1
2022-01-07
-500
spend
8
1
2022-01-08
1000
loan
9
1
2022-01-09
-100
spend
I'm trying to make:
row number
id
created
amount
event
cumulative_sum
1
1
2022-01-01
-200
spend
-200
2
1
2022-01-02
1000
loan
1000
3
1
2022-01-03
-200
spend
800
4
1
2022-01-04
-500
spend
300
5
1
2022-01-05
-500
loan
300
6
1
2022-01-06
100
spend
300
7
1
2022-01-07
-500
spend
-200
8
1
2022-01-08
1000
loan
1000
9
1
2022-01-09
-100
spend
900
Required logic:
I want to get a special cumulative sum which sums the amount only when:
(the amount is < 0 AND the event is spend) OR (when amount is > 0 AND event is loan)
.
The thing is I want the cumulative sum to start when that first positive loan amount. I don't care about anything before the positive loan amount and if they are counted it will obscure the results. The requirement is trying to select the rows which the loan enabled (if the loan is 1000 then we want to select the rows that add up to -1000 but only when event is spend and amount < 0).
my attempt
WITH tmp AS (
SELECT
1 AS id,
'2021-01-01' AS created,
-200 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-02' AS created,
1000 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-03' AS created,
-200 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-04' AS created,
-500 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-05' AS created,
-500 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-06' AS created,
100 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-07' AS created,
-500 AS amount,
'spend' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-08' AS created,
1000 AS amount,
'loan' AS scheme
UNION ALL
SELECT
1 AS id,
'2022-01-09' AS created,
-100 AS amount,
'spend' AS scheme
)
SELECT
*,
SUM(CASE WHEN (scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0) THEN amount ELSE 0 END)
OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
FROM tmp
Question
How do I make the cumulative sum reset at row 2 (not conditional to the row number - the requirement is the positive loan amount)?
That's a gaps-and-islands problem if I am understanding this correctly.
Islands start with a positive loan ; within each island, you want to compute a running sum in a subset of rows.
We can identify the islands in a subquery with a window count of positive loans, then do the maths in each group with a conditional expression:
select id, created, amount, event,
sum(case when (event = 'loan' and amount > 0) or (event = 'spend' and amount < 0) then amount end)
over(partition by id, grp order by created) as cumulative_sum
from (
select t.*,
sum(case when event = 'loan' and amount > 0 then 1 else 0 end)
over(partition by id order by created) grp
from tmp t
) t
order by id, created
One option would be something like this:
SELECT
*,
SUM(CASE WHEN cnt >= 1 AND ((scheme != 'loan' AND amount<0) OR (scheme = 'loan' AND amount > 0)) THEN amount ELSE 0 END)
OVER (PARTITION BY id ORDER BY created ASC) AS cumulative_sum_spend
FROM (
SELECT *, SUM(CASE WHEN amount > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY id ORDER BY created) cnt
FROM tmp
) a
The idea here is that the inner query's window function counts the number of previous positive values. Then the outer query can do an extra check cnt >= 1 as part of its window function, so it will only consider values after the first positive one.

Get orders for each customer after a specific date for each customer

Forgive me if I word this poorly.
And sorry if it has already been asked, but I was not able to find an answer here.
I'm using Snowflake to try and do the below.
Basically, I'm trying to do a piece of work to find out how many times a customer as placed an order after a specific date for each customer.
Scenario:
We want to see if customers continue to shop with us after they have been short-shipped (received 1 or more items less than they ordered).
So for example:
customer 1 places an order on 01/01/2020 and this was a short-shipment.
they then go on to place an order 06/06/2020 and 02/02/2021.
so this customer has a total of 2 additional orders since they were short-shipped on 01/01/2020.\
customer 2 places an order on 02/03/2020 and this was short-shipped.
customer 2 has not since placed an order, so they will have 0 additional orders.
Data available:
cust_id
ord_id
order_date
1
0123
01/01/2020
1
0456
06/06/2020
1
0789
02/02/2021
2
1011
01/01/2020
Desired output:
cust_id
number_of_orders
1
2
2
0
So using a boosted version of your data:
with data_cte( cust_id, ord_id, order_date, short_order_flg) as (
select * from values
(1, '1', '2018-06-06'::date, false),
(1, '2', '2019-01-01'::date, true),
(1, '3', '2019-06-06'::date, false),
(1, '4', '2019-12-02'::date, false),
(1, '5', '2020-01-01'::date, true),
(1, '6', '2020-06-06'::date, false),
(1, '7', '2021-02-02'::date, false),
(2, '8', '2020-01-01'::date, true)
)
which shows a "valid" purchase, multiple "short ships" and how to batch them
SELECT
cust_id,
min(order_date) as short_date,
count(*) -1 as follow_count
FROM (
select
cust_id
,order_date
,CONDITIONAL_TRUE_EVENT(short_order_flg) over(partition by cust_id order by order_date ) as edge
from data_cte
)
where edge > 0
group by 1, edge
order by 1,2;
gives:
CUST_ID
SHORT_DATE
FOLLOW_COUNT
1
2019-01-01
2
1
2020-01-01
2
2
2020-01-01
0
The key things to note, CONDITIONAL_TRUE_EVENT increases each time the event happen, which gives cust_id,edge value as batch key, and if the event has not happened those lines are zero, thus the WHERE filter.
The last things is given we have atleast one count for the start of "post short" batch, we need to subtract one from the count.
Try this
with CTE as (
select 1 as cust_id, '0123' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg union all
select 1 as cust_id, '0456' as ord_id, '2020-06-06'::date as order_date, 0 as short_order_flg union all
select 1 as cust_id, '0789' as ord_id, '2021-02-02'::date as order_date, 0 as short_order_flg union all
select 2 as cust_id, '1011' as ord_id, '2020-01-01'::date as order_date, 1 as short_order_flg
),
following_orders as (
select cust_id, short_order_flg, count(ord_id) over (partition by cust_id order by order_date rows between current row and unbounded following) - 1 as number_of_orders
from cte
order by cust_id, order_date
)
select cust_id, number_of_orders
from following_orders
where short_order_flg = 1
;
I added column short_order_flg to indicate which record represents the short order. Then I used window function count(ord_id) over(...) to calculate the number of orders following each order, subtracting 1 to exclude the current record itself. Finally, I applied a filter to select only the short order records.

Get value based on date

I am trying to get the price point of a product based on future pricing. I can't really do it by Max (Expiration Date) since the price points are different. And I also can't do a Max(Price) since the high price might be the one expiring or it could be the new one. My data would look like this:
Supplier
Product
Price
Effective Date
Expiration Date
Supplier 1
A
800
04-01-2121
12-31-2023
Supplier 1
A
1000
01-01-2121
03-31-2023
Supplier 1
B
500
04-01-2121
12-31-2023
Supplier 1
B
400
01-01-2121
03-31-2023
Supplier 2
D
200
01-01-2121
12-31-2023
Supplier 2
C
600
01-01-2121
12-31-2023
The result I am trying to get is below:
Supplier
Product
Price
Effective Date
Expiration Date
Supplier 1
A
800
04-01-2121
12-31-2023
Supplier 1
B
500
04-01-2121
12-31-2023
Supplier 2
D
200
01-01-2121
12-31-2023
Supplier 2
C
600
01-01-2121
12-31-2023
Any ideas?
To have the product information when expirationdate is the greatest we can product wise group the rows in descending order of expirationdate and find the row with lowest serial number for each product or we can simply use subquery to select the information where expirationdate=max(expirationdate) within the product group.
First approach will be more efficient. But if your dbms doesn't support row_number() then you can use second approach.
Schema:
create table mydata(Supplier varchar(30),Product varchar(30), Price int,EffectiveDate date, ExpirationDate date);
insert into mydata values('Supplier 1', 'A', 800 ,'04-01-2121', '12-31-2023');
insert into mydata values('Supplier 1', 'A', 1000 ,'01-01-2121', '03-31-2023');
insert into mydata values('Supplier 1', 'B', 500 ,'04-01-2121', '12-31-2023');
insert into mydata values('Supplier 1', 'B', 400 ,'01-01-2121', '03-31-2023');
insert into mydata values('Supplier 2', 'D', 200 ,'01-01-2121', '12-31-2023');
insert into mydata values('Supplier 2', 'C', 600 ,'01-01-2121', '12-31-2023');
Query#1
WITH cte
AS (SELECT supplier,
product,
price,
effectivedate,
expirationdate,
ROW_NUMBER ()
OVER (PARTITION BY product
ORDER BY expirationdate DESC)
rn
FROM mydata)
SELECT supplier,
product,
price,
effectivedate,
expirationdate
FROM cte
WHERE rn = 1
Output:
supplier
product
price
effectivedate
expirationdate
Supplier 1
A
800
2121-04-01
2023-12-31
Supplier 1
B
500
2121-04-01
2023-12-31
Supplier 2
C
600
2121-01-01
2023-12-31
Supplier 2
D
200
2121-01-01
2023-12-31
Query#2 for older version of DBMS:
SELECT supplier,
product,
price,
effectivedate,
expirationdate
FROM mydata m
where effectivedate =
(select max(effectivedate) from mydata md where m.product=md.product)
Output:
supplier
product
price
effectivedate
expirationdate
Supplier 2
D
200
2121-01-01
2023-12-31
Supplier 2
C
600
2121-01-01
2023-12-31
Supplier 1
B
500
2121-04-01
2023-12-31
Supplier 1
A
800
2121-04-01
2023-12-31
db<>fiddle here
It looks you can accomplish what you need with a simple row_number by numbering the rows decending ordered against your ExpirationDate and filtering for the first row in each group.
select Supplier, Product, Price, EffectiveDate, ExpirationDate from (
select Supplier, Product, Price, EffectiveDate, ExpirationDate, row_number() over (partition by Product order by ExpirationDate desc)Seq
from table
)t
where Seq=1

Closing balance of the previous day as an Opening balance of today

I am developing a database application for a small electronics business. I need a SQL query which takes the closing balance of previous day as an opening balance of current day. I have following data tables
Expensis
ExpenseID Date Expense
1 2019-03-01 2,000
2 2019-03-02 1,000
3 2019-03-03 500
Income
IncomeID Date Income
1 2019-03-01 10,000
2 2019-03-02 13,000
3 2019-03-03 10,000
Required result
Date Opening Balance Income Expense Closing Balance
2019-03-01 0 10,000 2,000 8,000
2019-03-02 8,000 13,000 1,000 20,000
2019-03-03 20,000 10,000 5,00 29,500
You can use sum aggregation function recursively ( lag window analytic function cannot be used for sql server 2008 )
with Expensis( ExpenseID, Date, Expense ) as
(
select 1, '2019-03-01', 2000 union all
select 2, '2019-03-02', 1000 union all
select 3, '2019-03-03', 500
), Income( IncomeID, Date, Income ) as
(
select 1, '2019-03-01', 10000 union all
select 2, '2019-03-02', 13000 union all
select 3, '2019-03-03', 10000
), t as
(
select i.date,
i.income,
e.expense,
sum(i.income-e.expense) over (order by i.date) as closing_balance
from income i
join expensis e on e.date = i.date
)
select date,
( closing_balance - income + expense ) as opening_balance,
income, expense, closing_balance
from t;
date opening balance income expense closing balance
---------- --------------- ------ ------- ---------------
2019-03-01 0 10000 2000 8000
2019-03-02 8000 13000 1000 20000
2019-03-03 20000 10000 500 29500
Demo
Here is one way you could do it
You have to valuate income and expenses differently
WITH INCOME AS
(
SELECT '2018-01-05' AS DT, 200 AS INC, 1 AS TP
UNION ALL
SELECT '2018-01-06' AS DT, 300 AS INC, 1 AS TP
UNION ALL
SELECT '2018-01-07' AS DT, 400 AS INC, 1 AS TP
)
, EXPENSES AS
(
SELECT '2018-01-05' AS DT, -100 AS EXPS, 2 AS TP
UNION ALL
SELECT '2018-01-06' AS DT, -500 AS EXPS, 2 AS TP
UNION ALL
SELECT '2018-01-07' AS DT, -30 AS EXPS, 2 AS TP
)
, UN AS
(
SELECT * FROM INCOME
UNION ALL
SELECT * FROM EXPENSES
)
SELECT *, [1]+[2] AS END_BALANCE FROM UN
PIVOT
(
SUM(INC)
FOR TP IN ([1],[2])
) AS P

Incremental count

I have a table with a list of Customer Numbers and Order Dates and want to add a count against each Customer number, restarting from 1 each time the customer number changes, I've sorted the Table into Customer then Date order, and need to add an order count column.
CASE WHEN 'Customer Number' on This row = 'Customer Number' on Previous Row then ( Count = Count on Previous Row + 1 )
Else Count = 1
What is the best way to approach this?
Customer and Dates in Customer then Date order:
Customer Date Count
0001 01/05/18 1
0001 02/05/18 2
0001 03/05/18 3
0002 03/05/18 1 <- back to one here as Customer changed
0002 04/05/18 2
0003 05/05/18 1 <- back to one again
I've just tried COUNT(*) OVER (PARTITION BY Customer ) as COUNT but it doesn't seem to be starting from 1 for some reason when the Customer changes
It's hard to tell what you want, but "to add a count against each Customer number, restarting from 1 each time the customer number changes" sounds as if you simply want:
count(*) over (partition by customer_number)
or maybe that should be the count "up-to" the date of the row:
count(*) over (partition by customer_number order by order_date)
It sound like you just want an analytic row_number() call:
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num
from your_table
order by customer_number,
order_date
Using an analytic count also works, as #horse_with_no_name suggested:
count(*) over (partition by customer_number order by order_date) as num
Quick demo showing both, with your sample data in a CTE:
with your_table (customer_number, order_date) as (
select '0001', date '2018-05-01' from dual
union all select '0001', date '2018-05-03' from dual
union all select '0001', date '2018-05-02' from dual
union all select '0002', date '2018-05-03' from dual
union all select '0002', date '2018-05-04' from dual
union all select '0003', date '2018-05-05' from dual
)
select customer_number,
order_date,
row_number() over (partition by customer_number order by order_date) as num1,
count(*) over (partition by customer_number order by order_date) as num2
from your_table
order by customer_number,
order_date
/
CUST ORDER_DATE NUM1 NUM2
---- ---------- ---------- ----------
0001 2018-05-01 1 1
0001 2018-05-02 2 2
0001 2018-05-03 3 3
0002 2018-05-03 1 1
0002 2018-05-04 2 2
0003 2018-05-05 1 1