SQL interview question: select, join, grouping

SQL interview question: select, join, grouping - sql

At the interview got question:
What products clients bought before first order of brand "Brand 1". Select top 5 by orders.
Tables:
Items:
RezonItemID;BrandName
5555613;Brand 1
2315946;Brand 2
9132648;Brand 3
3125847;Brand 1
3126548;Brand 5
Orders:
ClientID;ClientOrderID;RezonItemID;FactMoment
00611847;4562145;5555613;2021-01-09
00798451;7987465;1321321;2021-08-10
00914751;3154844;9132648;2021-07-01
00975418;9797451;1312125;2021-09-09
00978461;9413235;9754512;2021-10-29
My decision:
WITH first_order AS (
SELECT ClientID, MIN(FactMoment) as o_date
FROM orders
JOIN items
USING(RezonItemID)
WHERE BrandName = 'Brand 1'
GROUP BY ClientID
)
SELECT RezonItemID, COUNT(*) AS n_orders
FROM orders
JOIN items
USING(RezonItemID)
JOIN first_order
USING(ClientID)
WHERE FactMoment < o_date
GROUP BY RezonItemID
ORDER BY n_orders DESC
LIMIT 5
Is it possible to solve this by window functions? Maybe there is better decision?

Given the following tables
1 RezonItemID;BrandName
2 5555613;Brand 1
3 2315946;Brand 2
4 9132648;Brand 3
5 3125847;Brand 1
6 3126548;Brand 5
7
8 ClientID;ClientOrderID;RezonItemID;FactMoment
9 00611847;4562145;5555613;2021-01-09
10 00798451;7987465;1321321;2021-08-10
11 00914751;3154844;9132648;2021-07-01
12 00975418;9797451;1312125;2021-09-09
13 00978461;9413235;9754512;2021-10-29
It seems that if the question is "Which products did the clients buy before the first order for an item of Brand 1," a sql query may not be necessary. Assuming that FactMoment is a timestamp for the order, we can see that the first order has the earliest date (01/09/21) and has a "RezonItemID" 5555613. That item has the brand "Brand 1".
So the answer would be that no items were purchased before the first purchase of an item with BrandName = 'Brand 1'.

This is a puzzling question, the sample test data is not particularly useful since it yields no testable results so is pretty much useless.
If you want to use window functions then that's certainly possible.
The following successfully yields no rows and should work, but without proper test data it's hard to actually be sure!
Note using() is not supported by some databases, ansi join syntax is preferred.
select top(5) BrandName from (
select o.ClientID, i.BrandName, o.FactMoment,
Min(case when i.BrandName='Brand 1' then FactMoment end) over() earliest,
Count(*) over(partition by ClientID) qty
from Orders o left join Items i on i.RezonItemID=o.RezonItemID
)o
where FactMoment<earliest
order by qty desc

Related

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2

Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is

Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;

GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500

Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END

Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.

You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable

You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

SQL counting query

Sorry if this is a basic question.
Basically, I have a table that is as follows, below is a basic sample
store-ProdCode-result
13p I10x 5
13p I20x 7
13p I30x 8
14a K38z 23
17a K38z 23
my data set has nearly 100,000 records.
What I'm trying to do is, for every store find the top 10 prodCode.
I am unsure of how to do this but what I tried was:
select s_code as store, prod_code,count (prod_code)
from top10_secondary
where prod_code is not null
group by store,prod_code
order by count(prod_code) desc limit 10
this is giving me something completely different and i'm unsure on how I go about achieving my final result.
All help is appreciated.
Thanks
The expected output should be: for every store(s_code) display the top 10 prodcode
so:
store--prodcode--result
1a abc 5
1a abd 4
2a dgf 1
2a ldk 6
.(10 times until next store code)

You can use the table twice in the FROM clause, once for the data, and once to get a count of how many records have fewer results for that store.
SELECT a.s_code, a.prod_code, count(*)
FROM top10_secondary a
LEFT OUTER JOIN top10_secondary b
ON a.s_code = b.s_code
AND b.result < a.result
GROUP BY a.s_code, a.prod_code
HAVING count(*) < 10
With this technique though, you may get more than 10 records per store if the 10th result value exists multiple times. Because the limit rule is simply "include record as long as there are less than 10 records with result values than mine"
It looks like in your case, "result" is a ranking, so they would not be duplicated per store.

This is a good case for Window functions.
SELECT
s_code,
prod_code,
prod_count
FROM
(
SELECT
s_code,
prod_code,
prod_count,
RANK() OVER (PARTITION BY s_code ORDER BY prod_Count DESC) as prod_rank
FROM
(SELECT s_code as store, prod_code, count(prod_Code) prod_count FROM table GROUP BY s_code, prod_code) t1
) t2
WHERE prod_rank <= 10
The inner most query gets the count of each product at the store. The second inner more query determines the rank for those products for each store based on that count. Then the outer most query limits the results based on that rank.
o

How can I SELECT the max row in a table SQL?

I have a little problem.
My table is:
Bill Product ID Units Sold
----|-----------|------------
1 | 10 | 25
1 | 20 | 30
2 | 30 | 11
3 | 40 | 40
3 | 20 | 20
I want to SELECT the product which has sold the most units; in this sample case, it should be the product with ID 20, showing 50 units.
I have tried this:
SELECT
SUM(pv."Units sold")
FROM
"Products" pv
GROUP BY
pv.Product ID;
But this shows all the products, how can I select only the product with the most units sold?

Leaving aside for the moment the possibility of having multiple products with the same number of units sold, you can always sort your results by the sum, highest first, and take the first row:
SELECT pv."Product ID", SUM(pv."Units sold")
FROM "Products" pv
GROUP BY pv."Product ID"
ORDER BY SUM(pv."Units sold") DESC
LIMIT 1
I'm not quite sure whether the double-quote syntax for column and table names will work - exact syntax will depend on your specific RDBMS.
Now, if you do want to get multiple rows when more than one product has the same sum, then the SQL will become a bit more complicated:
SELECT pv.`Product ID`, SUM(pv.`Units sold`)
FROM `Products` pv
GROUP BY pv.`Product ID`
HAVING SUM(pv.`Units sold`) = (
select max(sums)
from (
SELECT SUM(pv2.`Units sold`) as "sums"
FROM `Products` pv2
GROUP BY pv2.`Product ID`
) as subq
)
Here's the sqlfiddle

SELECT SUM(pv."Units sold") as `sum`
FROM "Products" pv
group by pv.Product ID
ORDER BY sum DESC
LIMIT 1
limit 1 + order by

The Best and effective way to this is Max function
Here's The General Syntax of Max function
SELECT MAX(ID) AS id
FROM Products;
and in your Case
SELECT MAX(Units Sold) from products
Here is the Complete Reference to MIN and MAX functions in Query
Click Here

Oracle SQL find popular query

I have got a table called Questionnaire in SQL where there are columns named ID, Newspaper and CreditCards. I need to output the newspaper that is most popular among IDs who has at least 3 creditcards.
Example:
ID Credit Cards Newspaper
----------------------------------------
10354 3 The Independent
12154 4 The Independent
11354 2 The Times
14587 3 The Daily Mail
19874 5 The Sunday news
16847 1 The Independent
Can you please help with an sql command to output the query stated above?

select *
from (
select newspaper,
rank() over (order by count(*) desc) as rnk
from Questionnaire
where credit_cards >= 3
group by newspaper
) t
where rnk = 1
If two newspapers have the same "popularity" both will be returned.
SQLFiddle demo: http://sqlfiddle.com/#!4/16dcb/1

If you are looking to get only the most popular Newspaper[s], then this can solve the query.
select
newspaper
, count(1) as fct
from
Questionnaire
where
CreditCards >= 3
group by newspaper
having fct =
(
select
max(ct)
from
(
select
newspaper
,count(1) as ct
from
Questionnaire
where
CreditCards >= 3
group by newspaper
)
)
/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL interview question: select, join, grouping - sql

Related

How to consecutively count everything greater than or equal to itself in SQL?

Case Statement for multiple criteria

SQL counting query

How can I SELECT the max row in a table SQL?

Oracle SQL find popular query

Categories

Resources