Case Statement for multiple criteria - sql

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500

Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END

Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.

You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable

You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

Related

how can I count some values for data in a table based on same key in another table in Bigquery?

I have one table like bellow. Each id is unique.
id
times_of_going_out
fef666
2
S335gg
1
9a2c50
1
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id
category_name
city
S335gg
Games & Game Supplies
tk
9a2c50
Telephone Companies
os
9a2c50
Recreation Centers
ky
fef666
Recreation Centers
ky
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
 ・ Where do people who go out twice often go?
 ・ Where do people who go out 6times often go?
Thank you
The expected result could be something like
less than 5
more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times
top ten “category_name” for uid’s with "times_of_going_out" more than 5 times
Steps:
combining data and aggregating total time_going_out
creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

How to consecutively count everything greater than or equal to itself in SQL?

Let's say if I have a table that contains Equipment IDs of equipments for each Equipment Type and Equipment Age, how can I do a Count Distinct of Equipment IDs that have at least that Equipment Age.
For example, let's say this is all the data we have:
equipment_type
equipment_id
equipment_age
Screwdriver
A123
1
Screwdriver
A234
2
Screwdriver
A345
2
Screwdriver
A456
2
Screwdriver
A567
3
I would like the output to be:
equipment_type
equipment_age
count_of_equipment_at_least_this_age
Screwdriver
1
5
Screwdriver
2
4
Screwdriver
3
1
Reason is there are 5 screwdrivers that are at least 1 day old, 4 screwdrivers at least 2 days old and only 1 screwdriver at least 3 days old.
So far I was only able to do count of equipments that falls within each equipment_age (like this query shown below), but not "at least that equipment_age".
SELECT
equipment_type,
equipment_age,
COUNT(DISTINCT equipment_id) as count_of_equipments
FROM equipment_table
GROUP BY 1, 2
Consider below join-less solution
select distinct
equipment_type,
equipment_age,
count(*) over equipment_at_least_this_age as count_of_equipment_at_least_this_age
from equipment_table
window equipment_at_least_this_age as (
partition by equipment_type
order by equipment_age
range between current row and unbounded following
)
if applied to sample data in your question - output is
Use a self join approach:
SELECT
e1.equipment_type,
e1.equipment_age,
COUNT(*) AS count_of_equipments
FROM equipment_table e1
INNER JOIN equipment_table e2
ON e2.equipment_type = e1.equipment_type AND
e2.equipment_age >= e1.equipment_age
GROUP BY 1, 2
ORDER BY 1, 2;
GROUP BY restricts the scope of COUNT to the rows in the group, i.e. it will not let you reach other rows (rows with equipment_age greater than that of the current group). So you need a subquery or windowing functions to get those. One way:
SELECT
equipment_type,
equipment_age,
(Select COUNT(*)
from equipment_table cnt
where cnt.equipment_type = a.equipment_type
AND cnt.equipment_age >= a.equipment_age
) as count_of_equipments
FROM equipment_table a
GROUP BY 1, 2, 3
I am not sure if your environment supports this syntax, though. If not, let us know we will find another way.

SQL interview question: select, join, grouping

At the interview got question:
What products clients bought before first order of brand "Brand 1". Select top 5 by orders.
Tables:
Items:
RezonItemID;BrandName
5555613;Brand 1
2315946;Brand 2
9132648;Brand 3
3125847;Brand 1
3126548;Brand 5
Orders:
ClientID;ClientOrderID;RezonItemID;FactMoment
00611847;4562145;5555613;2021-01-09
00798451;7987465;1321321;2021-08-10
00914751;3154844;9132648;2021-07-01
00975418;9797451;1312125;2021-09-09
00978461;9413235;9754512;2021-10-29
My decision:
WITH first_order AS (
SELECT ClientID, MIN(FactMoment) as o_date
FROM orders
JOIN items
USING(RezonItemID)
WHERE BrandName = 'Brand 1'
GROUP BY ClientID
)
SELECT RezonItemID, COUNT(*) AS n_orders
FROM orders
JOIN items
USING(RezonItemID)
JOIN first_order
USING(ClientID)
WHERE FactMoment < o_date
GROUP BY RezonItemID
ORDER BY n_orders DESC
LIMIT 5
Is it possible to solve this by window functions? Maybe there is better decision?
Given the following tables
1 RezonItemID;BrandName
2 5555613;Brand 1
3 2315946;Brand 2
4 9132648;Brand 3
5 3125847;Brand 1
6 3126548;Brand 5
7
8 ClientID;ClientOrderID;RezonItemID;FactMoment
9 00611847;4562145;5555613;2021-01-09
10 00798451;7987465;1321321;2021-08-10
11 00914751;3154844;9132648;2021-07-01
12 00975418;9797451;1312125;2021-09-09
13 00978461;9413235;9754512;2021-10-29
It seems that if the question is "Which products did the clients buy before the first order for an item of Brand 1," a sql query may not be necessary. Assuming that FactMoment is a timestamp for the order, we can see that the first order has the earliest date (01/09/21) and has a "RezonItemID" 5555613. That item has the brand "Brand 1".
So the answer would be that no items were purchased before the first purchase of an item with BrandName = 'Brand 1'.
This is a puzzling question, the sample test data is not particularly useful since it yields no testable results so is pretty much useless.
If you want to use window functions then that's certainly possible.
The following successfully yields no rows and should work, but without proper test data it's hard to actually be sure!
Note using() is not supported by some databases, ansi join syntax is preferred.
select top(5) BrandName from (
select o.ClientID, i.BrandName, o.FactMoment,
Min(case when i.BrandName='Brand 1' then FactMoment end) over() earliest,
Count(*) over(partition by ClientID) qty
from Orders o left join Items i on i.RezonItemID=o.RezonItemID
)o
where FactMoment<earliest
order by qty desc

SQL counting query

Sorry if this is a basic question.
Basically, I have a table that is as follows, below is a basic sample
store-ProdCode-result
13p I10x 5
13p I20x 7
13p I30x 8
14a K38z 23
17a K38z 23
my data set has nearly 100,000 records.
What I'm trying to do is, for every store find the top 10 prodCode.
I am unsure of how to do this but what I tried was:
select s_code as store, prod_code,count (prod_code)
from top10_secondary
where prod_code is not null
group by store,prod_code
order by count(prod_code) desc limit 10
this is giving me something completely different and i'm unsure on how I go about achieving my final result.
All help is appreciated.
Thanks
The expected output should be: for every store(s_code) display the top 10 prodcode
so:
store--prodcode--result
1a abc 5
1a abd 4
2a dgf 1
2a ldk 6
.(10 times until next store code)
You can use the table twice in the FROM clause, once for the data, and once to get a count of how many records have fewer results for that store.
SELECT a.s_code, a.prod_code, count(*)
FROM top10_secondary a
LEFT OUTER JOIN top10_secondary b
ON a.s_code = b.s_code
AND b.result < a.result
GROUP BY a.s_code, a.prod_code
HAVING count(*) < 10
With this technique though, you may get more than 10 records per store if the 10th result value exists multiple times. Because the limit rule is simply "include record as long as there are less than 10 records with result values than mine"
It looks like in your case, "result" is a ranking, so they would not be duplicated per store.
This is a good case for Window functions.
SELECT
s_code,
prod_code,
prod_count
FROM
(
SELECT
s_code,
prod_code,
prod_count,
RANK() OVER (PARTITION BY s_code ORDER BY prod_Count DESC) as prod_rank
FROM
(SELECT s_code as store, prod_code, count(prod_Code) prod_count FROM table GROUP BY s_code, prod_code) t1
) t2
WHERE prod_rank <= 10
The inner most query gets the count of each product at the store. The second inner more query determines the rank for those products for each store based on that count. Then the outer most query limits the results based on that rank.
o

Simple SQL query with select and group by

I have some kind of problem to understand something.
I have the next table:
ID PROD PRICE
1 A 10
2 B 20
3 C 30
4 A 1
5 B 12
6 C 2
7 A 7
8 B 8
9 C 9
10 A 5
11 B 2
I want to get all the minimum prices of all the prod, meaning I want to get 3 records, the minimum price for every prod.
From the example above, this is what I want to get:
ID PROD MIN(PRICE)
4 A 1
11 B 2
6 C 2
This is the query I wrote:
select id, prod, min(price)
from A1
group by(prod);
But this is the records I got:
ID PROD MIN(PRICE)
1 A 1
2 B 2
3 C 2
As you can see the ID value is wrong, it is only give me some kind of line counter and not the actual ID value.
You can check it at the next link
What I'm doing wrong?
SELECT a.*
FROM A1 a
INNER JOIN
(
SELECT Prod, MIN(Price) minPrice
FROM A1
GROUP BY Prod
) b ON a.Prod = b.Prod AND
a.Price = b.minPrice
SQLFiddle Demo
For MSSQL
SELECT ID, Prod, Price
FROM
(
SELECT ID, Prod, Price,
ROW_NUMBER() OVER(Partition BY Prod ORDER BY Price ASC) s
FROM A1
) a
WHERE s = 1
SQLFiddle Demo
You must be using MySQL or perhaps PostgreSQL.
In standard SQL, all non-aggregate columns in the select-list must be cited in the GROUP BY clause.
I'm not clear whether you need the ID column. If not, then use:
SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod;
If you need the matching ID number, then that becomes a sub-query:
SELECT id, prod, price
FROM A1
JOIN (SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod
) AS A2 ON A1.prod = A2.prod AND A1.price = A2.min_price;
Can you please explain what is the problem with what I wrote, and yes I need the ID column.
select id, prod, min(price)
from A1
group by(prod);
In standard SQL, you would get an error message (or, if not standard, in most SQL DBMS).
Where you are allowed to omit the ID column from the GROUP BY clause, then you get a quasi-random value for ID for the correct prod and MIN(price) values. Basically, the optimizer will choose any convenient ID that it knows about, based on its whims. Specifically, it does not do the sub-query and join that the full answer does. For example, it might do a sequential scan, and the ID it returns might be the first, or last, that it encounters for the given prod value, or it might be some other value — I'm not even sure whether the ID returned for prod = 'A' has to be an ID that was associated with prod = 'A'; you'd have to read the manual carefully. Basically, your query is indeterminate, so many return values are permissible and 'correct' (but not what you wanted).
Note that if you grouped by ID and not prod, then the result in prod would be determinate. That's because the ID column is a candidate key (unique identifier) for the table. (I believe PostgreSQL distinguishes between the two cases — but I'm not certain of that; MySQL does not.)