Simple SQL query with select and group by - sql

I have some kind of problem to understand something.
I have the next table:
ID PROD PRICE
1 A 10
2 B 20
3 C 30
4 A 1
5 B 12
6 C 2
7 A 7
8 B 8
9 C 9
10 A 5
11 B 2
I want to get all the minimum prices of all the prod, meaning I want to get 3 records, the minimum price for every prod.
From the example above, this is what I want to get:
ID PROD MIN(PRICE)
4 A 1
11 B 2
6 C 2
This is the query I wrote:
select id, prod, min(price)
from A1
group by(prod);
But this is the records I got:
ID PROD MIN(PRICE)
1 A 1
2 B 2
3 C 2
As you can see the ID value is wrong, it is only give me some kind of line counter and not the actual ID value.
You can check it at the next link
What I'm doing wrong?

SELECT a.*
FROM A1 a
INNER JOIN
(
SELECT Prod, MIN(Price) minPrice
FROM A1
GROUP BY Prod
) b ON a.Prod = b.Prod AND
a.Price = b.minPrice
SQLFiddle Demo
For MSSQL
SELECT ID, Prod, Price
FROM
(
SELECT ID, Prod, Price,
ROW_NUMBER() OVER(Partition BY Prod ORDER BY Price ASC) s
FROM A1
) a
WHERE s = 1
SQLFiddle Demo

You must be using MySQL or perhaps PostgreSQL.
In standard SQL, all non-aggregate columns in the select-list must be cited in the GROUP BY clause.
I'm not clear whether you need the ID column. If not, then use:
SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod;
If you need the matching ID number, then that becomes a sub-query:
SELECT id, prod, price
FROM A1
JOIN (SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod
) AS A2 ON A1.prod = A2.prod AND A1.price = A2.min_price;
Can you please explain what is the problem with what I wrote, and yes I need the ID column.
select id, prod, min(price)
from A1
group by(prod);
In standard SQL, you would get an error message (or, if not standard, in most SQL DBMS).
Where you are allowed to omit the ID column from the GROUP BY clause, then you get a quasi-random value for ID for the correct prod and MIN(price) values. Basically, the optimizer will choose any convenient ID that it knows about, based on its whims. Specifically, it does not do the sub-query and join that the full answer does. For example, it might do a sequential scan, and the ID it returns might be the first, or last, that it encounters for the given prod value, or it might be some other value — I'm not even sure whether the ID returned for prod = 'A' has to be an ID that was associated with prod = 'A'; you'd have to read the manual carefully. Basically, your query is indeterminate, so many return values are permissible and 'correct' (but not what you wanted).
Note that if you grouped by ID and not prod, then the result in prod would be determinate. That's because the ID column is a candidate key (unique identifier) for the table. (I believe PostgreSQL distinguishes between the two cases — but I'm not certain of that; MySQL does not.)

Related

Get occurrence count of specific categories in a table

Looking to get the transition count of categories from a table. For Name type B, category transitions from Good to Bad so count is 2. For Name type A, it transitions from Good - Moderate - Good - Moderate - Bad, hence gets a count of 5.
Any help would be appreciated.
This is my input data:
Name
order no
category
A
1
Good
A
2
Good
A
3
MODERATE
A
4
Good
A
5
MODERATE
A
6
Bad
A
7
Bad
B
1
Good
B
2
Good
B
3
Good
B
4
BAD
And this is my desired output:
Name
category_transition_count
A
5
B
2
select name
,count(cnt) as category_transition_count
from
(select name
,case when category <> lag(category) over(partition by Name order by order_no) or lag(category) over(partition by Name order by order_no) is null then 1 end as cnt
from t) t
group by name
name
category_transition_count
A
5
B
2
Fiddle
You could use the lag window function to get the category of the previous row, and then compare it with the current row to see if it changed, and count those occurrences. Note that by definition the lag of the first value is null, which can't be different from the current value. so you'll need to handle that explicitly:
SELECT name, COUNT(changed) + 1
FROM (SELECT name,
CASE WHEN category <> LAG(category) OVER (PARTITION BY name ORDER BY order_no ASC)
THEN 1
END AS changed
FROM mytable) t
GROUP BY name
SQLFiddle (PostgreSQL) demo

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

SQL counting query

Sorry if this is a basic question.
Basically, I have a table that is as follows, below is a basic sample
store-ProdCode-result
13p I10x 5
13p I20x 7
13p I30x 8
14a K38z 23
17a K38z 23
my data set has nearly 100,000 records.
What I'm trying to do is, for every store find the top 10 prodCode.
I am unsure of how to do this but what I tried was:
select s_code as store, prod_code,count (prod_code)
from top10_secondary
where prod_code is not null
group by store,prod_code
order by count(prod_code) desc limit 10
this is giving me something completely different and i'm unsure on how I go about achieving my final result.
All help is appreciated.
Thanks
The expected output should be: for every store(s_code) display the top 10 prodcode
so:
store--prodcode--result
1a abc 5
1a abd 4
2a dgf 1
2a ldk 6
.(10 times until next store code)
You can use the table twice in the FROM clause, once for the data, and once to get a count of how many records have fewer results for that store.
SELECT a.s_code, a.prod_code, count(*)
FROM top10_secondary a
LEFT OUTER JOIN top10_secondary b
ON a.s_code = b.s_code
AND b.result < a.result
GROUP BY a.s_code, a.prod_code
HAVING count(*) < 10
With this technique though, you may get more than 10 records per store if the 10th result value exists multiple times. Because the limit rule is simply "include record as long as there are less than 10 records with result values than mine"
It looks like in your case, "result" is a ranking, so they would not be duplicated per store.
This is a good case for Window functions.
SELECT
s_code,
prod_code,
prod_count
FROM
(
SELECT
s_code,
prod_code,
prod_count,
RANK() OVER (PARTITION BY s_code ORDER BY prod_Count DESC) as prod_rank
FROM
(SELECT s_code as store, prod_code, count(prod_Code) prod_count FROM table GROUP BY s_code, prod_code) t1
) t2
WHERE prod_rank <= 10
The inner most query gets the count of each product at the store. The second inner more query determines the rank for those products for each store based on that count. Then the outer most query limits the results based on that rank.
o

SQL Server find the missing number

I have a table like below
id name year
--------------
1 A 2000
2 B 2000
2 B 2000
2 B 2000
5 C 2000
1 D 2001
3 E 2001
as well as you see in the year 2000 we missed id '3' and id '4' and in the year 2001 we missed id '2'. I want to generate my second table which includes missed items.
2nd table :
From-id to-id name year
--------------------------------
3 4 null 2000
2 null null 2001
Which method in a SQL query can solve my problem?
Gaps and Islands in Sequences is the name of this problem. you read this article
Here's something to get you started:
WITH cte AS
(
SELECT *
FROM
(VALUES
(1),(2),(3),(4),(5)
) Tally(number)
), cte2 as
(
SELECT DISTINCT [year]
FROM
(VALUES
(2000),(2000),(2001)
)tbl([year])
), cte3 as
(
SELECT *
FROM cte
CROSS JOIN cte2
)
SELECT *
FROM cte3
LEFT OUTER JOIN YourTable ON cte3.number = YourTable.id AND cte3.[year] = YourTable[year)
A few notes: please avoid using reserved keywords as column names (such as year).
Furthermore, since I didn't know how you'd handle multiple missing ranges I did not format the output to reflect a range. For example: What would be your expected output if only one row with id=3 would be in your table?
I'd probably use ROW_NUMBER for this
This query gives you what the correct ID should be (if I interpreted your question right):
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
It assigns a row number (so a number starting from 1 increasing by 1 every time the year is the same).
And to let you know which ones are missing I think this should be a working solution:
WITH missing AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
)
SELECT * FROM missing
WHERE "Correct ID" != "id"
It takes the first query as a base to select only those records where the assumed correct ID is not equal to the currently assigned ID. You can turn this into a query to include the ranges you mentioned, but not sure if that is really necessary.
Hope this helps.