SQL Group by main category in self-referenced table - sql

I need to get a list of total sales grouped by the main category and Seller. Note that there can be sales on the main category (this is the best example I can think of at the moment).
Source table
+--------------------------------------+
|ID |Name |Seller|Qty|ParentID|
+--------------------------------------+
|10 |Egg |John |5 |NULL |
|10 |Egg |Anna |2 |NULL |
|10-01|Egg - Small |John |3 |10 |
|10-01|Egg - Small |Anna |4 |10 |
|10-02|Egg - Medium|John |2 |10 |
|10-02|Egg - Medium|Bob |11 |10 |
|10-03|Egg - Large |Anna |7 |10 |
+--------------------------------------+
Desired output
+------------------+
|ID|Name|Seller|Qty|
+------------------+
|10|Egg |John |10 | <- SUM of all sales John has made for any type of egg
|10|Egg |Anna |13 |
|10|Egg |Bob |11 |
+------------------+
I'm getting close with this query, but if someone has not made a sale on the main category, they will get the wrong Name when I use MIN(Name).
Current query
SELECT
SUBSTRING(t1.ID, 1, 2) AS 'ID',
MIN(t1.Name) AS 'Name',
t1.Seller,
SUM(t1.Qty) AS 'Qty'
FROM EggTest t1
GROUP BY
SUBSTRING(t1.ID, 1, 2),
t1.Seller
Current output
+--------------------------+
|ID|Name |Seller|Qty|
+--------------------------+
|10|Egg |Anna |13 |
|10|Egg - Medium|Bob |11 | <- Bob has not made sales on the main category
|10|Egg |John |10 |
+--------------------------+
EDIT: Seeing that multiple answers have already suggested SUBSTRING(Name, 1, 3) it will not work for me. Name does not always start with "Egg".
Update:
Now trying this query:
WITH report AS(
SELECT
ID = CASE WHEN s.ParentID IS NOT NULL THEN s.ParentID ELSE s.ID END,
Name = CASE WHEN s.ParentID IS NOT NULL THEN p.Name ELSE s.Name END,
s.Seller,
s.Qty
FROM EggTest s
LEFT JOIN EggTest p ON p.ID = s.ParentID
)
SELECT ID, Name, Seller, SUM(Qty) AS 'Total'
FROM report
GROUP BY ID, Name, Seller;
But I am getting this strange result:
+--------------------+
|ID|Name|Seller|Total|
+--------------------+
|10|Egg |Anna |24 | <- Wrong (Should be 13)
|10|Egg |Bob |22 | <- Wrong (Should be 11)
|10|Egg |John |15 | <- Correct(!!)
+--------------------+
In the report-table I'm getting some duplicates:
+------------------+
|ID|Name|Seller|Qty|
+------------------+
|10|Egg |John |5 |
|10|Egg |Anna |2 |
|10|Egg |John |3 |
|10|Egg |John |3 |
|10|Egg |Anna |4 |
|10|Egg |Anna |4 |
|10|Egg |John |2 |
|10|Egg |John |2 |
|10|Egg |Anna |7 |
|10|Egg |Anna |7 |
|10|Egg |Bob |11 |
|10|Egg |Bob |11 |
+------------------+

I will consider the source table name as [Sales]
You can use the following
with report as(
select ID = case when s.ParentID is not null then s.ParentID else s.ID end,
Name= case when s.ParentID is not null then p.Name else s.Name end,
s.Seller,
s.Qty
from Sales s
left join Sales p on p.ID = s.ParentID and p.Seller = s.Seller
)
select ID,Name,Seller,sum(Qty) as Qty
from report
group by ID,Name,Seller
Here a demo using Distinct
Here a demo by including the Seller in the left join , which will give you Name of the item for Seller Bob as NULL, the left join should work if you have correct data integrity which means separate table for the Items and Categories
replying on your last comment, here a demo how to make your data clear
Hope this will help you

Try this query. If you need explanation, ask :) But it's rather simple query :)
SELECT MAX(SUBSTRING(ID, 1, 2)) AS ID,
SUBSTRING(Name, 1, 3) AS Name,
Seller,
SUM(Qty) AS Qty
FROM TABLE_NAME
GROUP BY Seller, SUBSTRING(Name, 1, 3)

I'm not sure if ID is always in the format nn[-nn] and if Name can handle other stuffs than eggs...
This shoud works in any case:
;with
m as (
select *, nullif(charindex('-', ID), 0) div_id, nullif(charindex(' - ', name), 0) div_cat
from EggTest
),
c as (
select *,
SUBSTRING(ID, 1, isnull(div_id-1, 1000)) main_ID,
SUBSTRING(name, 1, isnull(div_cat-1, 1000)) main_cat,
nullif(SUBSTRING(name, isnull(div_cat, 1000)+2, 1000), '') sub_cat
from m
)
select main_ID ID, main_cat [Name], Seller, sum(qty) Qty
from c
group by main_ID, main_cat, seller
Outputs:
ID Name Seller Qty
10 Egg Anna 13
10 Egg Bob 11
10 Egg John 10

Related

Fill Future date for null groupby sql(presto)

This might be easier than I'm thinking, but essentially want to fill in values that would be null for ID 2. Example below. Thanks.
Given Table:
|ID| food category | time |
:--:----------:-------
|1 |italian | 2021-10-01|
|1 | indian | 2021-10-23|
|1 | american| 2021-10-05|
|1 | mexican | 2021-10-07|
|1 | Chinese | 2021-10-09|
|1 | vietnamese| 2021-10-11|
|1 | thai | 2021-10-12|
|1 | Moroccan| 2021-9-01|
|1 | russian | 2021-7-01|
|1 | korean | 2021-4-30|
|1 | canadian| 2021-7-01|
|2 |italian | 2020-10-11|
|2 | indian | 2021-04-23|
|2 | american| 2021-10-25|
|2 | mexican | 2021-10-27|
I'd like to transform the table above by grouping by id and food category, but still have the time for ID 2 to be replaced with future dates(date_add('year',1, now()) for null time. Since there would be no record for ID 2 for the food categories of Chinese, Vietnamese, Thai, Moroccan, Russian, Korean, and Canadian these would be null, but I'd like them to still show in the group by the table and be placed by the date 1 year from now. Example of desired results below. Thank you for the help.
Desired Table:
|ID| food category | time |
:--:----------:-------
|1 |italian | 2021-10-01|
|1 | indian | 2021-10-23|
|1 | american| 2021-10-05|
|1 | mexican | 2021-10-07|
|1 | Chinese | 2021-10-09|
|1 | vietnamese| 2021-10-11|
|1 | thai | 2021-10-12|
|1 | Moroccan| 2021-9-01|
|1 | russian | 2021-7-01|
|1 | korean | 2021-4-30|
|1 | canadian| 2021-7-01|
|2 |italian | 2020-10-11|
|2 | indian | 2021-04-23|
|2 | american| 2021-10-25|
|2 | mexican | 2021-10-27|
|2 | Chinese | 2022-11-23|
|2 | vietnamese| 2022-11-23|
|2 | thai | 2022-11-23|
|2 | Moroccan| 2022-11-23|
|2 | russian | 2022-11-23|
|2 | korean | 2022-11-23|
|2 | canadian| 2022-11-23|
you can use following query
SELECT COALESCE(t1.ID,t2.ID) as ID,
COALESCE(t1.foodcategory,t2.foodcategory) as foodcategory,
CAST(COALESCE(t2.time,dateadd(year, 1, getdate())) AS DATE) time
FROM
(SELECT *
FROM
(SELECT foodcategory
FROM testTB
GROUP BY foodcategory) t1
JOIN
(SELECT id
FROM testTB
GROUP BY id) t2 on 1=1) t1
LEFT JOIN testTB t2 on t1.ID = t2.ID and t1.foodcategory = t2.foodcategory
or
WITH cte AS (
select distinct foodcategory from testTB
)
SELECT t2.ID,t1.foodcategory,CAST(COALESCE(t3.time,dateadd(year, 1, getdate())) AS DATE) time
FROM cte t1
FULL OUTER JOIN (
select distinct [ID] from testTB
) t2 on 1=1
left join testTB t3 on t2.ID = t3.ID and t1.foodcategory = t3.foodcategory
order by t2.id
demo in db<>fiddle
Use a CTE to gather the list of food categories first. Then gather the list of IDs.
WITH cteCat AS (
select distinct [food category] from table
)
, cteID AS (
select distinct [ID] from table
)
SELECT id.[ID], cat.[food category],
COALESCE(t.[time], dateadd(year, 1, getdate())) as [time]
FROM cteCat cat
, cteID id
LEFT OUTER JOIN table t
ON t.[ID] = id.[ID]
AND t.[food category] = cat.[food category]

Group by a period Postgresql

I'm trying to group by a period of time the following table (example) :
------------------
|month|year|value|
------------------
|7 |2019|1.2 |
|8 |2019|1.7 |
|9 |2019|1.5 |
|10 |2019|0.7 |
|11 |2019|0.2 |
|12 |2019|1.7 |
|1 |2020|1.0 |
|2 |2020|0.1 |
|3 |2020|2.1 |
|4 |2020|1.2 |
|5 |2020|1.2 |
|6 |2020|1.7 |
|7 |2020|2.1 |
|8 |2020|1.7 |
|9 |2020|1.5 |
|10 |2020|0.7 |
|11 |2020|0.2 |
|12 |2020|1.7 |
|1 |2021|1.0 |
|2 |2021|0.1 |
|3 |2021|2.1 |
|4 |2021|1.2 |
|5 |2021|1.7 |
|6 |2021|1.5 |
Etc..
I have to group every 12 month from july(7) to June(6 of the next year).
I already tried some solution found online but nothing work for me, anyone have a solution?
I'm using Postgresql.
Thanks in advance
One way is to use arithmetic
select floor((year * 12 + month - 7) / 12) as effective_year, avg(value)
from t
group by effective_year;
#GordonLinoff has the appropriate solution missing only the actual period covered by the effective_year. However, that period is easily derived using the effective_year and a couple built in functions: daterange and make_date.
select daterange(make_date(effective_year ,07 ,01)
,make_date(effective_year+1,06 ,30)
,'[]'
)
, avg_value
from (select floor((year * 12 + month - 7) / 12)::integer as effective_year
, avg(value) avg_value
from test_data
group by effective_year
) da
order by effective_year;
See full example.

Adding an additional column to SQL UNION SELECT

Suppose I have the following mapped and normalized tables;
Group User Contact BelongsTo
+-------+------+------+ +-------+------+------+ +-------+------+------+ +-------+------+
| gID| name| col3| | uID| fname| sname| | cID| name| col3| | accID| gID|
+-------+------+------+ +-------+------+------+ +-------+------+------+ +-------+------+
|1 |ABC |? | |1 |JJ |BB | |4 |ABCD |? | |1 |2 |
+-------+------+------+ +-------+------+------+ +-------+------+------+ +-------+------+
|2 |BCD |? | |2 |CC |LL | |5 |BCDE |? | |3 |2 |
+-------+------+------+ +-------+------+------+ +-------+------+------+ +-------+------+
|3 |DEF |? | |3 |RR |NN | |6 |CDEF |? | |5 |3 |
+-------+------+------+ +-------+------+------+ +-------+------+------+ +-------+------+
Using EERM, User and Contact are subclasses of "Account" superclass. (not shown) An account can belong to many groups, thus "BelongsTo" table records the M:N relationship between the Accounts and Group membership.
I would like an SQL statement which will allow me to query all the users and contacts that have membership in a Group matching conditions as follows:
SELECT
tc."cID" AS "accID",
tc."name" AS "accName",
tbt."gID"
FROM "tblContact" tc
INNER JOIN "tblBelongsTo" tbt
ON tbt."accID" = tc."cID"
UNION SELECT
tu."uID" AS "accID",
CONCAT (tu."fname", ' ', tu."sname") AS "accName",
tbt."gID"
FROM "tblUser" tu
INNER JOIN "tblBelongsTo" tbt
ON tbt."accID" = tu."uID"
ORDER BY "accID" ASC;
The above works, I have combined UNION SELECT in the query as the number of columns match either side when I CONCAT the forename and surname together. Resulting in a global "account_name" & "account_id" column.
My question is this: How would I go about adding an extra column so that I can see what the group name is?
ie from this:
Result
+-------+-------+------+
| accID|accName| gID|
+-------+-------+------+
|1 |JJBB |2 |
+-------+-------+------+
|3 |RRNN |2 |
+-------+-------+------+
|5 |BCDE |3 |
+-------+-------+------+
to this:
Result (2)
+-------+-------+------+------+
| accID|accName| gID| name|
+-------+-------+------+------+
|1 |JJBB |2 | BCD|
+-------+-------+------+------+
|3 |RRNN |2 | BCD|
+-------+-------+------+------+
|5 |BCDE |3 | DEF|
+-------+-------+------+------+
It seems everything I have tried causes the UNION SELECT to break (because of unmatched column). Likewise, I had little luck in combining sub-queries. I am probably missing something very obvious...
Thanks in advance.
You can join it to the group table to get the name.
select x.*, g.name
from group g join
(
SELECT
tc."cID" AS "accID",
tc."name" AS "accName",
tbt."gID"
FROM "tblContact" tc
INNER JOIN "tblBelongsTo" tbt
ON tbt."accID" = tc."cID"
UNION
SELECT
tu."uID" AS "accID",
CONCAT (tu."fname", ' ', tu."sname") AS "accName",
tbt."gID"
FROM "tblUser" tu
INNER JOIN "tblBelongsTo" tbt
ON tbt."accID" = tu."uID"
) x on x.gid = g.gid
order by accid;

Complicated min/max multi-table query

I need to get the min and max score of group ids, but only if they are enabled:
cdu_group_sl: cdu_group_cc: cdu_group_ph:
-------------------- -------------------- --------------------
|id |name |enabled | |id |name |enabled | |id |name |enabled |
-------------------- -------------------- --------------------
|1 |sl_1 |1 | |1 |cc_1 |1 | |1 |ph_1 |0 |
|2 |sl_3 |1 | |2 |cc_2 |0 | |2 |ph_2 |1 |
|3 |sl_4 |1 | |3 |cc_3 |1 | |3 |ph_3 |1 |
-------------------- -------------------- --------------------
Scores are found in a separate table:
cdu_user_progress
----------------------------------
|id |group_type |group_id |score |
----------------------------------
|1 |sl |1 |50 |
|1 |cc |1 |10 |
|1 |ph |1 |20 |
|1 |sl |2 |80 |
|1 |sl |3 |20 |
|1 |cc |3 |30 |
|1 |sl |1 |40 |
|1 |ph |1 |50 |
|1 |cc |1 |40 |
|1 |ph |2 |90 |
----------------------------------
I need to get a max and min score for each type of group for only enabled groups (for each type):
---------------------------------------------
|group_type |group_id |min_score |max_score |
---------------------------------------------
|sl |1 |40 |50 |
|sl |2 |80 |80 |
|sl |3 |20 |20 |
|cc |1 |10 |40 |
|cc |3 |30 |30 |
|ph |1 |20 |50 |
|ph |2 |90 |90 |
---------------------------------------------
Any idea what the query might be??? So far I have:
SELECT * FROM cdu_user_progress
JOIN cdu_group_sl ON (cdu_group_sl.id = cdu_user_progress.group_id AND cdu_user_progress.group_type = 'sl')
JOIN cdu_group_cc ON (cdu_group_cc.id = cdu_user_progress.group_id AND cdu_user_progress.group_type = 'cc')
JOIN cdu_group_ph ON (cdu_group_ph.id = cdu_user_progress.group_id AND cdu_user_progress.group_type = 'ph')
WHERE cdu_user_progress.uid = $student->uid
AND (cdu_user_progress.group_type = 'sl' AND cdu_group_sl.enabled = 1)
AND (cdu_user_progress.group_type = 'cc' AND cdu_group_cc.enabled = 1)
AND (cdu_user_progress.group_type = 'ph' AND cdu_group_ph.enabled = 1)
Probably completely wrong...
what about using a union to pick the groups you are interested in - something like:
select group_type, group_id min(score) min_score, max(score) max_score
from (
select id, 'sl' grp from cdu_group_sl where enabled = 1
union all
select id, 'cc' from cdu_group_cc where enabled = 1
union all
select id, 'ph' from cdu_group_ph where enabled = 1
) grps join cdu_user_progress scr
on grps.id = scr.group_id and grps.grp = scr.group_type
group by scr.group_type, scr.group_id
The following is probably the fastest way to do this query. To optimize this, you should have an index on group_id, enabled on each of the three "sl", "cc", and "ph" tables:
select cup.*
from cdu_user_progress cup
where (cup.group_type = 'sl' and
exists (select 1
from cdu_group_sl sl
where sl.id = cup.group_id and
sl.enabled = 1
)
) or
(cup.group_type = 'cc' and
exists (select 1
from cdu_group_cc cc
where cc.id = cup.group_id and
cc.enabled = 1
)
) or
(cup.group_type = 'ph' and
exists (select 1
from cdu_group_ph ph
where ph.id = cup.group_id and
ph.enabled = 1
)
)
As a note, having three tables with the same structure is usually a sign of a poor database schema. These three tables should probably be combined into a single table, which would make this query much easier to write.
If you are just starting up this project, I would recommend refining your data structure. Based on what you showed, you could benefit from only one cdu_groups table with a reference to a new cdu_group_types table, and removing the group_type column from cdu_user_progress.
If this is an established project, where changing the structure would be too disruptive... then one of the other answers showing a query would be a better/easier fit.
Otherwise, you could simplify things with restructured tables and end up with a query like:
SELECT group_type,
group_id,
MIN(score) as min_score,
MAX(score) as max_score
FROM cdu_user_progress c
INNER JOIN cdu_groups g
ON c.group_id=g.id
INNER JOIN cdu_group_types t
ON g.group_type_id=t.id
WHERE enabled=1
GROUP BY group_type, group_id
This is shown, with expected results, in this SQLFiddle. With this structure you can add new group types as you want (and also cut down on amount of tables and joins). Tables would be (simplified in this code below, no FKs or anything):
CREATE TABLE cdu_user_progress
(id INT, group_id INT, score INT)
CREATE TABLE cdu_group_types
(id INT, group_type VARCHAR(3))
CREATE TABLE cdu_groups
(id INT, group_type_id INT, name VARCHAR(10), enabled BIT NOT NULL DEFAULT 1)
Granted moving data to a new structure may be a pain or not reasonable... but wanted to throw this out there as a possibility or just something to chew on.

count and distinct over multiple columns

I have a database table containing two costs. I want to find the distinct costs over these two columns. I also want to find the count that these costs appear. The table may look like
|id|cost1|cost2|
|1 |50 |60 |
|2 |20 |50 |
|3 |50 |70 |
|4 |20 |30 |
|5 |50 |60 |
In this case I want a result that is distinct over both columns and count the number of times that appears. So the result I would like is
|distinctCost|count|
|20 |2 |
|30 |1 |
|50 |4 |
|60 |2 |
|70 |1 |
and ideally ordered
|disctinCost1|count|
|50 |4 |
|60 |2 |
|20 |2 |
|70 |1 |
|30 |1 |
I can get the distinct over two columns by doing something like
select DISTINCT c FROM (SELECT cost1 AS c FROM my_costs UNION SELECT cost2 AS c FROM my_costs);
and I can get the count for each column by doing
select cost1, count(*)
from my_costs
group by cost1
order by count(*) desc;
My problem is how can I get the count for both columns? I am stuck on how to do the count over each individual column and then add it up.
Any pointers would be appreciated.
I am using Oracle DB.
Thanks
By combining your two queries..
select cost, count(*)
from
(
SELECT id, cost1 AS cost FROM my_costs
UNION ALL
SELECT id, cost2 AS c FROM my_costs
) v
group by cost
order by count(*) desc;
(If when a row has cost1 and cost2 equal, you want to count it once not twice, change the union all to a union)
You can use the unpivot statement :
select *
from
(
SELECT cost , count(*) as num_of_costs
FROM my_costs
UNPIVOT
(
cost
FOR cost_num IN (cost1,cost2)
)
group by cost
)
order by num_of_costs desc;