How to groupby by aggregating different keys in SQL - sql

I have tables like below.
I would like to groupby by generating new keys like D,Dmeans AorB
In this case,countin D is 2 becauseAandBhas 1 record each.
Are there any way to generate new keys and groupby by using this?
product sex age
A M 10
B M 20
C F 30
My desired result is like below.
product count
C 1
D (A orB) 2
If you have same experience please let me know.
Thanks

Instead of the column product you must group by a derived column that matches your condition:
select
case when product in ('A', 'B') then 'D' else product end product,
count(*)
from tablename
group by case when product in ('A', 'B') then 'D' else product end
See the demo.
Results:
| newproduct | count(*) |
| ---------- | -------- |
| C | 1 |
| D | 2 |

ANSI SQL compliant query, use a case expression in a derived table to put A and B into D. GROUP BY its result:
select product, count(*)
from
(
select case when product in ('A', 'B') then 'D' else product end product
from tablename
) dt
group by product

Related

How do I transform the specific row value into column headers in hive [duplicate]

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).
I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
+--------+----------+-----------+
| 1 | B | 3 |
+--------+----------+-----------+
| 2 | A | 9 |
+--------+----------+-----------+
| 2 | C | 40 |
+--------+----------+-----------+
How do I query the database to return something like
+--------+------+-----+-----+
| hostid | A | B | C |
+--------+------+-----+-----+
| 1 | 10 | 3 | 0 |
+--------+------+-----+-----+
| 2 | 9 | 0 | 40 |
+--------+------+-----+-----+
I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.
I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:
select * from history;
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
| 1 | B | 3 |
| 2 | A | 9 |
| 2 | C | 40 |
+--------+----------+-----------+
This will be our goal, the pretty pivot table:
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).
When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):
select the columns of interest, i.e. y-values and x-values
extend the base table with extra columns -- one for each x-value
group and aggregate the extended table -- one group for each y-value
(optional) prettify the aggregated table
Let's apply these steps to your problem and see what we get:
Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.
Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:
create view history_extended as (
select
history.*,
case when itemname = "A" then itemvalue end as A,
case when itemname = "B" then itemvalue end as B,
case when itemname = "C" then itemvalue end as C
from history
);
select * from history_extended;
+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A | B | C |
+--------+----------+-----------+------+------+------+
| 1 | A | 10 | 10 | NULL | NULL |
| 1 | B | 3 | NULL | 3 | NULL |
| 2 | A | 9 | 9 | NULL | NULL |
| 2 | C | 40 | NULL | NULL | 40 |
+--------+----------+-----------+------+------+------+
Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.
Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:
create view history_itemvalue_pivot as (
select
hostid,
sum(A) as A,
sum(B) as B,
sum(C) as C
from history_extended
group by hostid
);
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.
Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:
create view history_itemvalue_pivot_pretty as (
select
hostid,
coalesce(A, 0) as A,
coalesce(B, 0) as B,
coalesce(C, 0) as C
from history_itemvalue_pivot
);
select * from history_itemvalue_pivot_pretty;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
And we're done -- we've built a nice, pretty pivot table using MySQL.
Considerations when applying this procedure:
what value to use in the extra columns. I used itemvalue in this example
what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)
Known limitations:
this solution doesn't allow n columns in the pivot table -- each pivot column needs to be manually added when extending the base table. So for 5 or 10 x-values, this solution is nice. For 100, not so nice. There are some solutions with stored procedures generating a query, but they're ugly and difficult to get right. I currently don't know of a good way to solve this problem when the pivot table needs to have lots of columns.
SELECT
hostid,
sum( if( itemname = 'A', itemvalue, 0 ) ) AS A,
sum( if( itemname = 'B', itemvalue, 0 ) ) AS B,
sum( if( itemname = 'C', itemvalue, 0 ) ) AS C
FROM
bob
GROUP BY
hostid;
Another option,especially useful if you have many items you need to pivot is to let mysql build the query for you:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'ifnull(SUM(case when itemname = ''',
itemname,
''' then itemvalue end),0) AS `',
itemname, '`'
)
) INTO #sql
FROM
history;
SET #sql = CONCAT('SELECT hostid, ', #sql, '
FROM history
GROUP BY hostid');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
FIDDLE
Added some extra values to see it working
GROUP_CONCAT has a default value of 1000 so if you have a really big query change this parameter before running it
SET SESSION group_concat_max_len = 1000000;
Test:
DROP TABLE IF EXISTS history;
CREATE TABLE history
(hostid INT,
itemname VARCHAR(5),
itemvalue INT);
INSERT INTO history VALUES(1,'A',10),(1,'B',3),(2,'A',9),
(2,'C',40),(2,'D',5),
(3,'A',14),(3,'B',67),(3,'D',8);
hostid A B C D
1 10 3 0 0
2 9 0 40 5
3 14 67 0 8
Taking advantage of Matt Fenwick's idea that helped me to solve the problem (a lot of thanks), let's reduce it to only one query:
select
history.*,
coalesce(sum(case when itemname = "A" then itemvalue end), 0) as A,
coalesce(sum(case when itemname = "B" then itemvalue end), 0) as B,
coalesce(sum(case when itemname = "C" then itemvalue end), 0) as C
from history
group by hostid
I edit Agung Sagita's answer from subquery to join.
I'm not sure about how much difference between this 2 way, but just for another reference.
SELECT hostid, T2.VALUE AS A, T3.VALUE AS B, T4.VALUE AS C
FROM TableTest AS T1
LEFT JOIN TableTest T2 ON T2.hostid=T1.hostid AND T2.ITEMNAME='A'
LEFT JOIN TableTest T3 ON T3.hostid=T1.hostid AND T3.ITEMNAME='B'
LEFT JOIN TableTest T4 ON T4.hostid=T1.hostid AND T4.ITEMNAME='C'
use subquery
SELECT hostid,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='A' AND hostid = t1.hostid) AS A,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='B' AND hostid = t1.hostid) AS B,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='C' AND hostid = t1.hostid) AS C
FROM TableTest AS T1
GROUP BY hostid
but it will be a problem if sub query resulting more than a row, use further aggregate function in the subquery
If you could use MariaDB there is a very very easy solution.
Since MariaDB-10.02 there has been added a new storage engine called CONNECT that can help us to convert the results of another query or table into a pivot table, just like what you want:
You can have a look at the docs.
First of all install the connect storage engine.
Now the pivot column of our table is itemname and the data for each item is located in itemvalue column, so we can have the result pivot table using this query:
create table pivot_table
engine=connect table_type=pivot tabname=history
option_list='PivotCol=itemname,FncCol=itemvalue';
Now we can select what we want from the pivot_table:
select * from pivot_table
More details here
My solution :
select h.hostid, sum(ifnull(h.A,0)) as A, sum(ifnull(h.B,0)) as B, sum(ifnull(h.C,0)) as C from (
select
hostid,
case when itemName = 'A' then itemvalue end as A,
case when itemName = 'B' then itemvalue end as B,
case when itemName = 'C' then itemvalue end as C
from history
) h group by hostid
It produces the expected results in the submitted case.
I make that into Group By hostId then it will show only first row with values,
like:
A B C
1 10
2 3
I figure out one way to make my reports converting rows to columns almost dynamic using simple querys. You can see and test it online here.
The number of columns of query is fixed but the values are dynamic and based on values of rows. You can build it So, I use one query to build the table header and another one to see the values:
SELECT distinct concat('<th>',itemname,'</th>') as column_name_table_header FROM history order by 1;
SELECT
hostid
,(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue else '' end) as col1
,(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue else '' end) as col2
,(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue else '' end) as col3
,(case when itemname = (select distinct itemname from history a order by 1 limit 3,1) then itemvalue else '' end) as col4
FROM history order by 1;
You can summarize it, too:
SELECT
hostid
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue end) as A
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue end) as B
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue end) as C
FROM history group by hostid order by 1;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
Results of RexTester:
http://rextester.com/ZSWKS28923
For one real example of use, this report bellow show in columns the hours of departures arrivals of boat/bus with a visual schedule. You will see one additional column not used at the last col without confuse the visualization:
** ticketing system to of sell ticket online and presential
This isn't the exact answer you are looking for but it was a solution that i needed on my project and hope this helps someone. This will list 1 to n row items separated by commas. Group_Concat makes this possible in MySQL.
select
cemetery.cemetery_id as "Cemetery_ID",
GROUP_CONCAT(distinct(names.name)) as "Cemetery_Name",
cemetery.latitude as Latitude,
cemetery.longitude as Longitude,
c.Contact_Info,
d.Direction_Type,
d.Directions
from cemetery
left join cemetery_names on cemetery.cemetery_id = cemetery_names.cemetery_id
left join names on cemetery_names.name_id = names.name_id
left join cemetery_contact on cemetery.cemetery_id = cemetery_contact.cemetery_id
left join
(
select
cemetery_contact.cemetery_id as cID,
group_concat(contacts.name, char(32), phone.number) as Contact_Info
from cemetery_contact
left join contacts on cemetery_contact.contact_id = contacts.contact_id
left join phone on cemetery_contact.contact_id = phone.contact_id
group by cID
)
as c on c.cID = cemetery.cemetery_id
left join
(
select
cemetery_id as dID,
group_concat(direction_type.direction_type) as Direction_Type,
group_concat(directions.value , char(13), char(9)) as Directions
from directions
left join direction_type on directions.type = direction_type.direction_type_id
group by dID
)
as d on d.dID = cemetery.cemetery_id
group by Cemetery_ID
This cemetery has two common names so the names are listed in different rows connected by a single id but two name ids and the query produces something like this
CemeteryID Cemetery_Name Latitude
1 Appleton,Sulpher Springs 35.4276242832293
You can use a couple of LEFT JOINs. Kindly use this code
SELECT t.hostid,
COALESCE(t1.itemvalue, 0) A,
COALESCE(t2.itemvalue, 0) B,
COALESCE(t3.itemvalue, 0) C
FROM history t
LEFT JOIN history t1
ON t1.hostid = t.hostid
AND t1.itemname = 'A'
LEFT JOIN history t2
ON t2.hostid = t.hostid
AND t2.itemname = 'B'
LEFT JOIN history t3
ON t3.hostid = t.hostid
AND t3.itemname = 'C'
GROUP BY t.hostid
I'm sorry to say this and maybe I'm not solving your problem exactly but PostgreSQL is 10 years older than MySQL and is extremely advanced compared to MySQL and there's many ways to achieve this easily. Install PostgreSQL and execute this query
CREATE EXTENSION tablefunc;
then voila! And here's extensive documentation: PostgreSQL: Documentation: 9.1: tablefunc or this query
CREATE EXTENSION hstore;
then again voila! PostgreSQL: Documentation: 9.0: hstore

Improving query that counts distinct values that have particular values in another column

Say I have a table in the format:
| id | category|
|----|---------|
| 10 | A |
| 10 | B |
| 10 | C |
| 2 | C |
I want to count the number of distinct id's that have all three values A, B, and C in the category variable. In this case, the query would return 1 since only for id = 10 is this true.
My intuition is to write the following query to get this value:
SELECT
COUNT(DISTINCT id),
SUM(CASE WHEN category = 'A' THEN 1 else 0 END) AS A,
SUM(CASE WHEN category = 'B' THEN 1 else 0 END) AS B,
SUM(CASE WHEN category = 'C' THEN 1 else 0 END) AS C
FROM
table
GROUP BY
id
HAVING
A >= 1
AND
B >= 1
AND
C >= 1
This feels a bit overwrought though -- is there a simpler way to achieve the desired outcome?
You are close, but you need two levels of aggregation. Assuming no duplicate rows:
SELECT COUNT(*)
FROM (SELECT id
FROM t
WHERE Category IN ('A', 'B', 'C')
GROUP BY id
HAVING COUNT(*) = 3
) t;
I assume this is part of a larger table, your id and categories can appear multiple times and still be distinct due to other fields, and that you know how many categories you're looking for.
SELECT ID, COUNT(ID)
FROM(
SELECT DISTINCT ID, CATEGORY
FROM TABLE)
GROUP BY ID
HAVING COUNT(ID) = 3 --or however many categories you want
Your subquery here removes extraneous info and forces your id to show up once per category. You then count up the number of times it shows up and look up the ones that show up 3 or however many times you want.

Sum two rows with different IDs

I have the following data:
ID | Field A | Quantity
1 | A | 10
2 | B | 20
3 | C | 30
I would like to sum the fields with ids 1 and 3 in a way that result will be:
ID | Field A | Quantity
1 | A | 40
2 | B | 20
Sounds to me more like code manipulation rather than SQL, but still want to try it.
My DMBS is sql-server.
you can try by using case when
select case when id in(1,3) then 'A' else 'B' end as field,sum(Quantity) as Quantity
from tablename group by case when id in(1,3) then 'A' else 'B' end
I think simple aggregation does what you want:
select min(id) as id, min(fieldA) as fieldA, sum(quantity) as quantity
from t
group by (case when id in (1, 3) then 1 else id end);
Not sure what method to use to combine 1,3 ID but you could try case:
select
case when id in (1,3) then 1 else id end
, min("Field A") "Field A"
, sum(quantity) quantity from myTable
group by case when id in (1,3) then 1 else id end
The above uses group by to aggregate the data. In this case it organizes the ID field using some logic to combine 1,3. All other unique ID will have its own group.
Aggregate functions take care of the other fields, including logic to take the min() value for Field A which seems to fit your requirement

How to use SUM over GROUP BY in SQL Server?

I currently have:
SELECT Name, COUNT(*) as Total
FROM DataTable
WHERE Name IN ('A', 'B', 'C')
GROUP BY Name
Resulting output:
Name Total
--------------
A 2
B 5
C 3
Instead I want this:
Name Total
--------------
A 10
B 10
C 10
Here 10 is a total of 2 + 5 + 3 (total number of records with name = A/B/C)
How do I do this?
To get your desired result you can use SUM() OVER () on the grouped COUNT(*). Demo
SELECT Name,
SUM(COUNT(*)) OVER () as Total
FROM DataTable
WHERE Name IN ('A', 'B', 'C')
GROUP BY Name
Get rid of the group by and use distinct:
select distinct Name, count(*) over() as Total
from t
where name in ('A', 'B', 'C')
rextester demo: http://rextester.com/WDMT68119
returns:
+------+-------+
| name | Total |
+------+-------+
| A | 10 |
| B | 10 |
| C | 10 |
+------+-------+
If you count all of the records and then do a cross join on all the different names
SELECT a.NAME
,x.Total
FROM DataTable a
CROSS JOIN (
COUNT(*) AS Total FROM DataTable
) x
GROUP BY a.NAME
,x.Total

Query to identify records without certain values

I have a table of data that looks something like this:
ID Num | Code
-------------
1 | A
1 | B
1 | C
1 | D
2 | A
2 | B
3 | A
3 | B
3 | D
4 | B
5 | A
5 | B
5 | E
And I need to be able to write an SQL query to show me all ID Numbers that do not have Codes C or D associated with them. (Which in this example would be ID Numbers 2, 4, & 5.)
Thanks in advance for any help you can provide!
I would use NOT IN:
SELECT DISTINCT ID_Num
FROM t
WHERE ID_Num NOT IN
(SELECT ID_Num
FROM t
WHERE code = 'C'
OR code = 'D')
I like to approach this type of question using group by and having:
select id_num
from t
group by id_num
having sum(case when code in ('C', 'D') then 1 else 0 end) = 0;
you can use 'not exists' (often more performant of not in)
SELECT DISTINCT ID_Num
FROM yourtable f1
WHERE not exists
(
SELECT * FROM yourtable f2
WHERE f2.code in ('C', 'D') and f2.ID_Num=f1.ID_Num
)
you can use 'left outer join lateral' and take not founded row like this:
SELECT DISTINCT f1.ID_Num
FROM yourtable f1
LEFT OUTER JOIN LATERAL
(
SELECT f2.ID_Num FROM yourtable f2
WHERE f2.code in ('C', 'D') AND f2.ID_Num=f1.ID_Num
FETCH FIRST ROWS ONLY
) f3 on 1=1
WHERE f3.ID_Num is null