HiveQL - aggregate multi-row data to single row

HiveQL - aggregate multi-row data to single row - hive

I'm struggling with trying to take multi-row data into a single column. To give an example here is the table I have:
site country users country _rank
cnn.com 840 10000 1
cnn.com 31 4000 3
cnn.com 556 6000 2
rt.com 840 200 3
rt.com 33 6000 2
rt.com 400 10000 1
and the result I am trying to get takes the # of users for top 2 countries and puts it in a single row:
site country_1 country_1_share country_2 country_2_share
cnn.com 840 10000 556 6000
rt.com 400 10000 33 6000
I've tried to do this a few different ways:
select site, country_1, country_1_share,country_2,country_2_share
from (
select site
,max(CASE WHEN country_rank = 1 THEN country END) AS country_1
,max(CASE WHEN country_rank = 1 THEN users END) as country_1_share
,max(CASE WHEN country_rank = 2 THEN country END) AS country_2
,max(CASE WHEN country_rank = 2 THEN users END) as country_2_share
from t1
group by site
)
and also:
select a.site, a.country_1, b.country_1_share,c.country_2,d.country_2_share
from (
select site, country as country_1
from t1
where max(CASE WHEN country_rank = 1 THEN country END)) a
JOIN (
select site, users as country_1_share
from t1
where max(CASE WHEN country_rank = 1 THEN users END)) b on (a.site=b.site)
JOIN (
select site, country as country_2
from t1
where max(CASE WHEN country_rank = 2 THEN country END)) c on (a.site = c.site)
JOIN (
select site, users as country_2_share
from t1
where max(CASE WHEN country_rank = 2 THEN users END)) d on (a.site = c.site)
Any insight would be much appreciated!

This works fine on Hive 1.2.1:
drop table if exists t1;
create table t1
as
select 'cnn.com' site, 840 country , 10000 users, 1 country_rank union all
select 'cnn.com' site, 31 country , 4000 users, 3 country_rank union all
select 'cnn.com' site, 556 country , 6000 users, 2 country_rank union all
select 'rt.com' site, 840 country , 200 users, 3 country_rank union all
select 'rt.com' site, 33 country , 6000 users, 2 country_rank union all
select 'rt.com' site, 400 country , 10000 users, 1 country_rank;
select site, country_1, country_1_share,country_2,country_2_share
from (
select site
,max(CASE WHEN country_rank = 1 THEN country END) AS country_1
,max(CASE WHEN country_rank = 1 THEN users END) as country_1_share
,max(CASE WHEN country_rank = 2 THEN country END) AS country_2
,max(CASE WHEN country_rank = 2 THEN users END) as country_2_share
from t1
group by site
)s;
OK
site country_1 country_1_share country_2 country_2_share
cnn.com 840 10000 556 6000
rt.com 400 10000 33 6000

Related

To select data from multiple records in SQL Server having a common ID

I need to select/concat data from 2 tables in SQL Server I'm using Left Join, but the data is returned as multiple records.
Below are the sample tables
Table1
Id Name Age
1 Sk 20
2 Rb 30
Table2
ID Bike Price Table1Id
1 RX 200 1
2 CD 250 1
3 FZ 300 1
4 R1 400 2
The desired output is
ID Name Age Bike1 Price1 Bike2 Price2 Bike3 Price3
1 Sk 20 RX 200 CD 250 FZ 300
2 Rb 30 R1 400 NULL NULL NULL NULL
A sample format of the query I'm using
SELECT A.ID, A.Name, B.Bike, B.Price FROM Table1 A LEFT JOIN Table2 B ON
A.id = B.Table1Id order by A.id
The output I'm getting from the above query is
ID Name Age Bike Price
1 Sk 20 RX 200
1 Sk 20 CD 250
1 Sk 20 FZ 300
2 Rb 30 R1 400
I need the data as one record for a particular ID and not multiple records (As seen in the desired output). Tired using offset, but offset will return only limited result not the entire records.
Any suggestions on how this can be achieved?

If you know the maximum number of bikes per person, you can use conditional aggregation:
SELECT ID, Name,
MAX(CASE WHEN seqnm = 1 THEN Bike END) as bike_1,
MAX(CASE WHEN seqnm = 1 THEN Price END) as price_1,
MAX(CASE WHEN seqnm = 2 THEN Bike END) as bike_2,
MAX(CASE WHEN seqnm = 2 THEN Price END) as price_2,
MAX(CASE WHEN seqnm = 3 THEN Bike END) as bike_3,
MAX(CASE WHEN seqnm = 3 THEN Price END) as price_3
FROM (SELECT A.ID, A.Name, B.Bike, B.Price,
ROW_NUMBER() OVER (PARTITION BY A.id ORDER BY B.Price) as seqnum
FROM Table1 A LEFT JOIN
Table2 B
ON A.id = B.Table1Id
) ab
GROUP BY ID, Name,
ORDER BY id

Trouble ordering GROUP BY, ORDER BY AND JOIN

i'm having trouble ordering a query.
I have this table (AttendanceLog);
ClassID | StudentPin | Status
69 1 YES
8 2 NO
10 2 NO
17 3 NO
43 5 YES
58 6 YES
and this table (Students):
STUDENTPIN | FNAME | LNAME | INTERNATIONAL
1 X X NO
2 X X YES
3 X X NO
4 X X YES
I want to find out the which INTERNATIONAL students (Fname, Lname and StudentPIN) have missed 10 or more classes (attendancelog status being no).
Currently I have this (below) which tells me the studentPIN and the number of classes attended and no attended by each student, however I am unable to join the two tables together.
SELECT
ATTENDANCELOG.studentpin,
SUM(CASE WHEN status = 'YES' THEN 1 ELSE 0 END) AS number_of_yes,
SUM(CASE WHEN status = 'NO' THEN 1 ELSE 0 END) AS number_of_no
FROM attendancelog
GROUP BY ATTENDANCELOG.studentpin
ORDER BY ATTENDANCELOG.studentpin
Thanks!

you could use a join
SELECT
ATTENDANCELOG.studentpin,
Students.FNAME,
Students.LNAME,
SUM(CASE WHEN status = 'YES' THEN 1 ELSE 0 END) AS number_of_yes,
SUM(CASE WHEN status = 'NO' THEN 1 ELSE 0 END) AS number_of_no
FROM attendancelog
INNER JOIN Students ON Students.STUDENTPIN = attendancelog.StudentPin
and INTERNATIONAL='YES'
GROUP BY ATTENDANCELOG.studentpin, Students.FNAME, Students.LNAME
ORDER BY ATTENDANCELOG.studentpin

Join on student pin, put your international = 'YES' filter in the where clause, and filter for more than 10 misses in a having clause. You can also shorten the case expressions a little:
select a.studentpin
, s.fname, s.lname, s.international
, count(case a.status when 'YES' then 1 end) as attended
, count(case a.status when 'NO' then 1 end) as missed
from attendancelog a
join students s on s.studentpin = a.studentpin
where international = 'YES'
group by s.fname, s.lname, s.international, a.studentpin
having count(case a.status when 'NO' then 1 end) > 10
order by s.fname, s.lname, a.studentpin;

SQL query sum of total corresponding rows

I have two tables as below. Caseid from first table is referenced in second table along with accidents. What I am trying to get total different accidents for a case type. Below two tables I documented sample data and expected result.
Table case:
caseId CaseType
1 AB
2 AB
3 AB
4 CD
5 CD
6 DE
Table CaseAccidents:
AccidentId caseID AccidentRating
1 1 High
2 1 High
3 1 Medium
4 1 LOW
5 2 High
6 2 Medium
7 2 LOW
8 5 High
9 5 High
10 5 Medium
11 5 LOW
Result should look like:
CaseType TotalHIghrating TotalMediumRating TotalLOWRating
AB 3 2 2
CD 2 1 1
DE 0 0 0

To get the sum of every rating, you can Use a SUM(CASE WHEN) clause, adding 1 by every record that match the rating.
In your question, you have pointed out that you want to see all distinct CaseType, you can get it by using a RIGHT JOIN, this will include all records of case table.
select case.CaseType,
sum(case when caseAccidents.AccidentRating = 'High' then 1 else 0 end) as TotalHighRating,
sum(case when caseAccidents.AccidentRating = 'Medium' then 1 else 0 end) as TotalMediumRating,
sum(case when caseAccidents.AccidentRating = 'LOW' then 1 else 0 end) as TotalLowRating
from caseAccidents
right join case on case.caseId = caseAccidents.caseID
group by case.CaseType;
+----------+-----------------+-------------------+----------------+
| CaseType | TotalHighRating | TotalMediumRating | TotalLowRating |
+----------+-----------------+-------------------+----------------+
| AB | 3 | 2 | 2 |
+----------+-----------------+-------------------+----------------+
| CD | 2 | 1 | 1 |
+----------+-----------------+-------------------+----------------+
| DE | 0 | 0 | 0 |
+----------+-----------------+-------------------+----------------+
Check it: http://rextester.com/MCGJA9193

Have you use case in a select clause before?
select C.CaseType,
sum(case when CA.AccidentRating = 'High' then 1 else 0 end)
from Case C join CaseAccidents CA on C.CaseId = CA.CaseId
group by C.CaseType

Please see this. Sample query of the table and also that result
create table #case(caseid int,casetype varchar(5))
insert into #case (caseid,casetype)
select 1,'AB' union all
select 2,'AB' union all
select 3,'AB' union all
select 4,'CD' union all
select 5,'CD' union all
select 6,'DE'
create table #CaseAccidents(AccidentId int, CaseId int,AccidentRating varchar(10))
insert into #CaseAccidents(AccidentId, CaseId, AccidentRating)
select 1,1,'High' union all
select 2,1,'High' union all
select 3,1,'Medium' union all
select 4,1,'Low' union all
select 5,2,'High' union all
select 6,2,'Medium' union all
select 7,2,'Low' union all
select 8,5,'High' union all
select 9,5,'High' union all
select 10,5,'Medium' union all
select 11,5,'Low'
My script
select c.casetype,
sum(case when ca.AccidentRating='High' then 1 else 0 end) as TotalHighRating,
sum(case when ca.AccidentRating='Medium' then 1 else 0 end) as TotalMediumRating,
sum(case when ca.AccidentRating='Low' then 1 else 0 end) as TotalLowRating
from #case c
Left join #CaseAccidents ca
on c.Caseid=ca.Caseid
group by c.casetype
Hope This could help!

Another approach using Pivot operator
SELECT casetype,
[High],
[Medium],
[Low]
FROM (SELECT c.casetype,
AccidentRating
FROM case c
LEFT JOIN CaseAccidents ca
ON ca.CaseId = c.caseid)a
PIVOT (Count(AccidentRating)
FOR AccidentRating IN ([High],
[Medium],
[Low]) ) p

Try This code once.
select casetype,
sum(case when ca.AccidentRating='High' then 1 else 0 end ) as TotalHIghrating,
sum(case when ca.AccidentRating='Medium' then 1 else 0 end ) as TotalMediumRating ,
sum(case when ca.AccidentRating='Low' then 1 else 0 end ) as TotalLOWRating
from #case c
left join #CaseAccidents ca on c.caseid=ca.CaseId
group by casetype

PostgreSQL select query 'join multiple rows to one row'

I am new to PostgreSQL. My case is that I have a table where I store my data. The data come from a file as one row and are getting saved in the database as 5 rows. What I want is to make a SELECT statement where it will combine the 5 rows again into one.
e.g.
id id2 id3 year code value
4 1 1 1642 radio 30
4 1 1 1642 tv 56
4 1 1 1642 cable 67
4 1 1 1642 dine 70
I want to have a query where it will return the following:
id id2 id3 year radio tv cable dine
4 1 1 1642 30 56 67 70
The values of the code are becoming columns with values the actual values.
Is this possible?

You could use (SQL Fiddle):
SELECT m.id, m.id2, m.id3, m.year,
SUM(CASE WHEN m.code = 'radio' THEN m.value END) as radio,
SUM(CASE WHEN m.code = 'tv' THEN m.value END) as tv,
SUM(CASE WHEN m.code = 'cable' THEN m.value END) as cable,
SUM(CASE WHEN m.code = 'dine' THEN m.value END) as dine
FROM MyTable m
GROUP BY m.id, m.id2, m.id3, m.year

Convert Column Heading to row data - sql

My SQL Query Print result like
North South West East Central
0 280 0 41 36
But I want it Like
North 0
South 280
West 0
East 41
Central 36
SQL:-
Select Count(Case When Region=1 Then 1 Else Null End)[North],
Count(Case When Region=2 Then 1 Else Null End)[South],
Count(Case When Region=3 Then 1 Else Null End)[West],
Count(Case When Region=4 Then 1 Else Null End)[East],
Count(Case When Region=5 Then 1 Else Null End)[Central]
From ATM Where ATMStatus=0 And Bank=1

USE Group by
SELECT
CASE Region
WHEN 1 THEN 'North'
WHEN 2 THEN 'South'
WHEN 3 THEN 'West'
WHEN 4 THEN 'East'
WHEN 5 THEN 'Central' END AS Region
, COUNT(ID) --or your primary key if it is different
FROM ATM
WHERE ATMStatus = 0 AND Bank = 1
GROUP BY Region

select case Region
when 1 then 'North'
when 2 then 'South'
etc
end, count(*)
From ATM Where ATMStatus=0 And Bank=1
group by Region
Do you have a region table? That would make it simpler, especially if the 0 rows are important.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HiveQL - aggregate multi-row data to single row - hive

Related

To select data from multiple records in SQL Server having a common ID

Trouble ordering GROUP BY, ORDER BY AND JOIN

SQL query sum of total corresponding rows

PostgreSQL select query 'join multiple rows to one row'

Convert Column Heading to row data - sql

Categories

Resources