How to make column and row as header in HiveQL

How to make column and row as header in HiveQL - hive

I've a question here about setting column and row as header in HiveQL. So, my expected output is like this (Table 1):
But what I could do so far is just like this (Table 2):
with this query:
SELECT
sum(case when grade ='A' and class = 'I' then 1 else 0 end) as 'Grade_A_I',
sum(case when grade ='B' and class = 'I' then 1 else 0 end) as 'Grade_B_I',
sum(case when grade ='C' and class = 'I' then 1 else 0 end) as 'Grade_C_I',
sum(case when grade ='A' and class = 'II' then 1 else 0 end) as 'Grade_A_II', etc...
FROM
grade_table
where
.......
I didn't find any references on the Internet to do this, so I'm just wondering is there any ways to achieve the Table 1 instead of Table 2?
Really appreciate any inputs from you all, thanks in advance!

something like this would work?
select
case when class = 'I' then 'Class I'
when class = 'II' then 'Class II' end as class,
sum(case when grade ='A' then 1 else 0 end) as Grade_A,
sum(case when grade ='B' then 1 else 0 end) as Grade_B,
sum(case when grade ='C' then 1 else 0 end) as Grade_C
from
grade_table
group by 1
union all
select 'Total' as class,
sum(case when grade ='A' then 1 else 0 end) as Grade_A,
sum(case when grade ='B' then 1 else 0 end) as Grade_B,
sum(case when grade ='C' then 1 else 0 end) as Grade_C
from
grade_table

Related

Trying to introduce a simple new condition to a JOIN query

I am using SQLite and I had someone help me construct this JOIN query which works quite well, but now I need to add another condition but I am having trouble introducing it to the query without it breaking.
In both tables used in the JOIN there is a column called EventId and I want to introduce the simple condition...
WHERE EventId = 123456
Below you can see a working example of the query itself along with two comments where I have tried to introduce the new condition and failed (because I'm bad at SQL).
SELECT t.MicrosoftId,
SUM(CASE WHEN name = 'necktie' THEN 1 ELSE 0 END) as 'necktie',
SUM(CASE WHEN name = 'shirt' THEN 1 ELSE 0 END) as 'shirt',
SUM(CASE WHEN name = 'suit' THEN 1 ELSE 0 END) as 'suit',
SUM(CASE WHEN name = 'man' THEN 1 ELSE 0 END) as 'man',
SUM(CASE WHEN name = 'male' THEN 1 ELSE 0 END) as 'male'
FROM TagsMSCV t
/* <---- WHERE t.EventId = 123456 (fails here...) */
LEFT JOIN
(SELECT i.MicrosoftId
FROM Images i
GROUP BY i.MicrosoftId) i
ON i.MicrosoftId = t.MicrosoftId
WHERE t.name IN ('necktie','shirt','suit','man','male')
/* <---- AND WHERE t.EventId = 123456 (fails here too...) */
GROUP BY t.MicrosoftId

try like below
select t1.* from ( SELECT t.MicrosoftId,
SUM(CASE WHEN name = 'necktie' THEN 1 ELSE 0 END) as 'necktie',
SUM(CASE WHEN name = 'shirt' THEN 1 ELSE 0 END) as 'shirt',
SUM(CASE WHEN name = 'suit' THEN 1 ELSE 0 END) as 'suit',
SUM(CASE WHEN name = 'man' THEN 1 ELSE 0 END) as 'man',
SUM(CASE WHEN name = 'male' THEN 1 ELSE 0 END) as 'male'
FROM TagsMSCV t WHERE t.EventId = 123456
and name IN ('necktie','shirt','suit','man','male') group by t.MicrosoftId
) t1
You did mistake to create subquery and as 2nd subquery no need group by as there no aggregate function used

It should be in WHERE section, but without second WHERE keyword:
SELECT t.MicrosoftId,
SUM(CASE WHEN name = 'necktie' THEN 1 ELSE 0 END) as 'necktie',
SUM(CASE WHEN name = 'shirt' THEN 1 ELSE 0 END) as 'shirt',
SUM(CASE WHEN name = 'suit' THEN 1 ELSE 0 END) as 'suit',
SUM(CASE WHEN name = 'man' THEN 1 ELSE 0 END) as 'man',
SUM(CASE WHEN name = 'male' THEN 1 ELSE 0 END) as 'male'
FROM TagsMSCV t
LEFT JOIN
(SELECT i.MicrosoftId
FROM Images i
GROUP BY i.MicrosoftId) i
ON i.MicrosoftId = t.MicrosoftId
WHERE t.name IN ('necktie','shirt','suit','man','male')
AND t.EventId = 123456
GROUP BY t.MicrosoftId

Exclude records that have sum greater than 1

I have query returning details of customers that are subscribed to channel xyz or all other channels.
To generate this results i am using the following query:
select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
So my Question is, how do i Exclude customers that are subscribed to 2 channels, where one is xyz and one is another channel.

select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
group by customerID
having sum(case when channel= 'xyz' then 1 else 0 end) > 0
and sum(case when channel<>'xyz' then 1 else 0 end) > 0

First, your query is not correct. It needs a group by. Second, you can do what you want using having:
select customerID,
sum(case when channel = 'xyz' then 1 else 0 end) as xyz_Count,
sum(case when channel<>'xyz' then bundle_qty else 0 end) as Other
From temptable
group by customerID
having count(*) = 2 and
sum(case when channel = 'xyz' then 1 else 0 end) = 1;
If customers can subscribe to the same channel multiple times, and you still want only "xyz" and another channel, then:
having count(distinct channel) = 2 and
(min(channel) = 'xyz' or max(channel) = 'xyz')

How to do mathematical comparision with SQL CASE statement

I have a student table around 100k records and I have two types of data in it: student name and level type with selection values primary, secondary, intermediate & university
I want to filter out the student from this table, whose have count > 0, in all level primary, secondary, intermediate & university
I was able to find the sum for each student in each level using the following query
SELECT
student_id,
SUM(CASE WHEN lev_type = 'primary' THEN 1 ELSE 0 END) AS primary,
SUM(CASE WHEN lev_type = 'secondary' THEN 1 ELSE 0 END) AS secondary,
SUM(CASE WHEN lev_type = 'intermediate' THEN 1 ELSE 0 END) AS intermediate,
SUM(CASE WHEN level_type = 'university' THEN 1 ELSE 0 END) AS university
FROM
student_details
GROUP BY
student_id
and I am getting a result like (note that my result is 92242 row(s))
attendee_id primary secondary intermediate uni
student1 0 1 1 2
student2 0 1 1 0
student3 88 209 92 32
student4 0 1 1 0
student5 0 1 1 0
How to filter out student3 from this result?

You can simply add a where statement as follows:
SELECT student_id,
SUM(case when lev_type = 'primary' then 1 else 0 end) as primary,
SUM(case when lev_type = 'secondary' then 1 else 0 end) as secondary ,
SUM(case when lev_type = 'intermediate' then 1 else 0 end) as intermediate ,
SUM(case when lev_type = 'university' then 1 else 0 end) as university
FROM student_details
GROUP BY student_id
WHERE primary = 0 OR secondary = 0 OR intermediate = 0 OR university = 0

HAVING might get you what you want. For example:
SELECT student_id,
sum(case when lev_type = 'primary' then 1 else 0 end) as primary,
sum(case when lev_type = 'secondary' then 1 else 0 end) as secondary ,
sum(case when lev_type = 'intermediate' then 1 else 0 end) as intermediate ,
sum(case when level_type = 'university' then 1 else 0 end) as university
from student_details
group by student_id
-- Put the criteria here by which you want to filter
having sum(case when lev_type = 'primary' then 1 else 0 end) = 0
and sum(case when lev_type = 'secondary' then 1 else 0 end) = 0
and sum(case when lev_type = 'intermediate' then 1 else 0 end) = 0
and sum(case when level_type = 'university' then 1 else 0 end) = 0

How do you create a conditional count across multiple fields?

I have the huge table of over 100 million rows of data which is joined to another reference table that I want to create a conditional count for.
The first table is the large one which is an audit log and contains data which lists data on countries and contains a date of audit.
The second table is a smaller table which contains relational data to the audit log.
The first part is the easy bit which is to identify which audit data I want to see. I have the following code to identify this:
select aud.*
from audit_log aud
join database db on db.id=aud.release_id
where aud.event_description like '% opted in'
and r.creation_source = 'system_a'
This gives me the data in the following format:
Country Event Description Audit Date
Czech Republic Czech Republic has been automatically opted in 11-AUG-14 07.01.52.606000000
Denmark Denmark has been automatically opted in 12-AUG-15 07.01.53.239000000
Denmark Denmark has been automatically opted in 11-SEP-15 07.01.53.902000000
Dominican Republic Dominican Republic has been automatically opted in 11-SEP-15 07.01.54.187000000
Ecuador Ecuador has been automatically opted in 11-DEC-14 07.01.54.427000000
Ecuador Ecuador has been automatically opted in 11-NOV-14 07.01.54.679000000
The number of results from this query still returns over 5 million rows so I cannot export the data to Excel to create a count.
My two main issues are the number of rows and the date format of the 'Audit Date' field.
Ideally I want to create a count which shows the data as:
Country |Aug-14|Nov-14|Dec-14|Aug-15|Sep-15
Czech Republic | 1 | | | |
Denmark | | | | 1 | 1
Dominican Republic | | | | | 1
Ecuador | | 1 | 1 | |
Any idea's on how I extract the month and year and drop the figures into column by country?
Thanks
Edit - Thank you xQbert for you solution, it worked perfectly!
The problem now is that I have run into a new problem.
I need to constrain the count by another query, but there is no unique identifier between the tables involved.
For example, I amended your query to fit my db:
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
where aud.event_description like '% opted in'
and r.creation_source = 'DSCHED'
GROUP BY cty.COUNTRY_name
My second query is:
select *
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED'
The problem is that I have many countries which can relate to one ID (Release_ID) but there is no unique identifier between the tables on a country level. Each country has an ID though.
So for query 1, to identify each unique row I would need the 'aud.Release_ID' and the 'aud.Event_country_id' and for query 2 to achieve the same I would need to use the 'rcr.Release_ID' and 'rcr.country_id'.
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
where aud.event_description like '% opted in'
and ***** in (select ******
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED')
GROUP BY cty.COUNTRY_name
The bit I am stuck at are the two parts which are indicated by '*****' as the join criteria is two fields.
Any ideas?

Quick and dirty, not dynamic floating based on a 12 month cylce or anything...
select country,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from audit_log aud
join database db on db.id=aud.release_id
where aud.event_description like '% opted in'
and r.creation_source = 'system_a'
GROUP BY COUNTRY
Ideally we'd simply use a Pivot statement or base it on earliest date in range and go on... Such as found in this prior stack article Dynamic pivot in oracle sql
update based on changing requirements you do know you can join on multiple criteria right? :P
Note we created an inline view with your second query alias it as z table name and then add the two columns desired to match on as part of the results. Then we join it as if it were a table!
select cty.country_name,
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2014' then 1 else 0 end) as "AUG-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2014' then 1 else 0 end) as "SEP-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='OCT-2014' then 1 else 0 end) as "OCT-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='NOV-2014' then 1 else 0 end) as "NOV-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='DEC-2014' then 1 else 0 end) as "DEC-14",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JAN-2015' then 1 else 0 end) as "JAN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='FEB-2015' then 1 else 0 end) as "FEB-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAR-2015' then 1 else 0 end) as "MAR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='APR-2015' then 1 else 0 end) as "APR-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='MAY-2015' then 1 else 0 end) as "MAY-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUN-2015' then 1 else 0 end) as "JUN-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='JUL-2015' then 1 else 0 end) as "JUL-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='AUG-2015' then 1 else 0 end) as "AUG-15",
SUM(CASE WHEN to_char(Audit_Date,'MON-YYYY') ='SEP-2015' then 1 else 0 end) as "SEP-15"
from dschd.audit_trail aud
join dschd.release r on r.id=aud.release_id
join dschd.country cty on aud.EVENT_COUNTRY_ID=cty.id
join (select Release_ID, country_id
from DSCHD.RELEASE_COUNTRY_RIGHT rcr
join dschd.release r on rcr.RELEASE_ID=r.ID
join dschd.country cty on rcr.COUNTRY_ID=cty.id
where r.release_status in ('DRAFT', 'SCHEDULED', 'FINAL', 'DELIVERED')
and r.is_active = 'Y'
and rcr.MARKETING_RIGHT = 'Y'
and rcr.OPT_OUT = 'N'
and r.creation_source = 'DSCHED') Z
ON aud.Release_ID = z.Realease_ID and
aud.Event_country_id = z.country_id
where aud.event_description like '% opted in'
GROUP BY cty.COUNTRY_name

Sum data for many different results for same field

I am trying to find a better way to write this sql server code 2008. It works and data is accurate. Reason i ask is that i will be asked to do this for several other reports going forward and want to reduce the amount of code to upkeep going forward.
How can i take a field where i sum for the yes/no/- (dash) in each field without doing an individual sum as i have in code. Each table is a month of detail data which i sum using in a CTE. i changed the table name for each month and Union All to put data together. Is there a better way to do this. This is a small sample of code. Thanks for the help.
WITH H AS (
SELECT 'August' AS Month_Name
, SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash
, SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes
, SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No
, SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash
, SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes
, SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM table08 G )
, G AS (
SELECT 'July' AS Month_Name
, SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash
, SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes
, SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No
, SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash
, SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes
, SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM table07 G )
select * from H
UNION ALL
select * from G

How about:
SELECT Month_Name,
SUM(CASE WHEN G.FFS = '-' THEN 1 ELSE 0 END) AS FFS_Dash,
SUM(CASE WHEN G.FFS = 'Yes' THEN 1 ELSE 0 END) AS FFS_Yes,
SUM(CASE WHEN G.FFS = 'No' THEN 1 ELSE 0 END) AS FFS_No,
SUM(CASE WHEN G.DNA = '-' THEN 1 ELSE 0 END) AS DNA_Dash,
SUM(CASE WHEN G.DNA = 'Yes' THEN 1 ELSE 0 END) AS DNA_Yes,
SUM(CASE WHEN G.DNA = 'No' THEN 1 ELSE 0 END) AS DNA_No
FROM ((select 'July' as Month_Name, G.*
from table07 G
) union all
(select 'August', H.*
from table08 H
)
) gh
GROUP BY Month_Name;
However, having tables with the same structure is usually a sign of poor database design. You should have a single table with a column representing the month.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to make column and row as header in HiveQL - hive

Related

Trying to introduce a simple new condition to a JOIN query

Exclude records that have sum greater than 1

How to do mathematical comparision with SQL CASE statement

How do you create a conditional count across multiple fields?

Sum data for many different results for same field

Categories

Resources