HIVE Pivot and count - hive

I have a table that I am trying to figure out how to pivot and count based on these.
This example may not be very suitable, but the result is exactly what I want.
Example input:
name |chinese|math|english
tom |A |A |B
tom |B |A |C
peter|B |C |C
peter|A |B |C
Example output:
name |object |A|B|C
tom |chinese|1|1|0
tom |math |2|0|0
tom |english|0|1|1
peter|chinese|1|1|0
peter|math |0|1|1
peter|english|0|0|2

Use UNION ALL with aggregation.
Demo:
with your_table as (
select stack(4,
'tom' ,'A','A','B',
'tom' ,'B','A','C',
'peter','B','C','C',
'peter','A','B','C'
) as (name,chinese,math,english)
)
select name, 'chinese' as object,
count(case when chinese='A' then 1 end) as A,
count(case when chinese='B' then 1 end) as B,
count(case when chinese='C' then 1 end) as C
from your_table
group by name
UNION ALL
select name, 'math' as object,
count(case when math='A' then 1 end) as A,
count(case when math='B' then 1 end) as B,
count(case when math='C' then 1 end) as C
from your_table
group by name
UNION ALL
select name, 'english' as object,
count(case when english='A' then 1 end) as A,
count(case when english='B' then 1 end) as B,
count(case when english='C' then 1 end) as C
from your_table
group by name;
Result:
name object a b c
peter chinese 1 1 0
tom chinese 1 1 0
peter math 0 1 1
tom math 2 0 0
peter english 0 0 2
tom english 0 1 1

Related

Distinct Conditional Counting to Avoid Overlap

Consider this table:
[Table1]
------------------------
| Person_ID | Yes | No |
|-----------|-----|----|
| 1 | 1 | 0 |
|-----------|-----|----|
| 1 | 1 | 0 |
|-----------|-----|----|
| 2 | 0 | 1 |
|-----------|-----|----|
| 2 | 0 | 1 |
|-----------|-----|----|
| 3 | 1 | 0 |
|-----------|-----|----|
| 3 | 1 | 0 |
|-----------|-----|----|
| 3 | 0 | 1 |
|-----------|-----|----|
| 3 | 1 | 0 |
------------------------
I need a distinct count on Person_ID to get the number of people that are marked Yes and No. However, if someone has a single instance of No, they should be counted as a No and not be included in the Yes count no matter how many Yes they have.
My first thought was to try something similar to:
select count(distinct (case when Yes = 1 then Person_ID else null end)) Yes_People
, count(distinct (case when No = 1 then Person_ID else null end)) No_People
from Table1
but this will result in 3 being counted in both the Yes and No counts.
My desired output would be:
--------------------------
| Yes_People | No_People |
|------------|-----------|
| 1 | 2 |
--------------------------
I'm hoping to avoid the performance hit from having to evaluate a subquery against each row but if it has to be the way to go I will accept that.
Aggregate first at the person level and then overall:
select sum(yes_only) as yes_only,
sum(1 - yes_only) as no
from (select person_id,
(case when max(yes) = min(yes) and max(yes) = 1
then 1
end) as yes_only
from t
group by person_id
) t
You can first group them by the person.
Then the CASE for the Yes people can have a not No condition.
SELECT
COUNT(CASE WHEN No = 0 AND Yes = 1 THEN Person_ID END) AS Yes_People,
COUNT(CASE WHEN No = 1 THEN Person_ID END) AS No_People
FROM
(
select Person_ID
, MAX(Yes) as Yes
, MAX(No) as No
FROM Table1
GROUP BY Person_ID
) q
You could use a window function to rank the rows for a single person_id to prioritize a 'No' over a 'Yes', but that will require a subquery
select count(case when yes=1 then 1 end) as yes_count,
count(case when no=1 then no_count) as no_count
from (
select person_id, yes, no, row_number() over (order by no desc, yes desc) as rn
from table1
)
where rn = 1
The inner subquery plus the where filter will get you a single row per person_id, giving priority to the 'no' records.
This of course assumes yes/no are mutually exclusive, and if that's true, you should probably change the model to a single field.
Think you need to precheck every person with a window function
with t as (select 1 p_id, 1 yes, 0 no from dual
union all select 1 p_id, 1 yes, 0 no from dual
union all select 2 p_id, 0 yes, 1 no from dual
union all select 2 p_id, 0 yes, 1 no from dual
union all select 3 p_id, 1 yes, 0 no from dual
union all select 3 p_id, 0 yes, 1 no from dual
union all select 3 p_id, 1 yes, 0 no from dual)
, chk as (
select max(no) over (partition by p_id) n
, max(yes) over (partition by p_id) y
, p_id
from t)
-- select * from chk;
select count(distinct decode(y-n,1,p_id,null )) yes_people
, count(distinct decode(n,1,p_id,null )) no_people
from chk
group by 1;
You can use Conditional aggregation as following:
SQL> with table1 as (select 1 PERSON_ID, 1 yes, 0 no from dual
2 union all select 1 PERSON_ID, 1 yes, 0 no from dual
3 union all select 2 PERSON_ID, 0 yes, 1 no from dual
4 union all select 2 PERSON_ID, 0 yes, 1 no from dual
5 union all select 3 PERSON_ID, 1 yes, 0 no from dual
6 union all select 3 PERSON_ID, 0 yes, 1 no from dual
7 union all select 3 PERSON_ID, 1 yes, 0 no from dual)
8 SELECT
9 SUM(CASE WHEN NOS = 0 AND YES > 0 THEN 1 END) YES_PEOPLE,
10 SUM(CASE WHEN NOS > 0 THEN 1 END) NO_PEOPLE
11 FROM
12 (
13 SELECT
14 SUM(NO) NOS,
15 PERSON_ID,
16 SUM(YES) YES
17 FROM TABLE1
18 GROUP BY PERSON_ID
19 );
YES_PEOPLE NO_PEOPLE
---------- ----------
1 2
SQL>
Cheers!!

How to select data if same data from column will get data with condition

I want select data with condition
I have a table
Id|Name |Age
1 |David |1
2 |Aic |2
3 |Owen |2
4 |Aic |3
5 |Phuc |3
6 |Aic |4
7 |Ronaldo |4
8 |Ronaldo |5
9 |Ronaldo |6
How i can query if 2 record have same age like 2 it will select only record have name is "aic" else it get all if not same age will get all like that
Id|Name |Age
1 |David |1
2 |Aic |2
4 |Aic |3
6 |Aic |4
8 |Ronaldo |5
9 |Ronaldo |6
You can try using row_number()
DEMO
select id,name,age from
(
select id,name,age,row_number() over(partition by age order by name) as rn
from tablename
)A where rn=1
OUTPUT:
d name age
1 David 1
2 Aic 2
4 Aic 3
6 Aic 4
8 Ronaldo 5
9 Ronaldo 6
**You can use this ** row_number() and test1 is your Table name
select id,name,age from ( select id,name,age,row_number()
over(partition by age order by name) as rownumber from test1 )A where rownumber=1
One simple method is:
select t.*
from t
where t.name = 'Aic'
union all
select t.*
from t
where t.name <> 'Aic' and
not exists (select 1 from t t2 where t2.age = t.age and t2.name = 'Aic');
This selects all "Aic"s and then all other ages with no Aic. This can actually be combined into a single query:
select t.*
from t
where t.name = 'Aic' or
(t.name <> 'Aic' and
not exists (select 1 from t t2 where t2.age = t.age and t2.name = 'Aic')
);
If you want to use window functions, I might suggest:
select t.*
from (select t.*,
sum(case when name = 'Aic' then 1 else 0 end) over (partition by age) as num_aic
from t
) t
where name = 'Aic' or num_aic = 0
As you changed you question so use subquery
select * from table where age in ( select age from table where name ='Aic')
As you again change your question so it would be
with cte as (
select *,
row_number() over(partition by age order by name) as rn
from table
) select * from cte where rn=1

SQL query sum of total corresponding rows

I have two tables as below. Caseid from first table is referenced in second table along with accidents. What I am trying to get total different accidents for a case type. Below two tables I documented sample data and expected result.
Table case:
caseId CaseType
1 AB
2 AB
3 AB
4 CD
5 CD
6 DE
Table CaseAccidents:
AccidentId caseID AccidentRating
1 1 High
2 1 High
3 1 Medium
4 1 LOW
5 2 High
6 2 Medium
7 2 LOW
8 5 High
9 5 High
10 5 Medium
11 5 LOW
Result should look like:
CaseType TotalHIghrating TotalMediumRating TotalLOWRating
AB 3 2 2
CD 2 1 1
DE 0 0 0
To get the sum of every rating, you can Use a SUM(CASE WHEN) clause, adding 1 by every record that match the rating.
In your question, you have pointed out that you want to see all distinct CaseType, you can get it by using a RIGHT JOIN, this will include all records of case table.
select case.CaseType,
sum(case when caseAccidents.AccidentRating = 'High' then 1 else 0 end) as TotalHighRating,
sum(case when caseAccidents.AccidentRating = 'Medium' then 1 else 0 end) as TotalMediumRating,
sum(case when caseAccidents.AccidentRating = 'LOW' then 1 else 0 end) as TotalLowRating
from caseAccidents
right join case on case.caseId = caseAccidents.caseID
group by case.CaseType;
+----------+-----------------+-------------------+----------------+
| CaseType | TotalHighRating | TotalMediumRating | TotalLowRating |
+----------+-----------------+-------------------+----------------+
| AB | 3 | 2 | 2 |
+----------+-----------------+-------------------+----------------+
| CD | 2 | 1 | 1 |
+----------+-----------------+-------------------+----------------+
| DE | 0 | 0 | 0 |
+----------+-----------------+-------------------+----------------+
Check it: http://rextester.com/MCGJA9193
Have you use case in a select clause before?
select C.CaseType,
sum(case when CA.AccidentRating = 'High' then 1 else 0 end)
from Case C join CaseAccidents CA on C.CaseId = CA.CaseId
group by C.CaseType
Please see this. Sample query of the table and also that result
create table #case(caseid int,casetype varchar(5))
insert into #case (caseid,casetype)
select 1,'AB' union all
select 2,'AB' union all
select 3,'AB' union all
select 4,'CD' union all
select 5,'CD' union all
select 6,'DE'
create table #CaseAccidents(AccidentId int, CaseId int,AccidentRating varchar(10))
insert into #CaseAccidents(AccidentId, CaseId, AccidentRating)
select 1,1,'High' union all
select 2,1,'High' union all
select 3,1,'Medium' union all
select 4,1,'Low' union all
select 5,2,'High' union all
select 6,2,'Medium' union all
select 7,2,'Low' union all
select 8,5,'High' union all
select 9,5,'High' union all
select 10,5,'Medium' union all
select 11,5,'Low'
My script
select c.casetype,
sum(case when ca.AccidentRating='High' then 1 else 0 end) as TotalHighRating,
sum(case when ca.AccidentRating='Medium' then 1 else 0 end) as TotalMediumRating,
sum(case when ca.AccidentRating='Low' then 1 else 0 end) as TotalLowRating
from #case c
Left join #CaseAccidents ca
on c.Caseid=ca.Caseid
group by c.casetype
Hope This could help!
Another approach using Pivot operator
SELECT casetype,
[High],
[Medium],
[Low]
FROM (SELECT c.casetype,
AccidentRating
FROM case c
LEFT JOIN CaseAccidents ca
ON ca.CaseId = c.caseid)a
PIVOT (Count(AccidentRating)
FOR AccidentRating IN ([High],
[Medium],
[Low]) ) p
Try This code once.
select casetype,
sum(case when ca.AccidentRating='High' then 1 else 0 end ) as TotalHIghrating,
sum(case when ca.AccidentRating='Medium' then 1 else 0 end ) as TotalMediumRating ,
sum(case when ca.AccidentRating='Low' then 1 else 0 end ) as TotalLOWRating
from #case c
left join #CaseAccidents ca on c.caseid=ca.CaseId
group by casetype

sql to sum rows of booleans into rows

I'm trying to work out if it's possible to do the following transformation in SQL:
+--------+--------+--------+
|POLICY_1|POLICY_2|POLICY_3|
+--------+--------+--------+
|T |T |F |
+--------+--------+--------+
|F |T |F |
+--------+--------+--------+
|T |T |T |
+--------+--------+--------+
Is it possible to query this table and end up with a result set that looks like:
+------+-----+
|POLICY|COUNT|
+------+-----+
|1 |2 |
+------+-----+
|2 |3 |
+------+-----+
|3 |1 |
+------+-----+
I'm wondering in general SQL terms, but incase it matters I'm using Postgres (9.2)
Union All, Aggregate and CASE Version
select 1 as POLICY, SUM(case when POLICY_1 = 'T' THEN 1 ELSE 0 end) as COUNT
from POLICIES
union all
select 2 as POLICY, SUM(case when POLICY_2 = 'T' THEN 1 ELSE 0 end) as COUNT
from POLICIES
union all
select 3 as POLICY, SUM(case when POLICY_3 = 'T' THEN 1 ELSE 0 end) as COUNT
from POLICIES
*Unpivot Version: *(MicroSoft T-SQL specific)
If you insist, to convert a row to columns, you can use PIVOT/UNPIVOT functionality.
SELECT ROW_NUMBER() OVER (ORDER BY PolicyName) AS Row, *
FROM ( select SUM(CASE WHEN Policy_1 = 'T' THEN 1 ELSE 0 END) as Policy_1,
SUM(CASE WHEN Policy_2 = 'T' THEN 1 ELSE 0 END) as Policy_2,
SUM(CASE WHEN Policy_3 = 'T' THEN 1 ELSE 0 END) as Policy_3
from POLICIES
)p
UNPIVOT ( T_Count FOR PolicyName in ([Policy_1], [Policy_2], [Policy_3]))unpvt
Unnest version ( postgre sql specific )All credits go to Francis, topic caster, i just post it here.
select
UNNEST((select array_agg(generate_series)
from generate_series(1,3))) as policy_name,
UNNEST(array[
sum(case when policy_1 = 't' then 1 else 0 end),
sum(case when policy_2 = 't' then 1 else 0 end),
sum(case when policy_3 = 't' then 1 else 0 end)
]) as count from POLICY
As suggested,
here is the UNNEST version for Postgresql:
select
UNNEST((select array_agg(generate_series) from generate_series(1,3))) as policy_name,
UNNEST(array[
sum(case when policy_1 = 't' then 1 else 0 end),
sum(case when policy_2 = 't' then 1 else 0 end),
sum(case when policy_3 = 't' then 1 else 0 end)
]) as count from POLICY

How to group sums by weekday in MySQL?

I have a table like this:
id | timestamp | type
-----------------------
1 2010-11-20 A
2 2010-11-20 A
3 2010-11-20 B
4 2010-11-21 A
5 2010-11-21 C
6 2010-11-27 B
and I need to count the rows for each type, grouped by weekday; like this:
weekday | A | B | C
--------------------------
5 2 2 0 -- the B column equals 2 because nov 20 and nov 27 are saturday
6 1 0 1
What would be the simplest solution for this?
I don't mind using views, variables, subqueries, etc.
Use:
SELECT WEEKDAY(t.timestamp) AS weekday,
SUM(CASE WHEN t.type = 'A' THEN 1 ELSE 0 END) AS a,
SUM(CASE WHEN t.type = 'B' THEN 1 ELSE 0 END) AS b,
SUM(CASE WHEN t.type = 'C' THEN 1 ELSE 0 END) AS c
FROM YOUR_TABLE t
GROUP BY WEEKDAY(t.timestamp)