How to separate column values into diffenerent column using hive - hive

For input:
name year run
1. a 2008 4
2. a 2009 3
3. a 2008 4
4. b 2009 8
5. b 2008 5
Output in hive:
name 2008 2009
1. a 8 3
2. b 5 8

For fixed years:
select name,
max(case when year=2008 then run end) as year_2008,
max(case when year=2009 then run end) as year_2009,
... and so on
from my_table
group by name;
It is not possible in Hive to generate such columns dynamically, but you can select distinct years first and generate this SQL using shell.

As per my understanding, you need some of runs per year pivoted into year columns
You need sum function, not max
select
sum(case when year=2008 then run else 0 end) 2008_run,
sum(case when year=2009 then run else 0 end) 2009_run,
from table t1
group by name;
To find out top 5 run scorer in each year.
with table1 as
(
select name, sum(runs) as RunsPerYear, year from myTable group by name, year
)
table2 as
(
select name, year, RunsPerYear, dense_rank() over (partition by name, year order by RunsPerYear) as rnk from table2
)
select name, year, RunsPerYear from table2 where rnk<=5;

Related

SQL: Add number of unique values in a group

I have a table such as:
date
id
value
2020/4/4
1
a
2020/4/4
1
a
2020/4/4
1
b
2020/4/4
2
t
2020/4/4
2
u
2020/5/4
3
u
I want to find out how many IDs have more than one unique value at a particular date.
So this is what I should get for the table from above:
2020/4/4: 1 (=> only ID=1 has more than one unique value (a+b))
2020/4/5: 0
I tried to get it with:
SELECT date, SUM(CASE WHEN COUNT(DISTINCT value)>1 THEN 1 ELSE 0 END)
FROM table
GROUP BY date, id
But it did not work. How do I do it right?
Some databases will let you count "tuples", allowing this...
SELECT
date,
CASE WHEN COUNT(DISTINCT (id, value)) > COUNT(*) THEN 1 ELSE 0 END)
FROM
table
GROUP BY
date
Otherwise your style of approach works, but you need to aggregate twice using sub-queries.
SELECT date, MAX(has_duplicate_values)
FROM
(
SELECT date, id, CASE WHEN COUNT(DISTINCT value) > COUNT(*) THEN 1 ELSE 0 END has_duplicate_values
FROM table
GROUP BY date, id
)
AS date_id_aggregate
GROUP BY date
modify your request as following:
SELECT date, CASE WHEN COUNT(DISTINCT value)>1 THEN 1 ELSE 0 END FROM `table` GROUP BY date
I want to find out how many IDs have more than one unique value at a particular date.
You can aggregate twice:
SELECT date,
SUM(CASE WHEN num > num_distinct_values THEN 1 ELSE 0 END) as num_ids_with_duplicates
FROM (SELECT date, id,
COUNT(DISTINCT value) as num_distinct_values,
COUNT(*) as num
FROM table
GROUP BY date, id
) di
GROUP BY date;

How to display row data as column

I want to display row values as column values. and also display the final total value at end of the table.
To do that I'm using the below data set.
I want to set this data in the column side.
I used this SQL query to do that. But I don't know how to get Hours Total Column
select *
from
(select EMP_NO,SUM(Hours) total
from Employee_Attendence
group by EMP_NO)
pivot
(sum(total)
for WAGE_Type in ('Absence', 'Normal'))
Final output should display as below.
Select EMP_NO, Absence, Normal, Total
From
(select *
from
(select EMP_NO, sum(Hours) total
from Employee_Attendence
group by EMP_NO)
pivot
(sum(total)
for WAGE_Type in ('Absence', 'Normal'))
)
A simple conditional aggregation should do the trick
Example
Select Emp_ID
,Absence = sum(case when Wage_Type ='Absence' then Hours else 0 end)
,Normal = sum(case when Wage_Type ='Normal' then Hours else 0 end)
,Total = sum(Hours)
From YourTable
Group By Emp_ID
Results
Emp_ID Absence Normal Total
4000 8 32 40
EDIT - If you'd rather a PIVOT
Select Emp_ID
,Absence
,Normal
,Total = Absence + Normal
From YourTable
Pivot (sum( Hours ) for Wage_Type in ([Absence],[Normal] ) ) pvt

New table with row counts

I have 5 different tables stored in Hive, and I would like to know how to create a new table, called total_counts which has 5 columns each with the total row count from the individual tables. So, something like
My data is road flights for each year from 2015 to 2019, so I would like a table which just gives me the total number of accidents in each year.
I have tried variations of the following:
create table total_counts
as select COUNT(*)
from flights_2014 as "2014_count", flights_2015 as "2015_count;
I can get the counts for an individual year, but I can't seem to give the columns a heading, nor can I figure out how to do it for all my tables.
Thanks.
Calculate counts in sub-queries and do cross joins if you want to store data in columns
CREATE TABLE total_counts AS
SELECT 2015_count.cnt as 2015_count, 2016_count.cnt as 2016_count, ...
FROM (SELECT COUNT(*) cnt FROM flights_2015) AS 2015_count
CROSS JOIN
(SELECT COUNT(*) cnt FROM flights_2016) AS 2016_count
...
Or the same using UNION ALL + aggregation:
CREATE TABLE total_counts AS
SELECT max(case when yr=2015 then cnt else 0 end) 2015_count,
max(case when yr=2016 then cnt else 0 end) 2016_count,
...
FROM (
SELECT COUNT(*) cnt, 2015 yr FROM flights_2015
UNION ALL
SELECT COUNT(*) cnt, 2016 yr FROM flights_2016
...
) u
CREATE TABLE total_counts AS
SELECT
(SELECT COUNT(*) FROM flights_2015) AS 2015_count,
(SELECT COUNT(*) FROM flights_2016) AS 2016_count;
etc.

Pull records with unique flags in an efficient way

From the table below, I want to write a query that extracts the records where the flag first occurs. As an example, from the table below, I would want to pull the Nov 8 record, Dec 6 record, and Jan 10 record into a separate table. Any thoughts on how to best approach this? I'm not tied to having the flag column being a count - ideally it could be binary, but I'm not sure... the flag column is computed and not part of the raw data.
Date Location KPI Flag
11/8/2017 A 5 1
11/15/2017 A 5 1
11/22/2017 A 5 1
11/29/2017 A 5 1
12/6/2017 A 10 2
12/13/2017 A 10 2
12/20/2017 A 10 2
12/27/2017 A 10 2
1/3/2018 A 10 2
1/10/2018 A 15 3
1/17/2018 A 15 3
1/24/2018 A 15 3
Often the fastest method is a correlated subquery:
select t.*
from t
where t.date = (select min(t2.date)
from t t2
where t2.location = t.location and
t2.kpi = t.kpi
);
In particular, this can make use of an index on (location, kpi, date).
That said, if you want the rows where kpi changes, then you might want lag():
select t.*
from (select t.*,
lag(kpi) over (partition by location order by date) as prev_kpi
from t
) t
where prev_kpi is null or prev_kpi <> kpi;
In particular, this will allow kpi values to repeat at different times -- and you will get one for each group of adjacent values.
You can use PARTITION BY along with ROW_NUMBER() ,
Below query works fine with you data :
SELECT [Date], [Flag] FROM (
SELECT [Date], [Flag], ROW_NUMBER() OVER
( PARTITION BY [Flag]
ORDER BY [Date]) row_num
FROM #test) t
WHERE t.row_num = 1
So far what I understand is to exact the most older date from each of the flag categories.
select * from (
select
Date,
Location,
KPI,
Flag
row_number() over(partition by Flag order by Date asc) as RN
from
Your_Table
) t
where t.RN = 1
This solution is using partition to get the expected data.

SUM of grouped COUNT in SQL Query

I have a table with 2 fields:
ID Name
-- -------
1 Alpha
2 Beta
3 Beta
4 Beta
5 Charlie
6 Charlie
I want to group them by name, with 'count', and a row 'SUM'
Name Count
------- -----
Alpha 1
Beta 3
Charlie 2
SUM 6
How would I write a query to add SUM row below the table?
SELECT name, COUNT(name) AS count
FROM table
GROUP BY name
UNION ALL
SELECT 'SUM' name, COUNT(name)
FROM table
OUTPUT:
name count
-------------------------------------------------- -----------
alpha 1
beta 3
Charlie 2
SUM 6
SELECT name, COUNT(name) AS count, SUM(COUNT(name)) OVER() AS total_count
FROM Table GROUP BY name
Without specifying which rdbms you are using
Have a look at this demo
SQL Fiddle DEMO
SELECT Name, COUNT(1) as Cnt
FROM Table1
GROUP BY Name
UNION ALL
SELECT 'SUM' Name, COUNT(1)
FROM Table1
That said, I would recomend that the total be added by your presentation layer, and not by the database.
This is a bit more of a SQL SERVER Version using Summarizing Data Using ROLLUP
SQL Fiddle DEMO
SELECT CASE WHEN (GROUPING(NAME) = 1) THEN 'SUM'
ELSE ISNULL(NAME, 'UNKNOWN')
END Name,
COUNT(1) as Cnt
FROM Table1
GROUP BY NAME
WITH ROLLUP
Try this:
SELECT ISNULL(Name,'SUM'), count(*) as Count
FROM table_name
Group By Name
WITH ROLLUP
all of the solution here are great but not necessarily can be implemented for old mysql servers (at least at my case). so you can use sub-queries (i think it is less complicated).
select sum(t1.cnt) from
(SELECT column, COUNT(column) as cnt
FROM
table
GROUP BY
column
HAVING
COUNT(column) > 1) as t1 ;
Please run as below :
Select sum(count)
from (select Name,
count(Name) as Count
from YourTable
group by Name); -- 6
The way I interpreted this question is needing the subtotal value of each group of answers. Subtotaling turns out to be very easy, using PARTITION:
SUM(COUNT(0)) OVER (PARTITION BY [Grouping]) AS [MY_TOTAL]
This is what my full SQL call looks like:
SELECT MAX(GroupName) [name], MAX(AUX2)[type],
COUNT(0) [count], SUM(COUNT(0)) OVER(PARTITION BY GroupId) AS [total]
FROM [MyView]
WHERE Active=1 AND Type='APP' AND Completed=1
AND [Date] BETWEEN '01/01/2014' AND GETDATE()
AND Id = '5b9xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' AND GroupId IS NOT NULL
GROUP BY AUX2, GroupId
The data returned from this looks like:
name type count total
Training Group 2 Cancelation 1 52
Training Group 2 Completed 41 52
Training Group 2 No Show 6 52
Training Group 2 Rescheduled 4 52
Training Group 3 NULL 4 10535
Training Group 3 Cancelation 857 10535
Training Group 3 Completed 7923 10535
Training Group 3 No Show 292 10535
Training Group 3 Rescheduled 1459 10535
Training Group 4 Cancelation 2 27
Training Group 4 Completed 24 27
Training Group 4 Rescheduled 1 27
You can use union to joining rows.
select Name, count(*) as Count from yourTable group by Name
union all
select "SUM" as Name, count(*) as Count from yourTable
For Sql server you can try this one.
SELECT ISNULL([NAME],'SUM'),Count([NAME]) AS COUNT
FROM TABLENAME
GROUP BY [NAME] WITH CUBE
with cttmp
as
(
select Col_Name, count(*) as ctn from tab_name group by Col_Name having count(Col_Name)>1
)
select sum(ctn) from c
You can use ROLLUP
select nvl(name, 'SUM'), count(*)
from table
group by rollup(name)
Use it as
select Name, count(Name) as Count from YourTable
group by Name
union
Select 'SUM' , COUNT(Name) from YourTable
I am using SQL server and the following should work for you:
select cast(name as varchar(16)) as 'Name', count(name) as 'Count'
from Table1
group by Name
union all
select 'Sum:', count(name)
from Table1
I required having count(*) > 1 also. So, I wrote my own query after referring some the above queries
SYNTAX:
select sum(count) from (select count(`table_name`.`id`) as `count` from `table_name` where {some condition} group by {some_column} having count(`table_name`.`id`) > 1) as `tmp`;
Example:
select sum(count) from (select count(`table_name`.`id`) as `count` from `table_name` where `table_name`.`name` IS NOT NULL and `table_name`.`name` != '' group by `table_name`.`name` having count(`table_name`.`id`) > 1) as `tmp`;
You can try group by on name and count the ids in that group.
SELECT name, count(id) as COUNT FROM table group by name
After the query, run below to get the total row count
select ##ROWCOUNT
select sum(s) from
(select count(Col_name) as s from Tab_name group by Col_name having count(*)>1)c