hive table transpose operation using sql

hive table transpose operation using sql - hive

I want to transpose table using hive QL
This is the souce table below
___________________________________________
subject|roll_1|roll_2|roll_3|roll_4|roll_5|
___________________________________________
MATHS |80 | 90 | 78 | 95 | 68|
___________________________________________
ENGLISH|78 | 78 | 67 |75 |54 |
and i want answer in the format as shown in the table below
subject|MATHS|ENGLISH|
______________________
roll_1 |80 |78 |
______________________
roll_2 |90 |78 |
______________________
roll_3 |78 | 67 |
______________________
roll_4 |95 |75 |
______________________
roll_5 |68 |54 |
please help me to resove this

This is the closest I can get to making it generic,as new columns are added,you only have to make changes in CONCAT() inside map():
select pos1+1 AS rollnum,mat,eng from(
select collect_list(a.group_map['MATHS']) as MATHS,
collect_list(a.group_map['ENGLISH']) as ENGLISH,
from ( select map(SUBJECT,CONCAT(ROLL1,',',ROLL2,',',ROLL3,',',ROLL4,',',
ROLL5,',',ROLL6)) as group_map
from db_name.tbl_name) a) b
lateral view posexplode(split(b.MATHS[0],',')) MATHS AS pos1,mat
lateral view posexplode(split(b.ENGLISH[0],',')) ENGLISH AS pos2,eng
WHERE pos1=pos2
Though this might affect efficiency somewhat.

Below is the SQL to get the desired output.
select 'roll_1' as subject,sum(case when subject='MATHS' then roll_1 else 0 end) maths,sum(case when subject='ENGLISH' then roll_1 else 0 end) english from your_table_2
union all
select 'roll_2' as subject,sum(case when subject='MATHS' then roll_2 else 0 end) maths,sum(case when subject='ENGLISH' then roll_2 else 0 end) english from your_table_2
union all
select 'roll_3' as subject,sum(case when subject='MATHS' then roll_3 else 0 end) maths,sum(case when subject='ENGLISH' then roll_3 else 0 end) english from your_table_2
union all
select 'roll_4' as subject,sum(case when subject='MATHS' then roll_4 else 0 end) maths,sum(case when subject='ENGLISH' then roll_4 else 0 end) english from your_table_2
union all
select 'roll_5' as subject,sum(case when subject='MATHS' then roll_5 else 0 end) maths,sum(case when subject='ENGLISH' then roll_5 else 0 end) english from your_table_2

Related

How to get column wise sum in SQL?

I have a complex query where I am getting the count of various categories in separate columns.
Here's the output of my query:
district | colA | colB | colC
------------------------------------
DistA | 1 | 1 | 3
DistB | 2 | 0 | 2
DistC | 2 | 1 | 0
DistD | 0 | 3 | 4
..
And here's my query:
select
q1."district",
coalesce(max(case q1."type" when 'colA' then q1."type_count" else 0 end), 0) as "colA",
coalesce(max(case q1."type" when 'colB' then q1."type_count" else 0 end), 0) as "colB",
coalesce(max(case q1."type" when 'colC' then q1."type_count" else 0 end), 0) as "colC"
from (
select
d."name" as "district",
t."name" as "type",
count(t.id) as "type_count"
from
main_entity as m
inner join type_entity as t on
m."type_id" = t.id
inner join district as d on
m."district_id" = d.id
where
m."delete_at" is null
group by
d."name",
t.id
) as q1
group by
q1."district"
I want to modify this query so that I can get the sum of each column in the last row, something like this:
district | colA | colB | colC
------------------------------------
DistA | 1 | 1 | 3
DistB | 2 | 0 | 2
DistC | 2 | 1 | 0
DistD | 0 | 3 | 4
..
Total | 5 | 5 | 9
I have tried using group by + rollup with the above query by just adding the following:
...
group by rollup (q1."district")
It adds a row at the bottom but the values are similar to the values of a row before it, and not the sum of all the rows before it, so basically something like this:
district | colA | colB | colC
------------------------------------
DistA | 1 | 1 | 3
..
DistD | 0 | 3 | 4
Total | 0 | 3 | 4
So, how can I get the column-wise some from my query?

Try this:
With temp as
( --your query from above
select
q1."district",
coalesce(max(case q1."type" when 'colA' then q1."type_count" else 0 end), 0) as "colA",
coalesce(max(case q1."type" when 'colB' then q1."type_count" else 0 end), 0) as "colB",
coalesce(max(case q1."type" when 'colC' then q1."type_count" else 0 end), 0) as "colC"
from (
select
d."name" as "district",
t."name" as "type",
count(t.id) as "type_count"
from
main_entity as m
inner join type_entity as t on
m."type_id" = t.id
inner join district as d on
m."district_id" = d.id
where
m."delete_at" is null
group by
d."name",
t.id
) as q1
group by
q1."district"
)
select t.* from temp t
UNION
select sum(t1.colA),sum(t1.colB),sum(t1.colC) from temp t1

Count most occurring word in row SQL Server

I'm trying to get the number of times a certain word occur in a query row.
For example :
Name | Chemistry | Physics | Biology | Maths
-------+-----------+-----------+-----------+--------
John | Excellent | Good | Good | Poor
Kelvin | Excellent | Excellent | Excellent | Poor
I want to get something for each row like
Name | Excellent | Good | Poor
-------+-----------+------+-------
John | 1 | 2 | 1
Kelvin | 3 | 0 | 1

Just add them up using case expressions:
select name,
(case when chemistry = 'Excellent' then 1 else 0 end +
case when physics = 'Excellent' then 1 else 0 end +
case when biology = 'Excellent' then 1 else 0 end +
case when math = 'Excellent' then 1 else 0 end
) as num_excellents,
. . .
from t;
A fancier method would use apply and aggregation:
select t.name, v.*
from t cross apply
(select sum(case when marks = 'Excellent' then 1 else 0 end) as excellent,
sum(case when marks = 'Good' then 1 else 0 end) as good,
sum(case when marks = 'Poor' then 1 else 0 end) as Poor
from (values (chemistry), (physics), (biology), (math)
) v(marks);

Counting sum of items of type

What i what to do is from this :
|type|quantity|
+----+--------+
|shoe| 10 |
|hat | 2 |
|shoe| 7 |
|shoe| 1 |
|hat | 5 |
to get this :
|shoes|hats|
+-----+----+
| 18 | 7 |
How can i do that? So far I hadn't come up with a working query, I think it should look something like that:
SELECT
SUM(CASE type WHEN 'shoe' then quantity ELSE 0 END) AS "shoes",
SUM(CASE type WHEN 'hat' then quantity ELSE 0 END) AS "hats"
FROM items
GROUP BY type

Just drop the group by. You want only one row:
SELECT
SUM(CASE type WHEN 'shoe' then quantity ELSE 0 END) AS "shoes",
SUM(CASE type WHEN 'hat' then quantity ELSE 0 END) AS "hats"
FROM items ;

SQL: Count() based on column value

I have a table as follows:
CallID | CompanyID | OutcomeID
----------------------------------
1234 | 3344 | 36
1235 | 3344 | 36
1236 | 3344 | 36
1237 | 3344 | 37
1238 | 3344 | 39
1239 | 6677 | 37
1240 | 6677 | 37
I would like to create a SQL script that counts the number of Sales outcomes and the number of all the other attempts (anything <> 36), something like:
CompanyID | SalesCount | NonSalesCount
------------------------------------------
3344 | 3 | 1
6677 | 0 | 2
Is there a way to do a COUNT() that contains a condition like COUNT(CallID WHERE OutcomeID = 36)?

You can use a CASE expression with your aggregate to get a total based on the outcomeId value:
select companyId,
sum(case when outcomeid = 36 then 1 else 0 end) SalesCount,
sum(case when outcomeid <> 36 then 1 else 0 end) NonSalesCount
from yourtable
group by companyId;
See SQL Fiddle with Demo

Something like this:
SELECT companyId,
COUNT(CASE WHEN outcomeid = 36 THEN 1 END) SalesCount,
COUNT(CASE WHEN outcomeid <> 36 THEN 1 END) NonSalesCount
FROM
yourtable
GROUP BY
companyId
should work -- COUNT() counts only not null values.

Yes. Count doesn't count NULL values, so you can do this:
select
COUNT('x') as Everything,
COUNT(case when OutcomeID = 36 then 'x' else NULL end) as Sales,
COUNT(case when OutcomeID <> 36 then 'x' else NULL end) as Other
from
YourTable
Alternatively, you can use SUM, like bluefeet demonstrated.

SELECT
companyId, SalesCount, TotalCount-SalesCount AS NonSalesCount
FROM
(
select
companyId,
COUNT(case when outcomeid = 36 then 1 else NULL end) SalesCount,
COUNT(*) AS TotalCount
from yourtable
group by companyId
) X;
Using this mutually exclusive pattern with COUNT(*)
avoids a (very small) overhead of evaluating a second conditional COUNT
gives correct values if outcomeid can be NULL
Using #bluefeet's SQLFiddle with added NULLs

Knowing COUNT() and SUM() only count non-null values and the following rule:
true or null = true
false or null = null
For fiddling around, you can take Taryn's answer and circumvent CASE altogether in a super-dirty and error-prone way!
select companyId,
sum(outcomeid = 36 or null) SalesCount,
sum(outcomeid <> 36 or null) NonSalesCount
from yourtable
group by companyId;
Forget to add an or null and you'll be counting everything!

Multiple sum/counts across multiple tables in PostgreSQL

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.
I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:
Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
Table: "dna" is similar to babtissue.
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
(Apologies for the weird formatting above, I'm not very familiar with writing this way)
The latest version of the query that I've tried:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.
Any ideas?

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.
You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...
SELECT
SQ.name,
SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
SQ.gdnas,
SQ.wgas
FROM
(
SELECT
B.name,
SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
FROM
biograph B
LEFT JOIN dna D ON D.name = B.name
GROUP BY
B.name
) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name
Can the name really be NULL?

I don't know about the "avail" column, but this should give you the other columns you're looking for:
SELECT b.name,
COALESCE (t.bloodsamps, 0) AS bloodsamps,
COALESCE (t.tissuesamps, 0) AS tissuesamps
COALESCE (d.gdnas, 0) AS gdnas
COALESCE (d.wgas, 0) AS wgas
FROM biograph b
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'BLOOD' THEN 1 ELSE 0 END) AS bloodsamps,
SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
FROM babtissue
WHERE avail = 'Y'
GROUP BY name
) t
ON (t.name = b.name)
LEFT JOIN (
SELECT name,
SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
SUM(CASE WHEN sample_type = 'WGA' THEN 1 ELSE 0 END) AS wgas
FROM dna
WHERE avail = 'Y'
GROUP BY name
) d
ON (d.name = b.name)
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

hive table transpose operation using sql - hive

Related

How to get column wise sum in SQL?

Count most occurring word in row SQL Server

Counting sum of items of type

SQL: Count() based on column value

Multiple sum/counts across multiple tables in PostgreSQL

Categories

Resources