Transpose data in HIVE - sql

I have the following dataset in Hive, and I would like to transpose rows into columns.
Customer
Status
Quantity
25
Paid
5
25
N Paid
2
67
Open
12
67
Paid
4
45
N Paid
3
45
Open
2
I would like to have a new table after transpose that shows only one line by a customer and multiple columns by Status, e.g.
Customer
Paid
N Paid
Open
25
5
2
0
67
4
0
12
45
0
3
2
I tried some examples I've found on the Internet, but I could not make it works. Here, for the sake of simplicity, I listed only three statuses, but in fact, I could have more than that.
In SAS, I used to did something such as the following:
proc transpose
data = imputtable;
out = outputtable;
by customer;
id status;
var quantity;
run;
SAS gets all the existing statuses and pivots them into columns. I was looking to do the same in Hive.
Regards,
Marcio

Use conditional aggregation:
select Customer,
sum(case when Status = 'Paid' then Quantity else 0 end) as Paid ,
sum(case when Status = 'N Paid' then Quantity else 0 end) as `N Paid` ,
sum(case when Status = 'Open' then Quantity else 0 end) as Open
from table
group by Customer

Related

SQL transform table with sum based on values

i have table like this:
operation_id
order_id
qty
qty_type
detail_type
1
1
240
ready
glued
1
1
199
ready
unglued
1
1
100
done
glued
1
2
50
ready
glued
and would like to transform into this. it means to add 4 columns and to sum them from above table based on a conditions, like detail_type = 'glued', qty_type = 'ready' etc.
operation_id
order_id
qty_glued_ready
qty_unglued_ready
qty_glued_done
qty_unglued_done
1
1
240
199
10
10
can somebody help me how query should look like?
I assume it is just an example that you have mentioned in your OP and it is not accurate according to your table data you have mentioned.
I don't understand how your qty_glued_done is 10
But here is something you can start working out with:
SELECT o.`operation_id`, o.`order_id`,
SUM(CASE WHEN `detail_type`='glued' AND o.`qty_type`='ready' THEN o.`qty` ELSE 0 END) AS qty_glued_ready,
SUM(CASE WHEN `detail_type`='unglued' AND o.`qty_type`='ready' THEN o.`qty` ELSE 0 END) AS qty_unglued_ready
(and so on)
FROM `operation_table` o GROUP BY o.`operation_id`

Show two different sum columns based on a single column

Show two different sum columns based on another column.
For this table:
ID Item Quantity Location
1 1 10 A
2 1 10 B
3 1 10 A
4 2 10 A
5 2 10 A
6 2 10 B
7 3 10 A
8 3 20 A
I need to see the total quantities for both location A and location B (to compare which is higher), but only for items that have a location B:
Expected result:
Item Quantity A Quantity B
1 20 10
2 20 10
I've been trying this but getting errors:
SELECT st.item, st.qty ALIAS(stqty),
(SELECT SUM(dc.qty)
FROM table dc
WHERE st.item = dc.item) ALIAS(dcqty))
FROM table st
WHERE location ='b'
I can do this easily with two queries obviously, but I was hoping for a way to do it in one.
you can use a sum with case statement to do your pivot then a having to exclude rows with no total for b
here is the fiddle
https://www.db-fiddle.com/f/rS8fgvWoFxn879Utc2CKbu/0
select Item,
sum(case when Location = 'A' then Quantity else 0 end),
sum(case when Location = 'B' then Quantity else 0 end)
from myTable
group by Item
having sum(case when Location = 'B' then Quantity else 0 end) > 0

SQL: create another column that calculates ratio

So I have a table that looks like the following:
car owner
non car owner
have dog
num ppl
1
0
1
60
0
1
1
80
1
0
0
90
1
0
0
98
I am trying to add another column to find the ratios. For example, the total number of car owners is 110. If I want to find the ratio of people who own car and have dog, then I have to divide 60/110 for the first row. Also, the total number of non car owners is 98. Therefore, if I want to find that ration, I need to divide 80 by 98 for the second row and so on.
So far, I have tried the following code:
with a as (
select
id,
case when car_owner = 1 then 1 else 0 end car_owner,
case when non_car_owner = 1 then 1 else 0 end as non_car_owner = 1
from `xyz_table`
),
b as (select
car_owner,
non_car_owner,
case when have_dog = 1 then 1 else 0 end have_dog,
count(distinct id) num_ppl
from `xyz_table`
join a using (id)
group by 1,2,3
order by 4 desc
)
select *, num_ppl/(select (case when dog_owner = 1 then 110 else 0 end) as ratio
from a)
from b
Unfortunately , it throws the following error:
Scalar subquery produced more than one element
Any help would be appreciated.
PS. I am running this code on google bigquery.
If I want to find the ratio of people who own car and have dog,
You can use avg():
select avg(car_owner * have_dog)
from t;

Need resultset on following resultset in SQL

I have result set as follows:
Employer_id Type Amount
1 penalty 100
1 interest 120
2 penalty 50
2 interest 60
2 contribution 70
1 contribution 140
I need result as:
Employer_id penalty interest contribution**
1 100 120 140
2 50 60 70
How can I do this in SQL?
Please take a moment and learn how to format your questions.
A simple conditional aggregation should do the trick
Select Employer_id
,penalty = sum(case when [type]='penalty' then amount else 0 end)
,interest = sum(case when [type]='interest' then amount else 0 end)
,contribution = sum(case when [type]='contribution' then amount else 0 end)
From YourTable
Group By Employer_id

Presto SQL pivoting (for lack of a better word) data

I am working with some course data in a Presto database. The data in the table looks like:
student_id period score completed
1 2016_Q1 3 Y
1 2016_Q3 4 Y
3 2017_Q1 4 Y
4 2018_Q1 2 N
I would like to format the data so that it looks like:
student_id 2018_Q1_score 2018_Q1_completed 2017_Q3_score
1 0 N 5
3 4 Y 4
4 2 N 2
I know that I could do this by joining to the table for each time period, but I wanted to ask here to see if any gurus had a recommendation for a more scalable solution (e.g. perhaps not having to manually create a new join for each period). Any suggestions?
You can just use conditional aggregation:
select student_id,
max(case when period = '2018_Q1' then score else 0 end) as score_2018q1,
max(case when period = '2018_Q1' then completed then 'N' end) as completed_2018q1,
max(case when period = '2017_Q3' then score else 0 end) as score_2017q3
from t
group by student_id