HiveQL - join multi-level subtotals to existing table

HiveQL - join multi-level subtotals to existing table - sql

My goal is to determine the size of various organizations at various levels. Let's assume we have three organisations 'A', 'B', and 'C', each consisting of multiple department and having a further subdivision in teams with members., as outlined below:
Org. Dep. Tm. Member
A 1 I name1
A 1 I name2
A 1 I name3
A 1 II name4
A 2 I name5
A 2 I name6
B 1 I name7
B 1 II name8
B 1 II name9
B 1 II name10
B 2 I name11
B 2 I name12
B 2 II name13
B 2 II name14
B 2 III name15
B 2 III name16
C 1 I name17
C 1 I name18
C 1 I name19
C 1 I name20
C 1 I name21
Now, I'd like to know for each member how large their respective Org., Dep. and Tm. are, like this:
Org. Dep. Tm. Member org dep tm
A 1 I name1 6 4 3
A 1 I name2 6 4 3
A 1 I name3 6 4 3
A 1 II name4 6 4 1
A 2 I name5 6 2 2
A 2 I name6 6 2 2
B 1 I name7 10 4 1
B 1 II name8 10 4 3
B 1 II name9 10 4 3
B 1 II name10 10 4 3
B 2 I name11 10 6 2
B 2 I name12 10 6 2
B 2 II name13 10 6 2
B 2 II name14 10 6 2
B 2 III name15 10 6 2
B 2 III name16 10 6 2
C 1 I name17 5 5 5
C 1 I name18 5 5 5
C 1 I name19 5 5 5
C 1 I name20 5 5 5
C 1 I name21 5 5 5
My original idea was to do this with multiple LEFT JOINS to aggregate the different levels, but this scales very poorly as you need a new join for every aggregation level. Is there a way to do this efficiently in a single statement?

Use window functions:
select org, dep, tm,
count(*) over (partition by org) as org_cnt,
count(*) over (partition by org, dep) as dep_cnt,
count(*) over (partition by org, dep, tm) as tm_cnt
from t;
The columns are hierarchical so dep and tm need the higher levels of the hierarchy.
EDIT:
If Hive doesn't support count(distinct) and you need it, then you can do:
select org, dep, tm,
sum(case when seqnum_o = 1 then 1 else 0 end) over (partition by org) as org_cnt,
sum(case when seqnum_od = 1 then 1 else 0 end) over (partition by org, dep) as dep_cnt,
sum(case when seqnum_odt = 1 then 1 else 0 end) over (partition by org, dep, tm) as tm_cnt
from (select t.*,
row_number() over partition by org, memberid order by org) as seqnum_o,
row_number() over partition by org, dep, memberid order by org) as seqnum_od,
row_number() over partition by org, dep, tm, memberid order by org) as seqnum_odt
from t
) t;

Related

Format data from a table into another formatted table using Oracle SQL

So I have 1 table of data that has 3 columns, 1 being a reference number, the other being a version of that reference and a 3rd being the items belonging to that reference.
What I want to do is to display a table that shows 3 columns, 1st being the reference number, the next being the list of items on the version 1 and the list of items on the 'last final version'.
Original data:
refno
version
item
1
1
ABC123
1
1
XYZ123
1
2
EFG123
2
1
UIO123
2
1
JKL123
3
1
ABC123
3
2
ABC123
3
2
HJF123
3
2
IKJ123
3
2
EEK123
3
2
EEK123
4
1
GFD123
4
1
YUI123
4
2
YUI123
5
1
TYP123
6
1
GHS123
7
1
TEP123
7
1
SLS123
7
2
TEP123
7
2
SLS123
7
2
AEE123
7
3
AAL123
7
4
QEF123
How I want it to be formatted:
refno
Original Item
Final Item
1
ABC123
EFG123
1
XYZ123
2
UIO123
UIO123
2
JKL123
JKL123
3
ABC123
ABC123
3
HJF123
3
IKJ123
3
EEK123
3
EEK123
4
GFD123
YUI123
4
YUI123
5
TYP123
TYP123
6
GHS123
GHS123
7
TEP123
QEF123
7
SLS123
Any tips on how to do this in SQL (specifically oracle SQL)

Looks like you want full self join
select coalesce(a.refno, b.refno) refno, a.item OriginalItem, b.item FinalItem
from(
select *
from t
where version = 1
) a
full join(
select *
from (
select *, rank() over(partition by refno order by version DESC) rn
from t
where version > 1
) t
where rn=1
) b
on a.refno=b.refno
order by coalesce(a.refno, b.refno), a.item, b.item
Returns
refno OriginalItem FinalItem
1 ABC123 EFG123
1 XYZ123 EFG123
2 JKL123
2 UIO123
3 ABC123 ABC123
3 ABC123 EEK123
3 ABC123 EEK123
3 ABC123 HJF123
3 ABC123 IKJ123
4 GFD123 YUI123
4 YUI123 YUI123
5 TYP123
6 GHS123
7 SLS123 QEF123
7 TEP123 QEF123

How to order "group" of row?

This is the query:
SELECT WorkTypeId, WorktypeWorkID, LevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
This is the result:
WorkTypeId WorktypeWorkID LevelID
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
1 3 1
1 4 1
1 4 2
1 5 1
NULL NULL NULL
3 19 2
4 6 1
4 7 1
4 7 2
4 7 3
4 17 1
4 17 2
4 18 1
4 18 2
NULL NULL NULL
I'd like to order the block of rows of each WorktypeWorkID, placing at the top the blocks which have the lower LevelID within the block.
Here's the result that I'd like to get:
WorkTypeId WorktypeWorkID LevelID
NULL NULL NULL
NULL NULL NULL
1 3 1 // blocks which have MinLevel 1
1 5 1
4 6 1
1 4 1 // blocks which have MinLevel 2
1 4 2
3 19 2
4 17 1
4 17 2
4 18 1
4 18 2
1 1 1 // blocks which have MinLevel 3
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
4 7 1
4 7 2
4 7 3

I think this is what you are looking for:
SELECT WorkTypeId, WorktypeWorkID, LevelID, MAX(LevelID) OVER (PARTITION BY WorktypeWorkID) as maxLevelID
FROM Worktypes as w
LEFT JOIN WorktypesWorks as ww on w.ID = ww.WorktypeID
LEFT JOIN WorktypesWorksLevels as wwl on ww.ID = wwl.WorktypeWorkID
ORDER BY maxLevelID

summarising a 3 months sales report across 2 branches into top 3 product for each month

I have the following REPORT table
m = month,
pid = product_id,
bid = branch_id,
s = sales
m pid bid s
--------------------------
1 1 1 20
1 3 1 11
1 2 1 14
1 4 1 16
1 5 1 31
1 1 2 30
1 3 2 10
1 2 2 24
1 4 2 17
1 5 2 41
2 3 1 43
2 5 1 21
2 4 1 10
2 1 1 5
2 2 1 12
2 3 2 22
2 5 2 10
2 4 2 5
2 1 2 4
2 2 2 10
3 3 1 21
3 5 1 10
3 4 1 44
3 1 1 4
3 2 1 14
3 3 2 10
3 5 2 5
3 4 2 6
3 1 2 7
3 2 2 10
I'd like to have a summary of this sales table
by showing the top 3 sales among the products across all branches.
something like this:
m pid total
---------------------
1 5 72
1 1 50
1 4 33
2 3 65
2 5 31
2 2 22
3 4 50
3 3 31
3 2 24
so on month 1, product #5 has the highest total sales with 72, followed by product #1 is 50.. and so on. if i could separate them into different table for each month would be better
so far what i can do is make a summary for 1 month and shows the entire thing and not top 3.
select pid, sum(s)
from report
where m = 1
group by pid
order by sum(s);
thanks a lot!

Most databases support the ANSI standard window functions. You can do what you want with row_number():
select m, pid, s
from (select r.m, r.pid, sum(s) as s,
row_number() over (partition by m order by sum(s) desc) as seqnum
from report r
group by r.m, r.pid
) r
where seqnum <= 3
order by m, s desc;

SQL(Oracle) select the most common value (multiple tables)

I want to know which seat was the most sold by individual halls?
TICKETS
IDTICKET MOVIE_IDMOVIE HALL_IDHALL PRICE SEAT ROW
1 10 2 4 10 6
2 5 2 4 10 5
3 5 2 4 10 4
4 8 5 4 3 1
5 7 5 4 4 15
6 10 7 4 7 9
7 6 2 4 14 3
HALLS
IDHALL PLACE_IDPLACE NAME NUMSEATS EQUIPMENT
1 5 A1 250 high
2 5 B1 200 medium
3 5 B2 200 medium
4 5 C2 180 medium
5 5 C2 180 medium
6 9 old hall 120 low
Display should look like
B1 10
C2 3
...

SELECT b.Name, a.Seat
FROM (SELECT Hall, Seat, COUNT(1) AS SeatCount, RANK() OVER (PARTITION BY Hall ORDER BY COUNT(1) DESC) AS SeatRank
FROM SEAT
GROUP BY Hall, Seat ) a
INNER JOIN
HALL b
ON a.HALL_IDHALL = b.IDHALL
WHERE a.SeatRank = 1

select h.name,t.hall_idhall,h.idhall, max(t.seat) from tickets t, hall h where t.hall_idhall=h.idhall
group by h.name,t.hall_idhall,h.idhall
Try the above query

SQL query for fetching data

hi i have a situation in sql as follows:
table name: case_details
caseid refno clientid report_date
1 1/1 1007 08-05-2013
2 1/2 1007 01-06-2013
3 1/3 1007 12-07-2013
4 1/4 1012 17-07-13
5 1/6 1009 08-07-13
table name: case_check_detail
caseid checkid alert_val
1 1 1
1 2 2
1 3 1
1 4 2
2 5 4
2 6 3
2 7 2
2 8 1
3 9 2
3 10 1
3 11 2
3 12 1
4 13 3
4 14 3
4 15 3
4 16 4
5 17 1
5 18 2
5 19 1
5 20 2
I want to count how many cases are there for clientid 1007 for whom the highest value of alert_val is 2 between 01-05-2013 to 18-07-2013
Like in this case its:
case id:1,caseid:3

Try
SELECT d.caseid
FROM case_details d JOIN case_check_detail c
ON d.caseid = c.caseid
WHERE d.clientid = 1007
AND d.report_date BETWEEN '20130501' AND '20130718'
GROUP BY d.caseid
HAVING MAX(c.alert_val) = 2
Output:
| CASEID |
----------
| 1 |
| 3 |
If you want to count them
SELECT COUNT(*) total
FROM
(
SELECT d.caseid
FROM case_details d JOIN case_check_detail c
ON d.caseid = c.caseid
WHERE d.clientid = 1007
AND d.report_date BETWEEN '20130501' AND '20130718'
GROUP BY d.caseid
HAVING MAX(c.alert_val) = 2
) q
Output:
| TOTAL |
---------
| 2 |
Here is SQLFiddle demo

SELECT COUNT(*)
FROM case_check_detail AS ccd
JOIN case_details AS cd ON cd.caseid=ccd.caseid
WHERE alert_val=2 AND report_date BETWEEN '2013-05-01' AND '2013-07-18'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HiveQL - join multi-level subtotals to existing table - sql

Related

Format data from a table into another formatted table using Oracle SQL

How to order "group" of row?

summarising a 3 months sales report across 2 branches into top 3 product for each month

SQL(Oracle) select the most common value (multiple tables)

SQL query for fetching data

Categories

Resources