HIVE: Finding running totals

HIVE: Finding running totals - sql

I have a Table called Program which have following columns:
ProgDate(Date)
Episode(String)
Impression_id(int)
ProgName(String)
I want to find out total impressions for each date and episode for which i have the below query which is working fine
Select progdate, episode, count(distinct impression_id) Impression from Program where progname='BBC' group by progdate, episode order by progdate, episode;
Result:
ProgDate Episode Impression
20160919 1 5
20160920 1 15
20160921 1 10
20160922 1 5
20160923 2 25
20160924 2 10
20160925 2 25
But I also want to find out the cumulative total for each episode. I tried searching on how to find running total but it is adding up all the previous totals. i want running total for each episode, like below:
Date Episode Impression CumulativeImpressionsPerChannel
20160919 1 5 5
20160920 1 15 20
20160921 1 10 30
20160922 1 5 35
20160923 2 25 25
20160924 2 10 35
20160925 2 25 60

Recent versions of Hive HQL support windowed analytic functions (ref 1) (ref 2) including SUM() OVER()
Assuming you have such a version I have mimicked the syntax using PostgreSQL at SQL Fiddle
CREATE TABLE d
(ProgDate int, Episode int, Impression int)
;
INSERT INTO d
(ProgDate, Episode, Impression)
VALUES
(20160919, 1, 5),
(20160920, 1, 15),
(20160921, 1, 10),
(20160922, 1, 5),
(20160923, 2, 25),
(20160924, 2, 10),
(20160925, 2, 25)
;
Query 1:
select
ProgDate, Episode, Impression
, sum(Impression) over(partition by Episode order by ProgDate) CumImpsPerChannel
, sum(Impression) over(order by ProgDate) CumOverall
from (
Select progdate, episode, count(distinct impression_id) Impression
from Program
where progname='BBC'
group by progdate, episode order by progdate, episode
) d
Results:
| progdate | episode | impression | cumimpsperchannel |
|----------|---------|------------|-------------------|
| 20160919 | 1 | 5 | 5 |
| 20160920 | 1 | 15 | 20 |
| 20160921 | 1 | 10 | 30 |
| 20160922 | 1 | 5 | 35 |
| 20160923 | 2 | 25 | 25 |
| 20160924 | 2 | 10 | 35 |
| 20160925 | 2 | 25 | 60 |

Related

sql group by monthly sum and sum year by month

RDBMS - Latest Oracle
I'm out of my element here. I need to organize account transaction information by account and by month, and also use another column to show summed transactions for year to date. Here is a depiction of what I'm trying to get
ACCT_ID | ACCT_MM | FISCAL_YYYY | FISCAL_MM_AMT | YTD_AMT
------------------------------------------------------------
1 | 11 | 2018 | 25 | 100
1 | 12 | 2018 | 50 | 150
1 | 01 | 2019 | 20 | 20
I know you can get FISCAL_MM_AMT with a group by ACCT_MM, FISCAL_YYYY
this is all I have figured out so far.
SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0))
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY
Now how to combine this with the additional column YTD_AMT to that adds up all totals for that year up to the current month is what has me baffled. sql noob ftw.

Try analytical function SUM as following:
SELECT T.*,
SUM(FISCAL_MM_AMT)
OVER (PARTITION BY ACCT_ID, FISCAL_YYYY
ORDER BY ACCT_MM) AS YTD_AMT
FROM
(SELECT ACCT_ID,ACCT_MM,FISCAL_YYYY,SUM(NVL(ACCT_TRNSCTN_AMT,0)) AS FISCAL_MM_AMT
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID,ACCT_MM,FISCAL_YYYY);
Cheers!!

You can use the cumulative sum window function:
SELECT ACCT_ID, ACCT_MM, FISCAL_YYYY,
COALESCE(SUM(ACCT_TRNSCTN_AMT), 0),
SUM(SUM(ACCT_TRNSCTN_AMT)) OVER (PARTITION BY ACCT_ID, FISCAL_YYYY ORDER BY ACCT_MM) AS YTD
FROM TBL_ACCT_DETAIL
GROUP BY ACCT_ID, ACCT_MM, FISCAL_YYYY

Do you want to have one additional column bringing the sum of the year with it, and a year-to-date column?
OLAP functions are a prerequisite. But every respectable RDBMS should offer those by now.
Then, I think (adding what I presume should be the input) ... you should go:
WITH
---- this is just the input so I have example data
input( acct_id,acct_mm,fiscal_yyyy,fiscal_mm_amt) AS (
SELECT 1, 1, 2018, 5
UNION ALL SELECT 1, 3, 2018, 5
UNION ALL SELECT 1, 4, 2018, 5
UNION ALL SELECT 1, 5, 2018, 10
UNION ALL SELECT 1, 6, 2018, 10
UNION ALL SELECT 1, 7, 2018, 10
UNION ALL SELECT 1, 8, 2018, 10
UNION ALL SELECT 1, 9, 2018, 10
UNION ALL SELECT 1, 10, 2018, 10
UNION ALL SELECT 1, 11, 2018, 25
UNION ALL SELECT 1, 12, 2018, 50
UNION ALL SELECT 1, 01, 2019, 20
)
---- end of input -----
SELECT
*
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
) AS fiscal_yy_amt
, SUM(fiscal_mm_amt) OVER(
PARTITION BY acct_id,fiscal_yyyy
ORDER BY acct_mm
) AS ytd_amt
FROM input;
-- out acct_id | acct_mm | fiscal_yyyy | fiscal_mm_amt | fiscal_yy_amt | ytd_amt
-- out ---------+---------+-------------+---------------+---------------+---------
-- out 1 | 1 | 2018 | 5 | 150 | 5
-- out 1 | 3 | 2018 | 5 | 150 | 10
-- out 1 | 4 | 2018 | 5 | 150 | 15
-- out 1 | 5 | 2018 | 10 | 150 | 25
-- out 1 | 6 | 2018 | 10 | 150 | 35
-- out 1 | 7 | 2018 | 10 | 150 | 45
-- out 1 | 8 | 2018 | 10 | 150 | 55
-- out 1 | 9 | 2018 | 10 | 150 | 65
-- out 1 | 10 | 2018 | 10 | 150 | 75
-- out 1 | 11 | 2018 | 25 | 150 | 100
-- out 1 | 12 | 2018 | 50 | 150 | 150
-- out 1 | 1 | 2019 | 20 | 20 | 20
-- out (12 rows)
-- out
-- out Time: First fetch (12 rows): 76.235 ms. All rows formatted: 76.288 ms

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row

Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

Postgres FIFO query calculate profit margin

I am working on my inventory system query to calculate profit based on FIFO (First-In-First-Out) in PostgreSQL (9.3+). Most of the replies are targeted for MS SQL Server so I am not sure how to go about it for PostgreSQL. I have tried using the Windows functions but am getting stuck at calculating the profit (I'm not sure if we need/can use cursors as I have not used them before)
Sales (negative quantity) are around (20*4 + 30*1) = 110
Cost of Goods sold based on FIFO are (5*2 + 10*2 + 10*1) = 40
Profit should be 110 - 40 = 70
I have till now managed to calculate running totals. Could someone help with this?
http://sqlfiddle.com/#!15/50b12/6
product_id product_name product_price purchase_date product_quantity
1 Notebook 5 2017-05-05 00:00:00 2
1 Notebook 10 2017-05-06 00:00:00 4
1 Notebook 15 2017-05-07 00:00:00 6
1 Notebook 20 2017-05-08 00:00:00 -4 (this is sale)
1 Notebook 30 2017-05-09 00:00:00 -1 (this is sale)
Desired results should display the Sales and profit margin. As long as I can get the profit margin it would fix my issue.

select *,
sum(price_sold - price_purchased) over(order by rn) as profit
from
(
select
row_number() over(order by purchase_date, product_id) as rn,
product_id, product_price as price_purchased
from inv_test, generate_series(1, abs(product_quantity))
where product_quantity > 0
) p
full join
(
select
row_number() over(order by purchase_date, product_id) as rn,
product_id, product_price as price_sold
from inv_test, generate_series(1, abs(product_quantity))
where product_quantity < 0
) s using (rn, product_id)
;
rn | product_id | price_purchased | price_sold | profit
----+------------+-----------------+------------+--------
1 | 1 | 5 | 20 | 15
2 | 1 | 5 | 20 | 30
3 | 1 | 10 | 20 | 40
4 | 1 | 10 | 20 | 50
5 | 1 | 10 | 30 | 70
6 | 1 | 10 | | 70
7 | 1 | 15 | | 70
8 | 1 | 15 | | 70
9 | 1 | 15 | | 70
10 | 1 | 15 | | 70
11 | 1 | 15 | | 70
12 | 1 | 15 | | 70

ACCESS SQL : How to calculate wage (multiply function) and count no.of working days (count, distinct) of each staff between 2 dates

I need to create a form to summarize wage of each employees according to Date_start and Date_end that I selected.
I have 3 related tables.
tbl_labor
lb_id | lb_name | lb_OT ($/day) | If_social_sec
1 | John | 10 | yes
2 | Mary | 10 | no
tbl_production
pdtn_date | lb_id | pdtn_qty(pcs) | pd_making_id
5/9/12 | 1 | 200 | 12
5/9/12 | 1 | 40 | 13
5/9/12 | 2 | 300 | 12
7/9/12 | 1 | 48 | 13
13/9/12 | 2 | 220 | 14
15/9/12 | 1 | 20 | 12
20/9/12 | 1 | 33 | 14
21/9/12 | 2 | 55 | 14
21/9/12 | 1 | 20 | 12
tbl_pdWk_process
pd_making_id | pd_cost($/dozen) | pd_id
12 | 2 | 001
13 | 5 | 001
14 | 6 | 002
The outcome will look like this:
lb_name | no.working days | Total($)| OT payment | Social_sec($)| Net Wage |
John | 4 | 98.83 | 20 (2x10) | 15 | 103.83 (98.83+20-15)|
Mary | 2 | 160 | 10 (1x10) | 0 | 170 (160+10-0) |
My conditions are:
Wage are calculate between 2 dates that i specify eg. 5/9/12 - 20/9/12
Wage must be calculated from (pd_cost * pdtn_qty). However, pdtn_qty was kept in 'pieces' whereas pd_cost was kept in 'dozen'. Therefore the formula should be (pdtn_qty * pd_cost)/12
Add OT * no. of OT days that each worker did (eg. John had 2 OT days, Mary 1 OT day)
Total wages must be deducted eg. 15$ if If_social_sec are "TRUE"
Count no. of working days that each employees had worked.
I've tried but i couldn't merge all this condition together in one SQL statement, so could you please help me. Thank you.

OK this is really messy. Mainly because Access has no COUNT(DISTINCT ) option. So counting the working days is a mess. If you can skip that, then you can drop all the pdn1,pdn2,pdn3 stuff. But id does work. Couple of notes
1. I think your maths is not quite right in the example given, I make it like this:
2 I've used the IIF clause to simulate 2 OT for john, 1 for mary. You'll need to change that. Good luck.
select
lab.lb_name,
max(days),
sum(prod.pdtn_qty * pdWk.pd_cost / 12) as Total ,
max(lab.lb_OT * iif(lab.lb_id=1,2,1)) as OTPayment,
max(iif(lab.if_social_sec='yes' , 15,0 ) ) as Social_Sec,
sum(prod.pdtn_qty * pdWk.pd_cost / 12.00) +
max(lab.lb_OT * iif(lab.lb_id=1,2,1)) -
max(iif(lab.if_social_sec='yes' , 15,0 ) ) as NetWage
from
tbl_labor as lab,
tbl_production as prod,
tbl_pdWk_process as pdwk,
(select pdn2.lb_id, count(pdn2.lb_id) as days from
(select lb_id
from tbl_production pdn1
where pdn1.pdtn_date >= #9/5/2012#
and pdn1.pdtn_date <= #2012-09-20#
group by lb_id, pdtn_date ) as pdn2
group by pdn2.lb_id) as pdn3
where prod.pdtn_date >= #9/5/2012#
and prod.pdtn_date <= #2012-09-20#
and prod.lb_id = lab.lb_id
and prod.pd_making_id = pdwk.pd_making_id
and lab.lb_id = pdn3.lb_id
group by lab.lb_name
OK to add the items not in production table, you'll need to append something like this:
Union
select lab.lb_name,
0,
0,
max(lab.lb_OT * iif(lab.lb_id=1,2,1)) ,
max(iif(lab.if_social_sec='yes' , 15,0 ) ),0
from tbl_labor lab
where lb_id not in ( select lb_id from tbl_production where pdtn_date >= #2012-09-05# and pdtn_date <= #2012-09-20# )
group by lab.lb_name
Hope this helps.

How to insert additional values in between a GROUP BY

i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?

You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)

Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HIVE: Finding running totals - sql

Related

sql group by monthly sum and sum year by month

How to get last value for each user_id (postgreSQL)

Postgres FIFO query calculate profit margin

ACCESS SQL : How to calculate wage (multiply function) and count no.of working days (count, distinct) of each staff between 2 dates

How to insert additional values in between a GROUP BY

Categories

Resources