summing dynamic rows using over partition by postgres - sql

on postgres 9.2
| payer| effective_status | 1 | 2 | 3 | 4+
+------+ -----------------+-------+--------+-----+-----
| p1 | foo | 8 | 6000 | 4| 1
| p1 | bar | 10 | 5200 | 9| 2
| p1 | baz | 11 | 5200 | 11| 2
| p1 | zip | 9 | 4500 | 14| 4
| p1 | zap | 7 | 4200 | 45| 5
| p1 | status_n | 2 | 3900 | 71| 1
suppose the above query output (minus the ??s). i am trying to sum columns 1, 2, 3, and 4+ by payer and effective status. so for p1 there would be a column total including all effective_statuses, and then p2 would have a group total.
| p1 | effective_status | 1 | 2 | 3 | 4+| 1 total | 2 total|3 total| 4+ total
+------+ -----------------+-------+--------+-----+---+---------+--------+-------+----------
| | foo | 8 | 6000 | 4| 1| 94 | 6230 | 154 | 15
| | bar | 10 | 5200 | 9| 2| 94 | 6230 | 154 | 15
| | baz | 11 | 5200 | 11| 2| 94 | 6230 | 154 | 15
| | zip | 9 | 4500 | 14| 4| 94 | 6230 | 154 | 15
| | zap | 7 | 4200 | 45| 5| 94 | 6230 | 154 | 15
| | status_n | 2 | 3900 | 71| 1| 94 | 6230 | 154 | 15
how would i calculate the ??s? my I ave tried:
payer
,effective_status
,status_check1
,SUM(status_check1) OVER (PARTITION BY payer) AS status_check1_total
,status_check2
,SUM(status_check2) OVER (PARTITION BY payer) AS status_check2_total
,status_check3
,SUM(status_check3) OVER (PARTITION BY payer) AS status_check3_total
,status_check4
,SUM(status_check4) OVER (PARTITION BY payer) AS status_check4_total
which seems to work, most of the time. on occasion there are wrong totals. is this the correct approach?

If I understand correctly, you can use UNION ALL to combine total result set and your original table. then use order by by the grp order.
CREATE TABLE T(
payer varchar(50),
effective_status varchar(50),
status_check1 int,
status_check2 int,
status_check3 int,
status_check4 int
);
INSERT INTO T VALUES ('p1', 'foo',8 ,6000,4,1);
INSERT INTO T VALUES ('p1', 'bar',10,5200,9,2);
INSERT INTO T VALUES ('p1', 'baz',11,5200,11,2);
INSERT INTO T VALUES ('p1', 'zip',9 ,4500,14,4);
INSERT INTO T VALUES ('p1', 'zap',7 ,4200,45,5);
INSERT INTO T VALUES ('p1', 'status_n',2 ,3900,71,1);
INSERT INTO T VALUES ('p2', 'foo',5 ,3500,12,2);
INSERT INTO T VALUES ('p2', 'zip',1 ,5000,1,1);
Query 1:
SELECT *
FROM (
SELECT t1.payer
,effective_status
,status_check1
,status_check2
,status_check3
,status_check4
,1 grp
FROM T t1
UNION ALL
SELECT payer,
'',
SUM(status_check1),
SUM(status_check2),
SUM(status_check3),
SUM(status_check4),
2
FROM T
GROUP BY payer
) t1
ORDER BY payer,grp
Results:
| payer | effective_status | status_check1 | status_check2 | status_check3 | status_check4 | grp |
|-------|------------------|---------------|---------------|---------------|---------------|-----|
| p1 | foo | 8 | 6000 | 4 | 1 | 1 |
| p1 | bar | 10 | 5200 | 9 | 2 | 1 |
| p1 | baz | 11 | 5200 | 11 | 2 | 1 |
| p1 | zip | 9 | 4500 | 14 | 4 | 1 |
| p1 | zap | 7 | 4200 | 45 | 5 | 1 |
| p1 | status_n | 2 | 3900 | 71 | 1 | 1 |
| p1 | | 47 | 29000 | 154 | 15 | 2 |
| p2 | foo | 5 | 3500 | 12 | 2 | 1 |
| p2 | zip | 1 | 5000 | 1 | 1 | 1 |
| p2 | | 6 | 8500 | 13 | 3 | 2 |

I'm not sure why you are using window functions. This would appear to be union all:
select payer, effective_status, status_check1, status_check2, status_check3, status_check4
from t
union all
select payer, null, sum(status_check1), sum(status_check2), sum(status_check3), sum(status_check4)
order by payer, effective_status nulls last;
Postgres 9.5 supports grouping sets which simplifies such logic.

Actually, I didn't get clearly what you are trying to do, but if you want to have result grouped by payer and effective_status it possibly would look like this
select
payer as p,
effective_status as es,
(sum(col1) + sum(col2) + sum(col3) + sum(col4)) as sum
from table_name
group by p, es
So, hope it will help you

Related

PostgresSql:Comparing two tables and obtaining its result and compare it with third table

TABLE 2 : trip_delivery_sales_lines
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| Sl no | Order_date | Partner_id | Route_id | Product_id | Product qty | amount | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| 1 | 2020-08-01 04:25:35 | 34567 | 152 | 432 | 2 | 100 | |
| 2 | 2021-09-11 02:25:35 | 34572 | 130 | 312 | 4 | 150 | |
| 3 | 2020-05-10 04:25:35 | 34567 | 152 | 432 | 3 | 123 | |
| 4 | 2021-02-16 01:10:35 | 34572 | 130 | 432 | 5 | 123 | |
| 5 | 2020-02-19 01:10:35 | 34567 | 152 | 432 | 2 | 600 | |
| 6 | 2021-03-20 01:10:35 | 34569 | 152 | 123 | 1 | 123 | |
| 7 | 2021-04-23 01:10:35 | 34570 | 152 | 432 | 4 | 200 | |
| 8 | 2021-07-08 01:10:35 | 34567 | 152 | 432 | 3 | 32 | |
| 9 | 2019-06-28 01:10:35 | 34570 | 152 | 432 | 2 | 100 | |
| 10 | 2018-11-14 01:10:35 | 34570 | 152 | 432 | 5 | 20 | |
| | | | | | | | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
From Table 2 : we had to find partners in route=152 and find the sum of product_qty of the last 2 sale [can be selected by desc order_date]
. We can find its result in table 3.
34567 – Serial number [ 1,8]
34570 – Serial number [ 7,9]
34569 – Serial number [6]
TABLE 3 : RESULT OBTAINED FROM TABLE 1,2
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 5 |
| 34569 | 1 |
| 34570 | 6 |
| | |
+------------+-------+
From table 4 we want to find the above partner_ids leaf count
TABLE 4 :coupon_leaf
+------------+-------+
| Partner_id | Leaf |
+------------+-------+
| 34567 | XYZ1 |
| 34569 | XYZ2 |
| 34569 | DDHC |
| 34567 | DVDV |
| 34570 | DVFDV |
| 34576 | FVFV |
| 34567 | FVV |
| | |
+------------+-------+
From that we can find result as:
34567 – 3
34569-2
34570 -1
TABLE 5: result obtained from TABLE 4
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 3 |
| 34569 | 2 |
| 34570 | 1 |
| | |
+------------+-------+
Now we want compare table 3 and 5
If partner_id count [table 3] > partner_id count [table 4]
Print partner_id
I want a single query to do all these operation
distinct partner_id can be found by: fROM TABLE 1
SELECT DISTINCT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
This answers the original version of the problem.
You seem to want to compare totals after aggregating tables 2 and 3. I don't know what table1 is for. It doesn't seem to do anything.
So:
select *
from (select partner_id, sum(quantity) as sum_quantity
from (select tdsl.*,
row_number() over (partition by t2.partner_id order by order_date) as seqnum
from trip_delivery_sales_lines tdsl
) tdsl
where seqnum <= 2
group by tdsl.partner_id
) tdsl left join
(select cl.partner_id, count(*) as leaf_cnt
from coupon_leaf cl
group by cl.partner_id
) cl
on cl.partner_id = tdsl.partner_id
where leaf_cnt is null or sum_quantity > leaf_cnt

How to fill forward time series data in Postgres

I am looking to join three tables together and fill forward null values on the resulting table.
Three tables:
Table 1 (raw.fb_historical_data) - this is the main table on which I would like to join the other two on to. Each row of this table is related to one or more rows in the other two tables through a combination of columns id, clk and timestamp (mkt_id and row_id in the other tables).
+---------------------+-----+-----+--------------+
| timestamp | clk | id | some_columns |
+---------------------+-----+-----+--------------+
| 2016-06-19 06:11:13 | 123 | 126 | a |
| 2016-06-19 06:16:13 | 124 | 127 | b |
| 2016-06-19 06:21:13 | 234 | 126 | c |
| 2016-06-19 06:41:13 | 456 | 127 | d |
| ... | ... | ... | ... |
+---------------------+-----+-----+--------------+
Table 2 (raw.fb_runner_changes) - this table essentially gives price changes for a wide range of different markets
+---------------------+--------+--------+-------+
| timestamp | row_id | mkt_id | price |
+---------------------+--------+--------+-------+
| 2016-06-19 06:11:13 | 123 | 126 | 1 |
| 2016-06-19 06:21:13 | 123 | 126 | 2 |
| 2016-06-19 06:41:13 | 123 | 126 | 3 |
| 2016-06-06 18:54:06 | 124 | 127 | 1 |
| 2016-06-06 18:56:06 | 124 | 127 | 2 |
| 2016-06-06 18:57:06 | 124 | 127 | 3 |
| ... | ... | ... | ... |
+---------------------+--------+--------+-------+
Table 3 (raw.fb_runners) - a table with extra information about market changes that I would like to join
+---------------------+--------+--------+---------------+
| timestamp | row_id | mkt_id | other_columns |
+---------------------+--------+--------+---------------+
| 2016-06-19 06:15:13 | 234 | 126 | ab |
| 2016-06-19 06:31:13 | 234 | 126 | cd |
| 2016-06-19 06:56:13 | 234 | 126 | ef |
| 2016-06-06 18:54:06 | 456 | 127 | gh |
| 2016-06-06 18:56:06 | 456 | 127 | jk |
| 2016-06-06 18:57:06 | 456 | 127 | lm |
| ... | ... | ... | ... |
+---------------------+--------+--------+---------------+
Essentially what I want to do is fill NULL information forward (ordered by timestamp) while grouping by market id.
So far, I have tried to join the tables together using
SELECT *
FROM raw.fb_historical_data AS h
LEFT JOIN raw.fb_runner_changes AS rc
ON rc.row_id = h.clk
AND rc.timestamp = h.timestamp
AND rc.mkt_id = h.id
LEFT JOIN raw.fb_runners AS r
ON r.row_id = h.clk
AND r.timestamp = h.timestamp
AND r.mkt_id = h.id
Which has worked as intended, though now there are nulls in the resulting dataset which i'd like to fill in with the last available value for that market.
With some of the other SQL dialects, fill forward could be done using the window function last_value in combination with the instruction ignore nulls.
Since this is not supported in PostgreSQL (check the note at the bottom of this page), we are using a 2 steps work-around.
select ts, val, val_seq, min(val) over (partition by val_seq) val_fill_fw
from (select ts, val, count(val) over(order by ts) as val_seq
from t
) t
-
+----+----------+---------+-------------+
| ts | val | val_seq | val_fill_fw |
+----+----------+---------+-------------+
| 1 | (null) | 0 | (null) |
| 2 | (null) | 0 | (null) |
| 3 | hello | 1 | hello |
| 4 | (null) | 1 | hello |
| 5 | (null) | 1 | hello |
| 6 | darkness | 2 | darkness |
| 7 | my | 3 | my |
| 8 | (null) | 3 | my |
| 9 | old | 4 | old |
| 10 | (null) | 4 | old |
| 11 | (null) | 4 | old |
| 12 | (null) | 4 | old |
| 13 | friend | 5 | friend |
| 14 | (null) | 5 | friend |
+----+----------+---------+-------------+
SQL Fiddle
This seems to correctly do 'forward fill' in postgres. However I am a postgres newbie so I would appreciate feedback if it's wrong.
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select id, (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example

Incremental/Update in hive

I have a hive external table with data say, (version less than 0.14)
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10011 | 10 | 3 | 0 |
| 10012 | 9 | 0 | 40 |
| 10015 | 10 | 3 | 0 |
| 10017 | 9 | 0 | 40 |
+--------+------+------+------+
And I have a delta file having data given below.
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10012 | 50 | 3 | 10 | --> update
| 10013 | 29 | 0 | 40 | --> insert
| 10014 | 10 | 3 | 0 | --> update
| 10013 | 19 | 0 | 40 | --> update
| 10015 | 70 | 3 | 0 | --> update
| 10016 | 17 | 0 | 40 | --> insert
+--------+------+------+------+
How can I update my hive table with the delta file, without using sqoop. Any help on how to proceed will be great! Thanks.
This is because there is duplicates in the file. How do you know which you should keep? The last one?
In that case you can use, for example, the row_number and then get the maximum value. Something like that.
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
( SELECT *, row_number() over( partition by id ) as row_num
,COUNT(*) OVER (PARTITION BY id) AS cnt
FROM temp_table
) tmp
ON initial.id=tmp.id
WHERE row_num=cnt
OR row_num IS NULL;
Output:
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+
You can load the file to a temporary table in hive and then execute a FULL OUTER JOIN between the two tables.
Query Example:
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
temp_table tmp on initial.id=tmp.id;
Output
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 29 | 0 | 40 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+

How to update table 2 from the inserted data in table 1?

Can you help me on what query I to to update one table with data from another.
I have 2 tables for example:
tbl_med_take
| id | name | med | qty |
---------------------------------
| 1 | jayson | med2 | 3 |
| 2 | may | med2 | 4 |
| 3 | jenny. | med3 | 6 |
| 4 | joel. | med3 | 4 |
tbl_med
| id | med | stocks |
-----------------------------
| 1 | med1 | 20 |
| 2 | med2 |. 17 |
| 3 | med3 | 24 |
The output that I want in tbl_med:
tbl_med
| id | med | stocks |
-----------------------------
| 1 | med1 | 20 |
| 2 | med2 |. 10 |
| 3 | med3 | 14 |
First get the total consumed from med_tbl_take using
select med,sum(quantity) as total from tbl_med_take group by med
Then you can left join with your med_tbl and subtract.
select m.id,m.med,(m.stocks-ISNULL(n.total,0)) from tbl_med m
left join
(select med,sum(quantity) as total from tbl_med_take group by med) n
on m.med=n.med
CHECK DEMO HERE

Query for computing a column using other computed columns

Table: Project Details
+-----+------------------+------------+--------------+------------+
| GPN | EmployeePosition | Project.No | ChargedHours | PayPerHour |
+-----+------------------+------------+--------------+------------+
| 2 | B | 101 | 50 | 57 |
| 3 | C | 100 | 75 | 44 |
| 4 | D | 100 | 100 | 24.75 |
| 5 | E | 103 | 125 | 19.25 |
| 6 | F | 101 | 150 | 16 |
| 7 | C | 100 | 175 | 44 |
+-----+------------------+------------+--------------+------------+
I need to find out total pay of each Project. So first I have to find out Total pay per employee and group it by Project.No.
The table below shows the Total pay per Employee which is created using other 2 existing columns
+-----+-------------+---------+------------+----------+----------------+
| GPN | EmpPosition | Proj.No | ChargedHrs | PayPerHr | TotalPayPerEmp |
+-----+-------------+---------+------------+----------+----------------+
| 2 | B | 101 | 50 | 57 | 993.75 |
| 3 | C | 100 | 75 | 44 | 2850 |
| 4 | D | 100 | 100 | 24.75 | 3300 |
| 5 | E | 103 | 125 | 19.25 | 2406.25 |
| 6 | F | 101 | 150 | 16 | 2400 |
| 7 | C | 100 | 175 | 44 | 7700 |
+-----+-------------+---------+------------+----------+----------------+
My Query:
Select EngNumber, SUM([CharHrs])[SumOfChargedHours], Levell, CostPH,
SUM([CharHrs])*CostPH [TotalPayPerEmployee]
FROM data1.dbo.PayedPerHour
GROUP BY EngNumber, Levell, TotalPayPerEmployee, CostPH
ORDER BY EngNumber;
Update data1.dbo.PayedPerHour
SET CostPH = CASE Levell
WHEN 'Associate Director' THEN '79.75'
WHEN 'Senior Manager' THEN '57'
WHEN 'Manager' THEN '44'
WHEN 'Senior' THEN '24.75'
WHEN 'Staff 2, 3 & 4' THEN '19.25'
WHEN 'Staff 1' THEN '16'
ELSE 'NULL'
END
WHERE Levell IN('Associate Director', 'Senior Manager','Manager', 'Senior',
'Staff 2, 3 & 4', 'Staff 1');
I want to group the TotalPayPerEmp by Proj.No but i cant accomplish it.
I would have made silly mistakes in the query since I'm very new to sql so please regret them
Expected table:
+---------+--------------------+
| Proj.No | TotalPayPerProject |
+---------+--------------------+
| 100 | 14093.75 |
| 101 | 5250 |
| 103 | 4881.25 |
+---------+--------------------+
I think this could be done using some of your algorithm, except at the ProjectNo granularity:
SELECT ProjectNo
,SUM(ChargedHours*PayPerHour) [TotalPayPerProject]
FROM ProjectDetails
GROUP BY ProjectNo
This gives output:
ProjectNo TotalPayPerProject
100 13475
101 5250
103 2406.25
This is different from your expected output, for some reason.
Here's a SQL fiddle: http://sqlfiddle.com/#!6/21a33/2/0