Query last unt price before with detail record - sql

sample
+---------+------------+----------+------------+
| prdt_no | order_date | quantity | unit_price |
+---------+------------+----------+------------+
| A001 | 2020-01-01 | 100 | 10 |
| A001 | 2020-01-10 | 200 | 10 |
| A001 | 2020-02-01 | 100 | 20 |
| A001 | 2020-02-05 | 100 | 20 |
| A001 | 2020-02-07 | 100 | 20 |
| A001 | 2020-02-10 | 100 | 15 |
| A002 | 2020-01-01 | 100 | 10 |
| A002 | 2020-01-10 | 200 | 10 |
| A002 | 2020-02-01 | 100 | 20 |
| A002 | 2020-02-05 | 100 | 20 |
| A002 | 2020-02-07 | 100 | 20 |
| A002 | 2020-02-10 | 100 | 15 |
+---------+------------+----------+------------+
expected
if query condition is order_date between 2020-02-02 and 2020-02-10 then the data will be expected to get below result
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
| prdt_no | order_date | quantity | unit_price | last_unit_price_before | unit_price_diff | cost_reduce | last_unit_price_change_date |
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
| A001 | 2020-02-05 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A001 | 2020-02-07 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A001 | 2020-02-10 | 100 | 15 | 20 | -5 | -500 | 2020-02-10 |
| A002 | 2020-02-05 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A002 | 2020-02-07 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A002 | 2020-02-10 | 100 | 15 | 20 | -5 | -500 | 2020-02-10 |
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
logic
I hope to get same product last unit price before then use it to calculate the price difference
the data record count actually over 200K
like photo
test demo link
SQL Server 2012 | db<>fiddle

you can use OUTER APPLY() to get the last low with price difference
SELECT *,
unit_price_diff = T.[unit_price] - L.[last_unit_price_before]
FROM T
OUTER APPLY
(
SELECT TOP 1
last_unit_price_before = x.[unit_price],
last_unit_price_change_date = x.[order_date]
FROM T x
WHERE x.[prdt_no] = T.[prdt_no]
AND x.[order_date] < T.[order_date]
AND x.[unit_price] <> T.[unit_price]
ORDER BY x.[order_date] DESC
) L
WHERE T.[order_date] >= '2020-02-01'
AND T.[order_date] <= '2020-02-10'
db<>fiddle

Related

cumulative amount to current_date

base_table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
Based on the base table above, I would like to add more rows up to the current month,
even if there is no sales.
Expected table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
2021-03-01 33205 0 25
2021-04-01 33205 0 25
2021-05-01 33205 0 25
.........
2021-11-01 33205 0 25
My query stops at
select month, id, sales,
sum(sales) over (partition by id
order by month
rows between unbounded preceding and current row) as cumulative_sales
from base_table
This works. Assumes the month column is constrained to hold only "first of the month" dates. Use the desired hard-coded start date, or use another CTE to get the earliest date from base_table:
with base_table as (
select *
from (values
('2021-01-01'::date,33205,10)
,('2021-02-01' ,33205,15)
,('2021-01-01' ,12345,99)
,('2021-04-01' ,12345,88)
) dat("month",id,sales)
)
select cal.dt::date
,list.id
,coalesce(dat.sales,0) as sales
,coalesce(sum(dat.sales) over (partition by list.id order by cal.dt),0) as cumulative_sales
from generate_series('2020-06-01' /* use desired start date here */,current_date,'1 month') cal(dt)
cross join (select distinct id from base_table) list
left join base_table dat on dat."month" = cal.dt and dat.id = list.id
;
Results:
| dt | id | sales | cumulative_sales |
+------------+-------+-------+------------------+
| 2020-06-01 | 12345 | 0 | 0 |
| 2020-07-01 | 12345 | 0 | 0 |
| 2020-08-01 | 12345 | 0 | 0 |
| 2020-09-01 | 12345 | 0 | 0 |
| 2020-10-01 | 12345 | 0 | 0 |
| 2020-11-01 | 12345 | 0 | 0 |
| 2020-12-01 | 12345 | 0 | 0 |
| 2021-01-01 | 12345 | 99 | 99 |
| 2021-02-01 | 12345 | 0 | 99 |
| 2021-03-01 | 12345 | 0 | 99 |
| 2021-04-01 | 12345 | 88 | 187 |
| 2021-05-01 | 12345 | 0 | 187 |
| 2021-06-01 | 12345 | 0 | 187 |
| 2021-07-01 | 12345 | 0 | 187 |
| 2021-08-01 | 12345 | 0 | 187 |
| 2021-09-01 | 12345 | 0 | 187 |
| 2021-10-01 | 12345 | 0 | 187 |
| 2021-11-01 | 12345 | 0 | 187 |
| 2020-06-01 | 33205 | 0 | 0 |
| 2020-07-01 | 33205 | 0 | 0 |
| 2020-08-01 | 33205 | 0 | 0 |
| 2020-09-01 | 33205 | 0 | 0 |
| 2020-10-01 | 33205 | 0 | 0 |
| 2020-11-01 | 33205 | 0 | 0 |
| 2020-12-01 | 33205 | 0 | 0 |
| 2021-01-01 | 33205 | 10 | 10 |
| 2021-02-01 | 33205 | 15 | 25 |
| 2021-03-01 | 33205 | 0 | 25 |
| 2021-04-01 | 33205 | 0 | 25 |
| 2021-05-01 | 33205 | 0 | 25 |
| 2021-06-01 | 33205 | 0 | 25 |
| 2021-07-01 | 33205 | 0 | 25 |
| 2021-08-01 | 33205 | 0 | 25 |
| 2021-09-01 | 33205 | 0 | 25 |
| 2021-10-01 | 33205 | 0 | 25 |
| 2021-11-01 | 33205 | 0 | 25 |
The cross join pairs every date output by generate_series() with every id value from base_table.
The left join ensures that no dt+id pairs get dropped from the output when no such record exists in base_table.
The coalesce() functions ensure that the sales and cumulative_sales show 0 instead of null for dt+id combinations that don't exist in base_table. Remove them if you don't mind seeing nulls in those columns.

SQL percent of total and weighted average

I have the following postgreSql table stock, there the structure is the following:
| column | pk |
+--------+-----+
| date | yes |
| id | yes |
| type | yes |
| qty | |
| fee | |
The table looks like this:
| date | id | type | qty | cost |
+------------+-----+------+------+------+
| 2015-01-01 | 001 | CB04 | 500 | 2 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 |
| 2015-01-01 | 003 | CB04 | 500 | 1 |
| 2015-01-01 | 004 | CB04 | 100 | 5 |
| 2015-01-01 | 001 | CB02 | 800 | 6 |
| 2015-01-02 | 002 | CB03 | 3100 | 1 |
I want to create a view or query, so that the result looks like this.
The table will show the t_qty, % of total Qty, and weighted fee for each day and each type:
% of total Qty = qty / t_qty
weighted fee = fee * % of total Qty
| date | id | type | qty | cost | t_qty | % of total Qty | weighted fee |
+------------+-----+------+------+------+-------+----------------+--------------+
| 2015-01-01 | 001 | CB04 | 500 | 2 | 2600 | 0.19 | 0.38 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 | 2600 | 0.58 | 1.73 |
| 2015-01-01 | 003 | CB04 | 500 | 1 | 2600 | 0.19 | 0.192 |
| 2015-01-01 | 004 | CB04 | 100 | 5 | 2600 | 0.04 | 0.192 |
| | | | | | | | |
I could do this in Excel, but the dataset is too big to process.
You can use SUM with windows function and some Calculation to make it.
SELECT *,
SUM(qty) OVER (PARTITION BY date ORDER BY date) t_qty,
qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date) ,
fee * (qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date))
FROM T
If you want to Rounding you can use ROUND function.
SELECT *,
SUM(qty) OVER (PARTITION BY date ORDER BY date) t_qty,
ROUND(qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date),3) "% of total Qty",
ROUND(fee * (qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date)),3) "weighted fee"
FROM T
sqlfiddle
[Results]:
| date | id | type | qty | fee | t_qty | % of total Qty | weighted fee |
|------------|-----|------|------|-----|-------|----------------|--------------|
| 2015-01-01 | 001 | CB04 | 500 | 2 | 2600 | 0.192 | 0.385 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 | 2600 | 0.577 | 1.731 |
| 2015-01-01 | 003 | CB04 | 500 | 1 | 2600 | 0.192 | 0.192 |
| 2015-01-01 | 004 | CB04 | 100 | 5 | 2600 | 0.038 | 0.192 |
| 2015-01-02 | 002 | CB03 | 3100 | 1 | 3100 | 1 | 1 |

SQL multiple sum by PARTITION

I have the following postgreSql table stock, there the structure is following
| column | pk |
+--------+-----+
| date | yes |
| id | yes |
| type | yes |
| qty | |
| fee | |
table looks like this
| date | id | type | qty | fee |
+------------+-----+------+------+------+
| 2015-01-01 | 001 | CB04 | 500 | 2 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 |
| 2015-01-01 | 003 | CB04 | 500 | 1 |
| 2015-01-01 | 004 | CB04 | 100 | 5 |
| 2015-01-01 | 001 | CB02 | 800 | 6 |
| 2015-01-02 | 002 | CB03 | 3100 | 1 |
| | | | | |
I want to create a view or query, so that the result looks like this.
| date | type | t_qty | total_weighted_fee |
+------------+------+-------+--------------------+
| 2015-01-01 | CB04 | 2600 | 2.5 |
| 2015-01-01 | CB03 | 3100 | 1 |
| | | | |
what I did is this
http://sqlfiddle.com/#!17/39fb8a/18
But this is not the output what I want.
The Sub Query table looks like this:
% of total Qty = qty / t_qty
weighted fee = fee * % of total Qty
| date | id | type | qty | fee | t_qty | % of total Qty | weighted fee |
+------------+-----+------+------+-----+-------+----------------+--------------+
| 2015-01-01 | 001 | CB04 | 500 | 2 | 2600 | 0.19 | 0.38 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 | 2600 | 0.58 | 1.73 |
| 2015-01-01 | 003 | CB04 | 500 | 1 | 2600 | 0.19 | 0.192 |
| 2015-01-01 | 004 | CB04 | 100 | 5 | 2600 | 0.04 | 0.192 |
| 2015-01-01 | 002 | CB03 | 3100 | 1 | 3100 | 1 | 1 |
| | | | | | | | |
You can use aggregation . . . I don't think you are far off:
select date, type, sum(qty),
sum(fee * qty * 1.0) / nullif(sum(qty), 0)
from t
group by date, type;

How to update column with average weekly value for each day in sql

I have the following table. I insert a column named WeekValue, I want to fill the weekvalue column with the weekly average value of impressionCnt of the same category for each row.
Like:
+-------------------------+----------+---------------+--------------+
| Date | category | impressioncnt | weekAverage |
+-------------------------+----------+---------------+--------------+
| 2014-02-06 00:00:00.000 | a | 123 | 100 |
| 2014-02-06 00:00:00.000 | b | 121 | 200 |
| 2014-02-06 00:00:00.000 | c | 99 | 300 |
| 2014-02-07 00:00:00.000 | a | 33 | 100 |
| 2014-02-07 00:00:00.000 | b | 456 | 200 |
| 2014-02-07 00:00:00.000 | c | 54 | 300 |
| 2014-02-08 00:00:00.000 | a | 765 | 100 |
| 2014-02-08 00:00:00.000 | b | 78 | 200 |
| 2014-02-08 00:00:00.000 | c | 12 | 300 |
| ..... | | | |
| 2014-03-01 00:00:00.000 | a | 123 | 111 |
| 2014-03-01 00:00:00.000 | b | 121 | 222 |
| 2014-03-01 00:00:00.000 | c | 99 | 333 |
| 2014-03-02 00:00:00.000 | a | 33 | 111 |
| 2014-03-02 00:00:00.000 | b | 456 | 222 |
| 2014-03-02 00:00:00.000 | c | 54 | 333 |
| 2014-03-03 00:00:00.000 | a | 765 | 111 |
| 2014-03-03 00:00:00.000 | b | 78 | 222 |
| 2014-03-03 00:00:00.000 | c | 12 | 333 |
+-------------------------+----------+---------------+--------------+
I tried
update [dbo].[RetailTS]
set Week = datepart(day, dateDiff(day, 0, [Date])/7 *7)/7 +1
To get the week numbers then try to group by the week week number and date and category, but this seems isn't correct. How do I write the SQL query? Thanks!
Given that you may be adding more data in the future, thus requiring another update, you might want to just select out the weekly averages:
SELECT
Date,
category,
impressioncnt,
AVG(impressioncnt) OVER
(PARTITION BY category, DATEDIFF(d, 0, Date) / 7) AS weekAverage
FROM RetailTS
ORDER BY
Date, category;

Aggregating tsrange values into day buckets with a tie-breaker

So I've got a schema that lets people donate $ to a set of organizations, and that donation is tied to a certain arbitrary period of time. I'm working on a report that looks at each day, and for each organization shows the total number of donations and the total cumulative value of those donations for that organization's day.
For example, here's a mockup of 3 donors, Alpha (orange), Bravo (green), and Charlie (Blue) donating to 2 different organizations (Foo and Bar) over various time periods:
I've created a SQLFiddle that implements the above example in a schema that somewhat reflects what I'm working with in reality: http://sqlfiddle.com/#!17/88969/1
(The schema is broken out into more tables than what you'd come up with given the problem statement to better reflect the real-life version I'm working with)
So far, the query that I've managed to put together looks like this:
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
)
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT
COALESCE(sum(doa.amount_cents), 0) AS total_donations_cents,
COALESCE(count(doa.*), 0) AS total_donors
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
) o2 ON true;
With the results looking like this:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1500 | 2 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |
That's pretty close, however the problem with this query is that on days where a donation ends and that same donor begins a new one, it should only count that donor's donation one time, using the higher amount donation as a tie-breaker for the cumulative $ count. An example of that is on 2018-01-13 for organization Foo: total_donors should be 1 and total_donations_cents 1000.
I tried to implement a tie-breaker for using DISTINCT ON but I got off into the weeds... any help would be appreciated!
Also, should I be worried about the performance implications of my implementation so far, given the CTEs and the CROSS JOIN?
Figured it out using DISTINCT ON: http://sqlfiddle.com/#!17/88969/4
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
), donors_by_date AS (
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT DISTINCT ON (date, da.donor_id)
da.donor_id,
doa.id,
doa.donor_amounts_id,
doa.amount_cents
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
ORDER BY date, da.donor_id, doa.amount_cents DESC
) foo ON true
)
SELECT
date,
organization_id,
COALESCE(SUM(amount_cents), 0) AS total_donations_cents,
COUNT(*) FILTER (WHERE donor_id IS NOT NULL) AS total_donors
FROM donors_by_date
GROUP BY date, organization_id
ORDER BY organization_id, date;
Result:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1000 | 1 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |