There is one column called Price, and another column called Date_1, which include data from now to about one year later.
I want to find the mean value of Price across different dates. Ex, 2 weeks from now, 1 month from now, 6 months from now...
Can I use Case When function to do it?
Given:
Location_id | Date_1 | Price
------------+-------------+------
L_1 | 20-JUL-2016 | 105
L_1 | 21-JUL-2016 | 117
... | ... | ...
L_1 | 16-MAY-2017 | 103
L_2 | 20-JUL-2016 | 99
L_2 | 21-JUL-2016 | 106
... | ... | ...
L_2 | 16-MAY-2017 | 120
To get:
Location_id | Period | Average_Price
------------+----------+--------------
L_1 | 2 weeks | ...
L_1 | 6 months | ...
L_1 | 1 year | ...
L_2 | 2 weeks | ...
L_2 | 6 months | ...
L_2 | 1 year | ...
Where in "Period", '2 weeks' means 2 weeks from start date (sysdate). And "Average_Price" is the mean value of price across that period.
Thanks! This problem solved. And I cam across an additional one:
There is another table that contains date information :
Location_id | Ex_start_date | Ex_end_date
------------+-----------------+--------------
L_1 | 08-JUN-16 | 30-AUG-16
L_1 | 21-SEP-16 | 25-SEP-16
L_1 | 08-MAY-17 | 12-MAY-17
L_2 | 08-AUG-16 | 21-AUG-16
L_2 | 24-OCT-16 | 29-OCT-16
L_2 | 15-MAR-17 | 19-MAR-17
Beyond "Ex_Start_date" and "Ex_End_date" is 'Non_Ex' period. After I obtain average information of 2 weeks and 6 months period, I would like to I would like to add one more column, to obtain mean price for 'Non_Ex' and 'Ex' conditions as above.
Hopefully, a table as below can be obtained:
Location_id | Period | Ex_Condition | Average_Price
------------+----------------+----------------------------------
L_1 | 2 weeks | Ex period | ...
L_1 | 2 weeks | Non-Ex period | ...
L_1 | 6 months | Ex period | ...
L_1 | 6 months | Non-Ex period | ...
L_2 | 2 weeks | Ex period | ...
L_2 | 2 weeks | Non-Ex period | ...
L_2 | 6 months | Ex period | ...
L_2 | 6 months | Non-Ex period | ...
The average price will return 'null' if there is no dates falling in EX Period or Non-Ex Period.
And how can I make it happen? Thanks!
You could do it like this:
select location_id,
period,
sum(in_period * price) / nullif(sum(in_period), 0) as avg_price
from (select location_id,
price,
period,
case when mydate - days < sysdate then 1 else 0 end in_period
from localprice,
( select '2 weeks' as period, 14 as days from dual
union
select '6 months', 183 from dual
) intervals
) detail
group by location_id,
period
Replace localprice with the name of your table (you did not provide its name in your question).
Replace mydate with the actual name of your date column. I don't expect you called it date, as that is a reserved word and would require you to always quote it -- don't do that: choose another name.
dual is a standard object available in Oracle, which can be used to introduce rows in a query - rows which you don't have in a table somewhere.
Alternatively, you could create a table with all periods that interest you (2 weeks, 4 weeks, ..., together with the number of days they represent) and use that instead of the union select on dual.
Here is an SQL fiddle. Note that it runs on Postgres, because the Oracle instance is not available at this moment. For that reason I created dual explicitly and used current_date instead of sysdate. But for the rest it is the same.
NOT TESTED because OP didn't provide input data in usable format.
You probably want something along the lines of
select location_id, '2 weeks' as period, avg(price) as average_price
from base_table
where price is not null
and
"date" between SYSDATE and SYSDATE + 13
-- or however you want to define the two week interval
group by location_id
union all
select location_id, '6 months' as period, avg(price) as average_price
from base_table
where price is not null
and
"date" between SYSDATE and add_months(SYSDATE, 6) - 1
-- or however you want to define the six month interval
group by location_id
;
Note that date is a reserved Oracle keyword which should not be used as a column name; if you do, you'll have to use double-quotes, match case (upper and lower) exactly, and you may still run into various problems later. Better to only use table and column names that are not reserved words.
This is a re-phrased version of the #trincot answer. It should be faster over a bigger dataset.
Rows which are unwanted are skipped, not zeroed and used. You won't get a result row any more if there no localprice which match the intervals criteria.
It still only scans localprice once unlike the #mathguy answer.
If the real local price has a highly selective index on date then it can be used.
Un-commenting the line in the WHERE clause will help discard lines early i.e. before the intervals table is considered. The ORDERED hint may well be unnecessary in real life but it demonstrates the correct explain plan when using this line with this data.
Use UNION ALL rather that UNION when gluing rows which are going to be unique.
As usual, don't believe any answer until you've proved it in your circumstances.
WITH
localprice AS
( SELECT 'L_1' Location_id, TO_DATE('20-JUN-2016') "DATE", 105 Price FROM DUAL
UNION ALL
SELECT 'L_1' Location_id, TO_DATE('16-MAY-2017') "DATE", 103 Price FROM DUAL
UNION ALL
SELECT 'L_2' Location_id, TO_DATE('20-JUN-2016') "DATE", 99 Price FROM DUAL
UNION ALL
SELECT 'L_2' Location_id, TO_DATE('16-MAY-2017') "DATE", 120 Price FROM DUAL
),
intervals AS
( SELECT '2 weeks' AS period, 14 AS days FROM dual
UNION ALL
SELECT '6 months', 183 FROM dual
)
SELECT /*+ ORDERED */
location_id, period,
AVG(price) AS avg_price
FROM
localprice
CROSS JOIN
intervals
WHERE "DATE" >= SYSDATE - days
-- AND "DATE" >= SYSDATE - (SELECT MAX(days) FROM intervals)
GROUP BY location_id, period
Related
I don't think I properly titled this, but in essence I'm wanting to be able to count distinct users but have those previous distinct users be considered as time goes on. As an example, say we have a dataset of user purchases over time:
Date | User
-----------------
2/3/22 | A
2/4/22 | B
2/22/22 | C
3/2/22 | A
3/4/22 | D
3/15/22 | A
4/30/22 | B
Generally, if I were to count distincts grouped by months as would be normal we would get:
Date | Count
-----------------
2/1/22 | 3
3/1/22 | 2
4/1/22 | 1
But what I'm really wanting to see would be how the total number of distinct users increases over the time period.
Date | Count
-----------------
2/1/22 | 3
3/1/22 | 4
4/1/22 | 4
As such it would be 3 distinct users for the first month. Then 4 for the second month considering the total number of distinct users grew by one with the addition of "D" while "A" isn't counted because it was already recognized as a distinct user in the previous month. The third month would then still be 4 because no new distinct user performed an action that month.
Any help would be greatly appreciated (even if it is just a better title so that it reaches more people more appropriately haha)
here's a solution based on running sum in Postgres that should translate well to Vertica.
select date_trunc('month', "Date") as "Date"
,sum(count(case rn when 1 then 1 end)) over (order by date_trunc('month', "Date")) as "Count"
from (
select "Date"
,"User"
,row_number() over(partition by "User" order by "Date") as rn
from t
) t
group by date_trunc('month', "Date")
order by "Date"
Date
Count
2022-02-01 00:00:00
3
2022-03-01 00:00:00
4
2022-04-01 00:00:00
4
Fiddle
I've been reading the related questions here, and so far the solutions require that there are no missing months. Would love to get some help on what I can do if there are missing months?
For example, I'd like to calculate the 3 month rolling average of orders per item. If there is a missing month for an item, the calculation assumes that the number of orders for that item for that month is 0. If there are fewer than three months left, the rolling average isn't so important (it can be null or otherwise).
MONTH | ITEM | ORDERS | ROLLING_AVG
2021-04 | A | 5 | 3.33
2021-04 | B | 4 | 3
2021-03 | A | 3 | 1.66
2021-03 | B | 5 | null
2021-02 | A | 2 | null
2021-01 | B | 2 | null
Big thanks in advance!
Also, is there a way to "add" the missing month rows without using a cross join with a list of items? For example if I have 10 million items, the cross join takes quite a while to execute.
You can use a range window frame -- and some conditional logic:
select t.*,
(case when min(month) over (partition by item) <= month - interval '2 month'
then sum(orders) over (partition by item
order by month
range between interval '2 month' preceding and current row
) / 3.0
end) as rolling_average
from t;
Here is a db<>fiddle. The results are slightly different from what is in your question, because there is not enough info for A in 2021-03 but there is enough for B in 2021-03.
I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...
I have a SOME_DELTA table which records all party related transactions with amount change
Ex.:
PARTY_ID | SOME_DATE | AMOUNT
--------------------------------
party_id_1 | 2019-01-01 | 100
party_id_1 | 2019-01-15 | 30
party_id_1 | 2019-01-15 | -60
party_id_1 | 2019-01-21 | 80
party_id_2 | 2019-01-02 | 50
party_id_2 | 2019-02-01 | 100
I have a case where where MVC controller accepts map someMap(party_id, some_date) and I need to get part_id list with summed amount till specific some_date
In this case if I send mapOf("party_id_1" to Date(2019 - 1 - 15), "party_id_2" to Date(2019 - 1 - 2))
I should get list of party_id with summed amount till some_date
Output should look like:
party_id_1 | 70
party_id_2 | 50
Currently code is:
select sum(amount) from SOME_DELTA where party_id=:partyId and some_date <= :someDate
But in this case I need to iterate through map and do multiple DB calls for summed amount for eatch party_id till some_date which feels wrong
Is there a more delicate way to get in one select query? (to avoid +100 DB calls)
You can use a lateral join for this:
select map.party_id,
c.amount
from (
values
('party_id_1', date '2019-01-15'),
('party_id_2', date '2019-01-02')
) map (party_id, cutoff_date)
join lateral (
select sum(amount) amount
from some_delta sd
where sd.party_id = map.party_id
and sd.some_date <= map.cutoff_date
) c on true
order by map.party_id;
Online example
I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)