I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...
Related
I am running SQL Server 2016 and have the following problem which seems quite basic but I cannot figure it out. I have a table Prices, which holds prices of different securities, with columns
idTag varchar(12) NOT NULL
ts datetime2 NOT NULL
price float NOT NULL
I also have another table Data with columns idTag and ts, where tags match exactly, but timestamps don't. I would like to find the corresponding prices for each row of the Data table (equivalent to constant interpolation in time).
For example, sample values in Prices may be
idTag | ts | price
=================================
IBM | 2020-01-01 13:00 | 100.23
IBM | 2020-01-01 13:05 | 100.34
IBM | 2020-01-01 13:10 | 100.45
IBM | 2020-01-01 13:15 | 100.29
IBM | 2020-01-01 13:20 | 100.31
and the sample values of the Data table may be
idTag | ts
========================
IBM | 2020-01-01 13:01
IBM | 2020-01-01 13:03
IBM | 2020-01-01 13:17
IBM | 2020-01-01 13:18
IBM | 2020-01-01 13:20
The expected output would be
idTag | ts | price
=================================
IBM | 2020-01-01 13:01 | 100.23
IBM | 2020-01-01 13:03 | 100.23
IBM | 2020-01-01 13:17 | 100.29
IBM | 2020-01-01 13:18 | 100.29
IBM | 2020-01-01 13:20 | 100.31
If the time stamps in both tables would match, I cuold write an INNER JOIN, but here, the timestamps don't match. I could also do this in code, e.q. Python or Java, but Prices has more than 150 million rows, I would rather not read that in.
Is there a way to do this in SQL?
Thank you very much
You can get the latest price for a date in a subquery.
select
idtag, ts,
(
select top(1) price
from prices p
where p.idtag = d.idtag
and p.ts <= d.ts
order by p.ts desc
) as price
from data d
order by idtag, ts;
(You could also move this subquery to the FROM clause and use CROSS APPLY).
Recommended index:
create index idx on prices(idtag, ts, price);
Sure, use an analytic to copy the next value of ts into the current row then use a ranged predicate:
select *
from
(select *, lead(ts) over(partition by idtag order by ts) as nextts from prices) p
inner join data d
on
d.idtag = p.idtag and
d.ts >= p.ts and
d.ts < p.nextts
where
idtag = 'IBM'
Might take a while to do on hundreds of millions of rows..
I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)
I am trying to track the usage of material with my SQL. There is no way in our database to link when a part is used to the order it originally came from. A part simply ends up in a bin after an order arrives, and then usage of parts basically just creates a record for the number of parts used at a time of transaction. I am attempting to, as best I can, link usage to an order number by summing over the data and sequentially assigning it to order numbers.
My sub queries have gotten me this far. Each order number is received on a date. I then join the usage table records based on the USEDATE needing to be equal to or greater than the RECEIVEDATE of the order. The data produced by this is as such:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE |
|----------|----------|-------------------------|-----------|---------|------------------------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 3 | 12/26/2016 2:19:32 PM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null |
The query for the above section looks as such:
SELECT
B.*,
NVL(B2.QTY, ‘0’) USEQTY
B2.USEDATE USEDATE
FROM <<Sub Query B>>
LEFT JOIN USETABLE B2 ON B.PARTNUM = B2.PARTNUM AND B2.USEDATE >= B.RECEIVEDATE
My ultimate goal here is to join USEQTY records sequentially until they have filled enough ORDERQTY’s. I also need to add an ORDERUSE column that represents what QTY from the USEQTY column was actually applied to that record. Not really sure how to word this any better so here is example of what I need to happen based on the table above:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE | ORDERUSE |
|----------|----------|-------------------------|-----------|---------|------------------------|-----------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM | 1 |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM | 1 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 2 | 12/26/2016 2:19:32 PM | 2 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM | 1 |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 0 | null | 0 |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null | 0 |
If I can get the query to pull the information like above, I will then be able to group the records together and sum the ORDERUSE column which would get me the information I need to know what orders have been used and which have not been fully used. So in the example above, if I were to sum the ORDERUSE column for each of the ORDERNUMs, orders 4412, 4111, 0393 would all show full usage. Orders 7812, 1191 would show not being fully used.
If i am reading this correctly you want to determine how many parts have been used. In your example it looks like you have 5 usages and with 5 orders coming to a total of 8 parts with the following orders having been used.
4412 - one part - one used
4111 - one part - one used
7812 - one part - one used
0393 - three
parts - two used
After a bit of hacking away I came up with the following SQL. Not sure if this works outside of your sample data since thats the only thing I used to test and I am no expert.
WITH data
AS (SELECT *
FROM (SELECT *
FROM sub_b1
join (SELECT ROWNUM rn
FROM dual
CONNECT BY LEVEL < 15) a
ON a.rn <= sub_b1.orderqty
ORDER BY receivedate)
WHERE ROWNUM <= (SELECT SUM(useqty)
FROM sub_b2))
SELECT sub_b1.ordernum,
partnum,
receivedate,
orderqty,
usage
FROM sub_b1
join (SELECT ordernum,
Max(rn) AS usage
FROM data
GROUP BY ordernum) b
ON sub_b1.ordernum = b.ordernum
You are looking for "FIFO" inventory accounting.
The proper data model should have two tables, one for "received" parts and the other for "delivered" or "used". Each table should show an order number, a part number and quantity (received or used) for that order, and a timestamp or date-time. I model both in CTE's in my query below, but in your business they should be two separate table. Also, a trigger or similar should enforce the constraint that a part cannot be used until it is available in stock (that is: for each part id, the total quantity used since inception, at any point in time, should not exceed the total quantity received since inception, also at the same point in time). I assume that the two input tables do, in fact, satisfy this condition, and I don't check it in the solution.
The output shows a timeline of quantity used, by timestamp, matching "received" and "delivered" (used) quantities for each part_id. In the sample data I illustrate a single part_id, but the query will work with multiple part_id's, and orders (both for received and for delivered or used) that include multiple parts (part id's) with different quantities.
with
received ( order_id, part_id, ts, qty ) as (
select '0030', '11A4', timestamp '2015-03-18 15:00:33', 20 from dual union all
select '0032', '11A4', timestamp '2015-03-22 15:00:33', 13 from dual union all
select '0034', '11A4', timestamp '2015-03-24 10:00:33', 18 from dual union all
select '0036', '11A4', timestamp '2015-04-01 15:00:33', 25 from dual
),
delivered ( order_id, part_id, ts, qty ) as (
select '1200', '11A4', timestamp '2015-03-18 16:30:00', 14 from dual union all
select '1210', '11A4', timestamp '2015-03-23 10:30:00', 8 from dual union all
select '1220', '11A4', timestamp '2015-03-23 11:30:00', 7 from dual union all
select '1230', '11A4', timestamp '2015-03-23 11:30:00', 4 from dual union all
select '1240', '11A4', timestamp '2015-03-26 15:00:33', 1 from dual union all
select '1250', '11A4', timestamp '2015-03-26 16:45:11', 3 from dual union all
select '1260', '11A4', timestamp '2015-03-27 10:00:33', 2 from dual union all
select '1270', '11A4', timestamp '2015-04-03 15:00:33', 16 from dual
),
(end of test data; the SQL query begins below - just add the word WITH at the top)
-- with
combined ( part_id, rec_ord, rec_ts, rec_sum, del_ord, del_ts, del_sum) as (
select part_id, order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id),
null, cast(null as date), cast(null as number)
from received
union all
select part_id, null, cast(null as date), cast(null as number),
order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id)
from delivered
),
prep ( part_id, rec_ord, del_ord, del_ts, qty_sum ) as (
select part_id, rec_ord, del_ord, del_ts, coalesce(rec_sum, del_sum)
from combined
)
select part_id,
last_value(rec_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as rec_ord,
last_value(del_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as del_ord,
last_value(del_ts ignore nulls) over (partition by part_id
order by qty_sum desc) as used_date,
qty_sum - lag(qty_sum, 1, 0) over (partition by part_id
order by qty_sum, del_ts) as used_qty
from prep
order by qty_sum
;
Output:
PART_ID REC_ORD DEL_ORD USED_DATE USED_QTY
------- ------- ------- ----------------------------------- ----------
11A4 0030 1200 18-MAR-15 04.30.00.000000000 PM 14
11A4 0030 1210 23-MAR-15 10.30.00.000000000 AM 6
11A4 0032 1210 23-MAR-15 10.30.00.000000000 AM 2
11A4 0032 1220 23-MAR-15 11.30.00.000000000 AM 7
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 4
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 0
11A4 0034 1240 26-MAR-15 03.00.33.000000000 PM 1
11A4 0034 1250 26-MAR-15 04.45.11.000000000 PM 3
11A4 0034 1260 27-MAR-15 10.00.33.000000000 AM 2
11A4 0034 1270 03-APR-15 03.00.33.000000000 PM 12
11A4 0036 1270 03-APR-15 03.00.33.000000000 PM 4
11A4 0036 21
12 rows selected.
Notes: (1) One needs to be careful if at one moment the cumulative used quantity exactly matches cumulative received quantity. All rows must be include in all the intermediate results, otherwise there will be bad data in the output; but this may result (as you can see in the output above) in a few rows with a "used quantity" of 0. Depending on how this output is consumed (for further processing, for reporting, etc.) these rows may be left as they are, or they may be discarded in a further outer-query with the condition where used_qty > 0.
(2) The last row shows a quantity of 21 with no used_date and no del_ord. This is, in fact, the "current" quantity in stock for that part_id as of the last date in both tables - available for future use. Again, if this is not needed, it can be removed in an outer query. There may be one or more rows like this at the end of the table.
There is one column called Price, and another column called Date_1, which include data from now to about one year later.
I want to find the mean value of Price across different dates. Ex, 2 weeks from now, 1 month from now, 6 months from now...
Can I use Case When function to do it?
Given:
Location_id | Date_1 | Price
------------+-------------+------
L_1 | 20-JUL-2016 | 105
L_1 | 21-JUL-2016 | 117
... | ... | ...
L_1 | 16-MAY-2017 | 103
L_2 | 20-JUL-2016 | 99
L_2 | 21-JUL-2016 | 106
... | ... | ...
L_2 | 16-MAY-2017 | 120
To get:
Location_id | Period | Average_Price
------------+----------+--------------
L_1 | 2 weeks | ...
L_1 | 6 months | ...
L_1 | 1 year | ...
L_2 | 2 weeks | ...
L_2 | 6 months | ...
L_2 | 1 year | ...
Where in "Period", '2 weeks' means 2 weeks from start date (sysdate). And "Average_Price" is the mean value of price across that period.
Thanks! This problem solved. And I cam across an additional one:
There is another table that contains date information :
Location_id | Ex_start_date | Ex_end_date
------------+-----------------+--------------
L_1 | 08-JUN-16 | 30-AUG-16
L_1 | 21-SEP-16 | 25-SEP-16
L_1 | 08-MAY-17 | 12-MAY-17
L_2 | 08-AUG-16 | 21-AUG-16
L_2 | 24-OCT-16 | 29-OCT-16
L_2 | 15-MAR-17 | 19-MAR-17
Beyond "Ex_Start_date" and "Ex_End_date" is 'Non_Ex' period. After I obtain average information of 2 weeks and 6 months period, I would like to I would like to add one more column, to obtain mean price for 'Non_Ex' and 'Ex' conditions as above.
Hopefully, a table as below can be obtained:
Location_id | Period | Ex_Condition | Average_Price
------------+----------------+----------------------------------
L_1 | 2 weeks | Ex period | ...
L_1 | 2 weeks | Non-Ex period | ...
L_1 | 6 months | Ex period | ...
L_1 | 6 months | Non-Ex period | ...
L_2 | 2 weeks | Ex period | ...
L_2 | 2 weeks | Non-Ex period | ...
L_2 | 6 months | Ex period | ...
L_2 | 6 months | Non-Ex period | ...
The average price will return 'null' if there is no dates falling in EX Period or Non-Ex Period.
And how can I make it happen? Thanks!
You could do it like this:
select location_id,
period,
sum(in_period * price) / nullif(sum(in_period), 0) as avg_price
from (select location_id,
price,
period,
case when mydate - days < sysdate then 1 else 0 end in_period
from localprice,
( select '2 weeks' as period, 14 as days from dual
union
select '6 months', 183 from dual
) intervals
) detail
group by location_id,
period
Replace localprice with the name of your table (you did not provide its name in your question).
Replace mydate with the actual name of your date column. I don't expect you called it date, as that is a reserved word and would require you to always quote it -- don't do that: choose another name.
dual is a standard object available in Oracle, which can be used to introduce rows in a query - rows which you don't have in a table somewhere.
Alternatively, you could create a table with all periods that interest you (2 weeks, 4 weeks, ..., together with the number of days they represent) and use that instead of the union select on dual.
Here is an SQL fiddle. Note that it runs on Postgres, because the Oracle instance is not available at this moment. For that reason I created dual explicitly and used current_date instead of sysdate. But for the rest it is the same.
NOT TESTED because OP didn't provide input data in usable format.
You probably want something along the lines of
select location_id, '2 weeks' as period, avg(price) as average_price
from base_table
where price is not null
and
"date" between SYSDATE and SYSDATE + 13
-- or however you want to define the two week interval
group by location_id
union all
select location_id, '6 months' as period, avg(price) as average_price
from base_table
where price is not null
and
"date" between SYSDATE and add_months(SYSDATE, 6) - 1
-- or however you want to define the six month interval
group by location_id
;
Note that date is a reserved Oracle keyword which should not be used as a column name; if you do, you'll have to use double-quotes, match case (upper and lower) exactly, and you may still run into various problems later. Better to only use table and column names that are not reserved words.
This is a re-phrased version of the #trincot answer. It should be faster over a bigger dataset.
Rows which are unwanted are skipped, not zeroed and used. You won't get a result row any more if there no localprice which match the intervals criteria.
It still only scans localprice once unlike the #mathguy answer.
If the real local price has a highly selective index on date then it can be used.
Un-commenting the line in the WHERE clause will help discard lines early i.e. before the intervals table is considered. The ORDERED hint may well be unnecessary in real life but it demonstrates the correct explain plan when using this line with this data.
Use UNION ALL rather that UNION when gluing rows which are going to be unique.
As usual, don't believe any answer until you've proved it in your circumstances.
WITH
localprice AS
( SELECT 'L_1' Location_id, TO_DATE('20-JUN-2016') "DATE", 105 Price FROM DUAL
UNION ALL
SELECT 'L_1' Location_id, TO_DATE('16-MAY-2017') "DATE", 103 Price FROM DUAL
UNION ALL
SELECT 'L_2' Location_id, TO_DATE('20-JUN-2016') "DATE", 99 Price FROM DUAL
UNION ALL
SELECT 'L_2' Location_id, TO_DATE('16-MAY-2017') "DATE", 120 Price FROM DUAL
),
intervals AS
( SELECT '2 weeks' AS period, 14 AS days FROM dual
UNION ALL
SELECT '6 months', 183 FROM dual
)
SELECT /*+ ORDERED */
location_id, period,
AVG(price) AS avg_price
FROM
localprice
CROSS JOIN
intervals
WHERE "DATE" >= SYSDATE - days
-- AND "DATE" >= SYSDATE - (SELECT MAX(days) FROM intervals)
GROUP BY location_id, period
I've got a working query which is fine for now - and just about does what I'm looking for. I'm wanting to consult, however, as to whether this is the most sensible way of manipulating my data to have it spit out what I need:
I've got a table REPORTS which stores report data. One row gets inserted when a report is run, and another when a report is confirmed. Confirming a report simply involves inserting a reserved name TRUE with the same date as the report to be confirmed. Ugly, yes. But unfortunately, it's not up to me to decide...
Table structure:
Reports
UID (char)
Report (char)
Date (date)
On having run a report, the table REPORTS might look a little like this:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | runX | 2014-01-02 03:04:59 |
| 0001 | runY | 2014-01-02 03:05:58 |
| 0001 | runX | 2014-01-02 03:06:20 |
+------+--------+---------------------+
On action 'report confirm', the following rows would be inserted:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | TRUE | 2014-01-02 03:04:59 |
| 0001 | TRUE | 2014-01-02 03:05:58 |
| 0001 | TRUE | 2014-01-02 03:06:20 |
+------+--------+---------------------+
As you can see, when a report is marked TRUE (ie correct), there are two rows with exactly the same DATE:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | runX | 2014-01-02 03:04:59 |
| 0001 | TRUE | 2014-01-02 03:04:59 |
| 0001 | runY | 2014-01-02 03:05:58 |
| 0001 | TRUE | 2014-01-02 03:05:58 |
| 0001 | runX | 2014-01-02 03:06:20 |
| 0001 | TRUE | 2014-01-02 03:06:20 |
+------+--------+---------------------+
To return all reports which are 'correct' ie TRUE and identical date/time to report name eg 'runX', I do the following:
SELECT * FROM REPORTS T1
LEFT JOIN REPORTS T2
ON T1.DATE = T2.DATE
WHERE T1.REPORT = 'TRUE'
AND T1.REPORT != T2.REPORT;
This gives me something I can at least work with. I know, however, that there must be a more elegant way of doing this? The last clause, for example: not putting that in has it spit out a cartesian product, meaning I've created a cartesian product and am then filtering it. Presumably there must be a way of avoiding it completely and not creating it in the first place?
If I understand correctly, you want to pull the name from the record at the same time as the TRUE record and only return reports that actually have a TRUE record:
select uid,
max(case when Report <> 'TRUE' then Report end) as Report,
date
from reports r
group by uid, date
having sum(case when Report = 'TRUE' then 1 else 0 end) > 0;
Note: Doing equality comparisons on dates with a time component seems dangerous. The process that creates these tables should be putting some other link to the right report in the record. For instance, it could update a check flag column rather than create a new row.
EDIT:
Why is joining on dates (with times) as bad idea? Often, dates are shown as only dates, without the time component. That means that two dates can look the same in output, but really be different. Or, two dates can be in different time zones and look different but be the same.
Oracle mitigates the first problem by storing dates up to the second, in an exact format. Two dates that look the same to the second are the same. Equivalent data types in other databases sometimes include milliseconds -- although these are rarely printed out with the value. Two dates with times up to the second can look the same and still be different. In Oracle, you could say that two dates with times up to the minute can look the same and still be different.
The same phenomenon happens with floating point data types -- 1.0000000 and 0.9999999 are different, but they look the same when shown as 1.000. A join on these values would fail, even though looking at the values would suggest that it would succeed.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE reports ( "UID", Report, "Date" ) AS
SELECT '0001', 'runX', TO_DATE( '2014-01-02 03:04:59', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:04:59', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'runY', TO_DATE( '2014-01-02 03:05:58', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:05:58', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'runX', TO_DATE( '2014-01-02 03:06:20', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:06:20', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL;
Query 1:
SELECT "UID",
MAX( CASE Report WHEN 'TRUE' THEN NULL ELSE Report END ) AS Report,
"Date"
FROM reports
GROUP BY "UID", "Date"
HAVING MAX( CASE Report WHEN 'TRUE' THEN 1 ELSE 0 END ) = 1
Results:
| UID | REPORT | DATE |
|------|--------|--------------------------------|
| 0001 | runX | January, 02 2014 03:04:59+0000 |
| 0001 | runY | January, 02 2014 03:05:58+0000 |
| 0001 | runX | January, 02 2014 03:06:20+0000 |
Query 2:
Assuming that when a report is marked as FALSE if it is not correct then you could do:
SELECT "UID",
Report,
"Date"
FROM ( SELECT "UID",
Report,
LEAD( Report )
OVER (
PARTITION BY "UID", "Date"
ORDER BY CASE Report
WHEN 'TRUE' THEN 2
WHEN 'FALSE' THEN 1
ELSE 0
END ) AS Result,
"Date"
FROM Reports )
WHERE Result = 'TRUE'
Results:
| UID | REPORT | DATE |
|------|--------|--------------------------------|
| 0001 | runX | January, 02 2014 03:04:59+0000 |
| 0001 | runY | January, 02 2014 03:05:58+0000 |
| 0001 | runX | January, 02 2014 03:06:20+0000 |