How to average values in one table based on the condition involving another table in SQL? - sql

I have two tables. One defines time intervals (beginning and end). Time intervals are not equal in length. Another contains product ID, start and end date of the product.
TableOne:
Interval StartDateTime EndDateTime
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00
TableTwo
ProductID ProductStartDateTime ProductEndDateTime
ASSDWE1 2018-01-04 00:12:00 2020-04-10 20:00:30
ADFGHER 2020-01-05 00:11:30 2020-01-19 00:00:00
ASDFVBN 2017-10-10 00:12:10 2020-02-23 00:23:23
I need to compute the average length of the products from TableTwo that existed during time intervals defined in TableOne. If the product existed throughout the time interval from TableOne, then the length of the product during this time interval is defined as it length since its start date till the end of the time interval.
I tried the following
select
a.*,
(select
AVG(datediff(day, b.ProductStartDateTime, IIF (b.ProductEndDateTime> a.EndDateTime, a.EndDateTime
,b.ProductEndDateTime))) --compute average length of the products
FROM #TableTwo b
WHERE ( not (b.ProductEndDateTime <= a.StartDateTime ) and not (b.ProductStartDateTime >= a.EndDateTime) )
-- select products that existed during interval from #TableOne
) as AverageProductLength
from #TableOne a
I get the mistake "Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression."
The result I want:
Interval StartDateTime EndDateTime AverageProductLength
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00 23
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00 34.5
Is there a way I can do the averaging?

Related

SQL - Fuzzy JOIN on Timestamp columns within X amount of time

Say I have two tables:
a:
timestamp
precipitation
2015-08-03 21:00:00 UTC
3
2015-08-03 22:00:00 UTC
3
2015-08-04 3:00:00 UTC
4
2016-02-04 18:00:00 UTC
4
and b:
timestamp
loc
2015-08-03 21:23:00 UTC
San Francisco
2016-02-04 16:04:00 UTC
New York
I want to join to get a table who has fuzzy joined entries where every row in b tries to get joined to a row in a. Criteria:
The time is within 60 minutes. If a match does not exist within 60 minutes, do not include that row in the output.
In the case of a tie where some row in b could join onto two rows in a, pick the closest one in terms of time.
Example Output:
timestamp
loc
precipitation
2015-08-03 21:00:00 UTC
San Francisco
3
What you need is an ASOF join. I don't think there is an easy way to do this with BigQuery. Other databases like Kinetica (and I think Clickhouse) support ASOF functions that can be used to perform 'fuzzy' joins.
The syntax for Kinetica would be something like the following.
SELECT *
FROM a
LEFT JOIN b
ON ASOF(a.timestamp, b.timestamp, INTERVAL '0' MINUTES, INTERVAL '60' MINUTES, MIN)
The ASOF function above sets up an interval of 60 minutes within which to look for matches on the right side table. When there are multiple matches, it selects the one that is closest (MAX would pick the one that is farthest away).
As per my understanding and based on the data you provided I think the below query should work for your use case.
create temporary table a as(
select TIMESTAMP('2015-08-03 21:00:00 UTC') as ts, 3 as precipitation union all
select TIMESTAMP('2015-08-03 22:00:00 UTC'), 3 union all
select TIMESTAMP('2015-08-04 3:00:00 UTC'), 4 union all
select TIMESTAMP('2016-02-04 18:00:00 UTC'), 4
);
create temporary table b as(
select TIMESTAMP('2015-08-03 21:23:00 UTC') as ts,'San Francisco ' as loc union all
select TIMESTAMP('2016-02-04 14:04:00 UTC') as ts,'New York ' as loc
);
select b_ts,a_ts,loc,precipitation,diff_time_sec
from(
select b.ts b_ts,a.ts a_ts,
ABS(TIMESTAMP_DIFF(b.ts,a.ts, SECOND)) as diff_time_sec,
*
from b
inner join a on b.ts between date_sub(a.ts, interval 60 MINUTE) and date_add(a.ts, interval 60 MINUTE)
)
qualify RANK() OVER(partition by b_ts ORDER BY diff_time_sec) = 1

PGSQL query to get a list of sequential dates from today

I am having an calendar table where I have added the list of dates on which no action should be performed.
The table is as follows and the date format is YYYY-MM-DD
date
2021-01-01
2021-04-05
2021-04-06
2021-04-07
2021-08-10
2021-11-22
2021-11-23
2021-11-24
2021-12-25
2021-12-31
Considering today is 2021-11-24.
The expected output is
date
2021-11-24
2021-11-23
2021-11-22
And Considering today is 2021-12-25
then the expected output is
date
2021-12-25
And Considering today is 2021-12-27
then the output should contain no data.
date
It should get me the sequence with today's date in descending order without a break of sequence.
I searched on various posts I did find some of the posts related to my question but the query was little complex with nested subqueries. Is there a way to achieve the output in a more optimized way. I am new to pgsql.
Create example table:
CREATE TABLE calendar (d date);
INSERT INTO calendar VALUES ('2021-11-23'),('2021-11-20');
Query:
SELECT * FROM
(SELECT CURRENT_DATE - '1 day'::interval * generate_series(0,10) AS d) a
LEFT JOIN calendar c ON (c.d=a.d);
a.d | c.d
---------------------+------------
2021-11-14 00:00:00 | Null
2021-11-15 00:00:00 | Null
2021-11-16 00:00:00 | Null
2021-11-17 00:00:00 | Null
2021-11-18 00:00:00 | Null
2021-11-19 00:00:00 | Null
2021-11-20 00:00:00 | 2021-11-20
2021-11-21 00:00:00 | Null
2021-11-22 00:00:00 | Null
2021-11-23 00:00:00 | 2021-11-23
2021-11-24 00:00:00 | Null
Subquery "a" generates a date series, and then we join it to the table.
You can add conditions , for example "WHERE calendar.d IS NULL", or "IS NOT NULL" depending on the filtering you want.
You can simply filter by a date range, building it by subtracting 2 days from today:
select "date"
from maintenance_dates_70099898
where "date" <= now()::date --you want to see today and 2 days prior; Last 3 days total
and "date" >= now()::date - '2 days'::interval
order by 1 desc;
With a runnable test:
drop table if exists maintenance_dates_70099898;
create table maintenance_dates_70099898 ("date" date);
insert into maintenance_dates_70099898
("date")
values
('2021-01-01'),
('2021-04-05'),
('2021-04-06'),
('2021-04-07'),
('2021-08-10'),
('2021-11-22'),
('2021-11-23'),
('2021-11-24'),
('2021-12-25'),
('2021-12-31');
select "date"
from maintenance_dates_70099898
where "date" <= now()::date --you want to see today and 2 days prior; Last 3 days total
and "date" >= now()::date - '2 days'::interval
order by 1 desc;
-- date
--------------
-- 2021-11-24
-- 2021-11-23
-- 2021-11-22
--(3 rows)
select "date"
from maintenance_dates_70099898
where "date" >= '2021-12-25'::date - '2 days'::interval
and "date" <= '2021-12-25'::date
order by 1 desc;
-- date
--------------
-- 2021-12-25
--(1 row)
I assume that for 2021-12-27 you do want to see 2021-12-25, as it's within the 3 day range prior.
select "date"
from maintenance_dates_70099898
where "date" >= '2021-12-28'::date - '2 days'::interval
and "date" <= '2021-12-28'::date
order by 1 desc;
-- date
--------
--(0 rows)
The main issue appears to be not having a known number of days thus disabling a simple range validation/selection. However to the rescue there is a RECURSIVE cte to pluck off each previous date that is exactly 1 day prior to the last and terminate when no longer holds.
with recursive no_action(no_act_dt) as
( select no_act_dt
from no_action_calendar
where no_act_dt = :parm_date::date
union all
select c.no_act_dt
from no_action_calendar c
join no_action a
on (c.no_act_dt = a.no_act_dt - 1)
)
select *
from no_action
order by no_act_dt desc;
If you use this often or from several points, you can parametrize it with a SQL function. (see demo for both).
create or replace
function consective_no_action_dates (date_in date)
returns setof date
language sql
as $$
with recursive no_action(no_act_dt) as
( select no_act_dt
from no_action_calendar
where no_act_dt = date_in
union all
select c.no_act_dt
from no_action_calendar c
join no_action a
on (c.no_act_dt = a.no_act_dt - 1)
)
select *
from no_action
order by no_act_dt desc;
$$;

How can I extract the values of the last aggregation date in sql

I have the following table.
id user time_stamp
1 Mike 2020-02-13 00:00:00 UTC
2 John 2020-02-13 00:00:00 UTC
3 Levy 2020-02-12 00:00:00 UTC
4 Sam 2020-02-12 00:00:00 UTC
5 Frodo 2020-02-11 00:00:00 UTC
Let's say 2020-02-13 00:00:00 UTC is the last day and I would like to query this table to only display last days results? I want to create a view in Bigquery so that I only and always get the last day's results?
So that in the end I get something like this (For last day which is 2020-02-13 00:00:00 UTC )
id user time_stamp
1 Mike 2020-02-13 00:00:00 UTC
2 John 2020-02-13 00:00:00 UTC
You can use window functions:
select t.* except (seqnum)
from (select t.*,
dense_rank() over (order by time_stamp) as seqnum
from t
) t
where seqnum = 1;
This may not work well on a large amount of data -- because of the way that BQ implements window functions with no partitioning. So, you might find that this works better (especially if the above runs out of resources):
select t.*
from t join
(select max(time_stamp) as max_time_stamp
from t
) tt
on t.time_stamp = max_time_stamp;
Also, if the timestamps actually have date components, then you will want to convert to a date or remove the time component somehow.

Can not understand the logic of this query

This query is trying to get the s1ppmp (the price of product) of each s1ilie (size), each s1iref (reference) and s1ydat (the lastest date) for the price, because one product could have more than one price on different dates, for example, during the black friday or the normal price for the other days.
The anmoisjour comes from calender table, but there is no connection between CALENDER table and main table msk100, so ... I don't understand the logic of this query...
SELECT s1isoc,
s1ilie,
s1iref,
s1ydat,
anmoisjour,
s1ppmp
FROM msk110
INNER JOIN (SELECT s1isoc AS isoc,
s1ilie AS ilie,
s1iref AS iref,
MAX(s1ydat) AS ydat,
anmoisjour
FROM calendrier,
msk110
WHERE s1ydat <= anmoisjour
AND anmoisjour BETWEEN 20100101 AND 20302131
GROUP BY s1isoc,
s1ilie,
s1iref,
anmoisjour) a ON s1isoc = isoc
AND s1ilie = ilie
AND s1iref = iref
AND s1ydat = ydat
WHERE s1isoc = 1
AND anmoisjour BETWEEN 20100101 AND 20302131
ORDER BY anmoisjour,
s1ydat;
s1isoc, s1ilie, s1iref, s1ydat,and s1ppmp comes from msk110
and
anmoisjour belongs to calender table, which is a date table.
I believe the confusion is the way that the calendar table is joined.
If anmoisjour is the day column of the calendar table and this table holds 1 row per day, the WHERE filter anmoisjour BETWEEN 20100101 AND 20302131 makes calendrier hold a row for each day for 20 years (2010 to 2030).
They way the product prices table msk100 is linked to the calendar calendrier table is not directly by date, but with a max date (msk100.s1ydat <= calendrier.anmoisjour). This means that for example a date of msk100.s1ydat that's 2015-01-01 will join against every row of the calendar table thats between 2015-01-01 and 2030-12-31.
The GROUP BY is by the calendar table's date (calendrier.anmoisjour) this means that if a particular product, size and price repeats on different dates, let's say the only occurrences are on dates 2015-01-01, 2017-01-01 and 2020-01-01, then the result of the group by would be the following (ordered by calendar date, displaying even NULL to demonstrate):
MAX(s1ydat) anmoisjour
null 2010-01-01
null ...
null 2014-12-31
2015-01-01 2015-01-01
2015-01-01 2015-01-02
2015-01-01 ...
2015-01-01 2016-01-01
2015-01-01 ...
2017-01-01 2017-01-01
2017-01-01 2017-01-02
2017-01-01 ...
2017-01-01 2019-12-31
2020-01-01 2020-01-01
2020-01-01 2025-01-01
2020-01-01 ...
What your query is showing is the contents of the product table with the last date that that particular product had that particular price, for each day over 20 years, also where s1isoc = 1 (which I don't know what that means).

Oracle SQL query about Date

I have a database table named availableTimeslot with fields pk, startDate, endDate, e.g.
PK startDate endDate
1. 2017-03-07 09:00:00 2017-03-07 18:00:00
2. 2017-03-07 18:00:00 2017-03-07 21:00:00
3. 2017-03-08 09:00:00 2017-03-08 18:00:00
records starting from 09:00:00 to 18:00:00 indicate it is a morning time slot, while 18:00:00 to 23:00:00 indicating it is a afternoon time slot
storing available timeslot dates (e.g. 2017-03-06, 2017-03-08) which are available for the customer to choose one.
Can I use one query to get exactly 10 available time slots dates starting on the day after the order date?
e.g. if I order a product on 2016-03-07, then the query returns
2017-03-08 09:00:00
2017-03-08 18:00:00
2017-03-09 09:00:00
2017-03-09 18:00:00
2017-03-10 ...
2017-03-11 ...
2017-03-13 ...
as 12 is a public holiday and not in the table.
In short, it returns 10 dates (5 days with each day having am and pm sessions)
remark: the available time slot dates are in order, but may not be consecutive
select available_date
from ( select available_date, row_number() over (order by available_date) as rn
from your_table
where available_date > :order_date
)
where rn <= 5;
:order_date is a bind variable - the date entered by the user/customer through the interface.
Do you want 5 for a single customer?
select ts.*
from (select ts.*
from customer c join
timeslots ts
on ts.date > c.orderdate
where c.customerid = v_customerid
order by ts.date asc
) ts
where rownum <= 5