Oracle SELECT with multiple AND conditions for multiple columns (INTERSECT alternative) - sql

How do I solve the following problem:
Imagine we have a large building with about 100 temperature readers and each one collects the temperature every minute.
I have a rather large table (~100m) rows with the following columns:
Table TempEvents:
Timestamp - one entry per minute
Reader ID - about 100 separate readers
Temperature - Integer (-40 -> +40)
Timestamp and Reader ID are primary+secondary keys to the table. I want to perform a query which finds all the timestamps wherereader_01 = 10 degrees,reader_02 = 15 degrees andreader_03 = 20 degrees.
In other words something like this:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
AND (readerID=02 AND temperature=15)
AND (readerID=03 AND temperature=20)
==> Resulting in a list of timestamps:
Timestamp::
2016-01-01 05:45:00
2016-02-01 07:23:00
2016-03-01 11:56:00
2016-04-01 23:21:00
The above query returns nothing since a single row does not include all conditions at once. Using OR in between the conditions is also not producing the desired result since all readers should match the condition.
Using INTERSECT, I can get the result by:
SELECT * FROM
(SELECT Timestamp FROM TempEvents WHERE readerID=01 AND temperature=10
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=02 AND temperature=15
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=03 AND temperature=20
)
GROUP BY Timestamp ORDER BY Timestamp ASC;
The above query is extremely costly and takes about 5 minutes to execute.
Is there a better (quicker) way to get the result?

I just tried this in Oracle DB and it seems to work:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
OR (readerID=02 AND temperature=15)
OR (readerID=03 AND temperature=20)
Make sure to only change the AND outside of parenthesis

Try this:
with Q(readerID,temperature) as(
select 01, 10 from dual
union all
select 02,15 from dual
union all
select 03,20 from dual
)
select Timestamp FROM TempEvents T, Q
where T.readerID=Q.readerID and T.temperature=Q.temperature
group by Timestamp
having count(1)=(select count(1) from Q)
Perhaps this will give a better plan than using OR or IN clause.

If the number of readers you have to query is not too large you might try using a join-query like
select distinct Timestamp
from TempEvents t1
join TempEvents t2 using(Timestamp)
join TempEvents t3 using(Timestamp)
where t1.readerID=01 and t1.temperature = 10
and t2.readerID=02 and t2.temperature = 15
and t3.readerID=03 and t3.temperature = 20
But to be honest I doubt it will perform better than your INTERSECT-query.

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

Other efficient way to write multiple select

I have following queries:
select * from
( select volume as vol1 from table1 where code='A1' and daytime='12-may-2012') a,
( select volume as vol2 from table2 where code='A2' and daytime='12-may-2012') b,
( select volume as vol3 from table3 where code='A3' and daytime='12-may-2012') c
result:
vol1 vol2 vol3
20 45
What would be other efficient way to write this query(in real case it could be up to 15 sub queries), assuming data not always exists in any of these tables for selected date? I think it could be join but not sure.
thanks,
S
If the concern is that data might not exist, then cross join is not the right operator. If any subquery returns zero rows, then you will get an empty result set.
Assuming at most one row is returned per query, just use subqueries in the select:
select (select volume from table1 where code = 'A1' and daytime = date '2012-05-12') as vol1,
(select volume from table2 where code = 'A2' and daytime = date '2012-05-12') as vol2,
(select volume from table3 where code = 'A3' and daytime = date '2012-05-12') as vol3
from dual;
If a value is missing, it will be NULL. If a subquery returns more than one row, then you'll get an error.
I much prefer ANSI standard formats, which is why I use the date keyword.
I am highly suspicious of comparing a field called datetime to a date constant with no time component. I would double check the logic on this. Perhaps you intend trunc(daytime) = date '2012-05-12' or something similar.
I should also note that if performance is an issue, then you want an index on each table on (code, daytime, volume).

generate each minute string for a day within specified time limit

My aim is to generate per minute count of all records existing in a table like this.
SELECT
COUNT(*) as RECORD_COUNT,
to_Char(MY_DATE,'HH24:MI') MINUTE_GAP
FROM
TABLE_A
WHERE
BLAH='Blah! Blah!!'
GROUP BY
to_Char(MY_DATE,'HH24:MI')
However, This query doesn't give me the minutes where there were no results.
To get the desired result it, I'm to using the following query to fill the gaps in the original query by doing a JOIN between these two results.
SELECT
*
FROM
( SELECT
TO_CHAR(TRUNC(SYSDATE)+( (ROWNUM-1) /1440) ,'HH24:MI') as MINUTE_GAP,
0 as COUNT
FROM
SOME_LARGE_TABLE_B
WHERE
rownum<=1440
)
WHERE
minute_gap>'07:00' /*I want only the data starting from 7:00AM*/
This works for me, But
I can't rely on SOME_LARGE_TABLE_B to generate the minutes
because it might have no records at some point in future
The query doesn't look like a professional solution.
Is there any easier way to do this?
NOTE:I don't want any new tables created with static values for all the minutes just for one query.
Just generate your timestamps and left join your grouped data to it:
SELECT MINUTE, ....
FROM (
SELECT TO_CHAR(TO_DATE((LEVEL + 419) * 60, 'SSSSS'), 'HH24:MI') MINUTE /* 07:00 - 23:59 */ FROM DUAL CONNECT BY LEVEL <= 1020)
LEFT JOIN (
<your grouped subquery>
) ON MINUTE = MINUTE_GAP

Get average interval between pairs of rows in a table

I have a table with the following data (paypal transactions):
txn_type | date | subscription_id
----------------+----------------------------+---------------------
subscr_signup | 2014-01-01 07:53:20 | S-XXX01
subscr_signup | 2014-01-05 10:37:26 | S-XXX02
subscr_signup | 2014-01-08 08:54:00 | S-XXX03
subscr_eot | 2014-03-01 08:53:57 | S-XXX01
subscr_eot | 2014-03-05 08:58:02 | S-XXX02
I want to get the average subscription length overall for a given time period (subscr_eot is the end of a subscription). In the case of a subscription that is still ongoing ('S-XXX03') I want it to be included from it's start date until now in the average.
How would I go about doing this with an SQL statement in Postgres?
SQL Fiddle. Subscription length for each subscription:
select
subscription_id,
coalesce(t2.date, current_timestamp) - t1.date as subscription_length
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
order by t1.subscription_id
The average:
select
avg(coalesce(t2.date, current_timestamp) - t1.date) as subscription_length_avg
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
I used a couple of common table expressions; you can take the pieces apart pretty easily to see what they do.
One of the reasons this SQL is complicated is because you're storing column names as data. (subscr_signup and subscr_eot are actually column names, not data.) This is a SQL anti-pattern; expect it to cause you much pain.
with subscription_dates as (
select
p1.subscription_id,
p1.date as subscr_start,
coalesce((select min(p2.date)
from paypal_transactions p2
where p2.subscription_id = p1.subscription_id
and p2.txn_type = 'subscr_eot'
and p2.date > p1.date), current_date) as subscr_end
from paypal_transactions p1
where txn_type = 'subscr_signup'
), subscription_days as (
select subscription_id, subscr_start, subscr_end, (subscr_end - subscr_start) + 1 as subscr_days
from subscription_dates
)
select avg(subscr_days) as avg_days
from subscription_days
-- add your date range here.
avg_days
--
75.6666666666666667
I didn't add your date range as a WHERE clause, because it's not clear to me what you mean by "a given time period".
Using the window function lag(), this becomes considerably shorter:
SELECT avg(ts_end - ts) AS avg_subscr
FROM (
SELECT txn_type, ts, lag(ts, 1, localtimestamp)
OVER (PARTITION BY subscription_id ORDER BY txn_type) AS ts_end
FROM t
) sub
WHERE txn_type = 'subscr_signup';
SQL Fiddle.
lag() conveniently takes a default value for missing rows. Exactly what we need here, so we don't need COALESCE in addition.
The query builds on the fact that subscr_eot sorts before subscr_signup.
Probably faster than presented alternatives so far because it only needs a single sequential scan - even though the window functions add some cost.
Using the column ts instead of date for three reasons:
Your "date" is actually a timestamp.
"date" is a reserved word in standard SQL (even if it's allowed in Postgres).
Never use basic type names as identifiers.
Using localtimestamp instead of now() or current_timestamp since you are obviously operating with timestamp [without time zone].
Also, your columns txn_type and subscription_id should not be text
Maybe an enum for txn_type and integer for subscription_id. That would make table and indexes considerably smaller and faster.
For the query at hand, the whole table has to be read an indexes won't help - except for a covering index in Postgres 9.2+, if you need the read performance:
CREATE INDEX t_foo_idx ON t (subscription_id, txn_type, ts);

SQL Average Inter-arrival Time, Time Between Dates

I have a table with sequential timestamps:
2011-03-17 10:31:19
2011-03-17 10:45:49
2011-03-17 10:47:49
...
I need to find the average time difference between each of these(there could be dozens) in seconds or whatever is easiest, I can work with it from there. So for example the above inter-arrival time for only the first two times would be 870 (14m 30s). For all three times it would be: (870 + 120)/2 = 445 (7m 25s).
A note, I am using postgreSQL 8.1.22 .
EDIT: The table I mention above is from a different query that is literally just a one-column list of timestamps
Not sure I understood your question completely, but this might be what you are looking for:
SELECT avg(difference)
FROM (
SELECT timestamp_col - lag(timestamp_col) over (order by timestamp_col) as difference
FROM your_table
) t
The inner query calculates the distance between each row and the preceding row. The result is an interval for each row in the table.
The outer query simply does an average over all differences.
i think u want to find avg(timestamptz).
my solution is avg(current - min value). but since result is interval, so add it to min value again.
SELECT avg(target_col - (select min(target_col) from your_table))
+ (select min(target_col) from your_table)
FROM your_table
If you cannot upgrade to a version of PG that supports window functions, you
may compute your table's sequential steps "the slow way."
Assuming your table is "tbl" and your timestamp column is "ts":
SELECT AVG(t1 - t0)
FROM (
-- All this silliness would be moot if we could use
-- `` lead(ts) over (order by ts) ''
SELECT tbl.ts AS t0,
next.ts AS t1
FROM tbl
CROSS JOIN
tbl next
WHERE next.ts = (
SELECT MIN(ts)
FROM tbl subquery
WHERE subquery.ts > tbl.ts
)
) derived;
But don't do that. Its performance will be terrible. Please do what
a_horse_with_no_name suggests, and use window functions.