I am trying to exclude the results of an inner query in SQL (I am currently working on Googles cloud platform), I have the following table:
date | name
-----------+------------
2019-09-10 | gas_300x10
2019-09-10 | gas_250x10
2019-09-10 | gas_3x3
2019-09-11 | gas_300x10
2019-09-11 | gas_250x10
2019-09-11 | gas_4x4
I am trying to exclude the values where the name is equal to gas_300x10 and gas_250x10 for the date of 2019-09-10 only!
I want to keep the other values from that date and also want to keep where gas_300x10 and gas_250x10 occur on other days for example on the day 2019-09-11.
I have the following query which excludes the values for the date I do not want - so I don't want those two values for 2019-09-10:
SELECT *
FROM my_table
WHERE date = '2019-09-10'
AND (name = 'gas_300x10' OR name = 'gas_250x10')
This query would essentially return those values I do not want - how can I embed this as an inner query so that these results are excluded from the rest of the data?
I have tried using EXCEPT and NOT IN as a subquery but have not found any luck!
I think the code would work like this but I am unsure:
SELECT *
FROM my_table
EXCEPT
SELECT *
FROM my_table
WHERE date = '2019-09-10'
AND (name = 'gas_300x10' OR name = 'gas_250x10')
Use a combined expression:
select *
from mytable
where not (date = date '2019-09-10' and name in ('gas_300x10', 'gas_250x10'));
or
select *
from mytable
where date <> date '2019-09-10' or name not in ('gas_300x10', 'gas_250x10');
I would suggest NOT:
SELECT *
FROM my_table
WHERE NOT (date = '2019-09-10' AND
name IN ('gas_300x10', 'gas_250x10')
);
Note the use of IN to simplify the logic.
Alternatively, you can write this as:
SELECT *
FROM my_table
WHERE date <> '2019-09-10' OR
name NOT IN ('gas_300x10', 'gas_250x10');
Both of these assume that date and name are not NULL. The logic can be tweaked to handle this pretty easily, if that is possible.
I would NOT recommend using EXCEPT. First, it removes duplicates, so it does not do exactly the same logic. Second, it is doing much more work than necessary, matching the results of two subqueries rather than just filtering one table.
Related
I am trying to get all results from Oracle DB using SQL Developer by corresponding date.
My data:
ID | date_time_of_identification
--------------------------------------------
1240088696 | 22-SEP-19 06.24.23.432000000 AM
1239485087 | 21-SEP-19 09.25.45.912000000 AM
1239228398 | 21-SEP-19 07.18.40.555000000 AM
1239223300 | 21-SEP-19 07.16.39.812000000 AM
1233224199 | 18-SEP-19 10.54.04.023000000 AM
1232432331 | 18-SEP-19 05.06.40.383000000 AM
1231492850 | 17-SEP-19 01.06.05.316000000 PM
So I am trying to get all rows from 21.09.2019, then I am writing:
select * from mytable where date_time_of_identification = TO_DATE('2019/09/21', 'yyyy/mm/dd'); -- no result
Now I am trying to write better query:
select * from mytable
where to_char(date_time_of_identification, 'yyyy/mm/dd') = to_char(TO_DATE('2019/09/21', 'yyyy/mm/dd'), 'yyyy/mm/dd');
It returns good result, but Is there a better solution?
You'll have to truncate your date from column to lose the timestamp part:
select *
from mytable
where trunc(date_time_of_identification) = TO_DATE('2019/09/21', 'yyyy/mm/dd');
Assuming that your predicate is reasonably selective (i.e. the number of rows on a particular day is a small fraction of the number of rows in the table), you'd generally want your query to be able to use an index on date_time_of_identification. If you apply a function to that column, you won't be able to use an index. So you'd generally want to write this as
select *
from myTable
where date_time_of_identification >= date '2019-09-21'
and date_time_of_identification < date '2019-09-22'
The alternative would be to create a function-based index on date_time_of_identification and then use that function in the query.
create index fbi_myTable
on trunc( date_time_of_identification );
select *
from myTable
where trunc( date_time_of_identification ) = date '2019-09-21';
I have a Employee table like below
ID Name Mobile Ondate Address
1 Ankit 1234567895 2016-11-08 10:10:04.540 abc
2 Amit 4521545258 2016-11-08 11:10:04.540 bcd
3 Amit2 7541258562 2016-11-08 12:10:04.540 gfd
Now i write select query like below then it gives all records of Employee table
select * from Employee where convert(date,ondate)='2016-11-08 12:10:04.540'
but when i pass getdate() direct in where condition then it gives empty
select * from Employee where convert(date,ondate)=getdate()
while select getdate() result is 2016-11-08 12:10:04.540
so please give proper reason about it.
This is data type precedence at work. In your first query, in the WHERE clause you have a date on one side of a comparison and a varchar on the other. date wins, your string is converted to a date, the time is ignored and every row matches.
In your second query, you have a date on one side of the comparison and a datetime on the other side. datetime wins, the date is converted (back) into a datetime, and the datetimes don't match on their time components.
If you want to select values for today, use something like:
select * from Employee
where ondate >= DATEADD(day,DATEDIFF(day,0,GETDATE()),0) and
ondate < DATEADD(day,DATEDIFF(day,0,GETDATE()),1)
Where the DATEADD/DATEDIFF expressions are effectively computing "midnight at the start of today" and "midnight at the start of tomorrow". Both expressions will be computed once, and any index on the ondate column can then be used, if one exists, and we avoid excessively transforming column data.
try this.
select * from Employee where convert(date,ondate)=Convert(date,getdate())
SELECT * from Employee where DATEDIFF(DAY,Ondate,'2016-11-08 10:10:04.540') = 0
OR
SELECT * from Employee where DATEDIFF(DAY,Ondate,GETDATE()) = 0
My question is specific to my problem at hand, so I would try to explain the scenario first. I need to write a sql query. Following is the scenario:
Table columns for table1:
effective_date
expire_date
amount_value
state_id
amount_entry_type
Case 1, Input values:
input_date
I've achieved it using following sql query:
Sample Query:
select state_id, sum(amount)
from table1
where state_id=3 and (input_date between effective_date and expiry_date)
group by state_id;
My Question:
Now I've a date range and I wish to achieve the above for all the dates between date range.
Input values 2:
input_start_date
input_end_date
Expected Output:
Find the sum of amount_value grouped by states where input_date between effective and expire_date for input_date between input_start_date and input_end_date.
So the query would give following sample result for date range 2016-07-07 and 2016-07-08 :
state amount_sum date
California 100 2016-07-07
Florida 200 2016-07-08
I'm using postgres as database and django for querying and processing the result.
Options:
1. Fetch all the data and process using python.
2. Loop over given date range and fire the query above as:
for input_date in date_range(input_start_date, input_end_date):
//Execute above query
Both above solutions might have performance issues so I was wondering if I could achieve it using single sql query.
You can indeed do this with a single query, using the generate_series() set-returning-function to make the list of days. If you are sure that all dates have corresponding rows for the state then you can you use a regular JOIN, otherwise use a LEFT JOIN as below.
SELECT state_id, sum(amount), dt AS "date"
FROM generate_series(input_start_date, input_end_date, '1 day') dates(dt)
LEFT JOIN table1 ON state_id = 3 AND (dt BETWEEN effective_date AND expiry_date)
GROUP BY state_id, dt;
How do I solve the following problem:
Imagine we have a large building with about 100 temperature readers and each one collects the temperature every minute.
I have a rather large table (~100m) rows with the following columns:
Table TempEvents:
Timestamp - one entry per minute
Reader ID - about 100 separate readers
Temperature - Integer (-40 -> +40)
Timestamp and Reader ID are primary+secondary keys to the table. I want to perform a query which finds all the timestamps wherereader_01 = 10 degrees,reader_02 = 15 degrees andreader_03 = 20 degrees.
In other words something like this:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
AND (readerID=02 AND temperature=15)
AND (readerID=03 AND temperature=20)
==> Resulting in a list of timestamps:
Timestamp::
2016-01-01 05:45:00
2016-02-01 07:23:00
2016-03-01 11:56:00
2016-04-01 23:21:00
The above query returns nothing since a single row does not include all conditions at once. Using OR in between the conditions is also not producing the desired result since all readers should match the condition.
Using INTERSECT, I can get the result by:
SELECT * FROM
(SELECT Timestamp FROM TempEvents WHERE readerID=01 AND temperature=10
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=02 AND temperature=15
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=03 AND temperature=20
)
GROUP BY Timestamp ORDER BY Timestamp ASC;
The above query is extremely costly and takes about 5 minutes to execute.
Is there a better (quicker) way to get the result?
I just tried this in Oracle DB and it seems to work:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
OR (readerID=02 AND temperature=15)
OR (readerID=03 AND temperature=20)
Make sure to only change the AND outside of parenthesis
Try this:
with Q(readerID,temperature) as(
select 01, 10 from dual
union all
select 02,15 from dual
union all
select 03,20 from dual
)
select Timestamp FROM TempEvents T, Q
where T.readerID=Q.readerID and T.temperature=Q.temperature
group by Timestamp
having count(1)=(select count(1) from Q)
Perhaps this will give a better plan than using OR or IN clause.
If the number of readers you have to query is not too large you might try using a join-query like
select distinct Timestamp
from TempEvents t1
join TempEvents t2 using(Timestamp)
join TempEvents t3 using(Timestamp)
where t1.readerID=01 and t1.temperature = 10
and t2.readerID=02 and t2.temperature = 15
and t3.readerID=03 and t3.temperature = 20
But to be honest I doubt it will perform better than your INTERSECT-query.
I have a table with the following data (paypal transactions):
txn_type | date | subscription_id
----------------+----------------------------+---------------------
subscr_signup | 2014-01-01 07:53:20 | S-XXX01
subscr_signup | 2014-01-05 10:37:26 | S-XXX02
subscr_signup | 2014-01-08 08:54:00 | S-XXX03
subscr_eot | 2014-03-01 08:53:57 | S-XXX01
subscr_eot | 2014-03-05 08:58:02 | S-XXX02
I want to get the average subscription length overall for a given time period (subscr_eot is the end of a subscription). In the case of a subscription that is still ongoing ('S-XXX03') I want it to be included from it's start date until now in the average.
How would I go about doing this with an SQL statement in Postgres?
SQL Fiddle. Subscription length for each subscription:
select
subscription_id,
coalesce(t2.date, current_timestamp) - t1.date as subscription_length
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
order by t1.subscription_id
The average:
select
avg(coalesce(t2.date, current_timestamp) - t1.date) as subscription_length_avg
from
(
select *
from t
where txn_type = 'subscr_signup'
) t1
left join
(
select *
from t
where txn_type = 'subscr_eot'
) t2 using (subscription_id)
I used a couple of common table expressions; you can take the pieces apart pretty easily to see what they do.
One of the reasons this SQL is complicated is because you're storing column names as data. (subscr_signup and subscr_eot are actually column names, not data.) This is a SQL anti-pattern; expect it to cause you much pain.
with subscription_dates as (
select
p1.subscription_id,
p1.date as subscr_start,
coalesce((select min(p2.date)
from paypal_transactions p2
where p2.subscription_id = p1.subscription_id
and p2.txn_type = 'subscr_eot'
and p2.date > p1.date), current_date) as subscr_end
from paypal_transactions p1
where txn_type = 'subscr_signup'
), subscription_days as (
select subscription_id, subscr_start, subscr_end, (subscr_end - subscr_start) + 1 as subscr_days
from subscription_dates
)
select avg(subscr_days) as avg_days
from subscription_days
-- add your date range here.
avg_days
--
75.6666666666666667
I didn't add your date range as a WHERE clause, because it's not clear to me what you mean by "a given time period".
Using the window function lag(), this becomes considerably shorter:
SELECT avg(ts_end - ts) AS avg_subscr
FROM (
SELECT txn_type, ts, lag(ts, 1, localtimestamp)
OVER (PARTITION BY subscription_id ORDER BY txn_type) AS ts_end
FROM t
) sub
WHERE txn_type = 'subscr_signup';
SQL Fiddle.
lag() conveniently takes a default value for missing rows. Exactly what we need here, so we don't need COALESCE in addition.
The query builds on the fact that subscr_eot sorts before subscr_signup.
Probably faster than presented alternatives so far because it only needs a single sequential scan - even though the window functions add some cost.
Using the column ts instead of date for three reasons:
Your "date" is actually a timestamp.
"date" is a reserved word in standard SQL (even if it's allowed in Postgres).
Never use basic type names as identifiers.
Using localtimestamp instead of now() or current_timestamp since you are obviously operating with timestamp [without time zone].
Also, your columns txn_type and subscription_id should not be text
Maybe an enum for txn_type and integer for subscription_id. That would make table and indexes considerably smaller and faster.
For the query at hand, the whole table has to be read an indexes won't help - except for a covering index in Postgres 9.2+, if you need the read performance:
CREATE INDEX t_foo_idx ON t (subscription_id, txn_type, ts);