I have following queries:
select * from
( select volume as vol1 from table1 where code='A1' and daytime='12-may-2012') a,
( select volume as vol2 from table2 where code='A2' and daytime='12-may-2012') b,
( select volume as vol3 from table3 where code='A3' and daytime='12-may-2012') c
result:
vol1 vol2 vol3
20 45
What would be other efficient way to write this query(in real case it could be up to 15 sub queries), assuming data not always exists in any of these tables for selected date? I think it could be join but not sure.
thanks,
S
If the concern is that data might not exist, then cross join is not the right operator. If any subquery returns zero rows, then you will get an empty result set.
Assuming at most one row is returned per query, just use subqueries in the select:
select (select volume from table1 where code = 'A1' and daytime = date '2012-05-12') as vol1,
(select volume from table2 where code = 'A2' and daytime = date '2012-05-12') as vol2,
(select volume from table3 where code = 'A3' and daytime = date '2012-05-12') as vol3
from dual;
If a value is missing, it will be NULL. If a subquery returns more than one row, then you'll get an error.
I much prefer ANSI standard formats, which is why I use the date keyword.
I am highly suspicious of comparing a field called datetime to a date constant with no time component. I would double check the logic on this. Perhaps you intend trunc(daytime) = date '2012-05-12' or something similar.
I should also note that if performance is an issue, then you want an index on each table on (code, daytime, volume).
Related
I have 2 columns of phone number and the requirement is to get the numbers which have same last 8 digits. ColumnA's numbers have 11 digits and columnB's numbers have 9 or 10 digits.
I tried to use SUBSTR or LIKE and LEFT RIGHT function to solve but the problem is the data is too big and i can't use that way.
select trunc(ta.timeA), ta.columnA
from table1A ta,
tableB tb
WHERE substr(ta.columnA,-8) LIKE substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
You want to select from tableA, so do this. Don't join. You only want to select tableA rows that have a match in tableB. So place an EXISTS clause in your WHERE clause.
select trunc(timea), columna
from table1a ta
where trunc(timea) >= date '2018-01-01'
and trunc(timea) < date '2018-01-02'
and exists
(
select *
from tableb tb
where trunc(tb.timeb) = trunc(ta.timea)
and substr(tb.columnb, -8) = substr(ta.columna, -8)
)
order by trunc(timea), columna;
In order to have this run fast, create the following indexes:
create idxa on tablea( trunc(timea), substr(columna, -8) );
create idxb on tableb( trunc(timeb), substr(columnb, -8) );
I don't see, however, why you are so eager to have this run fast. Do you want to keep all data as is and run the query again and again? There should be a better solution. Splitting the area code and number into two separate columns is the first thing that comes to mind.
UPDATE: Still faster than the suggested idxa should be a covering index for tableA:
create idxa on tablea( trunc(timea), substr(columna, -8), columna );
Here the DBMS can work with the index only and doesn't have to access the table. So just in case the above was still a bit too slow for you, you can try with this altered index.
And as Alex Poole has pointed out in the comments below, it should be
where trunc(timea) = date '2018-01-01'
only, if the range you are looking at is always a single day as in the example.
You can try below using = operator instead of like operator
as your want to match last 2 digit
select trunc(ta.timeA),ta.columnA
from table1A ta inner join tableB tb
on substr(ta.columnA,-8) = substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
It would be easier to help if you were more specific about your SQL environment, below is some advice on this query that would apply in most environments.
When dealing with large data sets performance becomes even more critical and small changes in technique can have a big impact.
For example:
Like is normally used for a partial match with wildcards, do you not mean equals? Like is slower than equals, if you're not using wildcards I recommend looking for equality.
Also, you initially start with a (cross/cartesian product) join, but then your where clause defines very specific match criteria (matching time fields), if you need a matching time field make it part of the table join, this will reduce the number of join results which will significantly shrink the dataset that then needs to have the other criteria applied to it.
Also, having date calculated values in your where clause is slow. It is better to set a #fromDate and #toDate parameter before the query, then use these in the where clause as what are then literals which then don't need to be calculated for every row.
Similar to another question I've posted, given the following table...
Promo EffectiveDate
------ -------------
PromoA 1/1/2016
PromoB 4/1/2016
PromoC 7/1/2016
PromoD 10/1/2016
PromoE 1/1/2017
What is the easiest way to transform it into start and end dates, like so...
Promo StartDate EndDate
------ --------- ---------
PromoA 1/1/2016 4/1/2016
PromoB 4/1/2016 7/1/2016
PromoC 7/1/2016 10/1/2016
PromoD 10/1/2016 1/1/2017
PromoE 1/1/2017 null (ongoing until a new Effective Date is added)
Update
Correlated queries seem to be the simplest solution, but as I understand it, they are extremely inefficient since the subquery has to run once per row of the outer select.
What I was thinking as a potential solution was something along the lines of selecting the values from the table a second time, but eliminating the first result, then pairing them up with the first select by ordinal index with a simple outer left join.
As an example, substituting letters for dates above, the first select would be like A,B,C,D,E and second would be B,C,D,E (which is the first select minus the first record 'A') then pairing them up by ordinal index with a simple outer left join, resulting in A-B, B-C, C-D, D-E, E-null. However I couldn't figure out the syntax to make that work.
A correlated sub-query can lookup the additional field you need.
SELECT
yourTable.*,
(
SELECT MIN(lookup.EffectiveDate)
FROM yourTable AS lookup
WHERE lookup.EffectiveDate > yourTable.EffectiveDate
)
FROM
yourTable
EDIT
The notion of "has to run once per row" is a mis-understanding of how SQL generates the execution plan that actually runs. The same can be said for joining one table to another, the join has to be run at-least once per row... There is indeed a larger cost to a correlated sub-query, but with appropriate indexes it won't be "extemely high", and the functionality described does warrant it.
If you had another field that was guaranteed to be sequential, then it would be trivial, but do not try to re-use the existing Promo field for that additional purpose.
SELECT
this.*,
next.EffectiveEpoch
FROM
yourTable this
LEFT JOIN
yourTable next
ON next.sequential_id = this.sequential_id + 1
Yes, you can use a correlated query with LIMIT :
SELECT t.promo,t.effectiveDate as start_date,
(SELECT s.effectiveDate FROM YourTable s
WHERE s.date > t.date
ORDER BY s.effectiveDate
LIMIT 1) as end_date
FROM YourTable t
EDIT: Here is a solution with a join :
SELECT t.promo,t.effectiveDate as start_date,
MIN(s.effectiveDate) as end_date
FROM YourTable t
LEFT JOIN YourTable s
ON(t.date < s.date)
GROUP BY t.promo,t.effectiveDate
show this, use subquery
select
p.promo,
p.EffectiveDate as "Start",
(select n.EffectiveDate from table_promo n where n.EffectiveDate >
p.EffectiveDate order by n.EffectiveDate limit 1) as "End"
from table_promo p
How do I solve the following problem:
Imagine we have a large building with about 100 temperature readers and each one collects the temperature every minute.
I have a rather large table (~100m) rows with the following columns:
Table TempEvents:
Timestamp - one entry per minute
Reader ID - about 100 separate readers
Temperature - Integer (-40 -> +40)
Timestamp and Reader ID are primary+secondary keys to the table. I want to perform a query which finds all the timestamps wherereader_01 = 10 degrees,reader_02 = 15 degrees andreader_03 = 20 degrees.
In other words something like this:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
AND (readerID=02 AND temperature=15)
AND (readerID=03 AND temperature=20)
==> Resulting in a list of timestamps:
Timestamp::
2016-01-01 05:45:00
2016-02-01 07:23:00
2016-03-01 11:56:00
2016-04-01 23:21:00
The above query returns nothing since a single row does not include all conditions at once. Using OR in between the conditions is also not producing the desired result since all readers should match the condition.
Using INTERSECT, I can get the result by:
SELECT * FROM
(SELECT Timestamp FROM TempEvents WHERE readerID=01 AND temperature=10
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=02 AND temperature=15
INTERSECT SELECT Timestamp FROM TempEvents WHERE readerID=03 AND temperature=20
)
GROUP BY Timestamp ORDER BY Timestamp ASC;
The above query is extremely costly and takes about 5 minutes to execute.
Is there a better (quicker) way to get the result?
I just tried this in Oracle DB and it seems to work:
SELECT Timestamp FROM TempEvents
WHERE (readerID=01 AND temperature=10)
OR (readerID=02 AND temperature=15)
OR (readerID=03 AND temperature=20)
Make sure to only change the AND outside of parenthesis
Try this:
with Q(readerID,temperature) as(
select 01, 10 from dual
union all
select 02,15 from dual
union all
select 03,20 from dual
)
select Timestamp FROM TempEvents T, Q
where T.readerID=Q.readerID and T.temperature=Q.temperature
group by Timestamp
having count(1)=(select count(1) from Q)
Perhaps this will give a better plan than using OR or IN clause.
If the number of readers you have to query is not too large you might try using a join-query like
select distinct Timestamp
from TempEvents t1
join TempEvents t2 using(Timestamp)
join TempEvents t3 using(Timestamp)
where t1.readerID=01 and t1.temperature = 10
and t2.readerID=02 and t2.temperature = 15
and t3.readerID=03 and t3.temperature = 20
But to be honest I doubt it will perform better than your INTERSECT-query.
I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.
I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you
The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".
Why can't you create a temporary table for each function then join them together?
Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).