comparison query taking ages - sql

My query is quite simple:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a, COMPANIES b
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
Database: sql server 2008 r2
What I'm trying to do:
The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)
Right now it's taking a lot of time to complete. I was wondering if if could be done faster

Try this version. It should be only a little faster. The COUNT is quite slow. I've added a.ID <> b.ID to avoid few cases earlier.
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
The FROM ... INNER JOIN ... ON ... is a preferred SQL construct to join tables. It may be faster too.

One approach would be to pre-calculate the COMPANYID count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:
insert into #CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID
Then your main query:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
inner join #CompanyCount aCount on aCount.ID = a.ID
inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
inner join #CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount
If you want all instances of a even though there is no corresponding b then you'd need to have left outer joins to b and bCount.
However you need to look at the query plan - which indexes are you using - you probably want to have them on the IDs and the Postcode and Adres fields as a minimum since you're joining on them.

Build an index on postcode and adres
The database probably executes the subselects for every row. (Just guessing here, veryfy it in the explain plan. If this is the case you can rewrite the query to join with the inline views (note this is how it would look in oracle hop it works in sql server as well):
select distinct a.ID, a.adres, a.place, a.postalcode
from
COMPANIES a,
COMPANIES b,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntA,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntb
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and a.ID<>b.ID
and cnta.cnt>cntb.cnt

Related

Is there any alternative way for following query in oracle

select
a.id,
a.name,
b.group,
a.accountno,
(
select ci.cardno
from taccount ac, tcardinfo ci
where ac.accountno = ci.accountno
) as card_no
from tstudent a, tgroup b
where a.id = b.id
And how to select more than one field from (select ci.cardno from taccount ac,tcardinfo ci where ac.accountno = ci.accountno) or any others way
Please note that the is not a relation in two queries (main and subquery). Sub-query value depends on the data of the main query. Main query is set of data by joining multiple table and sub-query is also a set of data by joining multiple table
In essence, you are describing a lateral join. This is available in oracle since version 12.
Your query is rather unclear about from which table each column comes from (I made assumptions, that you might need to review), and you seem to be missing a join condition in the subquery (I added question marks in that spot)... But the idea is:
select
s.id,
s.name,
g.group,
s.accountno,
x.*
from tstudent s
inner join tgroup g on g.id = s.id
outer apply (
select ci.cardno
from taccount ac
inner join tcardinfo ci on ????
where ac.accountno = s.accountno
) x
You can then return more columns to the subquery, and then will show up in the resultset.

Use result of multiple rows to do arithmetic operation

I'm writing a query to multiply the count that I receive from subquery to fees amount, But I don't know how to do that. Any help/suggestion?
Oracle query is:
select courseid,coursename,fees*tmp
from course c join registration r on
r.courseid=c.courseid
and tmp IN (select count(*)
from course c join registration r on
r.courseid=c.courseid group by coursename);
I tried to use like a variable tmp ,But i don't think it works in oracle query. Is there an alternative way to do so?
You can't do that, because you can only select data from tables that appeared between FROM and WHERE. The IN operator is a quick way to save having to write a bunch of OR statements, it is not something that can establish a variable in the outer query.
Instead do something like:
select courseid,coursename,fees * COUNT(r.courseID) OVER(PARTITION BY c.coursename)
from course c join registration r on
r.courseid=c.courseid
Edit/update: you noted that this query produces too many rows and you only want to see distinct course names. In that case it would be better to just use the registrations table to count the number of people on the course and then multiply the fees:
SELECT
c.courseid, c.coursename, c.fees * COALESCE(r.numberOfstudents, 0) as courseWorth
FROM
course c
LEFT OUTER JOIN
(select courseid, COUNT(*) as numberofstudents FROM registration GROUP BY courseid) r
ON c.courseID = r.courseid
You can use a windowing function like Caius or you can use a join like this:
select courseid,coursename, fees * COALESCE(sub.cnt,0)
from course c
join registration r on r.courseid=c.courseid
left join (
select coursename, count(*) as cnt
from course c2
join registration r2 on r2.courseid=c2.courseid
group by coursename
) as sub;
note: I make no claim your joins are correct -- I'm basing this query off of your example not on any knowledge of your data model.

How to return rows matched in a table without multiple EXISTS clauses?

I want to pull back results from one table that match ALL specified values where the specified values are in another table. I can do it like this:
SELECT * FROM Contacts
WHERE
EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = '8C62E5DE-00FC-4994-8127-000B02E10DA5')
AND EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = 'D2E90AA0-AC93-4406-AF93-0020009A34BA')
AND EXISTS etc...
However that falls over when I get up to about 40 EXISTS clauses. The error message is "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query."
The gist of this is to
Select all contacts with any GUID from the IN statement
Use a DISTINCT COUNT to get a count for each contactid on matching GUID's
Use the HAVING to retain only those contacts that equal the amount of matching GUID's you've put into the IN statement
SQL Statement
SELECT *
FROM dbo.Contacts c
INNER JOIN (
SELECT c.ID
FROM dbo.Contacts c
INNER JOIN dbo.ContactClassifications cc ON c.ID = cc.ContactID
WHERE cc.ClassificationID IN ('..', '..', 38 other GUIDS)
GROUP BY
c.ID
HAVING COUNT(DISTINCT cc.ClassificationID) = 40
) cc ON cc.ID = c.ID
Test script at data.stackexchange
One solution is to demand that no classification exists without a matching contact. That's a double negation:
select *
from contacts c
where not exists
(
select *
from ContactClassifications cc
where not exists
(
select *
from ContactClassifications cc2
where cc2.ContactID = c.ID
and cc2.ClassificationID = cc.ClassificationID
)
)
This type of problem is known as relational division.
SELECT c.*
FROM Contacts c
INNER JOIN
(cc.ContactID, COUNT(DISTINCT cc.ClassificationID) as num_class
FROM ContactClassifications
WHERE ClassificationID IN (....)
GROUP BY cc.ContactID
) b ON c.ID = b.ContactID
WHERE b.num_class = [number of distinct values - how many different values you put in "IN"]
If you run SQLServer 2005 and higher, you can do pretty much the same with CROSS APPLY, supposedly more efficiently

Oracle SQL: Joining at most one associated entity

I have tables Building and Address, where each Building is associated with 0..n Addresses.
I'd like to list Buildings with an associated Address. If a Building has several entrances, and thus several Addresses, I don't care which one is displayed. If a Building has no known addresses, the address fields should be null.
This is, I want something like a left join that joins each row at most once.
How can I express this in Oracle SQL?
PS: My query will include rather involved restrictions on both tables. Therefore, I'd like to avoid repeating those restrictions in the query text.
I would consider querying the address in the SELECT clause, e.g.:
SELECT b.*
,(SELECT a.text
FROM addresses a
WHERE a.buildingid = b.id
AND ROWNUM=1) as atext
FROM building b;
The ROWNUM=1 means "just get one if there are any, don't care which".
The advantage of this approach is that it will probably perform better than most alternatives, as long as a suitable index on addresses.buildingid exists. It will stop looking for more addresses as soon as it finds one for each building queried.
The downside to this approach is if you want multiple columns from the address table, you can't - although you can concatenate them together into one string.
Because you don't care which of many addresses is displayed:
Oracle 9i+:
WITH summary AS (
SELECT b.*,
a.*,
ROW_NUMBER() OVER (PARTITION BY b.building_id) rn
FROM BUILDINGS b
LEFT JOIN ADDRESSES a ON a.building_id = b.building_id)
SELECT s.*
FROM summary s
WHERE s.rn = 1
Non-Subquery Factoring Equivalent:
SELECT s.*
FROM (SELECT b.*,
a.*,
ROW_NUMBER() OVER (PARTITION BY b.building_id) rn
FROM BUILDINGS b
LEFT JOIN ADDRESSES a ON a.building_id = b.building_id) s
WHERE s.rn = 1
what you could do is an restriction on the addresses dat you join.
For instance by requiring that there is no address with a lower id:
select *
from building b
left join addresses a on (a.buildingid = b.id)
where not exists (select 1 from addresses a2
where a2.buildingid = b.id and a2.id < a.id)
in this case you will get at most 1 address per building.
select b.*, max(a.id) as aid
from building b
left outer join addresses a on (a.buildingid = b.id)
group by a.buildingid
or
select b.*, maxid
from building b
left outer join
(
select buildingid, max(id) as maxid
from addresses
group by buildingid
) a on (a.buildingid = b.id)
Meriton,
This approach uses nested inline views. I've proven this approach on large data sets, it performs very well.
The best way to understand the query is to start from the inner-most "M" inline view. I added the count for the sake of debugging and clarity. This identifies the maximum (ie. most recent???) address id for each building:
select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id;
The next "A" inline view uses the above "M" inline view to decide which address to get, then joins to that address id to return a set of address fields:
select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id;
The above "A" inline view delivers a transformed set of addresses to the final query. Whereas the relationship between BUILDING and ADDRESS is 1 to 0..n, the relationship between BUILDING and "A" is 1 to 0..1, a basic outer-join:
select b.b_id, b.b_code, b.b_name, a.*
from building b,
( select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id ) a
where b.b_id = a.b_id (+);
The key advantages with this approach are:
Delivers any number of address columns.
Deterministic, returns exactly the same results each time it is run.
Does not place undue complexities on your final query, which will surely be more complex than this one.
The "A" inline view can be easily encapsulated within a database view, perhaps call it the LATEST_ADDRESS view:
create view latest_address (b_id, a_id, addr1, addr2, addr3, c) as
select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id;
select b.b_id, b.b_code, b.b_name, a.*
from building b, latest_address a
where b.b_id = a.b_id (+);
Enjoy!
Matthew

Complex join with nested group-by/having clause?

I ultimately need a list of "import" records that include "album"
records which only have one "song" each.
This is what I'm using now:
select i.id, i.created_at
from imports i
where i.id in (
select a.import_id
from albums a inner join songs s on a.id = s.album_id
group by a.id having 1 = count(s.id)
);
The nested select (with the join) is blazing fast, but the external
"in" clause is excruciatingly slow.
I tried to make the entire query a single (no nesting) join but ran
into problems with the group/having clauses. The best I could do was
a list of "import" records with dupes, which is not acceptable.
Is there a more elegant way to compose this query?
How's this?
SELECT i.id,
i.created_at
FROM imports i
INNER JOIN (SELECT a.import_id
FROM albums a
INNER JOIN songs s
ON a.id = s.album_id
GROUP BY a.id
HAVING Count(* ) = 1) AS TEMP
ON i.id = TEMP.import_id;
In most database systems, the JOIN works a lost faster than doing a WHERE ... IN.
SELECT i.id, i.created_at, COUNT(s.album_id)
FROM imports AS i
INNER JOIN albums AS a
ON i.id = a.import_id
INNER JOIN songs AS s
ON a.id = s.album_id
GROUP BY i.id, i.created_at
HAVING COUNT(s.album_id) = 1
(You might not need to include the COUNT in the SELECT list itself. SQL Server doesn't require it, but it's possible that a different RDBMS might.)
Untested:
select
i.id, i.created_at
from
imports i
where
exists (select *
from
albums a
join
songs s on a.id = s.album_id
where
a.import_id = i.id
group by
a.id
having
count(*) = 1)
OR
select
i.id, i.created_at
from
imports i
where
exists (select *
from
albums a
join
songs s on a.id = s.album_id
group by
a.import_id, a.id
having
count(*) = 1 AND a.import_id = i.id)
All three sugested techniques should be faster than your WHERE IN:
Exists with a related subquery (gbn)
Subquery that is inner joined (achinda99)
Inner Joining all three tables (luke)
(All should work, too ..., so +1 for all of them. Please let us know if one of them does not work!)
Which one actually turns out to be the fastest, depends on your data and the execution plan. But an interesting example of different ways for expressing the same thing in SQL.
I tried to make the entire query a
single (no nesting) join but ran into
problems with the group/having
clauses.
You can join subquery using CTE (Common Table Expression) if you are using SQL Server version 2005/2008
As far as I know, CTE is simply an expression that works like a virtual view that works only one a single select statement - So you will be able to do the following.
I usually find using CTE to improve query performance as well.
with AlbumSongs as (
select a.import_id
from albums a inner join songs s on a.id = s.album_id
group by a.id
having 1 = count(s.id)
)
select i.id, i.created_at
from imports i
inner join AlbumSongs A on A.import_id = i.import_id