I have two tables, task_list_sharee and task_list_assignee. They both have a reference to a task_list table.
There's also a task table that has a reference to the task_list table since a task always exists within a task_list.
Given a task, I want to find out if either task_list_sharee OR task_list_assignee have values. Right now I'm doing it as two SQL statements, like so:
SELECT count(*)
FROM task_list_assignee a
INNER JOIN task_list l ON l.uid = a.task_list_uid
INNER JOIN task t ON t.task_list_uid = l.uid
WHERE t.uid = ?
SELECT count(*)
FROM task_list_sharee s
INNER JOIN task_list l ON l.uid = s.task_list_uid
INNER JOIN task t ON t.task_list_uid = l.uid
WHERE t.uid = ?
and if either is non-zero I punt. I'm thinking this has to be doable as just a single SQL statement but I'm a bit stumped.
Performance of a full count on multiple joins (even more so for LEFT JOIN) can deteriorate quickly. While all you need is proof for the existence of a single row, there is no need for this. Use EXISTS - true to its name - to allow an optimal query plan:
SELECT EXISTS (
SELECT 1
FROM task t
WHERE t.uid = ? -- provide uid here
AND (
EXISTS (
SELECT 1
FROM task_list_assignee
WHERE task_list_uid = t.task_list_uid
)
OR EXISTS (
SELECT 1
FROM task_list_sharee
WHERE task_list_uid = t.task_list_uid
)
)
);
Should be substantially faster than a full count.
I also cut out the middleman. Joining to task_list only establishes that the related row in task_list exists - which is a waste of time given that:
a task always exists within a task_list.
Ideally implemented with FK constraints to enforce referential integrity.
In the absence of actual table definitions my educated guess will have to do.
To make this fast for any table size, you need 3 btree (default) indexes on
task(uid, task_list_uid)
task_list_assignee(task_list_uid)
task_list_sharee(task_list_uid)
If you start with Task as your base table and do a series of left joins, you should be able to determine which tasks have values for both assignee and sharee with a coalesce:
select count(*)
from task as t
left join
task_list as l
on t.task_list_uid = l.uid
left join
task_list_assignee as a
on l.uid = a.task_list_uid
left join
task_list_sharee as s
on l.uid = s.task_list_uid
where coalesce( a.task_list_uid, s.task_list_uid ) is not null
and t.uid = ?
SQL Fiddle here
Related
select
a.id,
a.name,
b.group,
a.accountno,
(
select ci.cardno
from taccount ac, tcardinfo ci
where ac.accountno = ci.accountno
) as card_no
from tstudent a, tgroup b
where a.id = b.id
And how to select more than one field from (select ci.cardno from taccount ac,tcardinfo ci where ac.accountno = ci.accountno) or any others way
Please note that the is not a relation in two queries (main and subquery). Sub-query value depends on the data of the main query. Main query is set of data by joining multiple table and sub-query is also a set of data by joining multiple table
In essence, you are describing a lateral join. This is available in oracle since version 12.
Your query is rather unclear about from which table each column comes from (I made assumptions, that you might need to review), and you seem to be missing a join condition in the subquery (I added question marks in that spot)... But the idea is:
select
s.id,
s.name,
g.group,
s.accountno,
x.*
from tstudent s
inner join tgroup g on g.id = s.id
outer apply (
select ci.cardno
from taccount ac
inner join tcardinfo ci on ????
where ac.accountno = s.accountno
) x
You can then return more columns to the subquery, and then will show up in the resultset.
I'm using a left join to check if certain types of information have been stored in the database.
I'm wondering if a lot of resources will be wasted if the joined table contains a lot of rows which matches the JOIN clause.
i.e.:
SELECT Applications.*
FROM Applications
LEFT JOIN SomeFeatureRows ON (SomeFeatureRows.ApplicationId = Applications.Id)
WHERE SomeFeatureRows.Id IS NULL;
Do the DB scan through all rows in SomeFeatureRows to see if there is a row where Id is NULL?
I just want to check if there is a row or not in that table (with the specified application id).
Edit, might as well include the real SQL statement:
SELECT organizations.id AS OrganizationId,
organizations.Name,
Application.Id as ApplicationId,
Application.Name as ApplicationName,
Account.id AS AccountId,
Account.Email,
Account.Username ,
SentEmails. SentAtUtc
FROM organizations
INNER JOIN applications ON ( organizations.id = applications.organizationid )
LEFT JOIN Incidents ON ( organizations.id = Incidents.organizationid )
LEFT JOIN SentEmails ON ( organizations.id = SentEmails.organizationid AND EmailTypeName = 'IncidentsReminder')
CROSS apply (SELECT accounts.id,
accounts.email,
accounts.username
FROM accounts,
organizationmembers
WHERE accounts.id = organizationmembers.accountid
AND organizationmembers.organizationid =
organizations.id)
Account
WHERE Incidents.id IS NULL
Here is a very good article explaining the different techniques and performance benefits of using: Not Exists vs. Not In vs. Left join / Is null
To summarize:
LEFT JOIN / IS NULL is less efficient, since it makes no attempt to skip the already matched values in the right table, returning all results and filtering them out instead. Use Not Exists for best performance as it will create a LEFT ANTI SEMI JOIN in the execution plan.
I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.
This query works about 3 minutes and returns 7279 rows:
SELECT identity(int,1,1) as id, c.client_code, a.account_num,
c.client_short_name, u.uso, us.fio, null as new, null as txt
INTO #ttable
FROM accounts a INNER JOIN Clients c ON
c.id = a.client_id INNER JOIN Uso u ON c.uso_id = u.uso_id INNER JOIN
Magazin m ON a.account_id = m.account_id LEFT JOIN Users us ON
m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9') AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
The type of 'client_code' field is VARCHAR(6).
Is it possible to somehow optimize this query?
Insert the records in the Temporary table without using Order by Clause and then Sort them using the c.client_code. Hope it should help you.
Create table #temp
(
your columns...
)
and Insert the records in this table Without Using the Order by Clause. Now run the select with Order by Clause
Do you have indexes set up for your tables? An index on foreign key columns as well as Magazin.status might help.
Make sure there is an index on every field used in the JOINs and in the WHERE clause
If one or the tables you select from are actually views, the problem may be in the performance of these views.
Always try to list tables earlier if they are referenced in the where clause - it cuts off row combinations as early as possible. In this case, the Magazin table has some predicates in the where clause, but is listed way down in the tables list. This means that all the other joins have to be made before the Magazin rows can be filtered - possibly millions of extra rows.
Try this (and let us know how it went):
SELECT ...
INTO #ttable
FROM accounts a
INNER JOIN Magazin m ON a.account_id = m.account_id
INNER JOIN Clients c ON c.id = a.client_id
INNER JOIN Uso u ON c.uso_id = u.uso_id
LEFT JOIN Users us ON m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9')
AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
This kind of optimization can greatly improve query performance.
I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!
Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)
You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.
Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?
Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null
If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.