I have "article" table, and "used" table for registration of rentals.
I want to know which articles are free, or in other words, the ones that have never been rented (table article) or the ones that are returned (table used).
I have 2 seperate queries and they work in the way I expected, but I'd want to combine them into a single query.
First query
SELECT a.article_id, a.mark, a.type, a.description
FROM article a
INNER JOIN used u ON u.article_id = a.article_id
WHERE return_date IS NOT NULL
Second query
SELECT article_id, mark, type
FROM article
WHERE NOT EXISTS
(SELECT *
FROM used
WHERE article.article_id = used.article_id)
The first query returns 25 records, while the second query returns 113 records. The final output should return 138 records.
How can I do it?
Thanks in advance for your help.
This is typically carried out by the UNION ALL operator, that adds up the records of one query to the records of the other. Make sure both the two tables you are making this operation on have the same number of fields and corresponding datatypes.
SELECT a.article_id, a.mark, a.type
FROM article a
INNER JOIN used u ON u.article_id = a.article_id
WHERE u.return_date IS NOT NULL
UNION ALL
SELECT article_id, mark, type
FROM article
WHERE NOT EXISTS
(SELECT *
FROM used
WHERE article.article_id = used.article_id)
Although it seems you can simplify this whole query using a single LEFT JOIN operation, hence avoiding making two queries out of it.
SELECT a.article_id, a.mark, a.type
FROM article a
LEFT JOIN used u
ON u.article_id = a.article_id
Related
I have a query that requires me to join/refers to the same table, however, I am unable to get a result using the query.
Below is a sample of my query
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
where a.'x' = b.'x'
AND NOT a.'y' = b.'y'
This query take forever to load. However, if I just run:
SELECT a."column1"
FROM table1 AS A
it only takes 14sec.
I'm currently using PostgreSQL with Pgadmin. table1 has 1.4million table currently.
Is it because there is a lock on the table 1 when it was first referred to as a?
EDIT : Each row contains the record of "author","book published" and in this case, there might be many authors for a book hence being collaborators. What I am trying to achieve is to find out the number of collaborators for each author
What I am trying to achieve is to find out the number of collaborators for each author
Something like this would count the number of authors, and I guess where that number is greater than 1, the number of collaborators is that number - 1
select b.name, count(a.*)-1 as num_collaborators
from books b
inner join authors a on b.id = a.book_id
group by b.name
having count(a.*) > 1
--original
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
;
--amended
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
where a.'x' = b.'x'
AND NOT a.'y' = b.'y'
Over 25 years ago ANSI standards for SQL introduced a more "explicit" syntax for joins and using this is well established as "best practice" now.
One of the greatest benefits of this "explicit join syntax" is that accidentally forgetting to join properly becomes impossible, unlike the original query which did forget the joining predicate. (& When that happens an unexpected Cartesian product is produced.)
So, I encourage you to stop using commas between table names. Taking that simple step will help you use better join syntax.
I often see something like...
SELECT events.id, events.begin_on, events.name
FROM events
WHERE events.user_id IN ( SELECT contacts.user_id
FROM contacts
WHERE contacts.contact_id = '1')
OR events.user_id IN ( SELECT contacts.contact_id
FROM contacts
WHERE contacts.user_id = '1')
Is it okay to have query in query? Is it "inner query"? "Sub-query"? Does it counts as three queries (my example)? If its bad to do so... how can I rewrite my example?
Your example isn't too bad. The biggest problems usually come from cases where there is what's called a "correlated subquery". That's when the subquery is dependent on a column from the outer query. These are particularly bad because the subquery effectively needs to be rerun for every row in the potential results.
You can rewrite your subqueries using joins and GROUP BY, but as you have it performance can vary, especially depending on your RDBMS.
It varies from database to database, especially if the columns compared are
indexed or not
nullable or not
..., but generally if your query is not using columns from the table joined to -- you should be using either IN or EXISTS:
SELECT e.id, e.begin_on, e.name
FROM EVENTS e
WHERE EXISTS (SELECT NULL
FROM CONTACTS c
WHERE ( c.contact_id = '1' AND c.user_id = e.user_id )
OR ( c.user_id = '1' AND c.contact_id = e.user_id )
Using a JOIN (INNER or OUTER) can inflate records if the child table has more than one record related to a parent table record. That's fine if you need that information, but if not then you need to use either GROUP BY or DISTINCT to get a result set of unique values -- and that can cost you when you review the query costs.
EXISTS
Though EXISTS clauses look like correlated subqueries, they do not execute as such (RBAR: Row By Agonizing Row). EXISTS returns a boolean based on the criteria provided, and exits on the first instance that is true -- this can make it faster than IN when dealing with duplicates in a child table.
You could JOIN to the Contacts table instead:
SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id
-- exercise: without the GROUP BY, how many duplicate rows can you end up with?
This leaves the following question up to the database: "Should we look through all the contacts table and find all the '1's in the various columns, or do something else?" where your original SQL didn't give it much choice.
The most common term for this sort of query is "subquery." There is nothing inherently wrong in using them, and can make your life easier. However, performance can often be improved by rewriting queries w/ subqueries to use JOINs instead, because the server can find optimizations.
In your example, three queries are executed: the main SELECT query, and the two SELECT subqueries.
SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id
In your case, I believe the JOIN version will be better as you can avoid two SELECT queries on contacts, opting for the JOIN instead.
See the mysql docs on the topic.
Here's my tables:
tblBusiness
BusinessID, BusinessName
tblTags
TagID, Tag
tblBusinessTagLink
BusinessID, TagID
Any business can have multiple tags applied to it. Now lets say a user is filtering down so that they find only businesses that are tagged 'Office Supplies' and 'Technology'
What SQL statement should I use? Is there a better design for my tables than what I've presented here?
SELECT
b.BusinessId,
b.BusinessName
FROM
tblBusiness AS b
INNER JOIN tblBusinessTagLink AS l ON l.BusinessId = b.BusinessId
INNER JOIN tblTags AS t ON t.TagId = l.TagId
WHERE
t.TagName IN ('Technology', 'Office Supplies')
GROUP BY
b.BusinessId,
b.BusinessName
This selects all businesses that are in either one of the categories. To select only those in both categories, you could append a
HAVING COUNT(*) = 2
The method you are using (three tables to represent a m:n relationship) is the standard way to solve this task, you can keep that.
Personally, I would not use "hungarian notation" for table names (i.e. no "tbl") and I would not use plural table names (i.e. not "Tags"), especially when the other tables are not plural either.
Answering the first comment below:
For larger data sets, the performance of this query relies on indexes. All the primary keys need an index, naturally. In tblBusinessTagLink you should have a composite index covering both fields and one additional index for the field that does not come first in the composite index.
The WHERE keywords LIKE '%technology%' idea is a bad one, mostly because for any LIKE conditions other than start-of-field searches an index cannot be used (i.e. performance will degrade rapidly as your data set grows) and partly because it should be WHERE ','+keywords+',' LIKE '%,technology,%' to begin with or you will get partial matches/false positives.
Also, it might be a bit more efficient to query by TagId. This way you can remove one table from the JOIN entirely:
FROM
tblBusiness AS b
INNER JOIN tblBusinessTagLink AS l ON l.BusinessId = b.BusinessId
WHERE
l.TagId IN (1, 2)
If you intend to query by TagName however, an index on this field will be absolutely necessary as well.
You can use simple JOIN to get record
SELECT t.Tag, b.BusinessName
FROM tblBusiness b, tblTags t, tblBusinessTagLink l
WHERE t.TagID = l.TagID
AND l.BusinessID = b.BusinessID
AND t.Tag = 'Office Supplies'
You can use the INTERSECT set operation to merge the 2 queries (one for 'Office Supplies' and one for 'Technology').
However if you are using MySQL (which does not support INTERSECT), you can use a UNION ALL with a 'HAVING COUNT(*) = 2' like this.
EDIT:
You can also use the second option without the UNION ALL like so:
select Name from tblBusiness
left join tblBusinessTagLink on tblBusinessTagLink.BusinessID = tblBusiness.ID
left join tblTags on tblTags.TagID = tblBusinessTagLink.TagID
where Tag = 'Office Supplies' or Tag = 'Technology'
group by name
having count(Name) = 2;
I'm doing some basic sql on a few tables I have, using a union(rightly or wrongly)
but I need remove the duplicates. Any ideas?
select * from calls
left join users a on calls.assigned_to= a.user_id
where a.dept = 4
union
select * from calls
left join users r on calls.requestor_id= r.user_id
where r.dept = 4
Union will remove duplicates. Union All does not.
Using UNION automatically removes duplicate rows unless you specify UNION ALL:
http://msdn.microsoft.com/en-us/library/ms180026(SQL.90).aspx
Others have already answered your direct question, but perhaps you could simplify the query to eliminate the question (or have I missed something, and a query like the following will really produce substantially different results?):
select *
from calls c join users u
on c.assigned_to = u.user_id
or c.requestor_id = u.user_id
where u.dept = 4
Since you are still getting duplicate using only UNION I would check that:
That they are exact duplicates. I mean, if you make a
SELECT DISTINCT * FROM (<your query>) AS subquery
you do get fewer files?
That you don't have already the duplicates in the first part of the query (maybe generated by the left join). As I understand it UNION it will not add to the result set rows that are already on it, but it won't remove duplicates already present in the first data set.
If you are using T-SQL then it appears from previous posts that UNION removes duplicates. But if you are not, you could use distinct. This doesn't quite feel right to me either but it could get you the result you are looking for
SELECT DISTINCT *
FROM
(
select * from calls
left join users a on calls.assigned_to= a.user_id
where a.dept = 4
union
select * from calls
left join users r on calls.requestor_id= r.user_id
where r.dept = 4
)a
If you are using T-SQL you could use a temporary table in a stored procedure and update or insert the records of your query accordingly.
I have a query that shows me a listing of ALL opportunities in one query
I have a query that shows me a listing of EXCLUSION opportunities, ones we want to eliminate from the results
I need to produce a query that will take everything from the first query minus the second query...
SELECT DISTINCT qryMissedOpportunity_ALL_Clients.*
FROM qryMissedOpportunity_ALL_Clients INNER JOIN qryMissedOpportunity_Exclusions ON
([qryMissedOpportunity_ALL_Clients].[ClientID] <> [qryMissedOpportunity_Exclusions].[ClientID])
AND
([qryMissedOpportunity_Exclusions].[ClientID] <> [qryMissedOpportunity_Exclusions].[BillingCode])
The initial query works as intended and exclusions successfully lists all the hits, but I get the full listing when I query with the above which is obviously wrong. Any tips would be appreciated.
EDIT - Two originating queries
qryMissedOpportunity_ALL_Clients (1)
SELECT MissedOpportunities.MOID, PriceList.BillingCode, Client.ClientID, Client.ClientName, PriceList.WorkDescription, PriceList.UnitOfWork, MissedOpportunities.Qty, PriceList.CostPerUnit AS Our_PriceList_Cost, ([MissedOpportunities].[Qty]*[PriceList].[CostPerUnit]) AS At_Cost, MissedOpportunities.fBegin
FROM PriceList INNER JOIN (Client INNER JOIN MissedOpportunities ON Client.ClientID = MissedOpportunities.ClientID) ON PriceList.BillingCode = MissedOpportunities.BillingCode
WHERE (((MissedOpportunities.fBegin)=#10/1/2009#));
qryMissedOpportunity_Exclusions
SELECT qryMissedOpportunity_ALL_Clients.*, MissedOpportunity_Exclusions.Exclusion, MissedOpportunity_Exclusions.Comments
FROM qryMissedOpportunity_ALL_Clients INNER JOIN MissedOpportunity_Exclusions ON (qryMissedOpportunity_ALL_Clients.BillingCode = MissedOpportunity_Exclusions.BillingCode) AND (qryMissedOpportunity_ALL_Clients.ClientID = MissedOpportunity_Exclusions.ClientID)
WHERE (((MissedOpportunity_Exclusions.Exclusion)=True));
One group needs to see everything, the other needs to see things they havn't deamed as "valid" missed opportunity as in, we've seen it, verified why its there and don't need to bother critiquing it every single month.
Generally you can exclude a table by doing a left join and comparing against null:
SELECT t1.* FROM t1 LEFT JOIN t2 on t1.id = t2.id where t2.id is null;
Should be pretty easy to adopt this to your situation.
Looking at your query rewritten to use table aliases so I can read it...
SELECT DISTINCT c.*
FROM qryMissedOpportunity_ALL_Clients c
JOIN qryMissedOpportunity_Exclusions e
ON c.ClientID <> e.ClientID
AND e.ClientID <> e.BillingCode
This query will produce a cartesian product of sorts... each and every row in qryMissedOpportunity_ALL_Clients will match and join with every row in qryMissedOpportunity_Exclusions where ClientIDs do not match... Is this what you want?? Generally join conditions are based on a column in one table being equal to the value of a column in the other table... Joining where they are not equal is unusual ...
Second, the second iniquality in the join conditions is between columns in the same table (qryMissedOpportunity_Exclusions table) Are you sure this is what you want? If it is, it is not a join condition, it is a Where clause condition...
Second, your question mentions two queries, but there is only the one query (above) in yr question. Where is the second one?