Many SQL queries vs 1 complex query - sql

I have a database with two tables that are similar to:
table1
------
Id : long, auto increment
Title : string(50)
ParentId : long
table2
------
Id : long, auto increment
FirstName : string(20)
LastName : string(30)
Zip : string(5)
table2 has a one-to-many relationship with table1 where many includes zero.
I also have the following query (that works correctly, so ignore typos an the like, it is an example):
SELECT t1.Id AS tid, t1.Title, t2.Id AS oid, t2.FirstName, t2.LastName
FROM table t1
INNER JOIN table2 t2 ON t1.ParentId = t2.Id
WHERE t2.Id IN
(SELECT Id FROM table2
WHERE Zip IN ('zip1', 'zip2', 'etc'))
ORDER BY t2.Id DESC
The query finds all items in table1 that belong to a person in table2, where the person is in one of the listed zip codes.
The problem I have now is: I want to show all the users (with their items if available) in the listed zip codes, not just the ones with items.
So, I am wondering, should I just do something simple with a lot more queries, like:
SELECT Id AS oid, FirstName, LastName FROM table2 WHERE Zip in ('zip1', 'zip2', 'etc')
foreach(result) {
SELECT Id AS tid, Title FROM table2 WHERE ParentId = oid
}
Or should I come up with a more elaborate single SQL statement? And if so, can I get a little help? Thanks!

If I understand correctly, changing your INNER JOIN to a RIGHT JOIN should return all users regardless of whether they have an item or not, the item columns will just be null for those that don't.

Look into Right Joins and Group By. That will most likely get you the query you are after.

I agree with (and have upvoted) #Lee D and #Bueller. However, I generally advocate LEFT OUTER JOINS, because I find it easier to conceptualized what's going on with them, particularly when you are joining three or more tables. Consider it like so:
Start with what you know you want in the final result set:
FROM table2 t2
and then add in the "optional" data.
FROM table2 t2
left outer join table1 t1
on t1.ParentId = t2.Id
Whether or not matches are found, whatever gets selected from table2 will always appears.

In general, you should prefer the "many queries" approach if (and only if)
it gets you simpler code in total
is fast enough (which you should find out by testing)
In this case, I suspect, both conditions may not apply.

You should come up with a more elaborate single SQL statement and then process the results with your favorite programming language.

What you've described is called an N + 1 query. You have 1 initial query that returns N results, then 1 query for each of your N results. If N is small, the performance difference may not be noticeable - but there will be a larger and larger performance hit as N grows.

If I understand correctly, I think you are looking for something like this
SELECT t1.Id AS tid, t1.Title, t2.Id AS oid, t2.FirstName, t2.LastName
FROM table t1
RIGHT OUTER JOIN table2 t2 ON t1.ParentId = t2.Id AND Zip IN ('zip1', 'zip2', 'etc'))
ORDER BY t2.Id DESC
You can have multiple conditions on your JOIN and RIGHT OUTER will give you all the rows in table2 even if they don't match in table1

Related

MS-Access Select 1 row from GROUP BY query

Always had a hard time wrapping my head around GROUP BY functionality, and this one is no exception.
I have a simple Join query as such
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
This returns my data as expected. Here are some sample rows that illustrate what I am trying to retrieve with Group By...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
726
4506
33
9/12/2021
nameB
726
4506
30
9/13/2021
nameC
I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
So, I try the following:
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
For the example Group above, this returns the following record...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
30
9/13/2021
nameC
So, Order By operates after the Grouping is done, not before. Or so it seems. Perhaps the reason for the order of the SQL keywords (Select, From, Where, Group By, Order By). Ok, makes sense if my assumption is correct. I think it finds t2.id=30 ahead of the other 726/4506 records because t2.id is a primary key on table2.
So, now I try a nested Query, wherein my first query above returns the data in the correct order and the outside query groups and grabs the first record.
Select t1.g1, t1.g2, FIRST(t2.id), FIRST(t2.datetime), FIRST(t3.name)
FROM (
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From ((table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Inner Join table3 t3 on t1.fld2=t3.fld2)
Order By t2.datetime, t2.id
)
Group By t1.g1, t1.g2
Order By FIRST(t2.datetime), FIRST(t2.id)
Same results! I am at a loss to understand how this is happening. So, if anyone can shed light on the order of functioning under-the-covers for Access SQL in this instance I would love to know. On my 2nd query (nested Select), it seems as though I am ordering the target data such that after Grouping the FIRST() aggregate function should select the first row found in the inner result set. But that is not happening.
And of course, if anyone can tell me how to return the row I am after ...
t1.g1
t2.g2
t2.id
t2.datetime
t3.name
726
4506
32
9/12/2021
nameA
That is all I really need.
I want to grab ONLY the first row in each Group of t1.g1, t1.g2.
You don't want aggregation. You want to filter the data. In this case, a correlated subquery does what you want:
Select t1.g1, t1.g2, t2.id, t2.datetime, t3.name
From (table1 t1 Inner Join
table2 t2
on t1.fld1 = t2.fld1
) Inner Join
table3 t3
on t1.fld2 = t3.fld2
where t2.id = (select top 1 tt2.id
from (table1 tt1 Inner Join
table2 tt2
on tt1.fld1 = tt2.fld1
) Inner Join
table3 tt3
on tt1.fld2 = tt3.fld2
where tt1.g1 = t1.g1 and tt1.g2 = t1.g2
order by tt2.datetime, tt2.id
);
Here is a solution that scales well (6s on 250k recs in t2) and does what I am asking for.
I could not get Gordon's answer to work in Access. Seems like it should have however. And I have my doubts about how well it would perform with 250k recs in t2. I would love to test a solution like Gordon's, if I could figure out how to get Access to take it.
See problem description for an example on exactly which record I am after. I only need t2.id from the result set. This was not stated originally, but I don't see how that changes the problem statement or solution. I could be wrong there. I still need t3.name, but it can be retrieved later using t2.id.
But I still need to pick the record GROUP'd BY t1.g1, t1.g2 that comes first when all records are sorted by t2.dateandtime, t2.id. Or stated another way, amongst all records with the same t1.g1+t1.g2, I need exactly the first record when the group is sorted by "t2.dateandtime, t2.id".
Perhaps I am thinking about this solution to my problem all wrong, and there are better ways to resolve this with SQL; if so, I would love to hear it.
I seem to have learned that GROUP BY does group records together based on this SQL clause, but this grouping loses any concept of individual records at this point; e.g. you can only extract other fields by using an Aggregate Function (MIN, MAX, SUM, etc), but - and importantly - FIRST does not get the value of the record that you can predict, as the ORDER BY clause has not been performed yet.
With all that said, here is my solution that works.
I removed reference to the Join on t3 as with t2.id I can retrieve all the other info I need from t3 after the fact, using t2.id.
Don't need to select 't1.g1, t1.g2', that is superfluous. I originally thought that any Group By fields had to be specified in the Select clause also.
I combine t2.dateandtime and t2.id into a Text field and use MIN to Select the data/record I am after once it is GROUP'd BY. No need to Sort my result set, as the record with the MIN value of t2.dateandtime, then t2.id has been chosen! Thus satisfying my condition and selection of the correct record.
Since all I need is t2.id returned for further processing, I extract t2.id from the String built in #3 and convert back to Long data type.
Here is the brief and simple query:
Select
MIN(Format(t2.dateandtime, "yyyymmddhhmmss") & '_' & Format(t2.id, '000000')) as dt_id,
CLNG(MID(dt_id, INSTR(dt_id, '_') + 1)) as id
From
(table1 t1 Inner Join table2 t2 on t1.fld1=t2.fld1)
Group By
t1.g1, t1.g2

DISTINCT COLUMN VALUES FROM ANOTHER TABLE IN JOIN

select t1.key,t2.design,t1.price,t1.gender,t1.store
from table1 t1,table2 t2
where t1.key=t2.key;
This is my query. In this query, column KEY is distinct. I need result with distinct DESIGN values. Help me with this.
from table1 t1 join table2 t2 on t1.key = t2.key
where design like "specific design"
Your request is not precise. There are two things possible:
1) You are talking about duplicate records, i.e. for two records with the same design the columns key, price, gender and store are guaranteed to be equal. Two ways to solve it:
select DISTINCT t1.key,t2.design,t1.price,t1.gender,t1.store
from table1 t1,table2 t2
where t1.key=t2.key;
or
select t1.key,t2.design,t1.price,t1.gender,t1.store
from table1 t1, (select distinct key, design from table2) t2
where t1.key=t2.key;
2) You are talking of duplicate designs and ambigious information related, i.e. for two records for the same design you may get different prices etc. Then you must think about what information you want to get per design. The maximum price? The sum of prices? ...
select t1.key,t2.design,sum(t1.price),max(t1.gender),max(t1.store)
from table1 t1,table2 t2
where t1.key=t2.key
group by t1.key,t2.design;
This gives you records per key and design. If you want records per design only then you would group only by design and decide which key you want to show with it.
A last recommendation: Use explicit join syntax. It is easier to read and less error-prone.
select t1.key, t2.design, t1.price, t1.gender, t1.store
from table1 t1
inner join table2 t2 on t1.key = t2.key;

SQL query, where = value of another table

I want to make a query that simply makes this, this may sound really dumb but i made a lot of research and couldn't understand nothing.
Imagine that i have two tables (table1 and table2) and two columns (table1.column1 and table2.column2).
What i want to make is basically this:
SELECT column1 FROM table1 where table2.column2 = '0'
I don't know if this is possible.
Thanks in advance,
You need to apply join between two talbes and than you can apply your where clause will do work for you
select column1 from table1
inner join table2 on table1.column = table2.column
where table2.columne=0
for join info you can see this
Reading this original article on The Code Project will help you a lot: Visual Representation of SQL Joins.
Find original one at: Difference between JOIN and OUTER JOIN in MySQL.
SELECT column1 FROM table1 t1
where exists (select 1 from table2 t2
where t1.id = t2.table1_id and t2.column2 = '0')
assuming table1_id in table2 is a foreign key refering to id of table1 which is the primary key
You don't have any kind of natural join between two tables.
You're asking for
Select Houses.DoorColour from Houses, Cars where Cars.AreFourWheelDrive = '1'
You'd need to think about why you're selecting anything from the first table, there must be a shared piece of information between tables 1 and 2 otherwise a join is pointless and probably dangerous.

How can I speed up MySQL query with multiple joins

Here is my issue, I am selecting and doing multiple joins to get the correct items...it pulls in a fair amount of rows, above 100,000. This query takes more than 5mins when the date range is set to 1 year.
I don't know if it's possible but I am afraid that the user might extend the date range to like ten years and crash it.
Anyone know how I can speed this up? Here is the query.
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND t5.store =2
I am not the greatest with mysql so any help would be appreciated!
Thanks in advance!
UPDATE
Here is the explain you asked for
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t5 ref PRIMARY,C_store_type,C_id,C_store_type_2 C_store_type_2 1 const 101 Using temporary
1 SIMPLE t4 ref PRIMARY,P_cat P_cat 5 alphacom.t5.C_id 326 Using where
1 SIMPLE t3 ref I_pid,I_oref I_pid 4 alphacom.t4.P_id 31
1 SIMPLE t2 eq_ref O_ref,O_cid O_ref 28 alphacom.t3.I_oref 1
1 SIMPLE t1 eq_ref PRIMARY PRIMARY 4 alphacom.t2.O_cid 1 Using where
Also I added an index to table5 rows and table4 rows because they don't really change, however the other tables get around 500-1000 entries a month... I heard you should add an index to a table that has that many new entries....is this true?
I'd try the following:
First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):
table1 : (subscribe, CDate)
(CU_id)
table2 : (O_cid)
(O_ref)
table3 : (I_oref)
(I_pid)
table4 : (P_id)
(P_cat)
table5 : (C_id, store)
Second, if adding the above indexes didn't improve things as much as you'd like, try rewriting the query as
SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
(SELECT CU_id, t1.first_name, t1.last_name, t1.email
FROM table1
WHERE subscribe = 1 AND
CDate >= $startDate AND
CDate <= $endDate) AS t1
INNER JOIN table2 AS t2
ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3
ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4
ON t3.I_pid = t4.P_id
INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
ON t4.P_cat = t5.C_id
I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.
In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.
Share and enjoy.
Make sure your date columns and all the columns you are joining on are indexed.
Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.
Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.
Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM (
SELECT first_name, last_name, email, CU_id FROM table1 WHERE
table1.subscribe = 1
AND table1.Cdate >= $startDate
AND table1.Cdate <= $endDate
) AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t5.store = 2
Then start looking at modifying the directionality of the joins.
Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.
At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query may perform significantly better:
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);
Try adding indexes on the fields that you join. It may or may not improve the performance.
Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.
As everyone says, make sure you have indexes.
You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.
Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.
It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.

Getting distinct rows from a left outer join

I am building an application which dynamically generates sql to search for rows of a particular Table (this is the main domain class, like an Employee).
There are three tables Table1, Table2 and Table1Table2Map.
Table1 has a many to many relationship with Table2, and is mapped through Table1Table2Map table. But since Table1 is my main table the relationship is virtually like a one to many.
My app generates a sql which basically gives a result set containing rows from all these tables. The select clause and joins dont change whereas the where clause is generated based on user interaction. In any case I dont want duplicate rows of Table1 in my result set as it is the main table for result display. Right now the query that is getting generated is like this:
select distinct Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
For simplicity I have excluded the where clause. The problem is when there are multiple rows in Table2 for Table1 even though I have said distinct of Table1.Id the result set has duplicate rows of Table1 as it has to select all the matching rows in Table2.
To elaborate more, consider that for a row in Table1 with Id = 1 there are two rows in Table1Table2Map (1, 1) and (1, 2) mapping Table1 to two rows in Table2 with ids 1, 2. The above mentioned query returns duplicate rows for this case. Now I want the query to return Table1 row with Id 1 only once. This is because there is only one row in Table2 that is like an active value for the corresponding entry in Table1 (this information is in Mapping table).
Is there a way I can avoid getting duplicate rows of Table1.
I think there is some basic problem in the way I am trying to solve the problem, but I am not able to find out what it is. Thanks in advance.
Try:
left outer join (select distinct YOUR_COLUMNS_HERE ...) SUBQUERY_ALIAS on ...
In other words, don't join directly against the table, join against a sub-query that limits the rows you join against.
You can use GROUP BY on Table1.Id ,and that will get rid off the extra rows. You wouldn't need to worry about any mechanics on join side.
I came up with this solution in a huge query and it this solution didnt effect the query time much.
NOTE : I'm answering this question 3 years after its been asked but this may help someone i believe.
You can re-write your left joins to be outer applies, so that you can use a top 1 and an order by as follows:
select Table1.Id as Id, Table1.Name, Table2.Description
from Table1
outer apply (
select top 1 *
from Table1Table2Map
where (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
order by somethingCol
) t1t2
outer apply (
select top 1 *
from Table2
where (Table2.Id = Table1Table2Map.Table2Id)
) t2;
Note that an outer apply without a "top" or an "order by" is exactly equivalent to a left outer join, it just gives you a little more control. (cross apply is equivalent to an inner join).
You can also do something similar using the row_number() function:
select * from (
select distinct Table1.Id as Id, Table1.Name, Table2.Description,
rowNum = row_number() over ( partition by table1.id order by something )
from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
) x
where rowNum = 1;
Most of this doesn't apply if the IsActive flag can narrow down your other tables to one row, but they might come in useful for you.
To elaborate on one point: you said that there is only one "active" row in Table2 per row in Table1. Is that row not marked as active such that you could put it in the where clause? Or is there some magic in the dynamic conditions supplied by the user that determines what's active and what isn't.
If you don't need to select anything from Table2 the solution is relatively simply in that you can use the EXISTS function but since you've put TAble2.Description in the clause I'll assume that's not the case.
Basically what separates the relevant rows in Table2 from the irrelevant ones? Is it an active flag or a dynamic condition? The first row? That's really how you should be removing duplicates.
DISTINCT clauses tend to be overused. That may not be the case here but it sounds like it's possible that you're trying to hack out the results you want with DISTINCT rather than solving the real problem, which is a fairly common problem.
You have to include activity clause into your join (and no need for distinct):
select Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
If you want to display multiple rows from table2 you will have duplicate data from table1 displayed. If you wanted to you could use an aggregate function (IE Max, Min) on table2, this would eliminate the duplicate rows from table1, but would also hide some of the data from table2.
See also my answer on question #70161 for additional explanation