What would be the best way to write this query - sql

I have a table in my database that has 1.1MM records. I have another table in my database that has about 2000 records under the field name, "NAME". What I want to do is do a search from Table 1 using the smaller table and pull the records where they match the smaller tables record. For example Table 1 has First Name, Last Name. Table 2 has Name, I want to find every record in Table 1 that contains any of Table 2 Names in either the first name field or the second name field. I tried just making an access query but my computer just froze. Any thoughts would be appreaciated.

have you considered the following:
Select Table1.FirstName, Table1.LastName
from Table1
where EXISTS(Select * from Table2 WHERE Name = Table1.FirstName)
or EXISTS(Select * from Table2 WHERE Name = Table1.LastName)
I have found before that on large tables this might work better than an inner join.

Be sure to create indexes on Table1.first_name, Table1.last_name, and Table2.name. They will dramatically speed up your query.
Edit: For Microsoft Access 2007, see CREATE INDEX.

See above previous notes about indexes, but I believe from your description, you want something like:
select table1.* from table1
inner join
table2 on (table1.first_name = table2.name OR table1.last_name = table2.name);

It should go something like this,
Select Table1.FirstName, Table1.LastName
from Table1
where Table1.FirstName IN (Select Distinct Name from Table2)
or Table1.LastName IN (Select Distinct Name from Table2)
And there are various other ways to run this same query, i would suggest you see execution plan for each of these queries to find out which one is the fastest. In addition creating indexes on the column which is used in a "where" condition will also speed up the query.

i agree with astander. based on my experience, using EXIST instead of IN is a lot faster.

Related

Avoid multiple SELECT while updating a table's column relatively to another table's one

I am quite a newbie with SQL queries but I need to modify a column of a table relatively to the column of another table. For now I have the following query working:
UPDATE table1
SET date1=(
SELECT last_day(max(date2))+1
FROM table2
WHERE id=123
)
WHERE id=123
AND date1=to_date('31/12/9999', 'dd/mm/yyyy');
The problem with this structure is that, I suppose, the SELECT query will be executed for every line of the table1. So I tried to create another query but this one has a syntax error somewhere after the FROM keyword:
UPDATE t1
SET t1.date1=last_day(max(t2.date2))+1
FROM table1 t1
INNER JOIN table2 t2
ON t1.id=t2.id
WHERE t1.id=123
AND t1.date1=to_date('31/12/9999', 'dd/mm/yyyy');
AND besides that I don't even know if this one is faster than the first one...
Do you have any idea how I can handle this issue?
Thanks a lot!
Kind regards,
Julien
The first code you wrote is fine. It won't be executed for every line of the table1 as you fear. It will do the following:
it will run the subquery to find a value you want to use in your UPDATE statement, searching through table2, but as you have stated the exact id from
the table, it should be as fast as possible, as long as you have
created an index on that (I guess a primary key) column
it will run the outer query, finding the single row you want to update. As before, it should be as fast as possible as you have stated the exact id, as long as there is an index on that column
To summarize, If those ID's are unique, both your subquery and your query should return only one row and it should execute as fast as possible. If you think that execution is not fast enough (at least that it takes longer than the amount of data would justify) check if those columns have unique values and if they have unique indexes on them.
In fact, it would be best to add those indexes regardless of this problem, if they do not exist and if these columns have unique values, as it would drastically improve all of the performances on these tables that search through these id columns.
Please try to use MERGE
MERGE INTO (
SELECT id,
date1
FROM table1
WHERE date1 = to_date('31/12/9999', 'dd/mm/yyyy')
AND id = 123
) t1
USING (
SELECT id,
last_day(max(date2))+1 max_date
FROM table2
WHERE id=123
GROUP BY id
) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET t1.date1 = t2.max_date
;

How to find duplicates in a table using Access SQL?

I try to use an SQL query in Access but it doesn't work. Why?
SELECT * FROM table
EXCEPT
SELECT DISTINCT name FROM table;
I have a syntax error in FROM statement.
MS Access does not support EXCEPT keyword. You can try using the LEFT JOIN like this:
select t1.* from table t1 left join table t2 on t1.name = t2.name
EDIT:
If you want to find the duplicates in your table then you can try this:
SELECT name, COUNT(*)
FROM table
GROUP BY name
HAVING COUNT(*) > 1
You can also refer: Create a Query in Microsoft Access to Find Duplicate Entries in a Table and follow the steps to find the duplicates in your table.
First open the MDB (Microsoft Database) containing the table you want
to check for duplicates. Click on the Queries tab and New.
This will open the New Query dialog box. Highlight Find Duplicates
Query Wizard then click OK.
Now highlight the table you want to check for duplicate data. You can
also choose Queries or both Tables and Queries. I have never seen a
use for searching Queries … but perhaps it would come in handy for
another’s situation. Once you’ve highlighted the appropriate table
click Next.
Here we will choose the field or fields within the table we want to
check for duplicate data. Try to avoid generalized fields.
Name the Query and hit Finish. The Query will run right away and pop
up the results. Also the Query is saved in the Queries section of
Access.
Depending upon the selected tables and fields your results will look
something similar to the shots below which show I have nothing
duplicated in the first shot and the results of duplicates in the
other.
use HAVING COUNT(name) > 1 clause
SELECT * FROM Table1
WHERE [name] IN
(SELECT name, Count(name)
FROM Table1
GROUP BY name
HAVING COUNT(name)>1)
You can use LEFT JOIN or EXISTS
LEFT JOIN
SELECT DISTINCT t1.NAME FROM table1 as t1
LEFT JOIN table2 as t2 on t1.name=t2.name
WHERE t2.name is null
;
NOT EXITS
SELECT T1.NAME FROM table1 as t1 where not exists
(SELECT T2.NAME FROM table2 as t2 where t1.name=t2.name)
Whether Access supports except or not is one issue. The other is that you are not using it properly. You have select * above the word except and select name below. That is not valid sql. If you tried that in SQL Server, your error message would be All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

Search table based on infromation from another table

I have created a temporary that has been populated correctly but now I want to search another table based on two fields that are contained within my temporary table. These fields are Forename and Surname. But I want to search for multiple student names and quantities and return specified data! I think the problem will be better explained in the images below:
My Temporary Table
The Table I would like to search (Table2)
Once I have searched each student name I want to be returned with the students Forename, Surname Address, Pin and Score!
Below shows how I have been trying to achieve this without any luck!
Select TempTable.Forname, TempTable.Surmname, Table2.Address, Table2.Pin
from TempTable
Where Exists ( Select * from Table2
where Table2.Forname=TempTable.Forname and
Table2.Surname=TempTable.Surname
)
But it is returning me no results and I don't know why!
If i understand correctly your question, the way to do it is just a simple join:
select TempTable.Forename, TempTable.Surname, Table2.Address, Table2.Pin
from TempTable
inner join Table2 on Table2.Forename = TempTable.Forename and Table2.Surname = TempTable.Surname
Though i recommend you to have a primary key on the "Persons" table (Table2) and use this primary key to reference the records on TepTable
The EXISTS is only used to "filter" result, it's columns aren't available outside the EXISTS.
You need a JOIN!
Select TempTable.Forname, TempTable.Surmname, Table2.Address, Table2.Pin
from TempTable JOIN Table2 ON Table2.Forname=TempTable.Forname and Table2.Surname=TempTable.Surname;
Assuming you really have called the columns "Forname" you just need a simple join. This does it explicitly to keep close to your original:
SELECT TempTable.Forname, TempTable.Surmname, Table2.Address, Table2.Pin
FROM TempTable tt, Table2 t2
WHERE tt.Forename = t2.Forename
AND tt.Surname = t2.Surname;
You could do the same with INNER JOIN.
This is all assuming that student names are unique.

SQL Query CREATE TABLE on multiple conditions

I am trying to deduplicate a large table where values are present but broken into several rows.
For example:
Table 1: Client_Code,Account#, First and last names, address.
Table 2: Client_Code,Account#, First and last names, address, TAX_ID.
Now what I want to do may seem pretty obvious at this point.
I want my results to pull from Table 1 into a new table and the query to be "Select From Table 1 where client code and account# from table 1 match client code and account# from table 2." TAble 2 has all values populated, Table 1 has everyone except TAX ID.
The code i tried looked like this.
CREATE TABLE Dedupe_1 AS SELECT * FROM `TABLE 1`
WHERE `TABLE 1`.`Client_Code`=`TABLE 2`.`Client_Code`
AND
WHERE `TABLE 1`.`account#`=`TABLE 2`.`account#`
ORDER BY `TABLE 2`.`account#`
I keep getting a syntax error. I am very new to this programming language so I apologize if this question is hard to understand.
I was just under the impression that I could call to a field from another table by simply using the 'WHERE' statement.
I think you want to use an exists clause:
CREATE TABLE Dedupe_1 AS
SELECT *
FROM `TABLE 1` t1
WHERE EXISTS (select 1
from table2 t2
where t2.Client_Code = t1.Client_Code and t2.`account#` = t1.`account#`
);
You may want to use Join to connect two tables. You can make use of common column among two tables for Join statement. Common syntax goes like
SELECT table1.column1, table2.column 2 and as many you want in common table
FROM table1 name
INNER JOIN table2 name
ON table1.commoncolumn=table2.Common column;
You may learn more about joins here.

SQL: Optimization problem, has rows?

I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:
SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...
Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?
I got indexes on the fields that I'm joining on and the fields in the WHERE clause.
Any ideas?
switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.
So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff
Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).
In MSSQL, you can use see a diagram of the execution path of any query you submit.
In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.
But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.
4 options
Try COUNT(*) in place of TOP 1 tbl.id
An index per column may not be good enough: you may need to use composite indexes
Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor
Also, it's possible that you don't need 5 joins.
Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)
So your query could become
SELECT TOP 1
tbl.Id --or count(*)
FROM
grandchildtable tbl
INNER JOIN
anothertable ON ... = ...
WHERE
tbl.xxx = ...
Try EXISTS.
For either for 5 tables or for assumed heirarchy
SELECT TOP 1 --or count(*)
tbl.Id
FROM
grandchildtable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
tbl.Id
FROM
mytable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
AND
EXISTS (SELECT *
FROM
yetanothertable T3
WHERE
tbl.key = T3.key /* AND T3 condition*/)
Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.
Select top 1 tbl.id
From
(
Select top 1 * from
table tbl1
Where Key = Key
) tbl1
inner join ...
After that you will likely need to provide more of the query to understand how it works.
Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.
Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.
Use the maximun row table first in every join and if more than one condition use
in where then sequence of the where is condition is important use the condition
which give you maximum rows.
use filters very carefully for optimizing Query.