SQL query Optimisation JOIN multiple column - sql

I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows).
T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns.
I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it.
Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS.
My query is :
SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR
INTO FINALTABLE
FROM T_DATAS A LEFT JOIN T_REAF B ON
A.REGION LIKE B.REGION AND
A.PAYS LIKE B.PAYS AND
A.MARQUE LIKE B.MARQUE AND
A.MODELE LIKE B.MODELE AND
A.CARROS LIKE B.CARROS AND
A.SEGT LIKE B.SEGT
I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes).
I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query.
Could you please tell me what is wrong with this query?
Thanks a lot for your help

Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index.
So:
SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR
INTO FINALTABLE
FROM T_DATAS D LEFT JOIN
T_REAF R
ON D.REGION = R.REGION AND
D.PAYS = R.PAYS AND
D.MARQUE = R.MARQUE AND
D.MODELE = R.MODELE AND
D.CARROS = R.CARROS AND
D.SEGT = R.SEGT;
Second, you want an index on T_REAF:
CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT);
MS Access can then use the index for the JOIN, speeding the query.
Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.

I assume that those 6 columns are same may have same datatype also.
Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time.
SELECT A.*
,B.CARROS_NEW
,B.SEGT_NEW
,B.ATTR
INTO FINALTABLE
FROM T_DATAS A
LEFT JOIN T_REAF B
ON A.REGION = B.REGION
AND A.PAYS = B.PAYS
AND A.MARQUE = B.MARQUE
AND A.MODELE = B.MODELE
AND A.CARROS = B.CARROS
AND A.SEGT = B.SEGT

Related

Sub-query works but would a join or other alternative be better?

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?
If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)
It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

LEFT JOIN in MS-Access with multiple match criteria (AND(OR)) bogging down

I'm working in MS-Access from Office 365.
t1 is a table with about 1,000 rows. I'm trying to LEFT JOIN t1 with t2 where t2 has a little under 200k rows. I'm trying to match up rows using Short Text strings in multiple fields, and all the relevant fields are indexed. The strings are relatively short, with the longest fields (the street fields) being about 15 characters on average.
Here is my query:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));
The query works, however it behaves like a 3-year-old. The query table appears after several seconds, but remains sluggish indefinitely. For example, selecting a cell in the talble takes 3-7 seconds. Exporting the query table as a .dbf takes about 8 minutes.
My concern is that this is just a sample file to build the queries, the actual t1 will have over 200k rows to process.
Is there a way to structure this query that will significantly improve performance?
I don't know if it will help but
(
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
is the same as
nz(two.pSTREET) IN (nz(one.mySTREET),nz(one.mySTREET_2),nz(one.mySTREET_3))
it might be the optimizer can handle this better.
Definetely, joining tables using text fields is not something You are hoping for.
But, life is life.
If there is no possibility to convert text strings into integers (for example additional table with street_name and street_id), try this:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE))
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_2) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE)
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));
I suppose using Nz() does not allow to use index. Try to avoid them. If data has no NULLs in join key fields then Nz() should be safely removed from query and it should help. But if data has NULLs, you probably need to change this - for example to replace all NULLs with empty strings to make them join-able without Nz() - that's additional data processing outside of this query.

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

Optimize SQL query with many left join

I have a SQL query with many left joins
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk
LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk
LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk
LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK
LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk
LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id
LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk
LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk
LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID
WHERE 1 = 1
AND po.CUST_APP_NO LIKE '%100010233976%'
AND 1 = 1
AND po.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
The performance seems to be not good and I would like to optimize the query.
Any suggestions please?
Try this one -
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
WHERE PO.CUST_APP_NO LIKE '%100010233976%'
AND PO.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding?
Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data?
Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query?
Then, revise your query:
- can you extract some parts that normally would quickly generate small rowset?
- can you add new columns to indexes so join/filter expressions will get covered?
- or reorder them so they match the query better?
And, supporting the solution from #Devart:
Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.

Improvement in SQL Query - Access - Join / IN / Exists

This is my SQL Query - using in Access. It is providing the desired result.
But just wanted opinion whether the approach is correct.
How can this be speeded up.
SELECT INVDETAILS2.F5
, INVDETAILS2.F16
, ExpectedResult.DLID
, ExpectedResult.NumRows
FROM INVDETAILS2
INNER
JOIN (INVDL INNER JOIN ExpectedResult ON INVDL.DLID =ExpectedResult.DLID)
ON (INVDETAILS2.F14 = ROUND(ExpectedResult.Total))
AND (INVDETAILS2.F1 = INVDL.RegionCode)
WHERE INVDETAILS2.F29 ='2013-03-06'
AND INVDETAILS2.F5 IN (SELECT INVDETAILS2.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN INVDETAILS2
ON INVDL.RegionCode = INVDETAILS2.F1
AND round(ExpectedResult.Total)
= INVDETAILS2.F14
WHERE INVDETAILS2.F29='2013-03-06'
GROUP BY INVDETAILS2.F5
HAVING Count(ExpectedResult.DLID)<2
)
;
Approximate Number of Rows in
"ExpectedResult" - Millions
"INVDL" - 80,000
"INVDETAILS" - 300,000 - Total , For One Date - approx - 10,000 , then again lesser for each region per date.
Please provide a better query if possible.
Two things you could investigate that might help speed things up:
Indexing
Make sure that you have indexed all of the columns involved in JOINs, WHERE clauses, and GROUP BY clauses.
JOIN expressions involving functions
A couple of your JOINs use Round(ExpectedResult.Total), so if you have an index on ExpectedResult.Total your query won't be able to use it. You may get a performance boost if you add a RoundedTotal column (Long Integer, Indexed), populate it with
UPDATE [ExpectedResult] SET [RoundedTotal]=Round([Total])
and then use the RoundedTotal column in your JOINs.