LEFT JOIN in MS-Access with multiple match criteria (AND(OR)) bogging down - sql

I'm working in MS-Access from Office 365.
t1 is a table with about 1,000 rows. I'm trying to LEFT JOIN t1 with t2 where t2 has a little under 200k rows. I'm trying to match up rows using Short Text strings in multiple fields, and all the relevant fields are indexed. The strings are relatively short, with the longest fields (the street fields) being about 15 characters on average.
Here is my query:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));
The query works, however it behaves like a 3-year-old. The query table appears after several seconds, but remains sluggish indefinitely. For example, selecting a cell in the talble takes 3-7 seconds. Exporting the query table as a .dbf takes about 8 minutes.
My concern is that this is just a sample file to build the queries, the actual t1 will have over 200k rows to process.
Is there a way to structure this query that will significantly improve performance?

I don't know if it will help but
(
nz(one.mySTREET) = nz(two.pSTREET)
OR nz(one.mySTREET_2) = nz(two.pSTREET)
OR nz(one.mySTREET_3) = nz(two.pSTREET)
)
is the same as
nz(two.pSTREET) IN (nz(one.mySTREET),nz(one.mySTREET_2),nz(one.mySTREET_3))
it might be the optimizer can handle this better.

Definetely, joining tables using text fields is not something You are hoping for.
But, life is life.
If there is no possibility to convert text strings into integers (for example additional table with street_name and street_id), try this:
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE))
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_2) = nz(two.pSTREET))
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE)
UNION
SELECT one.ID, two.ACCOUNT
FROM split_lct_2 AS one LEFT JOIN split_parcel AS two
ON (nz(one.mySTREET_3) = nz(two.pSTREET)
)
AND (nz(one.myDIR) = nz(two.pDIR))
AND (nz(one.myHOUSE) = nz(two.pHOUSE));

I suppose using Nz() does not allow to use index. Try to avoid them. If data has no NULLs in join key fields then Nz() should be safely removed from query and it should help. But if data has NULLs, you probably need to change this - for example to replace all NULLs with empty strings to make them join-able without Nz() - that's additional data processing outside of this query.

Related

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )
You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause
See Update 4 for details of how I resolved the "data conversion or mapping" error.

Finding differences in queries using two criteria

I'm trying to find a way to find a way to compare two queries that use a combine sent of criteria. In this case we have Prefixes (Two letter code like DA) and Pack number 1234567. In the query I've created a field that combines these two things so it appears 1234567DA this is done with each of the queries from the separate tables they are pulled from. The idea is that if this is in one table and not the other it would show up as "False". I tried to use an Unmatched query but that doesn't seem to work. What I have currently is as follows:
SELECT
[1LagoTest].Prefix,
[1BigPicPackPref].BigPicPP,
IIf([BigPicPP]=[LagoPP],"True","False") AS Compare,
[1LagoTest].RETAIL,
[1LagoTest].MEDIA
FROM 1LagoTest
LEFT JOIN 1BigPicPackPref
ON [1LagoTest].[Prefix] = [1BigPicPackPref].[BigPicPP]
WHERE (((IIf([BigPicPP]=[LagoPP],"True","False")) Like "False")
AND (([1LagoTest].MEDIA) Not Like "*2019 FL*"))
ORDER BY [1LagoTest].RETAIL;
Right now it will show whats missing from LagoPP but doesn't give me anything from missing packs in BigPicPP. Any help in the right direction would be greatly appreciated.
Thanks!!
This gets a little tricky in Access without FULL OUTER JOIN, but the general idea to is replicate a FULL OUTER JOIN using UNION ALL, then filter from that.
Something like this:
SELECT I.Prefix,
I.BigPicPP,
I.Compare,
I.Retail,
I.Media
FROM (SELECT L.Prefix,
B.BigPicPP,
IIf([BigPicPP]=[LagoPP],"True","False") as Compare,
L.Retail,
L.Media
FROM 1LagoTest L
JOIN 1BigPicPackPref B ON L.Prefix = B.BigPicPP
WHERE L.Media NOT LIKE "*2019 FL*"
UNION ALL
SELECT L.Prefix,
B.BigPicPP,
"False", --Missing records from 1BigPicPackPref
L.Retail,
L.Media
FROM 1LagoTest L
LEFT JOIN 1BigPicPackPref B ON L.Prefix = B.BigPicPP
AND L.Media NOT LIKE "*2019 FL*"
WHERE B.Prefix IS NULL
UNION ALL
SELECT B.Prefix,
B.BigPicPP,
"False", --Missing records from 1LagoTest
L.Retail,
L.Media
FROM 1LagoTest L
RIGHT JOIN 1BigPicPackPref B ON L.Prefix = B.BigPicPP
AND L.Media NOT LIKE "*2019 FL*"
WHERE L.Prefix IS NULL
) AS I
You only need IFF in the first part of the union because in the second two parts one side will always be NULL, so we know the compare will always fail and be False.
You shouldn't need this part of your current WHERE clause at all (((IIf([BigPicPP]=[LagoPP],"True","False")) Like "False"). But if you only want to see False records, just add WHERE I.Compare = "False" to the bottom of the outer select.
The reason the "Unmatched" query (assuming through the Wizard) does not work, is because you are attempting to see the values of two separate tables / queries that do not match either table / query. This is not how the "Unmatched" works. All that will give you is a single table / query that does not match another single table / query.
This can most likely be done any number of ways, but this would probably get you where you want to be (or close to it):
SELECT
a.Prefix,
b.BigPicPP,
IIf([BigPicPP]=[LagoPP],"True","False") AS Compare,
a.RETAIL,
a.MEDIA
FROM [1LagoTest] a
LEFT JOIN [1BigPicPackPref] b ON a.Prefix = b.BigPicPP
WHERE a.MEDIA Not Like "*2019 FL*"
AND b.BigPicPP IS NULL
ORDER BY a.RETAIL
UNION
SELECT
a.Prefix,
b.BigPicPP,
IIf([BigPicPP]=[LagoPP],"True","False") AS Compare,
a.RETAIL,
a.MEDIA
FROM [1LagoTest] a
RIGHT JOIN [1BigPicPackPref] b ON a.Prefix = b.BigPicPP
WHERE a.MEDIA Not Like "*2019 FL*"
AND a.Prefix IS NULL
ORDER BY a.RETAIL
NOTE: Depending on the data structure, the ORDER BY may cause some issues.
So the way I got this to finally work was to build two separate queries. One looking at what was missing from Lago and One that was looking at what was missing from BigPic. It was the only way I could get it to give me both sets of missing data. If I can find a better way to do it through one query I will report back as I'm still gonna play around with it.

how to join multiple tables without showing repeated data?

I pop into a problem recently, and Im sure its because of how I Join them.
this is my code:
select LP_Pending_Info.Service_Order,
LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,
LP_Pending_Info.ASC_Code,
LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY,
LP_Part_Codes.PartCode,
LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,
LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes
on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes
on LP_Pending_Info.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes
on LP_Pending_Info.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
For every service order I have 5 part code maximum.
If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin.
for example: this service order"4182134076" has only 2 part code, first'GH81-13601A' and second 'GH96-09938A' so it should show the data 2 time but it repeat it for 8 time. what seems to be the problem?
If your records were exactly the same the distinct keyword would have solved it.
However in rows 2 and 3 which have the same Service_Order and Part_Code if you check the SO_NO you see it is different - that is why distinct won't work here - the rows are not identical.
I say you have some problem in one of the conditions in your joins. The different data is in the SO_NO column so check the raw data in the LP_Confirmation_Codes table for that Service_Order:
select * from LP_Confirmation_Codes where Service_Order = 4182134076
I assume you are missing an and with the value from the LP_Part_Codes or LP_PS_Codes (but can't be sure without seeing those tables and data myself).
By this sentence If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin. - probably you are missing and and with the LP_Part_Codes table
Based on your output result, here are the following data that caused multiple output.
Service Order: 4182134076 has :
2 PartCode which are GH81-13601A and GH96-09938A
2 PS which are U and P
2 SO_NO which are 1.00024e+09 and 1.00022e+09
Therefore 2^3 returns 8 rows. I believe that you need to check where you should join your tables.
Use DINTINCT
select distinct LP_Pending_Info.Service_Order,LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,LP_Pending_Info.ASC_Code,LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY, LP_Part_Codes.PartCode,LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes on LP_Part_Codes.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes on LP_PS_Codes.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
distinct will not return duplicates based on your select. So if a row is same, it will only return once.

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

SQL query Optimisation JOIN multiple column

I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows).
T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns.
I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it.
Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS.
My query is :
SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR
INTO FINALTABLE
FROM T_DATAS A LEFT JOIN T_REAF B ON
A.REGION LIKE B.REGION AND
A.PAYS LIKE B.PAYS AND
A.MARQUE LIKE B.MARQUE AND
A.MODELE LIKE B.MODELE AND
A.CARROS LIKE B.CARROS AND
A.SEGT LIKE B.SEGT
I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes).
I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query.
Could you please tell me what is wrong with this query?
Thanks a lot for your help
Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index.
So:
SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR
INTO FINALTABLE
FROM T_DATAS D LEFT JOIN
T_REAF R
ON D.REGION = R.REGION AND
D.PAYS = R.PAYS AND
D.MARQUE = R.MARQUE AND
D.MODELE = R.MODELE AND
D.CARROS = R.CARROS AND
D.SEGT = R.SEGT;
Second, you want an index on T_REAF:
CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT);
MS Access can then use the index for the JOIN, speeding the query.
Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.
I assume that those 6 columns are same may have same datatype also.
Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time.
SELECT A.*
,B.CARROS_NEW
,B.SEGT_NEW
,B.ATTR
INTO FINALTABLE
FROM T_DATAS A
LEFT JOIN T_REAF B
ON A.REGION = B.REGION
AND A.PAYS = B.PAYS
AND A.MARQUE = B.MARQUE
AND A.MODELE = B.MODELE
AND A.CARROS = B.CARROS
AND A.SEGT = B.SEGT