Improve sql query without using 'union' - sql

I want to improve this sql query without using UNION. When table_xx.uniqeId = 1 i want to use table_yy, when 2 i want to use table_ww. In fact i want this dynamic. i want to use an another table for uniqeId-table match because for future i can add extra table for example for table_xx.uniqeId = 3, use table_qq or something else. Can you suggest some idea, What can i use ?
Thanks.
SELECT
xx.info,
yy.value
FROM
table_xx xx,
table_yy yy
WHERE
xx.uniqeId = 1
UNION ALL
SELECT
xx.info,
yy.value
FROM
table_xx xx,
table_ww yy
WHERE
xx.uniqeId = 2

This would do it, provided that value is mandatory in yy and ww. But like Gordon Linoff already commented, a query like this doesn't necessary perform better. Most likely it won't, and it isn't much more readable as well. I would choose the union.
SELECT
xx.info,
nvl(yy.value, ww.value) as value
FROM
table_xx xx
LEFT JOIN table_yy yy ON xx.uniqeId = 1
LEFT JOIN table_ww ww ON xx.uniqeId = 2
WHERE
nvl(yy.value, ww.value) IS NOT NULL
But I think if the introduction of an extra xx_id also requires an extra table, it's a sign that your database structure is incorrect.
A better idea would be to add the uniqueid to the 'value' table as well, so you can easily join the right values on xx. This way, you can add extra ids and matching values as much as you like, without having to modify the database to add extra value tables.
SELECT
xx.info,
yy.value
FROM
table_xx xx
INNER JOIN table_yy yy ON yy.xx_uniqueid = xx.uniqueid

Related

Sub-query works but would a join or other alternative be better?

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?
If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)
It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )
You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause
See Update 4 for details of how I resolved the "data conversion or mapping" error.

How to join YEARS from 2 tables with different format

I am trying to create a custom view in excel for a pivot, I cant join the YEARSF because the tb1 format is 2016-17 and tb2 is 2016/17, how can i join these, see code below... what is the appropriate way of doing this
TB1.YEARF = TB2.YEARF
Seems to be the issue
SELECT TB1.YEARF,
TB1.MC,
TB1.CATEG,
TB1.ID,
TB1.TY,
TB1.CAT,
TB1.LOC,
TB2.HD0_NAME,
TB2.HD1_NAME,
TB2.HD2_NAME,
TB2.HD3_NAME,
TB2.HD4_NAME
FROM DB.TB2 TB2, DB.TB1 TB1
WHERE TB2.CAT = TB1.CAT AND TB2.LOC = TB1.LOC AND TB1.TY = TB2.TY AND TB1.YEARF = TB2.YEARF AND TB1.ID = TB2.ID
It happens that both Oracle and MySQL have REPLACE() functions.
If you change the part of your query that reads
AND TB1.YEARF = TB2.YEARF AND
to
AND TB1.YEARF = REPLACE(TB2.YEARF, '/', '-') AND
you may be able to join these tables. It's not going to be fast.
In general, to do this kind of inexact matching, you have to
figure out the rules for matching. For example, if you want abcd/ef to match abcd-ef then you can use what I wrote above. If you want abcd/ef to match abcd, that's a different rule.
Write a SQL expression to implement your rules.
But, you know, step 1 must come before step 2.
If all you care about is matching the abcd parts of abcd/ef and abcd-ef you can write a rule for that.
AND SUBSTR(TB1.YEARF, 1, 4) = SUBSTR(TB2.YEARF, 1, 4) AND
WHERE Format(TB1.YEARF, "yyyy-dd") = Format(TB2.YEARF, "yyyy-dd")

Ways to Clean-up messy records in sql

I have the following sql data:
ID Company Name Customer Address 1 City State Zip Date
0108500 AAA Test Mish~Sara Newa Claims Chtiana CO 123 06FE0046
0108500 AAA.Test Mish~Sara Newa Claims Chtiana CO 123 06FE0046
1802600 AAA Test Company Ban, Adj.~Gorge PO Box 83 MouLaurel CA 153 09JS0025
1210600 AAA Test Company Biwel~Brce 97kehst ve Jacn CA 153 04JS0190
AAA Test, AAA.Test and AAA Test Company are considered as one company.
Since their data is messy I'm thinking either to do this:
Is there a way to search all the records in the DB wherein it will search the company name with almost the same name then re-name it to the longest name?
In this case, the AAA Test and AAA.Test will be AAA Test Company.
OR Is there a way to filter only record with company name that are almost the same then they can have option to change it?
If there's no way to do it via sql query, what are your suggestions so that we can clean-up the records? There are almost 1 million records in the database and it's hard to clean it up manually.
Thank you in advance.
You could use String matching algorithm like Jaro-Winkler. I've written an SQL version that is used daily to deduplicate People's names that have been typed in differently. It can take awhile but it does work well for the fuzzy match you're looking for.
Something like a self join? || is ANSI SQL concat, some products have a concat function instead.
select *
from tablename t1
join tablename t2 on t1.companyname like '%' || t2.companyname || '%'
Depending on datatype you may have to remove blanks from the t2.companyname, use TRIM(t2.companyname) in that case.
And, as Miguel suggests, use REPLACE to remove commas and dots etc.
Use case-insensitive collation. SOUNDEX can be used etc etc.
I think most Database Servers support Full-Text search ability, and if so there are some functions related to Full-Text search that support Proximity.
for example there is a Near function in SqlServer and here is its documentation https://msdn.microsoft.com/en-us/library/ms142568.aspx
You can do the clean-up in several stages.
Create new columns
Convert everything to upper case, remove punctuation & whitespace, then match on the first 6 to 10 characters (using self join). Assuming your table is called "vendor": add two columns, "status", "dupstr", then update as follows
/** Populate dupstr column for fuzzy match **/
update vendor v
set v.dupstr = left(upper(regex_replace(regex_replace(v.companyname,'.',''),' ','')),6)
;
Identify duplicate records
Add an index on the dupstr column, then do an update like this to identify "good" records:
/** Mark the good duplicates **/
update vendor v
set v.status = 'keep' --indicate keeper record
where
--dupes to clean up
exists ( select 1 from vendor v1 where v.dupstr = v1.dupstr
and v.id != v1.id )
and
( --keeper has longest name
length(v.companyname) =
( select max(length(v2.companyname)) from vendor v2
where v.dupstr = v2.dupstr
)
or
--keeper has latest record (assuming ID is sequential)
v.id =
( select max(v3.id) from vendor v3
where v.dupstr = v3.dupstr
)
)
group by v.dupstr
;
The above SQL can be refined to add "dupe" status to other records , or you can do a separate update.
Clean Up Stragglers
Report any remaining partial matches to be reviewed by a human (i.e. dupe records without a keeper record)
You can use SQL query with SOUDEX of DIFFRENCE
For example:
SELECT DIFFERENCE ('AAA Test','AAA Test Company')
DIFFERENCE returns 0 - 4 ( 4 = almost the same, 0 - totally diffrent)
See also: https://learn.microsoft.com/en-us/sql/t-sql/functions/difference-transact-sql?view=sql-server-2017

SQL statement structuring

I have two tables AgentFare and SalesHeader having hdrGuid and hdrGuid , DocumentNumber as columns respectively.
I have to perform a query based on Document number, however I am not able to figure out which of the two following queries are most suitable.
SELECT agf.Fare , agf.Tax . agf.Commission
FROM AgentFare as agf
INNER JOIN SalesHeader as h ON agf.hdrGuid = h.hdrGuid AND h.DocumentNumber = 'XYZ'
OR
SELECT agf.Fare , agf.Tax . agf.Commission
FROM AgentFare as agf
INNER JOIN SalesHeader as h ON agf.hdrGuid = h.hdrGuid
WHERE h.DocumentNumber = 'XYZ'
Which of the 2 are more appropriate ?
First method is more appropriate.Since it will take less time as compared to second query.
The "ON" statement should be used to define on what columns 2 tables should be joined.
The "WHERE" statement is intended to filter your results.
The second option is the more appropriate to use.
You might find that this is a case of personal preference. My preference is for the second one. It's clear to me what you are joining on and what you are filtering with. If you change your code so that you replace 'XYZ' with a parameter, then for me the second way is much easier. I expect to find the 'filters' at the bottom.