SQL join best matcher - sql

I have table A:
I need to join (SQL) this table onto table B, where I use ProductName as a join, but I want the following order of priorities:
Country being selected as a single row if it has a value (With Standard being null)
Using the combination of Country and Standard
Using Standard by itself (If Country being null).
I have tried looking around a lot. I hope the problem statement is clear.
Table A:
|ProductName|Country|Standard|Reportable|
|ProductA|Null|Value1|Y|
|ProductA|Value2|Value1|N|
|ProductA|Value2|Null|N|
The above is just a subset of the data, but basically the country and standard determine if the output is reportable. Product A could have 1 line or 3, depending on the data required.
Table B:
|ProductName|Year|
|ProductA|2006|
So the final join for the above should be:
|ProductName|Year|Country|Standard|Reportable|
|ProductA|2006|Value2|Null|N|

perhaps something like this: this is pseudo code at this time but hopefully gets the general concepts across.
Does assume A, B tables product are inner join related and not outer join.
the 1st CTE sets your priority. CTE (Common Table Expression)
the 2nd cte assigns a row number based on your priorities.
then the final query filters for the first row number encountered.
Obviously this is untested as we have no sample data or structure to test with at this time.
WITH CTE AS (
SELECT A.*
, B.*
, CASE WHEN A.Attribute is not null and B.attribute is null then 1
WHEN A.Attribute is not null and B.attribute is not null then 2
WHEN A.Attribute is null and B.Attribute is not null then 3 end as priority
FROM A
INNER JOIN B
on A.PRODUCT = B.PRODUCT),
CTE2 as (
SELECT CTE.*
, rowNumber() over (Order by Priority desc) RN
FROM CTE)
SELECT *
FROM CTE2
WHERE RN = 1

Something like this, maybe?
SELECT s.*, p.*
FROM source_table s
OUTER APPLY ( SELECT p.*
FROM product_table p
WHERE p.product_name = s.product_name
AND ( ( p.country = s.country AND s.standard IS NULL )
OR ( p.country = s.country AND p.standard = s.standard )
OR ( s.country IS NULL AND p.standard = s.standard )
ORDER BY CASE
WHEN ( p.country = s.country AND s.standard IS NULL ) THEN 1
WHEN ( p.country = s.country AND p.standard = s.standard ) THEN 2
WHEN ( s.country IS NULL AND p.standard = s.standard ) THEN 3
ELSE 99
FETCH FIRST 1 ROW ONLY )
OUTER APPLY (instead of CROSS APPLY) keeps your source_table result even if there is no product match. That may not be your desired outcome. If not, switch to CROSS APPLY.
There are probably ways to shorten the conditions and the sort order using NVL(). But I think this is the clearest way to start.
Also, if this is always a product match using one of those three conditions, you can shorten the WHERE clause in the OUTER APPLY.

Related

DB2 SQL HOW TO RETURN 1 ROW ON JOIN TABLE FROM INNER JOIN TABLE

SELECT GARAGE, MAKE
FROM NEIGHBORHOOD_TABLE A
JOIN VEHICLES_TABLE B ON B.MAKE = A.MAKE DISTINCT BY B.MAKE
WHERE A.ZIPCODE = MY_ZIP_CODE
;
Now I want to return all the garages in my zip code with a FORD make. Now the Vehicles-Table can have multiple Models of FORD Makes on the join but I only want to return 1 row for FORD make not all the models FUSION, RANGER, F150....probably a bad example but the Idea is I want to return multiple rows from Table A that Match Table B however Table B may have multiple rows that match table A but I want only 1 row, sort of distinct by B.MAKE.
This is a DB2 SQL database.
Thanks if you can figure out what I am asking.
The simplest solution I can think of is to use ROW_NUMBER(). For example
select *
from (
select
a.garage,
a.make,
row_number() over(partition by a.make order by b.id) as rn
from neighborhood_table a
join vehicles_table b on b.make = a.make
where a.zipcode = 12345
) x
where rn = 1
Note that you need to add an ordering criteria to decide which row to pick when there are multiple ones per make; I added order by b.id but you should change according to your criteria.
Alternatively, you can use a lateral query to get a single row per group.
Try this:
SELECT A.GARAGE, A.MAKE
FROM NEIGHBORHOOD_TABLE A
JOIN
(
SELECT DISTINCT MAKE
FROM VEHICLES_TABLE
) B ON B.MAKE = A.MAKE
WHERE A.ZIPCODE = MY_ZIP_CODE

SQL Left Join - OR clause

I am trying to join two tables. I want to join where all the three identifiers (Contract id, company code and book id) are a match in both tables, if not match using contract id and company code and the last step is to just look at contract id
Can the task be performed wherein you join using all three parameters, if does not, check the two parameters and then just the contract id ?
Code:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
LEFT JOIN #claim_total C
ON ( ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd
AND C.book_id_2 = A.book_id )
OR ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd )
OR ( C.contract_id_2 = A.contract_id ) )
Your ON clause boils down to C.contract_id_2 = A.contract_id. This gets you all matches, no matter whether the most precise match including company and book or a lesser one. What you want is a ranking. Two methods come to mind:
Join on C.contract_id_2 = A.contract_id, then rank the rows with ROW_NUMBER and keep the best ranked ones.
Use a lateral join in order to only join the best match with TOP.
Here is the second option. You forgot to tell us which DBMS you are using. SELECT INTO looks like SQL Server. I hope I got the syntax right:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
OUTER APPLY
(
SELECT TOP(1) *
FROM #claim_total C
WHERE C.contract_id_2 = A.contract_id
ORDER BY
CASE
WHEN C.company_cd_2 = A.company_cd AND C.book_id_2 = A.book_id THEN 1
WHEN C.company_cd_2 = A.company_cd THEN 2
ELSE 3
END
);
If you want to join all rows in case of ties (e.g. many rows matching contract, company and book), then make this TOP(1) WITH TIES.

Returning ID's from two other tables or null if no IDs found using using a left join SQL Server

I am wondering if someone could hep me. I am trying to make a join on two tables and return an id if an id is there but if there is no id return null but still return the row for that product and not ignore it. My query below returns twice the amount the records to which I can not figure out why.
SELECT
T2.ProductID, FirstChild.SupplierID, SecondChild.AccountID
FROM
Products T2
LEFT OUTER JOIN
(
SELECT TOP(1) SupplierID, Reference,CompanyID, Row_Number() OVER (Partition By SupplierID Order By SupplierID) AS RowNo FROM Suppliers
)
FirstChild ON T2.SupplierReference = FirstChild.Reference AND RowNo = 1AND FirstChild.CompanyID =T2.CompanyID
LEFT OUTER JOIN
(
SELECT TOP(1) AccountID, SageKey,CompanyID, Row_Number() OVER (Partition By AccountID Order By AccountID) AS RowNo2 FROM Accounts
)
SecondChild ON T2.ProductAccountReference = SecondChild.Reference AND RowNo2 = 1 AND SecondChild.CompanyID =T2.CompanyID
Example of what I am trying to do
ProductID SupplierID AccountID
1 5 2
2 6 NULL
3 NULL NULL
OUTER APPLY and ditching the ROW_NUMBER Seems like a better choice here:
SELECT
p.ProductId
,FirstChild.SupplierId
,SecondChild.AccountId
FROM
Products p
OUTER APPLY (SELECT TOP (1) s.SupplierId
FROM
Suppliers s
WHERE
p.SupplierReference = s.SupplierReference
AND p.CompanyId = s.CompanyId
ORDER BY
s.SupplierId
) FirstChild
OUTER APPLY (SELECT TOP (1) a.AccountId
FROM
Accounts
WHERE
p.ProductAccountReference = a.Reference
AND p.CompanyId = a.CompanyId
ORDER BY
a.AccountID
) SecondChild
The way your query is written above there is no correlation for the derived tables. Which means you would always get what ever SupplierId SQL chooses based on optimization and if that doesn't happen to always be Row1 you wont get the value. You need to relate your Table and select top 1, adding an ORDER BY in your derived table is like identifying the row number you want.
If it's just showing duplicate records, wouldn't an inelegant solution just be to add distinct in the select line?

Complex Query duplicating Result (same id, different columns values)

I have this query, working great:
SELECT * FROM
(
select
p.id,
comparestrings('marco', pc.value) as similarity
from
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
where ( u.id = 1 ) AND p.id_unit = u.id
) as subQuery
where
similarity is not null
AND
similarity > 0.35
order by
similarity desc;
Let me explain the situation.
TABLES:
person ID as column.
field a table that represents a column, like name, varchar (something like that)
person_field represents the value of that person and that field.. Like this:
unit not relevant for this question
Eg.:
Person id 1
Field id 1 {name, eg)
value "Marco Noronha"
So the function "comparestrings" returns a double from 0 to 1, where 1 is exact ('Marco' == 'Marco').
So, I need all persons that have similarity above 0.35 and i also need its similarity.
No problem, the query works fine and as it was suppost to. But now I have a new requirement that, the table "person_field" will contain an alteration date, to keep track of the changes of those rows.
Eg.:
Person ID 1
Field ID 1
Value "Marco Noronha"
Date - 01/25/2013
Person ID 1
Field ID 1
Value "Marco Tulio Jacovine Noronha"
Date - 02/01/2013
So what I need to do, is consider ONLY the LATEST row!!
If I execute the same query the result would be (eg):
1, 0.8
1, 0.751121
2, 0.51212
3, 0.42454
//other results here, other 'person's
And lets supose that the value I want to bring is 1, 0.751121 (witch is the lattest value by DATE)
I think I should do something like order by date desc limit 1...
But if I do something like that, the query will return only ONE person =/
Like:
1, 0.751121
When I really want:
1, 0.751121
2, 0.51212
3, 0.42454
You can use DISTINCT ON(p.id) on the sub-query:
SELECT * FROM
(
select
DISTINCT ON(p.id)
p.id,
comparestrings('marco', pc.value) as similarity
from
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
where ( u.id = 1 ) AND p.id_unit = u.id
ORDER BY p.id, pc.alt_date DESC
) as subQuery
where
similarity is not null
AND
similarity > 0.35
order by
similarity desc;
Notice that, to make it work I needed to add ORDER BY p.id, pc.alt_date DESC:
p.id: required by DISTINCT ON (if you use ORDER BY, the first fields must be exactly the same as DISTINCT ON);
pc.alt_date DESC: the alter date you mentioned (we order desc, so we get the oldest ones by each p.id)
By the way, seems that you don't need a sub-query at all (just make sure comparestrings is marked as stable or immutable, and it'll be fast enough):
SELECT
DISTINCT ON(p.id)
p.id,
comparestrings('marco', pc.value) as similarity
FROM
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
WHERE ( u.id = 1 ) AND p.id_unit = u.id
AND COALESCE(comparestrings('marco', pc.value), 0.0) > 0.35
ORDER BY p.id, pc.alt_date DESC, similarity DESC;
Change the reference to person to a subquery as in the following example (the subquery is the one called p):
. . .
from unit u cross join
(select p.*
from (select p.*,
row_number() over (partition by person_id order by alterationdate desc) as seqnum
from person p
) p
where seqnum = 1
) p
. . .
This uses the row_number() function to identify the last row. I've used an additional subquery to limit the result just to the most recent. You could also include this in an on clause or a where clause.
I also changed the , to an explicit cross join.

SQL Query Help Part 2 - Add filter to joined tables and get max value from filter

I asked this question on SO. However, I wish to extend it further. I would like to find the max value of the 'Reading' column only where the 'state' is of value 'XX' for example.
So if I join the two tables, how do I get the row with max(Reading) value from the result set. Eg.
SELECT s.*, g1.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE s.state = 'SA' // how do I get row with max(Reading) column from this result set
The table details are:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I'd think about using a common table expression:
WITH SchoolsInState (id, state, schoolname)
AS (
SELECT id, state, schoolname
FROM Schools
WHERE state = 'XX'
)
SELECT *
FROM SchoolsInState AS s
JOIN Grades AS g
ON s.id = g.id_schools
WHERE g.Reading = max(g.Reading)
The nice thing about this is that it creates this SchoolsInState pseudo-table which wraps all the logic about filtering by state, leaving you free to write the rest of your query without having to think about it.
I'm guessing [Reading] is some form of numeric value.
SELECT TOP (1)
s.[Id],
s.[State],
s.[SchoolName],
MAX(g.[Reading]) Reading
FROM
[Schools] s
JOIN [Grades] g on g.[id_schools] = s.[Id]
WHERE s.[State] = 'SA'
Group By
s.[Id],
s.[State],
s.[SchoolName]
Order By
MAX(g.[Reading]) DESC
UPDATE:
Looking at Tom's i don't think that would work but here is a modified version that does.
WITH [HighestGrade] (Reading)
AS (
SELECT
MAX([Reading]) Reading
FROM
[Grades]
)
SELECT
s.*,
g.*
FROM
[HighestGrade] hg
JOIN [Grades] AS g ON g.[Reading] = hg.[Reading]
JOIN [Schools] AS s ON s.[id] = g.[id_schools]
WHERE s.state = 'SA'
This CTE method should give you what you want. I also had it break down by year (grade_year in my code to avoid the reserved word). You should be able to remove that easily enough if you want to. This method also accounts for ties (you'll get both rows back if there is a tie):
;WITH MaxReadingByStateYear AS (
SELECT
S.id,
S.school_name,
S.state,
G.grade_year,
RANK() OVER(PARTITION BY S.state, G.grade_year ORDER BY Reading DESC) AS ranking
FROM
dbo.Grades G
INNER JOIN Schools S ON
S.id = G.id_schools
)
SELECT
id,
state,
school_name,
grade_year
FROM
MaxReadingByStateYear
WHERE
state = 'AL' AND
ranking = 1
One way would be this:
SELECT...
FROM...
WHERE...
AND g1.Reading = (select max(G2.Reading)
from Grades G2
inner join Schools s2
on s2.id = g2.id_schools
and s2.state = s.state)
There are certainly more.