SQL getting first matched row - sql

I have two huge database tables names "AR" and "All", and I am trying to match records in "AR" to "All", note here we don't have a unique identifier, so I am doing a kind of fuzzy matching using First Name, last name, dob and ssn to get the matches. My match query is working.
The All table has a column "MID" which I want to fetch for my every matched record, but when I try my query I get thousands of records. I searched a lot online but could not figure it out.
I am trying to get the first matched record from "All" table along with corresponding MId, for each and every record in my "AR" table. Can anyone help me out here. My Query is below:
Select distinct a.*,
r."MID"
from "public"."AR" a
inner join "public"."All" r
On ( a."cDOB" = r."cDOB"
and right(a."SSN",4) = right(r."SSN",4)
and left(a."Last Name",4) = left(r."LastName",4)
and (a."SSN"!='' or r."SSN"!='')
)
OR
( left(a."First Name",4) = left(r."FirstName",4)
and ( left(a."Last Name",4) = left(r."LastName",4)
OR right(a."Last Name",4) = right(r."LastName",4)
)
and ( right(a."SSN",4) = r."SSN"
OR a."cDOB" = r."cDOB"
)
and ( a."SSN"!=''
OR r."SSN"!=''
)
)
OR
( a."MelID (Original) " = r."Prp"
and a."cDOB" = r."cDOB"
and r."Prp"!=''
);
The query gives me the correct output if I remove r."MID", from the first line, but when I fetch r."MID" the output records are a lot with duplicates and what not.

To fetch the "first" MID from All for every row in AR, you can use DISTINCT ON:
SELECT DISTINCT ON (a.undisclosed_pk_column)
a.*, r."MID"
FROM ...
...
ORDER BY a.undisclosed_pk_column, r.undisclosed_columns_defining_first;
Related:
Select first row in each GROUP BY group?

I think the problem is that you're doing an inner join with 3 OR conditions, so you get duplicates when a record matches on more than one of them. Try the below where you left join to the "MID" table 3 times and only keep results where at least one matched.
Select distinct a.*,
nvl(nvl(r."MID",r2."MID"),r3."MID") as MID
from "public"."AR" a
left join "public"."All" r
On ( a."cDOB" = r."cDOB"
and right(a."SSN",4) = right(r."SSN",4)
and left(a."Last Name",4) = left(r."LastName",4)
and (a."SSN"!='' or r."SSN"!='')
)
left join "public"."All" r2
On ( left(a."First Name",4) = left(r2."FirstName",4)
and ( left(a."Last Name",4) = left(r2."LastName",4)
OR right(a."Last Name",4) = right(r2."LastName",4)
)
and ( right(a."SSN",4) = r2."SSN"
OR a."cDOB" = r2."cDOB"
)
and ( a."SSN"!=''
OR r2."SSN"!=''
)
)
left join "public"."All" r3
( a."MelID (Original) " = r3."Prp"
and a."cDOB" = r3."cDOB"
and r3."Prp"!=''
)
WHERE (r."MID" IS NOT NULL OR r2."MID" IS NOT NULL OR r3."MID" IS NOT NULL)
;

Related

SQL split repeating rows caused by UNION

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

SQL Query from multiple tables show all records

Please advise me on what I am doing wrong with query below
Scenario:
Data is in Three tables, One table contains complete record, other two tables have missing records, I want a query to based on Table A(where all records are present) and other two tables show result accordingly just like excel ( index match ) problem is it is filtering data correctly but shows only those records where all data matches in all three tables, I want query to show data even if it is available only in only one table.
Query I have generated so far:
SELECT primaryitemtable.itemcode,
primaryitemtable.itemdescription,
primaryitemtable.article,
itembatchlist.batchno,
Sum(tempscmstock.qty) AS SumOfQty,
Sum(stockmastertemp.qty) AS SumOfQty1,
tempscmstock.batch,
stockmastertemp.batch
FROM (stockmastertemp
INNER JOIN (tempscmstock
INNER JOIN primaryitemtable
ON tempscmstock.itemcode =
primaryitemtable.itemcode)
ON ( stockmastertemp.itemcode = primaryitemtable.itemcode )
AND ( stockmastertemp.itemcode = primaryitemtable.itemcode ))
INNER JOIN itembatchlist
ON primaryitemtable.itemcode = itembatchlist.itemcode
GROUP BY primaryitemtable.itemcode,
primaryitemtable.itemdescription,
primaryitemtable.article,
itembatchlist.batchno,
tempscmstock.batch,
stockmastertemp.batch
HAVING ( ( ( tempscmstock.batch ) = [itembatchlist] ! [batchno] )
AND ( ( stockmastertemp.batch ) = [itembatchlist] ! [batchno] ) )
ORDER BY primaryitemtable.itemcode;
[]
Try Left join because inner join return only those records where every table has same data
SELECT primaryitemtable.itemcode,
primaryitemtable.itemdescription,
primaryitemtable.article,
itembatchlist.batchno,
Sum(tempscmstock.qty) AS SumOfQty,
Sum(stockmastertemp.qty) AS SumOfQty1,
tempscmstock.batch,
stockmastertemp.batch
FROM (stockmastertemp
LEFT JOIN (tempscmstock
LEFT JOIN primaryitemtable
ON tempscmstock.itemcode = primaryitemtable.itemcode)
ON ( stockmastertemp.itemcode = primaryitemtable.itemcode )
AND ( stockmastertemp.itemcode = primaryitemtable.itemcode ))
LEFT JOIN itembatchlist
ON primaryitemtable.itemcode = itembatchlist.itemcode
GROUP BY primaryitemtable.itemcode,
primaryitemtable.itemdescription,
primaryitemtable.article,
itembatchlist.batchno,
tempscmstock.batch,
stockmastertemp.batch
HAVING ( ( ( tempscmstock.batch ) = [itembatchlist] ! [batchno] )
AND ( ( stockmastertemp.batch ) = [itembatchlist] ! [batchno] ) )
ORDER BY primaryitemtable.itemcode

Want to get only 1 record back from an inner join that can pass back multiple records

I have the following SQL query:
SELECT *
FROM My_TABL wr
INNER JOIN His_TABL pk ON (wr.Company = pk.company AND wr.NUMBER = pk.number)
WHERE wr.NUMBER = 'L00499233'
AND wr.S_CODE IN ('in', 'ji', 'je')
I want to get back 1 record but found out that it can pass back multiple records because a record could have more than 1 field with 'in', 'ji' and 'je'
How can I just pick the first one? Thanks.
If the goal is to join to the top 1 match on the join (ultimately returning several rows), use an OUTER APPLY:
SELECT *
FROM My_TABL wr
OUTER APPLY ( SELECT TOP 1 * FROM His_TABL pk WHERE wr.Company = pk.company
AND wr.NUMBER = pk.number ) AS pk2
WHERE wr.NUMBER = 'L00499233'
AND wr.S_CODE IN ( 'in', 'ji', 'je' );
However, if the goal is to return only a single row in your result set, use Stuart's suggestion.
If it doesn't matter which row you want, you can use TOP 1:
select TOP 1 * from My_TABL wr
inner join His_TABL pk on (wr.Company = pk.company and wr.NUMBER = pk.number)
where wr.NUMBER = 'L00499233' and wr.S_CODE in ('in', 'ji', 'je')
Note that you should get out of the habit of using SELECT * and be more precise about the rows and columns you want to retrieve.

How to increment a column based on two tables that are joined

I am trying to increment a column on a sql server table based on the join between the initial table and the joined table. The idea is to update tblForm10Objectives, set the ObjectiveNumber column to an increment number starting with 1 based on the number of rows returned from the join of tblForm10GoalsObjectives and tblForm10Objectives where ID_Form10Goal equals a number. Example query so far:
Update tblForm10Objectives
Set ObjectiveNumber = rn
From (
Select ROW_NUMBER() over (PARTITION by OG.ID_Form10Goal) as rn
, *
From (
Select *
From tblForm10GoalsObjectives OG
Join tblForm10Objectives O On OG.ID_Form10Objective = O.ID_Form10Objective
Where OG.ID_Form10Goal = 4
Order by O.ID_Form10Objective
) as tblForm10Objectives;
If the select portion of the query is performed the columns are displayed so you can see the ObjectiveNumber is currently 0 where ID_Form10Goal = 4
Once the update runs I need for the ObjectiveNumber to show 1 , 2; since there are two rows for ID_Form10Goal = 4.
I had to introduce a new table to the logic of this update statement, the table name is tblForm10Goals. The objectives need to be pulled by ID_Agency instead of ID_Form10Goal I am getting an error message stating a "a multipart identifier 'dbo.tblForm10Objectives.ID_Form10Objective = rns.ID_Form10Objective' could not be bound. I am using the following SQL Update statement:
UPDATE dbo.tblForm10Objectives
SET ObjectiveNumber = rn
FROM tblForm10Goals As g
Left Join tblForm10GoalsObjectives gobs ON g.ID_Form10Goal = gobs.ID_Form10Goal
Right Join
(
SELECT
ROW_NUMBER() OVER (PARTITION BY g.ID_Agency
ORDER BY OB.ID_Form10Objective) AS rn,
OB.ID_Form10Objective
FROM tblForm10Goals g
LEFT JOIN dbo.tblForm10GoalsObjectives gobs ON g.ID_Form10Goal = gobs.ID_Form10Goal
RIGHT JOIN dbo.tblForm10Objectives OB ON gobs.ID_Form10Objective = OB.ID_Form10Objective
Where g.ID_Agency = 2
) rns ON dbo.tblForm10Objectives.ID_Form10Object = rns.ID_Form10Objective
Your example seems to be missing a closing parenthesis somewhere, and without the table structures to look at, I can't be certain of my answer. It seems you have two tables:
tblForm10Objectives
-------------------
ID_Form10Objective
ObjectiveNumber
...
and
tblForm10GoalsObjectives
------------------------
ID_Form10Goal
ID_Form10Objective
...
If this is the case, the following query should give you the results you desire:
UPDATE dbo.tblForm10Objectives
SET ObjectiveNumber = rn
FROM dbo.tblForm10Objectives INNER JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY OG.ID_Form10Goal
ORDER BY O.ID_Form10Objective) AS rn,
O.ID_Form10Objective
FROM dbo.tblForm10Objectives O INNER JOIN
dbo.tblForm10GoalsObjectives OG ON OG.ID_Form10Objective = O.ID_Form10Objective
Where OG.ID_Form10Goal = 4
) rns ON dbo.tblForm10Objectives.ID_Form10Objective = rns.ID_Form10Objective
If you run the inner SELECT statement, you will see the desired ObjectiveNumber values and the corresponding ID_Form10Objective that will get updated with those values.
If you post your table structures, I or someone else may be able to be of more help.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...