Join to grab only non-matching records SQL - sql

I have some data which I'm trying to clean in order to run further analysis on. One of the columns in the table is called record_type, this can either be NEW or DEL. This means that initially a NEW record might be added but then a DEL record would come in later to say that particular record is now expired (NEW and DEL records would be matched on the record_id). However both the NEW and DEL record would stay in the data, it doesn't get deleted.
So what I had planned to do is to create two CTEs, one for DEL records only and one for NEW records only:
WITH deleted_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'DEL'
)
, new_rec AS(
SELECT *
FROM main_table
WHERE record_type = 'NEW'
)
Then outer join on the record_id column in both the CTEs.
SELECT *
FROM new_rec
FULL OUTER JOIN deleted_rec ON deleted_rec.record_id = new_rec.record_id
The goal would have been for the output to only include records which haven't had a DEL record come in for that record_id so that way I can guarantee that all the type NEW records I have in my final table would not have had a DEL record come in for them at any point and they would therefore all be active. However, I had forgotten that FULL OUTER JOIN return everything rather than just what didn't match so is there a way to get around this to get my desired output?

I would just use a single query with exists logic:
SELECT *
FROM main_table t1
WHERE record_type = 'NEW' AND
NOT EXISTS (SELECT 1 FROM main_table t2
WHERE t2.id = t1.id AND t2.record_type = 'DEL');
In plain English, the above query says to find all records which are NEW which also do not have associated with the same id another record having DEL.

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.
Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

sql join: Detect changes

Periodically, I want to compare a global sql table (called "resource") with a local backup one (called "region_db") to see if a field has been changed. The field I'm monitoring this way is called "state", and the primary key is called "id". Currently I'm doing
SELECT id, state FROM resource
Then manually going through the resulting rows in a loop. For each (id, state) tuple, I do
SELECT state FROM region_db WHERE id = id
And check if the state from the local region_db matches the one from the global resource db. I'm able to detect two cases this way: 1) when a new id is added to resource, and 2) when the state of an existing row changes.
However, I'm missing the case where a row is deleted from the resource table.
I'm thinking about using JOINs but not sure about how to efficiently distinguish between the three cases (modify existing, add new, and delete row from resource table) while minimizing the number of JOINs / DB operations.
You can use full join:
select coalesce(r.id, reg.id) as id,
(case when r.id is null then 'DELETED'
when reg.id is null then 'CREATED'
else 'UPDATED'
end)
from resource r full join
region_db reg
on r.id = reg.id
where r.id is null or reg.id is null or r.state <> reg.state; -- something changed
WITH joined AS (
SELECT
region.state as 'region_state',
resource.state as 'global_state'
FROM
resource
INNER JOIN
region_db
ON
resource.id = region_db.id
) SELECT * FROM joined WHERE region_state <> 'global_state';
;
This query will get you a table that reflects when the state of an existing row changes. If you do a left join instead of an inner join in the with query, you will get records that may have been added but not backed up yet to region_db. Like-wise, with a right join, you may get records that have been deleted but not propagated yet.
Hopefully this helps.
You could use a UNION ALL that should tell you the differences in the tables -- basically checking for where count(*) = 1 meaning where the rows don't match (because of the GROUP BY)
SELECT id,state
FROM (
SELECT id, state FROM resource
UNION ALL
SELECT id,state FROM region_db
) tbl
GROUP BY id, state
HAVING count(*) = 1
ORDER BY id;

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.

Joining SQL queries with where clause

I have 2 tables in a MYSQL database wich look like this:
Klant:
ID Naam Email Soort Status
6 test test test2
7 status test test test 20
8 soort test test test
9 soort test 2 test2 Museum
Mail:
ID Content Datum Titel
1 (lots of encoded HTML) 18-03-13 test
2 (lots of encoded HTML) 18-03-13 test2
4 (lots of encoded HTML) 18-03-13 alles weer testen
(yes, I'm still testing alot^^)
Now I have a SQL query that selects all from 'Klant' with a where clause which gets the ID from a previous page:
$strSQL = "SELECT * FROM Klant WHERE ID = '".$_GET["ID"]."' ";
What I want is to JOIN this query with the following query:
SELECT ID, Titel FROM Mail;
EDIT:
From all your answers and comments I think I begin to think my question maybe is totally wrong.. I'll explain where I need it for and I might not even need JOIN? I currently have a table wich includes the data from 'Klant' which looks like this:
The meaning is that I add another table which includes all the ID's and Title's from 'Mail'. I am sorry for the confusion I may have caused with you since I wasn't that clear with my question. I hope that this may clear up what I want and you guys can maybe tell me if I even need to JOIN this or can I do something else?
I am still a student and this is the first time I've had to use JOIN and I can't figure this out. If anyone can show me how to do this or push me in the right direction it would be great!
SELECT * FROM Klant t1
JOIN
SELECT ID, Titel FROM Mail t2
ON t1.ID = t2.ID
WHERE t1.Name = 'test'
To have the desired result do the following:
SELECT * FROM Klant t1
JOIN
SELECT ID, Titel FROM Mail t2
ON t1.ID = t2.ID
And if you want to have a specific row than just add the where clause:
WHERE t1.ID = 6
or
WHERE t1.Naam = 'test'
and so on
It is difficult to see how a JOIN is applicable in the example in your question.
A JOIN let's you pull information from more than one table based on a relationship. As far as I can see, your table's don't have any way to link a row in one with a row in the other, unless SteveP is correct and your id's provide that relationship.
For example, if your klant table had a mail_id column then you could do
SELECT *
FROM klant
JOIN mail ON klant.mail_id = mail.id
and this would return a row for every matching pair of rows in the two tables. Alternatively you could use a LEFT OUTER JOIN to pull back all rows from the table on the left of the JOIN and optionally data from a matching row on the right.
If there is nothing joining the table, you can use a CROSS JOIN which will return you a full cartesian of each row in table1 with every row in table2.
Something people often confuse with a JOIN is a UNION which allows you to write 2 SELECT statements and return the result set of both combined/joined together, but these should return the same columns in each query (e.g. selecting NULL in place of the column in a query if the query doesn't pull data for that column)
I'm guess that you want to join on the ID field which is common between the tables.
select * from Klant, Mail where Klant.ID = '".$_GET["ID"]."' and Klant.ID = Mail.ID
You can also do
select * from Klant
join Mail on Mail.ID = Klant.ID
where Klant.ID = '".$_GET["ID"]."'
You can do this directly by using the following query :
select k.ID,k.Naam, k.Email,k.Soort,k.Status, m.ID,m.Titel from Klant k, Mail m where k.ID = m.ID and k.ID = '".$_GET["ID"]."'

How to update table based on row index?

I made a copy of an existing table like this:
select * into table_copy from table
Since then I've made some schema changes to table (added/removed columns, changed order of columns etc). Now I need to run an update statement to populate a new column I added like this:
update t
set t.SomeNewColumn = copy.SomeOldColumn
from t
However, how do I get the second table in here based on row index instead of some column value matching up?
Note: Both tables still have equal number of rows in their original positions.
You cannot join the tables without a key to define each row uniquely, the position of the data in the table has no bearing on the situation.
If you tables do not have a primary key you need to define one.
If you have an ID on it, you can do this:
update t set
t.SomeNewColumn = copy.SomeOldColumn
from
table t
inner join table_copy copy on
t.id = copy.id
If you have no way to uniquely identify the row and are relying on the order of the rows, you're out of luck, as row order is not reliable in any version of SQL Server (nor most other RDBMSes).
You could use this to update them by matching ids
UPDATE
t
SET
t.SomeNewColumn = other_table.SomeOldColumn,
FROM
original_table t
INNER JOIN
other_table copy
ON
t.id = copy.id
or if you don't have the ids you might be able to pull out something by using ROW_NUMBER function to enumerate the records, but that's a long shot(I haven't checked if it's possible).
If you're updating, you'll need a primary key to join on. Usually in that case, the others' answers will suffice. If for some reason you still need to update the table with a resultset in a certain order, you can do this:
UPDATE t SET t.SomeNewColumn = copy.SomeOldColumn
FROM table t
JOIN (SELECT ROW_NUMBER() OVER(ORDER BY id) AS row, id, SomeNewColumn FROM table) t2
ON t2.Id = t.Id
JOIN (SELECT ROW_NUMBER() OVER(ORDER BY id) AS row, SomeOldColumn FROM copytable) copy
ON copy.row = t2.row
You get the new table and its row numbers in the order you want, join the old table and its row numbers in the order you want, and join back to the new table so the query has something to directly update.