T-SQL Match records 1 to 1 without join condition

T-SQL Match records 1 to 1 without join condition - sql

I have a group of enitities which need to have another record associated with them from another table.
When I try to output an Id for the table to be matched on it doesn't work because you can only output from inserted, updated etc.
DECLARE #SignatureGlobalIdsTbl table (ID int,
CompanyBankAccountId int);
INSERT INTO GlobalIds (TypeId)
-- I Cannot output cba.Id into the table since its not from inserted
OUTPUT Inserted.Id,
cba.Id
INTO #SignatureGlobalIdsTbl (ID,
CompanyBankAccountId)
SELECT (#DocumentsGlobalTypeKey)
FROM CompanyBankAccounts cba
INNER JOIN Companies c ON c.CompanyId = cba.CompanyId
WHERE SignatureDocumentId IS NULL
AND (SignatureFile IS NOT NULL
AND SignatureFile != '');
INSERT INTO Documents (DocumentPath,
DocumentType,
DocumentIsExternal,
OwnerGlobalId,
OwnerGlobalTypeID,
DocumentName,
Extension,
GlobalId)
SELECT SignatureFile,
#SignatureDocumentTypeKey,
1,
CompanyGlobalId,
#OwnerGlobalTypeKey,
[dbo].[fnGetFileNameWithoutExtension](SignatureFile),
[dbo].[fnGetFileExtension](SignatureFile),
documentGlobalId
FROM (SELECT c.GlobalId AS CompanyGlobalId,
cba.*,
s.ID AS documentGlobalId
FROM CompanyBankAccounts cba
INNER JOIN Companies c ON c.CompanyId = cba.CompanyId
CROSS JOIN #SignatureGlobalIdsTbl s) info
WHERE SignatureDocumentId IS NULL
AND (SignatureFile IS NOT NULL
AND SignatureFile != '');
I Tried to use cross join to prevent cartesian production but that did not work. I also tried to output the rownumber over some value but I could not get that to be stored in the table either.
If I have two seperate queries which return the same amount of records, how can I pair the records together without creating cartesian production?

'When I try to output an Id for the table ... it doesn't work.'
This seems to be because one of the columns you want to OUTPUT is not actually part of the insert. It's an annoying problem and I wish SQL Server would allow us to do it.
Someone may have a much better answer for this than I do, but the way I usually approach this is
Create a temporary table/etc of the data I want to insert, with a column for ID (starts blank)
Do an insert of the correct amount of rows, and get the IDs out into another temporary table,
Assign the IDs as appropriate within the original temporary table
Go back and update the inserted rows with any additional data needed (though that's probably not needed here given you're just inserting a constant)
What this does is to flag/get the IDs ready for you to use, then you allocate them to your data as needed, then fill in the table with the data. It's relatively simple although it does do 2 table hits rather than 1.
Also consider doing it all within a transaction to keep the data consistent (though also probably not needed here).
How can I pair the records together?
A cross join unfortunately multiplies the rows (number of rows on left times the number of rows on the right). It is useful in some instances, but possibly not here.
I suggest when you do your inserts above, you get an identifier (e.g., companyID) in your temp table and join on that.
If you don't have a matching record and just want to assign them in order, you can use an answer similar to my answer in another recent question How to update multiple rows in a temp table with multiple values from another table using only one ID common between them?
Further notes
I suggest avoiding table variables (e.g., DECLARE #yourtable TABLE) and use temporary tables (CREATE TABLE #yourtable) instead - for performance reasons. If it's only a small amount of rows it's OK, but it gets worse as it gets larger as SQL Server assumed that table variables only have 1 row
In your bottom statement, why is there the SELECT statement in the FROM clause? Couldn't you just get rid of that select statement and have the FROM clause list the tables you want?

I figured out a way to have access to the output, by using a merge statement.
DECLARE #LogoGlobalIdsTbl TABLE (ID INT, companyBankAccountID INT)
MERGE GlobalIds
USING
(
SELECT (cba.CompanyBankAccountId)
FROM CompanyBankAccounts cba
INNER JOIN Companies c on c.CompanyId = cba.CompanyId
WHERE cba.LogoDocumentId IS NULL AND (cba.LogoFile IS NOT NUll AND cba.LogoFile != '')
) src ON (1=0)
WHEN NOT MATCHED
THEN INSERT ( TypeId )
VALUES (#DocumentsGlobalTypeKey)
OUTPUT [INSERTED].[Id], src.CompanyBankAccountId
INTO #LogoGlobalIdsTbl;

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!

If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2

try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

How do I invert my join critera in TSQL?

I think this is a relatively basic operation in SQL, but I'm having a hard time figuring it out.
I have two tables, a source table I'm trying to SELECT my data from, and a reference table containing serial #'s and transaction #'s (the source table also has these columns, and more). I want to do two different SELECTs, one where the serial/trans number pair exist in the source table and one where the serial/trans number do not exist. (the combination of serial and trans number are the primary keys for both these tables)
Intially I'm doing this with a join like this:
SELECT * FROM source s
INNER JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
I would think this should give me everything from the source table that has a serial/trans pair matching with one in the reference table. I'm not positive this is correct, but it is returning a reasonable number of results and I think it looks good.
When I go to do the opposite, get everything from source where the serial/trans pair do not match up with one in reference, I encounter a problem. I tried the following query:
SELECT * FROM source s
INNER JOIN reference r ON s.serial <> r.serial AND s.trans <> r.trans
When I run this the query goes on forever, it starts returning way more results than it should, more than are actually in the entire source table. It eventually ends with an OOM exception, I let it run for 20 min+. For some persepctive the source table I'm dealing with has about 13 million records, and reference table has about 105,000.
So how do I get the results I'm looking for? If it is not already clear, the number of results from the first query + results from second query should equal the total number of records in my source table.
Thanks for any help!

I think you need something like NOT EXISTS:
SELECT *
FROM source s
WHERE NOT EXISTS (SELECT 1
FROM reference r
WHERE s.serial = r.serial AND s.trans = r.trans)
The above query will get everything from source where the serial/trans pair do not match up with one in reference.
As cited in comment from #Dan below NOT EXISTS generally has a performance advantage over LEFT JOIN in a situation like this (see this for example).

"LEFT JOIN" match records that match together, and return NULL for those that doesn't fit.
So, non-matching records all have NULL in column coming from reference table. This way, you can than add a where clause to return only records with null value in a non-nullable column of table "reference" (like the primarykey).
SELECT * FROM source s
LEFT JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
WHERE R.serial IS NULL

insert multiple records into multiple columns of a table from many tables

I want to insert multiple records into multiple columns of a table from many tables. Below is my query, but I just get to insert the records into the first column. The other columns populate with nulls. Can you let me know what am I doing wrong?
INSERT INTO [dbo].[dim_one_staging] ([Parent], [Child], [Child_Alias], [Operator])
SELECT
p.[Parent], c.[Child], a.[Child_Alias], o.[Child_Operator]
FROM
[dbo].[Staging_Parent] AS p
INNER JOIN
[dbo].[Staging_Child] AS c ON p.[id] = c.[id]
INNER JOIN
[dbo].[Staging_Child_Alias] AS a ON c.[id] = a.[id]
INNER JOIN
[dbo].[Staging_Operator] AS o ON a.[id] = o.[id]

Your query is syntactically correct. That doesn't mean it does what you want it to do.
It could be that you have no values in
,c.[Child]
,a.[Child_Alias]
,o.[Child_Operator]
for the records that meet the rest of the query conditions and thus null is the correct value.
It could be that you have no valaues in the join tables for those fields but you should have values, in which case there is a bug in the way the data in being entered into these tables.
Or it could be that you are trying to get values froma table where the value is not required and put them into a table where it is and thus need to use coalesce (or default values) to define what should go in there if the value is null.
Yet another possibility is that there is trigger on the table that is nulling the values out.
Only you can detrmine what the problem is from teh data structure you have and the meaning attached to the data. I don't know how to fix your problem because I don't actually understand your datamodel as far as meaning (as opposed structure.)

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?

ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.

This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).

Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);

For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

Comparing two tables for similar values and Inserting values

I am working on databases and now I need some advice's from you guys..
I have 2 Tables with many rows and columns and these db's contain addresses of customers. Names of the tables are Data, Orders.
Now the problem is I have to search the addresses present in Table Orders with the addresses in Data using email as the criteria.
If there is a match in emails then its ok....or else we should insert the addresses of the table Orders in table Data. ...
I made this query but i am getting some error.
INSERT INTO orders (orders_id, customers_id, customers_cid, customers_vat_id, customers_name, customers_email_address) VALUES( (select o.* from Test.dbo.orders o where o.customers_email_address not in ( select a.email0 from CobraDemoData.dbo.Data a)))
Any help is much appreciated..
Thanks,
subash

You can insert the values directly from a select statement--don't use values when you want to do that. Additionally, you can use not exists in lieu of not in, as SQL Server usually runs that much faster, but it's case-by-case, so you can look at the query plan if it's really an issue.
insert into orders (orders_id, customers_id, customers_cid, customers_vat_id, customers_name, customers_email_address)
select
o.*
from
Test.dbo.orders o
where
not exists (
select 1
from
CobraDemoData.dbo.Data a
where
a.email0 = o.customers_email_address
)
Also, you probably want to specify the columns in the select statement, just to make sure the right columns are transposed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas