Access Removing CERTAIN PARTS of Duplicates in Union Query - sql

I'm working in Access 2007 and know nothing about SQL and very, very little VBA. I am trying to do a union query to join two tables, and delete the duplicates.
BUT, a lot of my duplicates have info in one entry that's not in the other. It's not a 100% exact duplicate.
Example,
Row 1: A, B, BLANK
Row 2: A, BLANK, C
I want it to MERGE both of these to end up as one row of A, B, C.
I found a similar question on here but I don't understand the answer at all. Any help would be greatly appreciated.

I would suggest a query like this:
select
coalesce(t1.a, t2.a) as a,
coalesce(t1.b, t2.b) as b,
coalesce(t1.c, t2.c) as c
from
table1 t1
inner join table2 t2 on t1.key = t2.key
Here, I have used the keyword coalesce. This will take the first non null value in a list of values. Also note that I have used key to indicate the column that is the same between the two rows. From your example it looks like A but I cannot be sure.

If your first table has all the key values, then you can do:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a;
If this isn't the case, you can use this rather arcane looking construct:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a
union all
select t2.a, t2.b, t2.c
from table2 as t2
where not exists (select 1 from table1 as t1 where t1.key = t2.key)
The first part of the union gets the rows where there is a key value in the first table. The second gets the rows where the key value is in the second but not the first.
Note this is much harder in Access than in other (dare I say "real") databases. MS Access doesn't support common table expressions (CTEs), unions in subqueries, or full outer join -- all of which would help simplify the query.

Related

How can this query be optimized please?

I have performence issues with the following query :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((T1.E IS NULL OR T2.E IS NULL) AND T1.F= T2.F)
OR((T1.E IS NOT NULL OR T2.E IS NOT NULL) AND T1.E = T2.E))
More than 30 min to return about 1000 rows
I've tried this :
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2
ON (((COALESCE(T1.E,-1) = COALESCE(T2.E,-1)
AND ((T1.F= T2.F)
OR(T1.E = T2.E)))))
but gives less results than the first one
Can you help me to find another way to write it in oreder to reduce execution time please ?
I'm using SQL Server 2016
Try this:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.F = T2.F
WHERE T1.E IS NULL OR T2.E IS NULL
UNION
SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.E = T2.E
WHERE COALESCE(T1.E, T2.E) IS NOT NULL
You might want a UNION ALL, but this should match the original.
This also exposes an interesting quirk in the original logic you may want to reconsider. If the E field from one table is NULL, but not the other, the original code would make checks on both the E and F fields. Which is interesting, because for the E field we know one side is null, but the other is not, so that case can't ever be true... but the logic says to still make the comparison.
It's hard to know what you're doing with the generic names, but there's definitely room to clean up that conditional check. Before worrying about matching to your first results, go back and make sure those first results clearly and accurately state what you want to accomplish, even if that means making the query even slower or longer.
Then, only when you are sure you have a query that both produces accurate results and describes them in an understandable way, you can start looking for different or clever ways to express the same logic that might perform better. But if you don't first take the step of better-defining your logic, you won't be able to validate your optimizations and you'll risk quickly producing incorrect data.
Non-equality conditions -- such as OR -- pretty much kill JOIN performance, especially in databases such as SQL Server that do not use indexes in such cases.
I would recommend a two-join approach, but you are going to have to fix the SELECT because it is not clear where the columns come from.
SELECT --A, B, C, D, E, F,
T1.A,
COALESCE(T2_1.B, T2_2.B) as B,
. . .
FROM TABLE1 T1 INNER JOIN
TABLE2 T2_1
ON T2.F = T1.F AND
(T1.E IS NULL OR T2_1.E IS NULL) LEFT JOIN
TABLE2 T2_2
ON T2_2.E = T1.E -- E cannot be NULL
WHERE T2_1.F IS NOT NULL OR T2_2.E IS NOT NULL; -- checks for a match for either condition
Then for performance, you want indexes on TABLE2(F, E) and TABLE2(E).
Statement OR might extremely decrease execution time. Try to get rid of it. Maybe something like this would do:
SELECT A,B,C,D,E,F
FROM TABLE1 T1
LEFT JOIN TABLE2 T2
ON T1.E = T2.E
LEFT JOIN TABLE2 T22
ON T1.F= T22.F
AND T2.E IS NULL
WHERE NOT (T2.E IS NULL AND T22.F IS NULL)

Is is possible to attach table alias to column names to figure out where columns are coming from?

I have a query that I'm trying to rework that has over 1,000 columns when I select * FROM several tables. I want to know if there is a way in SQL to tag the column alias with the table alias so i can know from which table the columns are from. It looks like the following:
SELECT *
FROM table1 t1
join table2 t2
join table3 t3
join table4 t4
Current column output:
id, id, id, id, name, name, name, name, order, order, order, order
Desired Column output:
t1.id, t1.name, t1.order, t2.id, t2.name, t2.order,t3.id, t3.name, t3.order, t4.id, t4.name, t4.order
this is a very simple example but you can imagine trying to fish out the column you need of a sea of 1,000 columns trying to figure out what table it came from! Any ideas??
I'm not aware of a way to prefix each column with the column alias. However I do know how you could easily break the columns into groups that would allow you to figure out which table each column comes from.
SELECT 'T1' as [Table1]
, t1.*
, 'T2' as [Table2]
, t2.*
, 'T3' as [Table3]
, t3.*
, t4.* as [Table4]
, t4.*
, 'T5' as [Table5]
, t5.*
FROM table1 t1
join table2 t2
join table3 t3
join table4 t4
This would break out the columns into groups by table and it would break a little bookmark before and after each group to help you understand where they're coming.
I know not exactly what you asked for but I believe it would help you a lot in figuring out what's from what tables.
Your other option is as others have said and specifiying the prefix on every column which it sounds like you don't want to do. However it can be a lot quicker to do this if you drag the columns from the Object Explorer - and use ALT-SHIFT to add the prefix to each column.
Here's an article about copying columns from object explorer - https://www.qumio.com/Blog/Lists/Posts/Post.aspx?ID=56
Her's an article about adjusting code using ALT+SHIFT - https://blogs.msdn.microsoft.com/sql_pfe_blog/2017/04/11/quick-tip-shiftalt-for-multiple-line-edits/
The first method would take less than a method, the 2nd method I could see taking less than 10 minutes even for 1,000 columns.
You have to assign non-default column aliases manually:
select t1.id as t1_id, t1.name as t1_name, t1.order as t1_order,
t2.id as t2_id, t2.name as t2_name, t2.order as t2_order,
. . .
You might find that a spreadsheet or query can help, if you have a lot of columns.
Some products may have exceptions, but generally no, you can't do that. You either have to use wildcards (SELECT *) or specify the columns you wish returned by full and complete name.
If you specify columns, you can "alias" them, set the column name to something other than the source name. For example (psuedo-code, leaving out the "ON" clause):
SELECT
T1.Id as T1_Id
,T2.Id as T2_Id
from table1 T1
join table2 T2
Note that you can combine table aliases with wildcards. For example:
SELECT
T2.*
from table1 T1
join table2 T2
join table3 T3
join table4 T5
will return all the columns from table2, and only from table2. This might help in revising your query by getting a list of the available columns in each table.

Get rid off from matching record and add not equal data

I have following tables:
Table a:
Name
T1
T2
T3
T4
Table b:
Name
T1
T2
T3
T4
T5
T6
I need to select all from table a and add what is not in table a from table b, result below:
T1
T2
T3
T4
T5
T6
Thanks for help
If you want all unique names from both the tables, use UNION:
select name from table_a
union
select name from table_b;
Here is another way:
select ta.name from ta
union all
select tb.name from tb
left join ta
on tb.name = ta.name
where ta.name is null
I would do this with an anti-join (a NOT IN condition). As written below, it will not work correctly if NULL is possible in that column in table a (in that case, the anti-join should be written with a NOT EXISTS condition). I assume the column is NOT NULL.
An anti-join is faster than a join, because as soon as a value from table b is also found in a, the joining for that row of table b stops and processing moves on to the next row. In a join, the joining continues, there is no such short-circuiting.
Oto's solution uses a join rather than an anti-join. However, I believe the Oracle query optimizer recognizes, in this simple case, that an anti-join is sufficient, and it will rewrite the query to use an anti-join. This is something you can verify by running Explain Plan on both queries. With that said, in a similar but much more complicated problem, the optimizer may not be able to "see" this shortcut; this is why I believe it's best to write anti-joins (and semi-joins, where we use IN or EXISTS conditions) explicitly, rather than rely on the optimizer.
The query should be
select name from a
union all
select name from b where name not in ( select name from a );
Here's one way to do that:
Select distinct Name
from (
select Name from Table A
UNION ALL
select Name from Table B
)

T-SQL "Where not in" using two columns

I want to select all records from a table T1 where the values in columns A and B has no matching tuple for the columns C and D in table T2.
In mysql “Where not in” using two columns I can read how to accomplish that using the form select A,B from T1 where (A,B) not in (SELECT C,D from T2), but that fails in T-SQL for me resulting in "Incorrect syntax near ','.".
So how do I do this?
Use a correlated sub-query:
...
WHERE
NOT EXISTS (
SELECT * FROM SecondaryTable WHERE c = FirstTable.a AND d = FirstTable.b
)
Make sure there's a composite index on SecondaryTable over (c, d), unless that table does not contain many rows.
You can't do this using a WHERE IN type statement.
Instead you could LEFT JOIN to the target table (T2) and select where T2.ID is NULL.
For example
SELECT
T1.*
FROM
T1 LEFT OUTER JOIN T2
ON T1.A = T2.C AND T1.B = T2.D
WHERE
T2.PrimaryKey IS NULL
will only return rows from T1 that don't have a corresponding row in T2.
I Used it in Mysql because in Mysql there isn't "EXCLUDE" statement.
This code:
Concates fields C and D of table T2 into one new field to make it easier to compare the columns.
Concates the fields A and B of table T1 into one new field to make it easier to compare the columns.
Selects all records where the value of the new field of T1 is not equal to the value of the new field of T2.
SQL-Statement:
SELECT T1.* FROM T1
WHERE CONCAT(T1.A,'Seperator', T1.B) NOT IN
(SELECT CONCAT(T2.C,'Seperator', T2.D) FROM T2)
Here is an example of the answer that worked for me:
SELECT Count(1)
FROM LCSource as s
JOIN FileTransaction as t
ON s.TrackingNumber = t.TrackingNumber
WHERE NOT EXISTS (
SELECT * FROM LCSourceFileTransaction
WHERE [LCSourceID] = s.[LCSourceID] AND [FileTransactionID] = t.[FileTransactionID]
)
You see both columns exist in LCSourceFileTransaction, but one occurs in LCSource and one occurs in FileTransaction and LCSourceFileTransaction is a mapping table. I want to find all records where the combination of the two columns is not in the mapping table. This works great. Hope this helps someone.

How to get rid of NOT EXISTS

I have a sql that is not very complex but sufficiently confusion that I question rather I have an equivalent or by coincident that the count are the same.
SQL1:
SELECT a, b
FROM table1
WHERE NOT EXISTS(
SELECT a, c
FROM TABLE2
WHERE table2.a != table1.a)
SQL2
SELECT table1.a, table1.b
FROM table1
LEFT JOIN table2 ON table2.a = table1.a
WHERE table2.a IS NULL
The count on the two are identical, but not sure if this is by chance, and I want to make sure the conversion do not change the original functionality.
That doesn't look the same - but it's close. Your LEFT JOIN syntax is the same as:
SELECT a, b
FROM table1
WHERE NOT EXIST(
SELECT a, c
FROM TABLE2
WHERE table2.a = table1.a)
Note the "=" instead of "!=" though. Are you sure that's not what you have?
Your actual query translates to something like "where no non-matching rows exist", which would be odd, but could be expressed by changing the JOIN condition:
SELECT a, b
FROM table1
LEFT JOIN table2 ON table2.a != table1.a
WHERE table2.a IS NULL
The first query, as you have it, returns all rows of TABLE1 where a matches all values of a in TABLE2. Therefore, it will return zero rows, unless there's a single not-null value for a in TABLE2, and that value exists in TABLE1. In that case, it will return as many rows as there are in TABLE1 with that value of a.
The second query is completely different. It will simply returns all rows of TABLE1 where a does not exist in TABLE2.
So it's "matches all" (query 1) vs. "does not match any" (query 2). The fact that you are getting the same number of rows is pure coincidence.
Your queries would be equivalent if you changed != for = in the first one, like this:
SELECT a, b
FROM table1
WHERE NOT EXISTS(
SELECT a, c
FROM TABLE2
WHERE table2.a = table1.a)
That gets you values of a in table1 that doesn't exist in table2. This is EXACTLY the same as:
SELECT table1.a, b
FROM table1
LEFT JOIN table2 ON table2.a = table1.a
WHERE table2.a IS NULL
As you have it though, they are NOT equivalent. You must change != for = in the first one to make them so.
For the first query i.e.
SELECT a, b
FROM table1
WHERE NOT EXISTS(
SELECT a, c
FROM TABLE2
WHERE table2.a != table1.a)
This will return all rows when all the values of a in table1 are the same one value and either all the rows in table2 are the same one value as table1 or table2 is the empty set. Otherwise, the result will be the empty set.
The same cannot be same of your second query.
SELECT a, b, c , d
FROM table1 t1
WHERE NOT EXISTS( SELECT * FROM table2 nx
WHERE nx.y = t1.a
)
;
There is one big advantage of this ("correlated subquery") method: table table2 is not visible from the outside query, and cannot pollute it, or confuse your thinking. The subquery just produces one bit of information: either it exists, or does not exist. to be or not to be ....
In that respect, the LEFT JOIN idiom is nastier, since you'll have to check the xxx IS NULL condition in the outer query, while the xxx references the table2 from the inner query.
Technically, there is no difference.