Combine two tables on 4 similar columns and keep the unique columns - sql

I have two tables, one table has columns
"DATE","PRODUCT","SUBPRODUCT","ACTUALS","RANDOM1"
The 2nd table has
"DATE,"PRODUCT","SUBPRODUCT","ACTUALS","RANDOM2"
Date, product, subproduct, actuals are the same values for both tables, and random1 and random 2 are unique values.
I want the final table to be
"DATE","PRODUCT","SUBPRODUCT","ACTUALS","RANDOM1","RANDOM2"

A simple way in standard SQL uses join with the using clause:
select *
from table1 t1 join
table2 t2
using (col1, col2, col3, col4) ;
SQL Server doesn't support the using clause, so alas, you need to list columns. A typical method is:
select t1.*, t2.random2
from table1 t1 join
table2 t2
on t1.date = t2.date and
t1.product = t2.product and
t1.subproduct = t2.subproduct and
t1.actuals = t2.actuals;
Note: This only returns rows that match between both tables. If you want all rows, you can use a full join and tweak the select.

You can join tables using where clause like below
select t1.DATE, t1.PRODUCT, t1.SUBPRODUCT, t1.ACTUALS, t1.RANDOM1, t2.RANDOM2
from table1 t1, table2 t2
where t1.DATE = t2.DATE and t1.PRODUCT = t2.PRODUCT and t1.SUBPRODUCT = t2.SUBPRODUCT and t1.ACTUALS = t2.ACTUALS
It will return data like this

Related

postgres select values from a table with an extra dependency

So, I would like the next query for postgres:
SELECT name
FROM Table1 as T1
WHERE T1.id = (
SELECT id
FROM Table2 AS T2
WHERE T2.active=true)
So, I need to get all the values from the first table, whose id matches the ones set as active in another table.
You don't need a sub query for this. Use a join. It will be a lot more efficient.
SELECT T1.name
FROM Table1 as T1
INNER JOIN Table2 as T2 ON T2.id = T1.id AND T2.active=true
The equality operator imposes that the subquery should return a single record. You want IN instead, which accepts a resultset:
SELECT T1.name
FROM Table1 as T1
WHERE T1.id IN (SELECT id FROM Table2 AS T2 WHERE T2.active)
This can also be expressed with EXISTS:
SELECT T1.name
FROM Table1 as T1
WHERE EXISTS (SELECT 1 FROM Table2 AS T2 WHERE T2.id = T1.id AND T2.active)
Note that in Postgres condition T2.active = true can be shortened T2.active.
For performance, you want an index on Table2(id, active) and another on Table1(id).

SQL select rows based on 2 col criteria in a separate table

I wish to select some rows from a table based on values from another table:
Table1 (wish to select from here)
Columns Date, Name, Pay
Table2 (contains a 'list' that determines what is selected from Table1)
Columns Date, Name
The query I wish to write is to:
Select Date,Name,Pay from Table1 where Date,Name is present in Table2
I got as far as being able to do it on one value
SELECT Date,Name,Pay FROM Table1 WHERE Table1.Name IN (Select Table2.name from Table2)
but Im stuck with how to add the date qualifier. The names in either table are not unique, what makes them unique is the date and name combination.
If I understood your question clearly, you want to apply join
select t1.Date,t1.Name,t1.Pay FROM Table1 t1 inner join Table2 t2
ON t1.Name = t2.Name and t1.Date = t2.Date
The generic SQL solution uses exists:
Select Date, Name, Pay
from Table1 t1
where exists (select 1 from table2 t2 where t2.date = t1.date and t2.name = t1.name);
This will not match values in table 2 if they are NULL. For that, you would need a NULL-safe comparison operation. The ANSI standard is is not distinct from.
Some databases support in with tuples. In those databases, you can write:
Select Date, Name, Pay
from Table1 t1
where (t1.date, t1.name) in (select t2.date, t2.name from table2 t2);
Once again, this might have an issue with NULL values, depending on how you want to treat them.
Interestingly, you could extend your logic by using a correlated subquery:
SELECT Date, Name, Pay
FROM Table1 t1
WHERE t1.Name IN (Select t2.name from Table2 t2 where t2.date = t1.date);
Although this does what you want, I think the previous two approaches are clearer in their intent.
I should note that you could use a join for this. However, that would return duplicate values if you had duplicates in table2. For that reason, I prefer the exists or in methods, because these have no risk of duplicating values.
You can use alias (and instead of subquery a join ) for a more easy vision of your related table
SELECT a.Date, a.Name, a.Pay
FROM Table1 a
inner join Table2 b on a.name = b.name
in this case date is obtain from table1, changing the alias or addingi both column if you need more

Adding rows of one table to rows of another table where two tables are matched by ID

In an Access 2013 database, I have a table t1 and another table t2. They both have the same number of columns and column names are also the same. Table t2 have a number of overlaps with id variable of table t1. I am trying to make a new table t3 where I add all the rows of t1 and only those rows of t2 that are not matched by an id variable present in both the tables t1 and t2. I used something like
Create Table t3 As Select * From (Select t1.* From t1 Inner Join t2 on t1.ID_Number = t2. ID_Number)
This throws syntax error. However, even if it worked this will select those rows that matches ID_Number in both the tables. I have tried various other codes and browsed through many other relevant stackoverflow post but could not resolve it.
try this :
SELECT t1.*
INTO t3
FROM t1
INNER JOIN t2
ON t1.ID_Number = t2.ID_Number
I am not sure about Access syntax but can this 2-step solution work?
select t1.* into t3 from t1 where t1.ID_Number not in (select t2.ID_Number from t2)
select t2.* into t3 from t2 where t2.ID_Number not in (select t1.ID_Number from t1)

query in sql server 2005

there are two tables ...know i need
1st condition:
all the records in table
2 nd condition:
In table2 i need only records which have data
...i want one query for the aove two conditions...
SELECT
*
FROM Table1 t1
INNER JOIN Table2 t2 on t1.PK = t2.FK
This will return all rows in table1 that have at least one corresponding row in table2
But if you want all rows from t1 no matter what then this may be what you want
SELECT
*
FROM Table1 t1
LEFT JOIN Table2 t2 on t1.PK = t2.FK
Finally, As I dont know the structure in place perhaps table1 and table2 have similar structures. If this is true perhaps you may want a union of the two
SELECT
*
FROM Table1 t1
UNION ALL
SELECT
*
FROM Table2 t2

Best self join technique when checking for duplicates

i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > #Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)
Why self join: this is an aggregate question.
Hope you have an index on col1, col2, ...
--DELETE table
--WHERE KeyCol NOT IN (
select
MIN(KeyCol) AS RowToKeep,
col1, col2,
from
table
GROUP BY
col12, col2
HAVING
COUNT(*) > 1
--)
However, this will take some time. Have a look at bulk delete techniques
You can use ROW_NUMBER() to find duplicate rows in one table.
You can check here
The two methods you give should be equivalent. I think most SQL engines would do exactly the same thing in both cases.
And, by the way, this won't work. You have to have at least one field that is differernt or every record will match itself.
You might want to try something more like:
select col1, col2, col3
from table
group by col1, col2, col3
having count(*)>1
For table with 100m+ rows, Using GROUPBY functions and using holding table will be optimized. Even though it translates into four queries.
STEP 1: create a holding key:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
STEP 2: Push all the duplicate entries into the holddups. This is required for Step 4.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 3: Delete the duplicate rows from the original table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 4: Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
To detect duplicates, you don't need to join:
SELECT col1, col2
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1
That should be much faster.
In my experience, SQL Server performance is really bad with OR conditions. Probably it is not the self join but that with table3 that causes the bad performance. But without seeing the plan, I would not be sure.
In this case, it might help to split your query into two:
One with a WHERE condition t3.uniqueoid = 1 and one with a WHERE condition for the other conditons on table3, and then use UNION ALL to append one to the other.