Multiple columns in where clause SQL - sql

I was wondering about the efficiency of a couple of different queries. The task is to pull multiple tables together that must be equal from multiple columns different. I am curious about what is the best way to approach this from an efficiency standpoint.
I have already checked this out, but it doesn't say anything about multiple column where clauses
SQL WHERE.. IN clause multiple columns
and this solution shows doesn't comment on the efficiency or best practices of the solution, and doesn't include a solution where the final query is a join from the two tables
Two columns in subquery in where clause
select ID, col1, col2, col3
from table1 a
left join
(select ID, col1, col2, col3 from table2) b on a.col1 = b.col1
where a.col2 = b.col2
and a.col3 = b.col3
or
select ID, col1, col2, col3
from table1 a
left join
(select ID, col1, col2, col3
from table2) b on a.col1 = b.col1
and a.col2 = b.col2
and a.col3 = b.col3

You do not need to do a join on a sub-select. You were very close on the second sample query. Because of the join based on 3 columns, I would make sure that the second table also has a single index using all 3 columns for optimal performance. Ex: Index on ( col1, col2, col3 ), and not 3 individual indexes, one for each column.
Also, try not to use aliases like a, b, c, unless it really correlates to the name of your table like "Accounts a", "Business b", "Customers c". Use an alias on your table references such as the abbreviation more closely matches its source.
select
t1.ID,
t1.col1,
t1.col2,
t1.col3,
t2.WhatColumnFromSecondTable,
t2.AnotherColumnFromTable2,
t2.AnythingElse
from
table1 t1
left join table2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
Then, if you are only looking for specific things within, you would add a WHERE clause to further filter your data down.

Related

Get rows with cross value

I am trying to select all those rows where t1.col1 value is in t1.col2 value AND t1.col2 value is in t1.col1 value. This imply that for a given row it exists the "mirror" value in another row.
For example, I would select both rows if:
col1 col2
A B
B A
But not select if the table contains only one of those 2 rows.
I tried the following query without success:
SELECT distinct t1.*
FROM table AS t1
INNER JOIN table t2 ON (t1.col1 = t2.col2 AND t2.col1 = t1.col2)
Any help would be very appreciated.
You can join the table to itself (using different aliases), for matching "opposing rows". For example:
select a.*
from t a
join t b on b.col2 = a.col1 and b.col1 = a.col2
Result:
col1 col2
----- ----
A B
B A
See running example at DB Fiddle.

How to use WITH CLAUSE ...INSERT query in SAP HANA?

Here I used With AS Clause.if i use SELECT query it is working fine but if i use insert query . it gives syntax error.
Can we use WITH ....INSERT in SAP HANA?
Code:
WITH t1 as
(
Select
col1,
col2,
col3
from table1),
t2 as
(
select
a.col4,
a.col5,
a.col1,
b.col3
from table2 a
left outer join t1
on a.col1 = b. col1)
insert into table3
select
c.col4,
c.col5,
c.col3
from t2;
In addition to Serban's correct answer, a general workaround for lack of CTE functionality is to create views instead.
In your case that could be:
create view t1 as
(select
col1,
col2,
col3
from
table1);
create view t2 as
(select
a.col4,
a.col5,
a.col1,
b.col3
from
table2 a
left outer join t1
on a.col1 = b. col1);
insert into table3
select
c.col4,
c.col5,
c.col3
from t2;
Based on my knowledge on HANA, CTEs (~ WITH-based queries) are currently not supported in INSERT clauses. This means that you should directly use sub-queries instead where possible.
IMO, the only scenario that is impossible to create without CTEs are recursive queries (which are not at all supported in HANA). As your query is not recursive, you can re-write and simplify it as follows:
INSERT INTO TABLE3
SELECT T2.COL4, T2.COL5, T1.COL3
FROM TABLE1 AS T1
LEFT OUTER JOIN TABLE2 AS T2
ON T1.COL1 = T2.COL1

compare two table values

I have 2 tables table A and table B. In table B we have to check if all the column entered is exactly as in table A, means if a row exists in table B then the same row will be there in table A too. also table A may have rows which are not in table B. if there is a row which is not in table A and is there in table B, an alert should be displayed showing which element is extra in table B.
Can we do this using join? if so what will be the sql code?
this is the best picture about joins i've ever seen :)
You probably want to have a look at the following article
SQL SERVER – Introduction to JOINs – Basic of JOINs
This should give you a very clear understanding of JOINs in Sql.
From there you should be able to find the solution.
As an example, you would have to look at something like
TABLE1
Col1
Col2
Col3
Col4
TABLE2
Col1
Col2
Col3
Col4
--all rows that match
SELECT *
FROM TABLE1 t1 INNER JOIN
TABLE2 t2 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
--rows only in TABLE1
SELECT *
FROM TABLE1 t1 LEFT JOIN
TABLE2 t2 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
WHERE t2.Col1 IS NULL
--rows only in TABLE2
SELECT *
FROM TABLE1 t2 LEFT JOIN
TABLE2 t1 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
WHERE t1.Col1 IS NULL
If you want to compare based on single column, then you can do something like this:
SELECT ID FROM B LEFT JOIN A ON B.ID = A.ID WHERE A.ID IS NULL;
The above query will give you the list of records that are not present in A but in B.
Instead if you want to compare the entire row, you can use the following approach:
SELECT COUNT(*) FROM B;
SELECT COUNT(*) FROM A;
SELECT COUNT(*) FROM (
SELECT * FROM B UNION SELECT * FROM A
)
If all the queries returns the same count then you can assume that both the tables are exactly equal.

Best self join technique when checking for duplicates

i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > #Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)
Why self join: this is an aggregate question.
Hope you have an index on col1, col2, ...
--DELETE table
--WHERE KeyCol NOT IN (
select
MIN(KeyCol) AS RowToKeep,
col1, col2,
from
table
GROUP BY
col12, col2
HAVING
COUNT(*) > 1
--)
However, this will take some time. Have a look at bulk delete techniques
You can use ROW_NUMBER() to find duplicate rows in one table.
You can check here
The two methods you give should be equivalent. I think most SQL engines would do exactly the same thing in both cases.
And, by the way, this won't work. You have to have at least one field that is differernt or every record will match itself.
You might want to try something more like:
select col1, col2, col3
from table
group by col1, col2, col3
having count(*)>1
For table with 100m+ rows, Using GROUPBY functions and using holding table will be optimized. Even though it translates into four queries.
STEP 1: create a holding key:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
STEP 2: Push all the duplicate entries into the holddups. This is required for Step 4.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 3: Delete the duplicate rows from the original table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 4: Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
To detect duplicates, you don't need to join:
SELECT col1, col2
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1
That should be much faster.
In my experience, SQL Server performance is really bad with OR conditions. Probably it is not the self join but that with table3 that causes the bad performance. But without seeing the plan, I would not be sure.
In this case, it might help to split your query into two:
One with a WHERE condition t3.uniqueoid = 1 and one with a WHERE condition for the other conditons on table3, and then use UNION ALL to append one to the other.

Update multiple columns in SQL with bound multi-part identifier

I'm trying to update multiple columns in a MS SQL statement using a sub-query. A search led me to something like:
UPDATE table1
SET col1 = a.col1, col2 = a.col2, col3 = a.col3 FROM
(SELECT col1, col2, col3 from table2 where <expression>) AS a
WHERE table1.col1 <expression>
Link
My problem is that in the inner WHERE expression I need a reference to a specific field in table1:
UPDATE table1
SET col1 = a.col1, col2 = a.col2, col3 = a.col3 FROM
(SELECT col1, col2, col3 from table2 where table1.col0 = table2.col0) AS a
WHERE table1.col1 <expression>
When I run that query I get "The multi-part identifier "table1.col0" could not be bound.
". Apparently when using that syntax SQL cannot bind the current table1 record in the subquery. Right now I am repeating the subquery for each field and using the syntax:
UPDATE table1
SET col1 = (subquery), col2 = (subquery)...
But that executes the subquery (which is very expensive) once per column, which I would like to avoid.
Any ideas?
in sql server, you can use a from clause in an update query. Join the tables as you would in a select. The table you are updating must be included in the joins.
update table_1
set field_1 = table_2.value_1
from table_1
inner join table_2
on (table_1.id = table_2.id)
Or if you dislike the join syntax this will also work:
UPDATE table1
SET col1 = a.col1, col2 = a.col2, col3 = a.col3
FROM table1, table2 as a
WHERE table1.col0 = a.col0
AND table1.col1 <expression>
Your car use CROSS APPLY command to update multiple columns from sub select
UPDATE t1
SET t1.col1 = a.col1, t1.col2 = a.col2, t1.col3 = a.col3
FROM table1 t1
CROSS APPLY
(SELECT col1, col2, col3 from table2 where table1.col0 = table2.col0) a(col1,col2,col3)