Optimal query writing - sql

I have 3 tables t1,t2,t3 each having 35K records.
select t1.col1,t2.col2,t3.col3
from table1 t1,table2 t2,table3 t3
where t1.col1 = t2.col1
and t1.col1 = 100
and t3.col3 = t2.col3
and t3.col4 = 101
and t1.col2 = 102;
It takes more time to return the result (15 secs). I have proper indexes.
What is the optimal way of rewriting it?

It's probably best to run your query with Explain Extended placed in front of it. That will give you a good idea of what indexes it is or isn't using. Include the output in your question if you need help parsing the results.

If you have an index based on t1.Col1 or t1.Col2, use THAT as the first part of your WHERE clause. Then, by using the "STRAIGHT_JOIN" clause, it tells MySQL to do exactly as I've listed here. Yes, this is older ANSI querying syntax which is still completely valid (as you originally had too), but should come out quickly with a response. The first two of the where clause will immediately restrict the dataset while the rest actually completes the joins to the other tables...
select STRAIGHT_JOIN
t1.Col1,
t2.Col2,
t3.Col3
from
table1 t1,
table2 t2,
table3 t3
where
t1.Col1 = 100
and t1.Col2 = 102
and t1.col1 = t2.col1
and t2.col3 = t3.col3
and t3.Col4 = 101

Related

Table values as parameters for SQL Server stored procedure

I have table1 :col1, col2, col3 and table2: col1, col2, col3
My goal is to get all records
where
t2.col1 like t1.col1 and
t2.col2 like t1.col2 and
t2.col3 like t1.col3
........................................
One variant is the inner join method
select * from t2 inner join t1 on
t2.col1 like t1.col1 and
t2.col2 like t1.col2 and
t2.col3 like t1.col3
........................................
Another variant is a stored procedure based on the 'where' clause:
select * from t2
where t2.col1 like parameter1 and
t2.col2 like parameter2 and
t2.col3 like parameter3
Then I call the procedure in VBA and I use a for next loop to go through all values/parameters from an excel table1
........................................
Execution time for the join method is slower(~20, 30%) than vba+sp method, but unfortunately, for a big set of parameters, excel freeze.
........................................
Is possible to apply loop method and go thru table1 values, as parameters for the stored procedure, inside sql server, in a sql script, no vba or c++ or perl etc. ?
I am a user with no access to db/tables design.
Thank you
First of all, your two queries in the question are not equivalent:
select * from t2 inner join t1 on
t1.col1 like t2.col1 and
t1.col2 like t2.col2 and
t1.col3 like t2.col3
Here you have t1 like t2
select * from t2
where t2.col1 like parameter1 and
t2.col2 like parameter2 and
t2.col3 like parameter3
Here it is other way around t2 like t1.
End result would be different.
Based on the sample data it looks like it should be t2 like t1.
You can try to re-write the query using CROSS APPLY instead of JOIN, but it is unlikely to make any difference in performance.
SELECT *
FROM
t1
CROSS APPLY
(
SELECT
FROM t2
WHERE
t2.col1 like t1.col1
and t2.col2 like t1.col2
and t2.col3 like t1.col3
) AS A
;
This query structure mimics your stored procedure approach where for each row from t1 you select a set of rows from t2.

Convert OUTER APPLY to LEFT JOIN

We have query which is slow in production(for some internal reason),
SELECT T2.Date
FROM Table1 T1
OUTER APPLY
(
SELECT TOP 1 T2.[DATE]
FROM Table2 T2
WHERE T1.Col1 = T2.Col1
AND T1.Col2 = T2.Col2
ORDER BY T2.[Date] DESC
) T2
But when I convert to LEFT JOIN it become fast,
SELECT Max(T2.[Date])
FROM Table1 T1
LEFT JOIN Table2 T2
ON T1.Col1 = T2.Col1
AND T1.Col2 = T2.Col2
GROUP BY T1.Col1, T1.Col2
Can we say that both queries are equal? If not then how to convert it properly.
The queries are not exactly the same. It is important to understand the differences.
If t1.col1/t1.col2 are duplicated, then the first query returns a separate row for each duplication. The second combines them into a single row.
If either t1.col1 or t1.col2 are NULL, then the first query will return NULL for the maximum date. The second will return a row and the appropriate maximum.
That said, the two queries should have similar performance, particularly if you have an index on table2(col1, col2, date). I should note that under some circumstances the apply method is faster than joins, so relative performance depends on circumstances.

How to implement MINUS operator in Google Big Query

I am trying to implement MINUS operation in Google Big Query but looks like there is no documentation in Query Reference. Can somebody share your thoughts on this. I have done it in regular SQL in the past but not sure if Google is offering it in Big Query. Your inputs are appreciated. Thank you.
Just adding an update here since this post still comes up in Google Search. BigQuery now supports the EXCEPT set operator.
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#except
select * from t1
EXCEPT DISTINCT
select * from t2;
If BigQuery does not offer minus or except, you can do the same thing with not exists:
select t1.*
from table1 t1
where not exists (select 1
from table2 t2
where t2.col1 = t1.col1 and t2.col2 = t1.col2 . . .
);
This works correctly for non-NULL values. For NULL values, you need a bit more effort. And, this can also be written as a left join:
select t1.*
from table1 t1 left join
table2 t2
on t2.col1 = t1.col1 and t2.col2 = t1.col2
where t2.col1 is null;
One of these should be acceptable to bigquery.
What I usually do is similar to Linoff's answer and works always, independently of NULL fileds:
SELECT t1.*
FROM table1 t1 LEFT JOIN
(SELECT 1 AS aux, * FROM table2 t2)
ON t2.col1 = t1.col1 and t2.col2 = t1.col2
WHERE t2.aux IS NULL;
This solves the problems with nullable fields.
Notice: even though this is an old thread, I'm commenting just for sake of completeness if somebody gets to this page in the future.

Explanation about nested loop

Nested Loop Join
In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column.
Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index.
example:
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36
can you please explain which is outer input and inner input. Here we have two condition that is T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36 table is first filtered by which condition
I would rather write the query in this way:
SELECT T1.Col2
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.Col1 = T2.Col1
WHERE T1.Col1 BETWEEN 1 AND 36
The second condition is not a join condition, but a where condition (Table2 is not involved in solving that condition).
The optimizer of your database should be able to decide if filtering first Table1 is faster than join Table2 and then filter, I imagine that the later can be true if Table2 is quite small. Also indexes can change the query plan.
Anyway if you want to be sure about how your database is executing your query just check the query plan.
SELECT T1.Col2
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.Col1 = T2.Col1
WHERE T1.Col1 >=1 and T1.Col1<36
you'll find better explaination to join follow the link
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

SQL programming

How can I update a table (MyTable) from a aprticular database (DB1) by equating the said MyTable from a different database (DB2) but it has the same tablename with the one I need to update. I tried putting alias to my program but it doesn't work. Need Help
A SQL Server answer is
UPDATE t2
SET t2.Col1 = t1.Col1, t2.Col2 = t1.Col2
FROM DB1.dbo.MyTable AS t1 INNER JOIN
DB2.dbo.MyTable AS t2 ON t1.PK = t2.PK
(Obviously alter the joining condition and update column list as required)