SQL SELECT compare values from two tables (without UNION ALL) - sql

I have table T1:
ID IMPACT
1 3
I have table T2
PRIORITY URGENCY
1 2
I need to do the SELECT from T1 table.
I would like to get all the rows from T1 where IMPACT is greater than PRIORITY from T2.
I am working in some IBM application where it is only possible to start with SQL statement after the WHERE clause from the first table T1.
So query (unfortunately) must always start with "SELECT * FROM T1 WHERE..."
This cannot be changed (please have that in mind).
This means that I cannot use some JOIN or UNION ALL statement after the "FROM T1" part because I can start to write SQL query only after the WHERE clause.
SELECT * FROM T1
WHERE
IMPACT> SELECT PRIORITY FROM T2 WHERE URGENCY=2
But I am getting an error for this statement.
Please is it possible to write SQL query starting with:
SELECT * FROM T1
WHERE

You want a subquery, so all you need are parentheses:
SELECT *
FROM T1
WHERE IMPACT > (SELECT T2.PRIORITY FROM T2 WHERE T2.URGENCY = 2)
This assumes that the subquery returns one row (or zero rows, in which case nothing is returned). If the subquery can return more than one row, you should ask another question and be very explicit about what you want done.
One reasonable interpretation (for more than one row) is:
SELECT *
FROM T1
WHERE IMPACT > (SELECT MAX(T2.PRIORITY) FROM T2 WHERE T2.URGENCY = 2)

I would use exists:
select t1.*
from t1
where exists (select 1 from t2 where t1.IMPACT > t2.PRIORITY);

Related

works fine in one case / (column ambiguously defined)error in another

I have 2 tables with a column named the same. Column is BAN_KEY
when I run this query
with
t1 as
(
select *
from table1
),
t2 as
(
select *
from table2
)
t3 as
(
select *
from t1, t2
where t1.c1 = t2.c2
)
select * from t3
I get error column ambiguously defined, but when I do it this way
with
t1 as
(
select *
from table1
),
t2 as
(
select *
from table2
)
select *
from t1, t2
where t1.c1 = t2.c2
The result looks like this
BAN_KEY | BAN_KEY_1 | other columns
some values...
What's the reason for this?
First, learn to use proper JOIN syntax. Simple rule: Never use commas in the FROM clause. Always use proper, explicit JOINs.
That has nothing to do with your question. The answer is much simpler. For a CTE (or table), Oracle needs to be able to assign column names to the result so they can be access subsequently. It accepts the column names that you provide, assuming that your intention is correct. Duplicate column names are not allowed because the reference would be ambiguous; hence the error.
Why doesn't this happen for a result set? Oracle does not require that the columns in the result set of a query be unique. For convenience, though, it distinguishes between columns with the same name.

Basic difference between two tables

I am attempting a very basic difference function in postgresql. Table 1 and Table 2 have identical columns. Only difference is Table 1 has some surplus rows. I would like to select for surplus rows only:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT * from table2);
The query above returns nothing when I know there are surplus rows.
I think you are looking for except:
select t1.*
from table1 t1
except
select t2.*
from table2 t2;
Note that the two tables must have the same number of columns, and the columns must all be of the same type. You can review the documentation here.
If you wish to use NOT EXISTS you're missing the joining of your table's keys in the inner where clause. Try:
SELECT *
FROM table1 t1
WHERE NOT EXISTS (SELECT * from table2 t2 WHERE t2.id = t1.id);

Should I avoid IN() because slower than EXISTS() [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL Server IN vs. EXISTS Performance
Should I avoid IN() because slower than EXISTS()?
SELECT * FROM TABLE1 t1 WHERE EXISTS (SELECT 1 FROM TABLE2 t2 WHERE t1.ID = t2.ID)
VS
SELECT * FROM TABLE1 t1 WHERE t1.ID IN(SELECT t2.ID FROM TABLE2 t2)
From my investigation, I set SHOWPLAN_ALL. I get the same execution plan and estimation cost. The index(pk) is used, seek on both query. No difference.
What are other scenarios or other cases to make big difference result from both query? Is optimizer so optimization for me to get same execution plan?
Do neither. Do this:
SELECT DISTINCT T1.*
FROM TABLE1 t1
JOIN TABLE2 t2 ON t1.ID = t2.ID;
This will out perform anything else by orders of magnitude.
Both queries will produce the same execution plan (assuming no indexes were created): two table scans and one nested loop (join).
The join, suggested by Bohemian, will do a Hash Match instead of the loop, which I've always heard (and here is a proof: Link) is the worst kind of join.
Among IN and EXIST (your actuall question), EXISTS returs better performance (take a lok at: Link)
If your table T2 has a lot of records, EXISTS is the better approach hands down, because when your database find a record that match your requirement, the condition will be evaluated to true and it stopped the scan from T2. However, in the IN clause, you're scanning your Table2 for every row in table1.
IN is better than Exists when you have a bunch of values, or few values in the subquery.
Expandad a little my answer, based on Ask Tom answer:
In a Select with in, for example:
Select * from T1 where x in ( select y from T2 )
is usually processed as:
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table (typically).
In an exist query like:
select * from t1 where exists ( select null from t2 where y = x )
That is processed more like:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).
When is where exists appropriate and in appropriate?
Use EXISTS when... Subquery T2 is huge and takes a long time and T1 is relatively small and executing (select null from t2 where y = x.x ) is very very fast
Use IN when... The result of the subquery is small -- then IN is typicaly more appropriate.
If both the subquery and the outer table are huge -- either might work as well as the other -- depends on the indexes and other factors.

SQLite table aliases effecting the performance of queries

How does SQLite internally treats the alias?
Does creating a table name alias internally creates a copy of the same table or does it just refers to the same table without creating a copy?
When I create multiple aliases of the same table in my code, performance of the query is severely hit!
In my case, I have one table, call it MainTable with namely 2 columns, name and value.
I want to select multiple values in one row as different columns. for example
Name: a,b,c,d,e,f
Value: p,q,r,s,t,u
such that a corresponds to p and so on.
I want to select values for names a,b,c and d in one row => p,q,r,s
So I write a query
SELECT t1.name, t2.name, t3.name, t4.name
FROM MainTable t1, MainTable t2, MainTable t3, MainTable t4
WHERE t1.name = 'a' and t2.name = 'b' and t3.name = 'c' and t4.name = 'd';
This way f writing the query kills the performance when size of the table increases as rightly pointed above by Larry.
Is there any efficient way to retrieve this result. I am bad at SQL queries :(
If you list the same table more than once in your SQL statement and do not supply conditions on which to JOIN the tables, you are creating a cartesian JOIN in your result set and it will be enormous:
SELECT * FROM MyTable A, MyTable B;
if MyTable has 1000 records, will create a result set with one million records. Any other selection criteria you include will then have to be evaluated across all one million records.
I'm not sure that's what you're doing (your question is very unclear), but it may be a start on solving your problem.
Updated answer now that the poster has added the query that is being executed.
You're going to have to get a little tricky to get the results you want. You need to use CASE and MAX and, unfortunately, the syntax for CASE is a little verbose:
SELECT MAX(CASE WHEN name='a' THEN value ELSE NULL END),
MAX(CASE WHEN name='b' THEN value ELSE NULL END),
MAX(CASE WHEN name='c' THEN value ELSE NULL END),
MAX(CASE WHEN name='d' THEN value ELSE NULL END)
FROM MainTable WHERE name IN ('a','b','c','d');
Please give that a try against your actual database and see what you get (of course, you want to make sure the column name is indexed).
Assuming you have table dbo.Customers with a million rows
SELECT * from dbo.Customers A
does not result in a copy of the table being created.
As Larry pointed out, the query as it stands is doing a cartesian product across your table four times which, as you has observed, kills your performance.
The updated ticket states the desire is to have 4 values from different queries in a single row. That's fairly simple, assuming this syntax is valid for sqllite
You can see that the following four queries when run in serial produce the desired value but in 4 rows.
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a';
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b';
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c';
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d';
The trick is to simply run them as sub queries like so there are 5 queries: 1 driver query, 4 sub's doing all the work. This pattern will only work if there is one row returned.
SELECT
(
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a'
) AS t1_name
,
(
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b'
) AS t2_name
,
(
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c'
) AS t3_name
,
(
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d'
) AS t4_name
Aliasing a table will result a reference to the original table that exists for the duration of the SQL statement.

SQL query to limit number of rows having distinct values

Is there a way in SQL to use a query that is equivalent to the following:
select * from table1, table2 where some_join_condition
and some_other_condition and count(distinct(table1.id)) < some_number;
Let us say table1 is an employee table. Then a join will cause data about a single employee to be spread across multiple rows. I want to limit the number of distinct employees returned to some number. A condition on row number or something similar will not be sufficient in this case.
So what is the best way to get the same effect the same output as intended by the above query?
select *
from (select * from employee where rownum < some_number and some_id_filter), table2
where some_join_condition and some_other_condition;
This will work for nearly all DBs
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON some_join_condition
AND some_other_condition
INNER JOIN (
SELECT t1.id
FROM table1 t1
HAVING
count(t1.ID) > someNumber
) on t1.id = t1.id
Some DBs have special syntax to make this a little bit eaiser.
I may not have a full understanding of what you're trying to accomplish, but lets say you're trying to get it down to 1 row per employee, but each join is causing multiple rows per employee and grouping by employee name and other fields is still not unique enough to get it down to a single row, then you can try using ranking and partitioning and then select the rank you prefer for each employee partition.
See example : http://msdn.microsoft.com/en-us/library/ms176102.aspx