How do I put multiple criteria for a column in a where clause? - sql

I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?

So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work

Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.

Related

Is there some equivalent to subquery correlation when making a derived table?

I need to flatten out 2 rows in a vertical table (and then join to a third table) I generally do this by making a derived table for each field I need. There's only two fields, I figure this isn't that unreasonable.
But I know that the rows I want back in the derived table, are the subset that's in my join with my third table.
So I'm trying to figure out the best derived tables to make so that the query runs most efficiently.
I figure the more restrictive I make the derived table's where clause, the smaller the derived table will be, the better response I'll get.
Really what I want is to correlate the where clause of the derived table with the join with the 3rd table, but you can't do that in sql, which is too bad. But I'm no sql master, maybe there's some trick I don't know about.
The other option is just to make the derived table(s) with no where clause and it just ends up joining the entire table twice (once for each field), and when I do my join against them the join filters every thing out.
So really what I'm asking I guess is what's the best way to make a derived table where I know pretty much specifically what rows I want, but sql won't let me get at them.
An example:
table1
------
id tag value
-- ----- -----
1 first john
1 last smith
2 first sally
2 last smithers
table2
------
id occupation
-- ----------
1 carpenter
2 homemaker
select table2.occupation, firsttable.first, lasttable.last from
table2, (select value as first from table1 where tag = 'first') firsttable,
(select value as last from table1 where tag = 'last') lasttable
where table2.id = firsttable.id and table2.id = lasttable.id
What I want to do is make the firsttable where clause where tag='first' and id = table2.id
DERIVED tables are not to store the intermediate results as you expect. These are just a way to make code simpler. Using derived table doesnt mean that the derived table expression will be executed first and output of that will be used to join with remaining tables.Optimizer will automaticaly faltten derived tables in most of the cases.
However,There are cases where the optimizer might want to store the results of the subquery and thus materilize instead of flattening.It usually happens when you have some kind of aggregate functions or like that.But in your case the query is too simple and thus optimizer will flatten query
Also,storing derived table expression wont make your query fast it will in turn could make it worse.Your real problem is too much normalization.Fix that query will be just a join of two tables.
Why you have this kind of normalization?Why you are storing col values as rows.Try to denormalize table1 so that it has two columns first and last.That will be best solution for this.
Also, do you have proper indexes on id and tag column? if yes then a merge join is quite good for your query.
Please provide index details on these tables and the plan generated by your query.
Your query will be used like an inner join query.
select table2.occupation, first.valkue as first, last.value as last
from
table2
inner join table1 first
on first.tag = 'first'
and first.id =table2.id
inner join table1 last
on last.tag = 'last'
and table2.id = last.id
I think what you're asking for is a COMMON TABLE EXPRESSION. If your platform doesn't implement them, then a temporary table may be the best alternative.
I'm a little confused. Your query looks okay . . . although it looks better with proper join syntax.
select table2.occupation, firsttable.first, lasttable.last
from table2 join
(select value as first from table1 where tag = 'first') firsttable
on table2.id = firsttable.id join
(select value as last from table1 where tag = 'last') lasttable
on table2.id = lasttable.id
This query does what you are asking it to do. SQL is a declarative language, not a procedural language. This means that you describe the result set and rely on the database SQL compiler to turn it into the right set of commands. (That said, sometimes how a query is structured does make it easier or harder for some engines to produce efficient query plans.)

Why is it that the IsEqual (=) operator is working faster than the IsNotEqual (<>) operator in Oracle?

Like the title says, if anyone has the answer I would like to know. I've been googling but couldn't find a straight answer.
Example:
This works
SELECT COUNT(*) FROM Table1 TB1, Table2 TB2
WHERE TB1.Field1 = TB2.Table2
This seems to take hours
SELECT COUNT(*) FROM Table1 TB1, Table2 TB2
WHERE TB1.Field1 <> TB2.Table2
Because they are different SQL sentences. In the first one, you are joining two tables using Field1 and Table2 fields. Probably returning a few records.
In the second one, your query is probably returning a lot of records, since you are doing a cross join, and a lot of rows will satisfy your Field1 <> Table2 condition.
A very simplified example
Table1
Field1
------
1
2
5
9
Table2
Table2
------
3
4
5
6
9
Query1 will return 2 since only 5 and 9 are common.
Query2 will return 18 since a lot of rows from cross join will count.
If you have table with a lot of records, it will take a while to process your second query.
It's important to realize that SQL is a declarative language and not an imperative one. You describe what conditions you want your data to fit and not how those comparisons should be executed. It's the job of the database to find the fastest way to give you an answer (a task taken over by the query optimizer). This means that a seemingly small change in your query can result in a wildly different query plan, which in turn results in a wildly different runtime behaviour.
The = comparison can be converted to and optimized the same way as a simple join on the two fields. This means that normal indices can be used to execute the query very fast, probably without reading the actual data and using only the indices instead.
A <> comparison on the other hand requires a full cartesian product to be calculated and checked for the condition, usually (there might be a way to optimize this with the correct index, but usually an index won't help here). It will also usually return a lot more results, which adds to the execution time.
Probably, the second query processes way more rows than the first one.
(Thinking back to a similar question)
Are you trying to count the rows in Table1 for which there is no matching record in Table2?
If so you could use this
SELECT COUNT(*) FROM Table1 TB1
WHERE NOT EXISTS
(SELECT * FROM Table2 TB2
WHERE TB1.Field1 = TB2.Field2 )
or this for example
SELECT COUNT(*)
FROM
(
SELECT Field1 FROM Table1
MINUS
SELECT Field2 FROM Table2
) T

Execute MySQL update query on 750k rows

I've added a field to a MySQL table. I need to populate the new column with the value from another table. Here is the query that I'd like to run:
UPDATE table1 t1
SET t1.user_id =
(
SELECT t2.user_id
FROM table2 t2
WHERE t2.usr_id = t1.usr_id
)
I ran that query locally on 239K rows and it took about 10 minutes. Before I do that on the live environment I wanted to ask if what I am doing looks ok i.e. does 10 minutes sound reasonable. Or should I do it another way, a php loop? a better query?
Use an UPDATE JOIN! This will provide you a native inner join to update from, rather than run the subquery for every bloody row. It tends to be much faster.
update table1 t1
inner join table2 t2 on
t1.usr_id = t2.usr_id
set t1.user_id = t2.user_id
Ensure that you have an index on each of the usr_id columns, too. That will speed things up quite a bit.
If you have some rows that don't match up, and you want to set t1.user_id = null, you will need to do a left join in lieu of an inner join. If the column is null already, and you're just looking to update it to the values in t2, use an inner join, since it's faster.
I should make mention, for posterity, that this is MySQL syntax only. The other RDBMS's have different ways of doing an update join.
There are two rather important pieces of information missing:
What type of tables are they?
What indexes exist on them?
If table2 has an index that contains user_id and usr_id as the first two columns and table1 is indexed on user_id, it shouldn't be that bad.
You don't have an index on t2.usr_id.
Create this index and run your query again, or a multiple-table UPDATE proposed by #Eric (with LEFT JOIN, of course).
Note that MySQL lacks other JOIN methods than NESTED LOOPS, so it's index that matters, not the UPDATE syntax.
However, the multiple table UPDATE is more readable.

Sql Server query syntax

I need to perform a query like this:
SELECT *,
(SELECT Table1.Column
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 WHERE tmp = 1
I know I can take a workaround but I would like to know if this syntax is possible as it is (I think) in Mysql.
The query you posted won't work on sql server, because the sub query in your select clause could possibly return more than one row. I don't know how MySQL will treat it, but from what I'm reading MySQL will also yield an error if the sub query returns any duplicates. I do know that SQL Server won't even compile it.
The difference is that MySQL will at least attempt to run the query and if you're very lucky (Table2Id is unique in Table1) it will succeed. More probably is will return an error. SQL Server won't try to run it at all.
Here is a query that should run on either system, and won't cause an error if Table2Id is not unique in Table1. It will return "duplicate" rows in that case, where the only difference is the source of the Table1.Column value:
SELECT Table2.*, Table1.Column AS tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
Perhaps if you shared what you were trying to accomplish we could help you write a query that does it.
SELECT *
FROM (
SELECT t.*,
(
SELECT Table1.Column
FROM Table1
INNER JOIN
Table2
ON Table1.Table2Id = Table2.Id
) as tmp
FROM Table2 t
) q
WHERE tmp = 1
This is valid syntax, but it will fail (both in MySQL and in SQL Server) if the subquery returns more than 1 row
What exactly are you trying to do?
Please provide some sample data and desired resultset.
I agree with Joel's solution but I want to discuss why your query would be a bad idea to use (even though the syntax is essentially valid). This is a correlated subquery. The first issue with these is that they don't work if the subquery could possibly return more than one value for a record. The second and more critical problem (in my mind) is that they must work row by row rather than on the set of data. This means they will virtually always affect performance. So correlated subqueries should almost never be used in a production system. In this simple case, the join Joel showed is the correct solution.
If the subquery is more complicated, you may want to turn it into a derived table instead (this also fixes the more than one value associated to a record problem). While a derived table looks a lot like a correlated subquery to the uninitated, it does not perform the same way because it acts on the set of data rather than row-by row and thus will often be significantly faster. You are essentially making the query a table in the join.
Below is an example of your query re-written as a derived table. (Of course in production code you would not use select * either especially in a join, spell out the fields you need)
SELECT *
FROM Table2 t2
JOIN
(SELECT Table1.[Column], Table1.Table2Id as tmp
FROM Table1
INNER JOIN Table2 ON Table1.Table2Id = Table2.Id ) as t
ON t.Table2Id = Table2.Id
WHERE tmp = 1
You've already got a variety of answers, some of them more useful than others. But to answer your question directly:
No, SQL Server will not allow you to reference the column alias (defined in the select list) in the predicate (the WHERE clause). I think that is sufficient to answer the question you asked.
Additional details:
(this discussion goes beyond the original question you asked.)
As you noted, there are several workarounds available.
Most problematic with the query you posted (as others have already pointed out) is that we aren't guaranteed that the subquery in the SELECT list returns only one row. If it does return more than one row, SQL Server will throw a "too many rows" exception:
Subquery returned more than 1 value.
This is not permitted when the subquery
follows =, !=, , >= or when the
subquery is used as an expression.
For the following discussion, I'm going to assume that issue is already sufficiently addressed.
Sometimes, the easiest way to make the alias available in the predicate is to use an inline view.
SELECT v.*
FROM ( SELECT *
, (SELECT Table1.Column
FROM Table1
JOIN Table2 ON Table1.Table2Id = Table2.Id
WHERE Table1.Column = 1
) as tmp
FROM Table2
) v
WHERE v.tmp = 1
Note that SQL Server won't push the predicate for the outer query (WHERE v.tmp = 1) into the subquery in the inline view. So you need to push that in yourself, by including the WHERE Table1.Column = 1 predicate in the subquery, particularly if you're depending on that to make the subquery return only one value.
That's just one approach to working around the problem, there are others. I suspect that query plan for this SQL Server query is not going to be optimal, for performance, you probably want to go with a JOIN or an EXISTS predicate.
NOTE: I'm not an expert on using MySQL. I'm not all that familiar with MySQL support for subqueries. I do know (from painful experience) that subqueries weren't supported in MySQL 3.23, which made migrating an application from Oracle 8 to MySQL 3.23 particularly painful.
Oh and btw... of no interest to anyone in particular, the Teradata DBMS engine DOES have an extension that allows for the NAMED keyword in place of the AS keyword, and a NAMED expression CAN be referenced elsewhere in the QUERY, including the WHERE clause, the GROUP BY clause and the ORDER BY clause. Shuh-weeeet
That kind of syntax is basically valid (you need to move the where tmp=... to on outer "select * from (....)", though), although it's ambiguous since you have two sets named "Table2"- you should probably define aliases on at least one of your usages of that table to clear up the ambiguity.
Unless you intended that to return a column from table1 corresponding to columns in table2 ... in which case you might have wanted to simply join the tables?

Nested records is sql server

select Table1.colID, Table1.colName,
(select * from Table2 where Table2.colID = Table1.colID) as NestedRows
from Table1
The above query gives you this error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used.....
Can anybody explain why this limitation exist?
I had this idea that this kind of multidimentional queries would be nice for building OO objects directly from the database with 1 query
EDIT:
This question is pretty theoretical. To solve this practical I would use a join or simply done 2 queries, but I wondered if there was anything stopping you from returning a column as a table type (In sql server 2008 you can create table types).
Say you have corrensponding classes in code, think Linq2Sql
public class Table1
{
public int colID,
public string colName,
public List<Table2> table2s;
}
I would like to be able to fill instances of this class directly with 1 query
It appears as though you want a recordset (multiple columns and multiple rows) returned from Table2 for each row in table1. If this is correct, perhaps you could return the data as XML from the DB. Something like this...
select Table1.colID, Table1.colName, Table2.*
from Table1
Inner Join Table2
On Table1.ColId = Table2.ColId
Order By Table1.ColId
For XML Auto
Then, for each row in Table1, you'll get multiple sub-nodes in your XML for table2 data.
There's likely to be performance implications with returning XML from your database, as well as loading your data structure on the front end. I'm not necessarily suggesting this is the best approach, but it's probably worth investigating.
Because the subquery in a select clause must be "inserted" into a column value in every row of the result set from the outer query. You cannot put a set of values into a single cell (a single column of a single row) of the result set.
You need to use an inner join. the multiple rows returned by joined table will be output as multiple rows in the final result set.
You would be better off using an INNER JOIN between the two tables and simply selecting the rows you want from each table.
SELECT tab1.colID, tab1.colName, tab2.Column1, tab2.column2
FROM dbo.Table1 AS tab1
INNER JOIN dbo.Table2 AS tab2
ON tab1.colID = tab2.colID
However, remember that the data from table1 will be repeated for each matching record in table2. Although I believe the query I posted will get the data in the form you are looking for, I don't think it's the best method for querying the database. I would either execute separate queries, or put the separate queries into a stored procedure and return multiple result sets.
I think the query you're looking for is probably:
select Table1.colID, Table1.colName,Table2.*
from Table1 inner join Table2 ON Table1.colID = Table2.colID
Subqueries are more typically used (at least by me) in the WHERE clause.