Few doubts on writing SQL queries

Few doubts on writing SQL queries - sql

Few questions i have here
I always see some SQL written like below (not sure if i get it right)
SELECT a.column_1, a.column_2 FROM table_name WHERE b.column_a = 'some value'
i don't quite understand the SQL written in such way. Is it similar to using object in programming, where you can define an object and variables within the object? If it is, where is the definition of a and b for the SQL above (assuming i got the query right)?
I want to make comparisons between 3 columns (say C1 C2 C3) in 3 different tables, say T1 T2 and T3. The condition is to get the values from the C1 in T1, that exists in C2 in T2, but not exists in C3 in T3. Both columns are practically the same, just that some might different or lesser records than the other columns in the other 2 tables, and i want to know what the differences are. Is the query below the right way to do it?
select distinct C1 from T1
and (C1) not in (select C2 from T2)
and (C1) in (select C3 from T3)
order by C1;
And is it possible to extend the condition if i want to include more tables into comparison using the query above?
If i were to customize the query above into something similar to the first question, is the query below the right way to do it?
select a.C1 from T1 a
and (a.C1) not in (select b.C2 from T2 b)
and (a.C1) in (select c.C3 from T3 c)
order by a.C1;
What are the advantanges of writing query in object way (like above), compared to writing it in traditional way? I feel like even if you define a table name as a variable, the variable only can be used within the query where it is defined, and cannot be extended to the other queries.
Thanks

the first point is a and b are "table aliases" (shortcut reference to the table(s) involved in THAT query) e.g.
SELECT a.column_1, a.column_2
FROM table_name_a a ------------------------------- table alias a defined here
INNER JOIN table_name_b b -------------------------- table alias b defined here
ON a.id = b.id
WHERE b.column_a = 'some value'
Your second query has a syntax issue: You need WHERE as shown in uppercase. It also has and performance implications. Distinct adds effort to a query, using IN() is really a syntax shortcut for a series of ORs (it might not scale well). But with the syntax it is valid.
select distinct C1
from T1
WHERE (C1) not in (select C2 from T2)
and (C1) in (select C3 from T3)
order by C1;
Yes (with performance reservations) you could add more tables into that comparison.
You introduce table aliases, done correctly, into your third query - but there is no real advantage in that query structure. Aside from just making code more convenient, aliases serve to distinguish between items that would be ambiguous. In my first query above ON a.id = b.id shows possible ambiguity in that 2 tables both have a field of the same name. Prefixing the field name by a table or table alias solves that ambiguity.

For your first point.
I always see some SQL written like below (not sure if i get it right)
SELECT a.column_1, a.column_2 FROM table_name WHERE b.column_a = 'some value'
This query is wrong. It should be like this -
SELECT a.column_1, a.column_2
FROM table_name a INNER JOIN --(There might be another join also like left join etc..)
table_name b
ON a.id = b.id WHERE b.column_a = 'some value'
so you noted in the above query that a and b are just table alias. Well, there are some cases you must use them, like when you need to join to the same table twice in one query.
For the second point. you can also do it like this
SELECT DISTINCT C1 FROM T1 t1
WHERE NOT EXISTS (
SELECT C2 FROM T2 t2 where t2.C2 = t1.C1)
AND WHERE EXISTS (
SELECT C3 FROM T3 t3 where t3.C3 = t1.C1)
ORDER BY C1;
Personally I prefer aliases, and unless I have a lot of tables they tend to be single letter ones.

I'm not 100% sure on this one so don't quote me, but ill give it my best shot.
I think that when alias' are used it is because if you don't use them your statement can he huge and hard to understand, here is a comparison with two SQL query's, one using alias' and the other without:
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID;
Without:
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName
FROM Customers, Orders
WHERE Customers.CustomerName="Around the Horn" AND Customers.CustomerID=Orders.CustomerID;
As you can see one looks much easier to understand and read than the other and makes your query's much smaller.
Aliases can be useful when:
There are more than one table involved in a query
Functions are used in the query
Column names are big or not very readable
Two or more columns are combined together
http://www.w3schools.com/sql/sql_alias.asp

You have to define the table aliases in the FROM clause, so:
SELECT a.column_1, b.column_1
FROM table1 a, table2 b
WHERE a.id = b.id
It should be possible to make your comparisons in the way you have written your query above, however it is also possible to make aliasing easier by using subqueries in the FROM clause ie:
SELECT tab1.id, tab2.id, tab3.id
FROM table1 as tab1,
(select * from table2)as tab2,
(select * from table3)as tab3
This way you can choose any columns from any of the tables using tab1.xxx etc. and then use the WHERE clause to say NOT IN tab2.column_1 etc...

Related

How to efficiently retrieve data in one to many relationships

I am running into an issue where I have a need to run a Query which should get some rows from a main table, and have an indicator if the key of the main table exists in a subtable (relation one to many).
The query might be something like this:
select a.index, (select count(1) from second_table b where a.index = b.index)
from first_table a;
This way I would get the result I want (0 = no depending records in second_table, else there are), but I'm running a subquery for each record I get from the database. I need to get such an indicator for at least three similar tables, and the main query is already some inner join between at least two tables...
My question is if there is some really efficient way to handle this. I have thought of keeping record in a new column the "first_table", but the dbadmin don't allow triggers and keeping track of it by code is too risky.
What would be a nice approach to solve this?
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list. If no row in the second table exists, I won't turn on this indicator.
To search for all rows in first_table which have at least one row in second_table, or which don't have rows in the second table.
Another option I just found:
select a.index, b.index
from first_table a
left join (select distinct(index) as index from second_table) b on a.index = b.index
This way I will get null for b.index if it doesn' exist (display can finally be adapted, I'm concerned on query performance here).
The final objective of this question is to find a proper design approach for this kind of case. It happens often, a real application culd be a POS system to show all clients and have one icon in the list as an indicator wether the client has open orders.

Try using EXISTS, I suppose, for such case it might be better then joining tables. On my oracle db it's giving slightly better execution time then the sample query, but this may be db-specific.
SELECT first_table.ID, CASE WHEN EXISTS (SELECT * FROM second_table WHERE first_table.ID = second_table.ID) THEN 1 ELSE 0 END FROM first_table

why not try this one
select a.index,count(b.[table id])
from first_table a
left join second_table b
on a.index = b.index
group by a.index

Two ideas: one that doesn't involve changing your tables and one that does. First the one that uses your existing tables:
SELECT
a.index,
b.index IS NOT NULL,
c.index IS NOT NULL
FROM
a_table a
LEFT JOIN
b_table b ON b.index = a.index
LEFT JOIN
c_table c ON c.index = a.index
GROUP BY
a.index, b.index, c.index
Worth noting that this query (and likely any that resemble it) will be greatly helped if b_table.index and c_table.index are either primary keys or are otherwise indexed.
Now the other idea. If you can, instead of inserting a row into b_table or c_table to indicate something about the corresponding row in a_table, indicate it directly on the a_table row. Add exists_in_b_table and exists_in_c_table columns to a_table. Whenever you insert a row into b_table, set a_table.exists_in_b_table = true for the corresponding row in a_table. Deletes are more work since in order to update the a_table row you have to check if there are any rows in b_table other than the one you just deleted with the same index. If deletes are infrequent, though, this could be acceptable.

Or you can avoid join altogether.
WITH comb AS (
SELECT index
, 'N' as exist_ind
FROM first_table
UNION ALL
SELECT DISTINCT
index
, 'Y' as exist_ind
FROM second_table
)
SELECT index
, MAX(exist_ind) exist_ind
FROM comb
GROUP BY index

The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list.
To search for all rows in first_table which have at least one row in second_table.
Here you go:
SELECT a.index, 1 as c_check -- 1: at least one row in second_table exists for a given row in first_table
FROM first_table a
WHERE EXISTS
(
SELECT 1
FROM second_table b
WHERE a.index = b.index
);

I am assuming that you can't change the table definitions, e.g. partitioning the columns.
Now, to get a good performance you need to take into account other tables which are getting joined to your main table.
It all depends on data demographics.
If the other joins will collapse the rows by high factor, you should consider doing a join between your first table and second table. This will allow the optimizer to pick best join order , i.e, first joining with other tables then the resulting rows joined with your second table gaining the performance.
Otherwise, you can take subquery approach (I'll suggest using exists, may be Mikhail's solution).
Also, you may consider creating a temporary table, if you need such queries more than once in same session.

I am not expert in using case, but will recommend the join...
that works even if you are using three tables or more..
SELECT t1.ID,t2.name, t3.date
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 ON t1.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON t2.ID = t3.ID
--WHERE t1.ID = #ProductID -- this is optional condition, if want specific ID details..
this will help you fetch the data from Normalized(BCNF) tables.. as they always categorize data with type of nature in separate tables..
I hope this will do...

Select rows where particular records are removed based on another table

I have the following table A:
Cell1 Time1
a1 t1
a1 t2
And another table B:
Cell2 Time2 SomeColumn2
a1 t1 c1
a1 t3 c2
I want values from table B where (Cell2,Time2) combination is not in Table A as (Cell1,Time1).
e.g in the given case the output would be :
Cell2 Time2 SomeColumn2
a1 t3 c2
What would be the solution in TSQL or ms-access sql ?

SELECT b.*
FROM TableB b
LEFT JOIN TableA a ON b.Cell2=a.Cell1 AND b.Time2=A.Time1
WHERE a.Cell1 IS NULL

select *
from TableB as TB
where not exists(select *
from TableA as TA
where TB.Cell2 = TA.Cell1 and
TB.Time2 = TA.Time1)
Not tested in Microsoft Access.

The relational operator you require is antijoin.
SQL lacks an explicit antijoin operator or JOIN type or keyword; the same applies to Access (ACE, Jet, whatever). Note the truly relational language Tutorial D uses NOT MATCHING for its antijoin operator, for example.
An antijoin can of course be written using other SQL operators. The most commonly seen use EXISTS or IN (subquery). Depending on the data, it may be possible to use an outer join with a restriction on a key from the outer table testing for null.
Personally, I prefer to use EXISTS in SQL for antijoin because the join clauses are closer together in the written code and doesn't result in projection over the joined table. This is the approach used in #Mikael Eriksson's answer.

Oracle rename columns from select automatically?

I have 2 tables with the following fields.
Table1
AA
BB
CC
DD
Table2
AA
CC
EE
Query
Select t1.*,
t2.*
from table1 t1,
join table2 t2 on table1.DD = table2.EE
My data columns back with the following column names:
AA, BB, CC, DD, **AA_1**, **CC_1**, EE
I don't want the column names like that. I want them to have the table name prefixed in the names of common (or all columns). I could fix this with:
select t1.AA as t1_AA, t1.BB as t1_BB, t1.CC as t1_CC, t1.DD as t1_DD,
t2.AA as t2_AA, t2.CC as t2_CC, t2.EE as t2_EEE
from table1 t1,
inner join table2 t2
on table1.DD = table2.EE
But that means every select everywhere becomes 500 lines longer. Is there a magic way to do this in oracle? Basically I want to write my code like
select t1.* as t1_*, t2.* as t2_*
from table1 t1,
inner join table2 t2
on table1.DD = table2.EE
But of course that is not valid SQL

In Oracle SELECT syntax, there is currently no way to assign column aliases to multiple columns based on some expression. You have to assign an alias to each individual column.

Is there a magic way to do this in oracle?
Not that I'm aware of. Your options amount to:
Address the column naming scheme - you'd need to use ALTER TABLE statements like:
ALTER TABLE table_name
RENAME COLUMN old_name to new_name;
Use column aliases
You could use views to save on the work & effort of defining column aliases, but it's not a recommended practice because of the bad performance when layering views on top of one another.

Is creating a view an option?
What is the software you're using that does this to you? I don't see this behavior in SQL*Plus or PL/SQL Developer in 10g. PL/SQL won't let you build a cursor with this ambiguity in it.

Try this
select t1.AA "t1_AA", t2.AA "t2.AA"
from table1 t1,
inner join table2 t2
on table1.DD = table2.EE
As he said before, you need to do it per column

I Need some sort of Conditional Join

Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!

LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE

the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.

I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.

Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.

The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...

Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num

SQL (any) Request for insight on a query optimization

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).
I want to know if there is any gain from the following, and more pressing, will it even give the desired results.
select a.* from a where a.id in (select id from b where b.id = a.id)
as an alternative to:
select a.* from a where a.id in (select id from b)
Update:
MySQL
Can't be more specific sorry
table a is effectively a join between 7 different tables.
use of * is for examples
Edit, b doesn't get selected

Your question was about the difference between these two:
select a.* from a where a.id in (select id from b where b.id = a.id)
select a.* from a where a.id in (select id from b)
The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.
The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.
I would use the latter.

Both queries you list are the equivalent of:
select a.*
from a
inner join b on b.id = a.id
Almost all optimizers will execute them in the same way.
You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.

YMMV, but I've often found using EXISTS instead of IN makes queries run faster.
SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
Of course, without seeing the rest of the query and the context, this may not make the query any faster.
JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.

I would never use a subquery like this. A join would be much faster.
select a.*
from a
join b on a.id = b.id
Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.

Have you looked at the execution plan?
How about
select a.*
from a
inner join b
on a.id = b.id
presumably the id fields are primary keys?

Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID
I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)
Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).

your problem is most likely in the seven tables within "a"
make the FROM table contain the "a.id"
make the next join: inner join b on a.id = b.id
then join in the other six tables.
you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas