SQL join on many queries - sql

I have a table with a structure such as table(PK, a, b, c, d, e, f, g).
And I have many queries that I want to join:
select PK, sum(c) where...
JOIN
select PK, sum(e) where...
JOIN
select PK, sum(g) where ...
JOIN
select PK,a,b,d,f
Every sum(c|e|g) is actually a
select sum(c|e|g) from .... where...
because there are many conditions involved (not all c|e|g's must be added)
Is this the best way to do it? I was suggested to write it in PL / SQL, which I would have to learn. If its the way to go, i'll do it, but im not sure whats wrong with the solution shown above.
Edit
Im pretty sure its a Join. Here's what I want.
I need to get a result set in the form:
PK, a,b,COMPLEX_SUM_ON_C,d,COMPLEX_SUM_ON_D,f,COMPLEX_DUM_ON_G
so I thought of joining many queries to get this result.
Each of the COMPLEX... is another select (select sum...). This is a very big table, and writing
select a,b,(select sum..),d,(select sum...),f,(select sum...)
will yield bad performance (so I was told to remove it)
I've edited my query above.
Thanks in advance.

I think you mean "UNION" not "JOIN". Whether is the best way depends on what you're trying to achieve.

This is not a well-defined problem (yet).
Assuming PK is your primary key (i.e. unique, by definition), then
SELECT PK, SUM(c)
FROM tbl
GROUP BY PK
is ALWAYS the same as
SELECT PK, c
FROM tbl
So grouping (and aggregating) is relatively meaningless.
In your expected results:
PK, a,b,COMPLEX_SUM_ON_C,d,COMPLEX_SUM_ON_D,f,COMPLEX_DUM_ON_G
How are COMPLEX_SUM_ON_C, COMPLEX_SUM_ON_D, COMPLEX_DUM_ON_G related to PK?
We know how a, b, d, f are related to PK, because for each PK, one can identify the one and only a, b, d, f on the same row.

An example of a JOIN is the following:
Select a.col1, b.col2
FROM table1 a, table2 b
WHERE a.key = b.key;
which can also be written:
SELECT a.col1, b.col2
FROM table1 a
INNER JOIN table2 b
ON a.key = b.key;
Edit:
After reading your re-edit of the original question, you can probably use a JOIN. JOINs can be used when you have related data in more than one table, or you can specifiy the same table multiple times. I have used both kinds with Oracle. Here's an example of the latter kind which will hopefully help you:
SELECT t1.a, t1.b, t3.sum(c), t2.d, t4.sum(e), t1.f, t5.sum(g)
FROM table1 t1, table1 t2, table1 t3, table1 t4, table1 t5
WHERE t1.a = 'hello'
AND t2.a = 'world'
AND t3.c = 10
AND t4.e = 20
AND t5.g = 100
GROUP BY t1.a, t1.b, t2.d, t1.f;

Related

Get rid off from matching record and add not equal data

I have following tables:
Table a:
Name
T1
T2
T3
T4
Table b:
Name
T1
T2
T3
T4
T5
T6
I need to select all from table a and add what is not in table a from table b, result below:
T1
T2
T3
T4
T5
T6
Thanks for help
If you want all unique names from both the tables, use UNION:
select name from table_a
union
select name from table_b;
Here is another way:
select ta.name from ta
union all
select tb.name from tb
left join ta
on tb.name = ta.name
where ta.name is null
I would do this with an anti-join (a NOT IN condition). As written below, it will not work correctly if NULL is possible in that column in table a (in that case, the anti-join should be written with a NOT EXISTS condition). I assume the column is NOT NULL.
An anti-join is faster than a join, because as soon as a value from table b is also found in a, the joining for that row of table b stops and processing moves on to the next row. In a join, the joining continues, there is no such short-circuiting.
Oto's solution uses a join rather than an anti-join. However, I believe the Oracle query optimizer recognizes, in this simple case, that an anti-join is sufficient, and it will rewrite the query to use an anti-join. This is something you can verify by running Explain Plan on both queries. With that said, in a similar but much more complicated problem, the optimizer may not be able to "see" this shortcut; this is why I believe it's best to write anti-joins (and semi-joins, where we use IN or EXISTS conditions) explicitly, rather than rely on the optimizer.
The query should be
select name from a
union all
select name from b where name not in ( select name from a );
Here's one way to do that:
Select distinct Name
from (
select Name from Table A
UNION ALL
select Name from Table B
)

Multiple reusable SQL queries

(Note I am getting an error submitting to stackoverflow if i use "select", so have misspelled my queries. [Now Fixed])
Sorry this is a newbie question. I have one very long SQL query that is getting harder to manage. In fact there are some sub-queries that are being used multiple times. What is the best way to break up the query? I would prefer to keep it in the database, rather than take it out into the calling program. It goes something like this.
Select A, B, C
from (select D from Table_1 where ...)
Union Select E, F
from Table_2
Inner Join (Select D, E, from Table_1 where...)..
So what I would like to do is
Result1 = select D,E from Table_1 where....
Result2 = Select A,B,C from Result_1 Union Select E,F from Table_2 Inner Join Result_1 ...
What is the best way to do this? I can't use Views because I don't have privileges. How can I use the results from the first query in the second query? Can cursors be used in this case?
Using a CTE you can access the same subquery multiple times (this is the main difference to Derived Tables):
with CTE as
(Select D, E, from Table_1 where...)
Select A, B, C
from CTE
Union
Select E, F
from Table_2
Inner Join CTE ..

Access Removing CERTAIN PARTS of Duplicates in Union Query

I'm working in Access 2007 and know nothing about SQL and very, very little VBA. I am trying to do a union query to join two tables, and delete the duplicates.
BUT, a lot of my duplicates have info in one entry that's not in the other. It's not a 100% exact duplicate.
Example,
Row 1: A, B, BLANK
Row 2: A, BLANK, C
I want it to MERGE both of these to end up as one row of A, B, C.
I found a similar question on here but I don't understand the answer at all. Any help would be greatly appreciated.
I would suggest a query like this:
select
coalesce(t1.a, t2.a) as a,
coalesce(t1.b, t2.b) as b,
coalesce(t1.c, t2.c) as c
from
table1 t1
inner join table2 t2 on t1.key = t2.key
Here, I have used the keyword coalesce. This will take the first non null value in a list of values. Also note that I have used key to indicate the column that is the same between the two rows. From your example it looks like A but I cannot be sure.
If your first table has all the key values, then you can do:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a;
If this isn't the case, you can use this rather arcane looking construct:
select t1.a, nz(t1.b, t2.b), nz(t1.c, t2.c) as c
from table1 as t1 left join
table2 as t2
on t1.a = t2.a
union all
select t2.a, t2.b, t2.c
from table2 as t2
where not exists (select 1 from table1 as t1 where t1.key = t2.key)
The first part of the union gets the rows where there is a key value in the first table. The second gets the rows where the key value is in the second but not the first.
Note this is much harder in Access than in other (dare I say "real") databases. MS Access doesn't support common table expressions (CTEs), unions in subqueries, or full outer join -- all of which would help simplify the query.

NOT IN operator issue Oracle

Here is my query:
Select a.* from Table1 a, Table2 b
Where
a.tid=b.tid and
b.createddate=(Select max(createddate) from Table2) and
a.tid not in (Select distinct tid from Table3);
The problem is I know this should return some valid output but it does not. The issue us with the last line in the a.tid not in (Select distinct tid from Table3); if I replace Select distinct tid from Table3 with hard coded values like ('T001','T002','T003','T004') then it works fine and returns data.
Whats wrong? Am I missing something? Please help.
Try this:
Select a.* from Table1 a, Table2 b
Where
a.tid=b.tid and
b.createddate=(Select max(createddate) from Table2) and
a.tid not in (Select tid from Table3 where tid is not null);
As all the people in the comments mentioned, if there is at least one row with a null value for tid in table3 you will get no rows returned. This is because to Oracle null is like saying "I don't know what this value is". Oracle can't say with certainty that the value you are searching for is definitely not in your sub-select because it doesn't know what this "not-known" value actually is. Also, the documentation says it works that way:
http://docs.oracle.com/cd/B28359_01/server.111/b28286/conditions013.htm
Another option would be to write the query as:
Select a.* from Table1 a, Table2 b
Where
a.tid=b.tid and
b.createddate=(Select max(createddate) from Table2) and
not exists (Select null from Table3 t3 where t3.tid = a.tid);
The handling of nulls is one of the major differences between not exists and not in.
Your query, slightly rewritten:
Select a.*
from Table1 a join
Table2 b
on a.tid=b.tid
where b.createddate=(Select max(createddate) from Table2) and
a.tid not in (Select distinct tid from Table3)
What this tells me is that the tid with the maximum create date from Table2 is in Table3.
To test this, get the maximum create date from table2. Then get all records in table1 that correspond to this max. You will find that these are also in table3.
If I had to speculate, you might want the max create date per table in Table2, rather than the overall max.
By the way, in Oracle (and most other databases) the distinct in the last subquery is redundant. The database should be smart enough to remove duplicates in this case.

When to use CTEs to encapsulate sub-results, and when to let the RDBMS worry about massive joins

This is a SQL theory question. I can provide an example, but I don't think it's needed to make my point. Anyone experienced with SQL will immediately know what I'm talking about.
Usually we use joins to minimize the number of records due to matching the left and right rows. However, under certain conditions, joining tables cause a multiplication of results where the result is all permutations of the left and right records.
I have a database which has 3 or 4 such joins. This turns what would be a few records into a multitude. My concern is that the tables will be large in production, so the number of these joined rows will be immense. Further, heavy math is performed on each row, and the idea of performing math on duplicate rows is enough to make anyone shudder.
I have two questions. The first is, is this something I should care about, or will SQL Server intelligently realize these rows are all duplicates and optimize all processing accordingly?
The second is, is there any advantage to grouping each part of the query so as to get only the distinct values going into the next part of the query, using something like:
WITH t1 AS (
SELECT DISTINCT... [or GROUP BY]
),
t2 AS (
SELECT DISTINCT...
),
t3 AS (
SELECT DISTINCT...
)
SELECT...
I have often seen the use of DISTINCT applied to subqueries. There is obviously a reason for doing this. However, I'm talking about something a little different and perhaps more subtle and tricky.
Are you talking about a query like this?
You can see in the plan that SQL Server does the computation on the small number of rows pre join rather than the large number post join.
CREATE TABLE #BigTable
(
n INT PRIMARY KEY
);
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1), --2
E02(N) AS (SELECT 1 FROM E00 a, E00 b), --4
E04(N) AS (SELECT 1 FROM E02 a, E02 b), --16
E08(N) AS (SELECT 1 FROM E04 a, E04 b), --256
E16(N) AS (SELECT 1 FROM E08 a, E08 b) --65,536
INSERT INTO #BigTable
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM E16
CREATE TABLE #SmallTable
(
n INT PRIMARY KEY
);
insert into #SmallTable select top 20 * from #BigTable ORDER BY n
SELECT SIN(COS(LOG(#SmallTable.n)))
FROM #SmallTable join #BigTable on #BigTable.n > #SmallTable.n
I'm not quite sure of the question, to be honest...
There is no difference between a CTE and a derived table. The CTE is just a macro.
WITH
t1 AS (SELECT DISTINCT... [or GROUP BY]),
t2 AS (SELECT DISTINCT...)
SELECT * FROM t1 JOIN t2 ON ...
is the same as
SELECT
*
FROM
(SELECT DISTINCT... [or GROUP BY]) t1
JOIN
(SELECT DISTINCT...) t2 ON ...
Where you can have issues is associativity of tables
FROM
t1
LEFT JOIN
t2 ON t1. = t2.
JOIN
t3 ON t2. = t3.
can be different to
FROM
t1
LEFT JOIN
(
SELECT *
FROM
t2
JOIN
t3 ON t2. = t3.
) Td ON t1. = Td.
However, if you need DISTINCTs in line, then it could be "why are you using EXISTS" or "why do you ave cartesian joins"