Joining tables based on the maximum id - sql

I have found three questions which all seem to ask a similar question:
Getting max value from rows and joining to another table
Select only rows by join tables max value
Joining tables based on the maximum value
But I'm having a hard time wrapping my head around how exactly to join tables keeping only the maximum row of one of the tables when the maximum is in the id or index field itself.
I am looking for an answers that only require joins because this will allow the solution to work in a tool which generates queries for which it is easy to get it to generate the corresponding joins, although sub-queries are probably doable as well with a bit more effort. I found the answer below to be of particular interest:
SELECT DISTINCT b.id, b.filename, a1.name
FROM a a1
JOIN b
ON b.id = a1.id
LEFT JOIN a a2
ON a2.id = a1.id
AND a2.rank > a1.rank
WHERE a2.id IS NULL
However, in my case the ranking column is also the index, e.g. "id". I cannot compare for equality and greater than at the same time, because they will never be true at the same time!
Also, potentially complicating the situation is that a typical query in which I have need of this may join several tables (3-5 is not uncommon). So as a simplified example of my query:
SELECT
table1.field1, table1.field2, table1.field3,
table2.field1, table2.field2, table2.field3,
table3.field1, table3.field2, table3.field3,
table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
table1.field1 = table2.field1
AND table1.field2 = table2.field2
AND table2.field3 < 0
INNER JOIN table3 ON
table2.field1 = table3.field1
AND table2.field4 = table3.field4
INNER JOIN table4 ON
table1.field1 = table4.field1
AND table1.field2 = table4.field2
And what I want to do is to eliminate duplicates in table3 by only getting the row with the maximum id (e.g. MAX(table3.id)) for each unique combination of all the other fields. That is to say, the above query is returning something like this:
+-------+-------+-------+---------+
| table1| table2| table4|table3 |
+-------+-------+-------+---------+
| A | A | A | 1,... |
| A | A | A | 2,... |
| A | A | A | 3,... |
| A | A | A | MAX2,...|
| B | B | B | 1,... |
| B | B | B | 2,... |
| B | B | B | 3,... |
| B | B | B | MAX2,...|
+-------+-------+-------+---------+
(I'm just using A and B to denote that I'm talking about all the same values for the fields in table1, table2, and table4 for a particular set of rows.)
and I want to reduce it to this:
+-------+-------+-------+---------+
| table1| table2| table4|table3 |
+-------+-------+-------+---------+
| A | A | A | MAX1,...|
| B | B | B | MAX2,...|
+-------+-------+-------+---------+

You can add a derived table to reduce the matching rows in TABLE3 to one per group. Another method would use a window function but you asked for a JOIN only
SELECT
table1.field1, table1.field2, table1.field3,
table2.field1, table2.field2, table2.field3,
table3.field1, table3.field2, table3.field3,
table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
table1.field1 = table2.field1
AND table1.field2 = table2.field2
AND table2.field3 < 0
INNER JOIN table3 ON
table2.field1 = table3.field1
AND table2.field4 = table3.field4
--here is the added derived table. Change column names as needed
INNER JOIN (select UID, ID = max(ID) from Table3 group by UID) x
on x.UID = table3.UID and x.mx = table3.ID
INNER JOIN table4 ON
table1.field1 = table4.field1
AND table1.field2 = table4.field2
Or, perhaps... something like below. It really depends on your schema and that's hard to understand with the sample data.
INNER JOIN (select field1, field4, mx = max(ID) from Table3 group by field1, field4) x
on x.field1 = table3.field1 and x.field4 = table3.field4 and x.mx = table3.ID
Here is an example. You'll notice that the last three column pairs are identical. You only want the last one, which is the max(id) for that grouping. What ever makes a row unique in relation to the rest of your data (not your primary key, but what you are joining with) is what you'd want to include int he derived table and join condition.
declare #table table (id int identity(1,1), f1 char(1), f2 char(1))
insert into #table
values
('a','b'),
('a','c'),
('a','a'),
('b','b'),
('b','b'),
('b','b')
select * from #table
select t1.*
from #table t1
inner join
(select f1, f2, mx = max(id) from #table group by f1, f2) t2 on
t1.f1 = t2.f1
and t1.f2 = t2.f2
and t1.id = t2.mx

Related

How to create a table with multiple calculations?

If I do one calculation with one join:
SELECT
SUM(friends_made) as calc1, table2.group_id
FROM
friends_made_table as table1
INNER JOIN
grouped_users as table2 ON table1.user_id = table2.user_id
GROUP BY
table2.group_id
The result I get is:
calc1 | group_id
-----------------
400 | 1
320 | 2
330 | 3
But I also need another calculation (calc2) with the same inner join on table1 but with a different table (table3)
SELECT
SUM(request_accept) AS calc2, table1.group_id
FROM
friends_accept_table AS table3
INNER JOIN
grouped_users as table1 ON table1.user_id = table3.user_id
GROUP BY
table1.group_id
Result is:
calc2 | group_id
-----------------
100 | 1
150 | 2
120 | 3
How can I join these two queries and create a new table showing both of the calculations (calc1, calc2)?
calc1 |calc2 | group_id
-----------------------
400 | 100 | 1
320 | 150. | 2
330 | 120. | 3
EDITED to show tables/results and take out rounding
A join will suffice as long as there is a common set of group_ids across the two results. You may otherwise need a left/right join or full join.
with data1 as (
SELECT SUM(friends_made) as calc1, table2.group_id
FROM friends_made_table as table1 INNER JOIN grouped_users as table2
ON table1.user_id = table2.user_id
GROUP BY table2.group_id
), data2 as (
SELECT SUM(request_accept) as calc2, table1.group_id
FROM friends_accept_table as table3 INNER JOIN grouped_users as table1
ON table1.user_id = table3.user_id
GROUP BY table1.group_id
)
select calc1, calc2, d1.group_id
from data1 d1 inner join data2 d2 on d2.group_id = d1.group_id;
This does assume that your platform supports CTE syntax. If it doesn't there are probably similar rewrites.

Find values where related must have list of values

I'm trying to find a simple solution for my SQL Server problem.
I have two tables look like this:
table1
--id
-- data
table2
--id
--table1_id
--value
I have some records like this:
Table1
+-----------------------+
| id | data |
+-----------------------+
| 1 | ? |
+-----------------------+
| 2 | ? |
+-----------------------+
Table2
+-----------------------+
|id | table1_id | value |
+-----------------------+
| 1 | 1 | 'a' |
+-----------------------+
| 2 | 1 | 'b' |
+-----------------------+
| 3 | 2 | 'a' |
+-----------------------+
Now I want to get table1 with all it's additional values where the relation to table2 has 'a' AND 'b' as values.
So I would get the id 1 of table1.
Currently I have an query like this:
SELECT t1.[id], t1.[data]
FROM [table1] t1,
(SELECT [id]
FROM [table1] t1
JOIN [table2] t2 ON t1.[id] = t2.[table1_id] AND t2.[Value] IN('a', 'b')
GROUP BY t1[id]
HAVING COUNT(t2.[Value]) = 2) x
WHERE t1.id = x.id
Has anyone an idea on how to achieve my goal in a simpler way?
One way uses exists:
select t1.*
from table1 t1
where exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'a'
) and
exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'b'
);
This can take advantage of an index on table2(table1_id, value).
You could also write:
select t1.*
from table1 t1
where (select count(distinct t2.value)
from table2 t2
where t2.table1_id = t1.id and t2.value in ('a', 'b')
) = 2 ;
This would probably also have very good performance with the index, if table2 doesn't have duplicates.
SELECT T1.[id], T1.[data]
FROM table1 AS T1
JOIN table2 AS T2
ON T1.[id]=T2.[table1_id]
JOIN table2 AS T3
ON T1.[id]=T3.[table1_id]
WHERE
T2.[Value] ='a'
AND T3.[Value] = 'b'
As Gordon Linoff suggested, exists clause usage works as well and could be performance efficient depending on the data you are playing with.
you have to do several steps to solve the problem:
established which records are related to table 1 and table 2 and which of these are of value (A or B) and eliminate the repeated ones with the group by(InfoRelationate )
validate that only those related to a and b were allowed by means of a count in the table above (ValidateAYB)
see what data meets the condition of table1 and table 2 and joined table 1
this query meets the conditions
with InfoRelationate as
(
select Table2.table1_id,value
from Table2 inner join
Table1 on Table2.table1_id=Table1.id and Table2.value IN('a', 'b')
group by Table2.table1_id,value
),
ValidateAYB as
(
select InfoRelationate.table1_id
from InfoRelationate
group by InfoRelationate.table1_id
having count (1)=2
)
select InfoRelationate.table1_id,InfoRelationate.value
from InfoRelationate
inner join ValidateAYB on InfoRelationate.table1_id=ValidateAYB.table1_id
union all
select id,data
from Table1
Example code

What is the correct way from performance perspective to match(replace) every value in every row in temp table using SQL Server 2016 or 2017?

I am wondering what should I use in SQL Server 2016 or 2017 (CTE, LOOP, JOINS, CURSOR, REPLACE, etc) to match (replace) every value in every row in temp table? What is the best solution from performance perspective?
Source Table
|id |id2|
| 1 | 2 |
| 2 | 1 |
| 1 | 1 |
| 2 | 2 |
Mapping Table
|id |newid|
| 1 | 3 |
| 2 | 4 |
Expected result
|id |id2|
| 3 | 4 |
| 4 | 3 |
| 3 | 3 |
| 4 | 4 |
You may join the second table to the first table twice:
WITH cte AS (
SELECT
t1.id AS id_old,
t1.id2 AS id2_old,
t2a.newid AS id_new,
t2b.newid AS id2_new
FROM table1 t1
LEFT JOIN table2 t2a
ON t1.id = t2a.id
LEFT JOIN table2 t2b
ON t1.id2 = t2b.id
)
UPDATE cte
SET
id_old = id_new,
id2_old = id2_new;
Demo
Not sure if you want just a select here, or maybe an update, or an insert into another table. In any case, the core logic I gave above should work for all these cases.
You'd need to apply joins on update query. Something like this:
Update tblA set column1 = 'something', column2 = 'something'
from actualName tblA
inner join MappingTable tblB
on tblA.ID = tblB.ID
this query will compare eachrow with ids and if matched then it will update/replace the value of the column as you desire. :)
Do the self join only
SELECT t1.id2 as id, t2.id2
FROM table1 t
INNER JOIN table2 t1 on t1.id = t.id
INNER JOIN table2 t2 on t2.id = t.id2
This may have best performance from solutions posted here if you have indexes set appropriately:
select (select [newid] from MappingTable where id = [ST].[id]) [id],
(select [newid] from MappingTable where id = [ST].[id2]) [id2]
from SourecTable [ST]

Using the same table alias twice in a query

My coworker, who is new to ANSI join syntax, recently wrote a query like this:
SELECT count(*)
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c);
Note that table3 is joined to both table1 and table2 on different columns, but the two JOIN clauses use the same table alias for table3.
The query runs, but I'm unsure of it's validity. Is this a valid way of writing this query?
I thought the join should be like this:
SELECT count(*)
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c);
Are the two versions functionally identical? I don't really have enough data in our database yet to be sure.
Thanks.
The first query is a join of 4 tables, the second one is a join of 3 tables. So I don't expect that both queries return the same numbers of rows.
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c);
The alias t3 is only used in the ON clause. The alias t3 refers to the table before the ON keyword. I found this out by experimenting. So the pervious query is equvivalent to
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t4 ON
(t4.col_c = t1.col_c);
and this can be transfotmed in a traditional join
SELECT *
FROM table1 t1,
table2 t2,
table3 t3,
table3 t4
where (t1.col_a = t2.col_a)
and (t2.col_b = t3.col_b)
and (t4.col_c = t1.col_c);
The second query is
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c);
This can also transformed in a traditional join
SELECT *
FROM table1 t1,
table2 t2,
table3 t3
where (t1.col_a = t2.col_a)
and (t2.col_b = t3.col_b)
AND (t3.col_c = t1.col_c);
These queries seem to be different. To proof their difference we use the following example:
create table table1(
col_a number,
col_c number
);
create table table2(
col_a number,
col_b number
);
create table table3(
col_b number,
col_c number
);
insert into table1(col_a, col_c) values(1,3);
insert into table1(col_a, col_c) values(4,3);
insert into table2(col_a, col_b) values(1,2);
insert into table2(col_a, col_b) values(4,2);
insert into table3(col_b, col_c) values(2,3);
insert into table3(col_b, col_c) values(2,5);
insert into table3(col_b, col_c) values(7,9);
commit;
We get the following output
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b)
JOIN table3 t3 ON
(t3.col_c = t1.col_c)
| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|-------|-------|
| 1 | 3 | 1 | 2 | 2 | 3 | 2 | 3 |
| 4 | 3 | 4 | 2 | 2 | 3 | 2 | 3 |
| 1 | 3 | 1 | 2 | 2 | 5 | 2 | 3 |
| 4 | 3 | 4 | 2 | 2 | 5 | 2 | 3 |
SELECT *
FROM table1 t1
JOIN table2 t2 ON
(t1.col_a = t2.col_a)
JOIN table3 t3 ON
(t2.col_b = t3.col_b AND
t3.col_c = t1.col_c)
| COL_A | COL_C | COL_A | COL_B | COL_B | COL_C |
|-------|-------|-------|-------|-------|-------|
| 4 | 3 | 4 | 2 | 2 | 3 |
| 1 | 3 | 1 | 2 | 2 | 3 |
The number of rows retrieved is different and so count(*) is different.
The usage of the aliases was surprising. at least for me.
The following query works because t1 in the where_clause references table2.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0;
The following query works because t1 in the where_clause references table1.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_c<0;
The following query raises an error because both table1 and table2 contain a column col_a.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_a<0;
The error thrown is
ORA-00918: column ambiguously defined
The following query works, the alias t1 refers to two different tables in the same where_clause.
select *
from table1 t1 join table2 t1 on(1=1)
where t1.col_b<0 and t1.col_c<0;
These and more examples can be found here: http://sqlfiddle.com/#!4/84feb/12
The smallest counter example
The smallest counter example is
table1
col_a col_c
1 2
table2
col_a col_b
1 3
table3
col_b col_c
3 5
6 2
Here the second query has an empty result set and the first query returns one row. It can be shown that the count(*) of the second query never exeeds the count(*)of the first query.
A more detailed explanation
This behaviour will became more clear if we analyze the following statement in detail.
SELECT t.col_b, t.col_c
FROM table1 t
JOIN table2 t ON
(t.col_b = t.col_c) ;
Here is the reduced syntax for this query in Backus–Naur form derived from the syntax descriptions in the SQL Language Reference of Oracle 12.2. Note that under each syntax diagram there is a link to the Backus–Naur form of this diagram, e.g Description of the illustration select.eps. "reduced" means that I left out all the possibilities that where not used, e,g. the select is defined as
select::=subquery [ for_update_clause ] ;
Our query does not use the optional for_update_clause, so I reduced the rule to
select::=subquery
The only exemption is the optional where-clause. I didn't remove it so that this reduced rules can be used to analyze the above query even if we add a where_clause.
These reduced rule will define only a subset of all possible select statements.
select::=subquery
subquery::=query_block
query_block::=SELECT select_list FROM join_clause [ where_clause ]
join_clause::=table_reference inner_cross_join_clause ...
table_reference::=query_table_expression t_alias query_table_expression::=table
inner_cross_join_clause::=JOIN table_reference ON condition
So our select statement is a query_block and the join_clause is of type
table_reference inner_cross_join_clause
where table_reference is table1 t and inner_cross_join_clause is JOIN table2 t ON (t.col_b = t.col_c). The ellipsis ... means that there could be additional inner_cross_join_clauses, but we do not need this here.
in the inner_cross_join_clause the alias t refers to table2. Only if these references cannot be satisfied the aliasmust be searched in an outer scope. So all the following expressions in the ONcondition are valid:
t.col_b = t.col_c
Here t.col_b is table2.col_b because t refers to the alias of its inner_cross_join_clause, t.col_c is table1.col_c. t of the inner_cross_join_clause (refering to table2) has no column col_c so the outer scope will be searched and an appropriate alias will be found.
If we have the clause
t.col_a = t.col_a
the alias can be found as alias defined in the inner_cross_join_clause to which this ON-condition belongs so t will be resolved to table2.
if the select list consists of
t.col_c, t.col_b, t.col_a
instead of * then the join_clause will be searched for an alias and t.col_c will be resolved to table1.col_c (table2 does not contain a column col_c), t.col_b will be resolved to table2.col_b (table1 does not contain a col_b) but t.col_a will raise the error
ORA-00918: column ambiguously defined
because for the select_list none of the aias definition has a precedenve over the other. If our query also has a where_clause then the aliases are resolved in the same way as if they are used in the select_list.
With more data, it will produce different results.
Your colleagues query is same as this.
select * from table3 where t3.col_b = 'XX'
union
select * from table3 where t3.col_c = 'YY'
or
select * from table3 where t3.col_b = 'XX' or t3.col_c = 'YY'
while your query is like this.
select * from table3 where t3.col_b ='XX' and t3.col_c='YY'
First one is like data where (xx or yy) while second one is data where ( xx and yy)

SQL Join Not Exists

I have the following between these two tables, id column and case column
Left Table Right Table
| id1 | case#1 | | case#1 | id1 |
| id2 | case#1 | | case#1 | id3 |
| id3 | case#2 | | case#2 | id1 |
As you can see case#1 is not assigned to id2 on the right table, therefore I would like to pull the (id2, case#1) record from the left table.
You would need to perform an OUTER JOIN with a multiple JOIN condition:
left_table as l right outer join right_table as r
on l.id = r.id and l.case_ = r.case_
where l.id IS NULL;
This will return non-matching rows from the left table
Have you tried a left join?
LEFT OUTER JOIN table ON table1.field1 = table2.field2
Depends on what you want. If you do INNER JOIN, you get the rows that match in Table1 AND Table2. If you do OUTER JOIN, you get all of Table1 and only the matching rows from Table2.
I've used LEFT JOIN and selected the record with NULL id in the Right Table using WHERE clause. See my query and demo below:
SELECT
*
FROM LeftTable A
LEFT JOIN RightTable B ON A.id=B.id
WHERE B.id IS NULL
DEMO HERE
I would use NOT EXISTS operation in that case.
SELECT l.id, l.value
FROM LeftTable l
WHERE NOT EXISTS (SELECT 1 FROM RightTable r WHERE r.id = l.id and r.case_value = l.case_value)
Efficiency comparing and the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL are here