Choosing MAX values by the "where=" statement - sql

Suppose, i have table A with columns a1, a2 and B table with b1, b2.
I need to join them like this
proc sql;
create C as
select a1, b1
from A as t1
left join B( where=(b1=max(select b1 from B)) as t2
on t1.a2 = t2.b2
run;
The problem is in where=(a1=max(select a1 from A)). It doesn't work somewhy. I need a where= solution, because B is big and where= is really fast

Your condition is on the first table. Hence, in a left join, such a condition usually goes in the where clause. Conditions on the second table would go in the on clause.
One method of doing what you want is to use a subquery:
proc sql;
create C as
select a1, b1
from A t1 left join
B t2
on t1.a2 = t2.b2
where t1.a1 = (select max(tt1.a1) from A tt1)
run;

It seems you only got the syntax wrong. This gets you the B record where b2 matches a2 and b1 is the maximum b1 value in the table.
create table c as
select a.a1, b.b1
from a
left join b on b.b2 = a.a2
and b.b1 = (select max(b1) from b);
Or are you simply trying to get the maximum b1 from all B records where b2 matches a2? That would be:
create table c as
select a.a1, max(b.b1)
from a
left join b on b.b2 = a.a2
group by a.a1;

Related

Select entries that have non repeating values on a specific column (although other columns may have repeating or non repeating values) (SQL)

Let's say I have the following table:
A
B
C
D
a1
b1
c1
d1
a1
b1
c1
d2
a2
b2
c3
d3
a2
b2
c4
d3
I want to filter and see all four columns for entries that have the same value con column A but different on column C, so I get only this as a result:
A
B
C
D
a2
b2
c3
d3
a2
b2
c4
d3
I don't really care if values con columns B and D are the same or different, although I would like to have them in my table to do further analysis later.
Using the DISTINCT statement would give me all the columns as a result, as they all are different in some column, so that doesn't work for me.
I read some questions (like this one) and the answers recommended using the row_number() over(partition by...) clause, although the use they gave it doesn't quite fit my problem (I think), as it would also return the first row with a repeating value on column C.
Any ideas how this could be done?
You can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.a = t.a and t2.c <> t.c
)
order by t.a;
You could use a self join
select t1.*
from t t1
join t t2 on t1.a=t2.a and t1.c<>t2.c

Update values from one table in another

I have tables A,B and C. I need to select certain values from all rows in Table A and then for every row in A, I need to update values in Table B and Table C.
Pseudo Code would look like this:
SELECT A1, A2, A3, A4 FROM Table A
UPDATE Table B SET B2=A2, B3=A3, B4=A4 WHERE B1 = A1;
UPDATE Table C SET C2=A2, C3=A3, C4=A4 WHERE C1 = A1;
How can I achieve this?
In Oracle, you would use two update statements:
update b
set (b2, b3, b4) = (select a2, a3, a4 from a where a.a1 = b.b1);
update c
set (c2, c3, c4) = (select a2, a3, a4 from a where a.a1 = c.b1);
You need two update statements. You could use an inline view:
update (
select a.a2, a.a3, a.a4 , b.b2, b.b3, b.b4
from a
inner join b on b.b1 = a.a1
) u
set u.b2 = u.a2, u.b3 = u.a3, u.b4 = u.a4
The upside is that this only updates matching rows - while, using the correlated subquery technique, you need to repeat the subquery in the where clause.
Another neat syntax that does what you want in Oracle is merge:
merge into b
using a on (a.a1 = b.a1)
when matched then update set b.b2 = a.a2, b.b3 = a.a3, b.b4 = a.a4;

Using AND in an INNER JOIN

I am fairly new with SQL would like to understand the logic below.
SELECT *
FROM Table A A1
INNER JOIN TABLE B B1 ON B1.ID = A1.ID AND A1 = 'TASK';
Not sure if this is a clear detail but please let me know. Thanks!
SELECT *
FROM Table A A1
INNER JOIN TABLE B B1 ON B1.ID = A1.ID AND A1.Column = 'TASK'
is the same as
SELECT *
FROM Table A A1
INNER JOIN TABLE B B1 ON B1.ID = A1.ID
WHERE A1.Column = 'TASK'
It's even the same performance wise, it's just a different way to write the query. In very large queries it can be more readable to use an AND directly on an INNER JOIN instead of "hiding" it the in the WHERE part.
This wouldn't run at all
SELECT *
FROM Table A A1 INNER JOIN
TABLE B B1
ON B1.ID = A1.ID AND A1 = 'TASK';
This will run because I added a column name (SomeColumn):
SELECT *
FROM Table A A1 INNER JOIN
TABLE B B1
ON B1.ID = A1.ID AND A1.SomeColumn = 'TASK';
And is the same as this
SELECT *
FROM Table A A1 INNER JOIN
TABLE B B1
ON B1.ID = A1.ID
WHERE A1.SomeCoumn = 'TASK';
Whenever you join to a constant it is pretty much the same as adding an additional criterion to the where clause. The only reason to put it up with the join is for code clarity.
SELECT * -- Select all the columns
FROM TABLE A A1 -- From the table A. A1 is like a nickname you are giving table A. Instead of typing A.ColumnName (A couldbe a very long name) you just type A1.ColumnName
INNER JOIN TABLE B B1 -- You are inner joining Table A and B. Again, B1 is just a nickname. Here is a good picture explaning joins.
ON B1.ID = A1.ID -- This is the column that the 2 tables have in common (the relationship column) These need to contain the same data.
AND A1 = 'TASK' -- This is saying you are joining where A1 tablename

Concatenating two tables distributively

I'm not 100% sure how to phrase the question, but I'm pretty much trying to do this:
say I have two tables:
table a:
a1
a2
and
table b:
b1
b2
I want to combine them and create a table such as:
a1 b1
a1 b2
a2 b1
a2 b2
(for every row in table a, create row number of rows in table b sort of)
I figure I'd be able to do this using a loop of some sort, but I was wondering if there was any way to do this with set logic?
The syntax you're looking for is a cross join:
SELECT a.*, b.*
FROM a
CROSS JOIN b
You don't need any loops.
This is a very simple task in SQL.
You can do:
select a.*, b.*
from a
cross join b
or:
select a.*, b.*
from a
inner join b on (1=1)
No need to loop just simple one line query would work.
SELECT a.*, b.* FROM a,b
Note: By Default it is cross join so no need to define keyword cross join.

filter duplicates in SQL join

When using a SQL join, is it possible to keep only rows that have a single row for the left table?
For example:
select * from A, B where A.id = B.a_id;
a1 b1
a2 b1
a2 b2
In this case, I want to remove all except the first row, where a single row from A matched exactly 1 row from B.
I'm using MySQL.
This should work in MySQL:
select * from A, B where A.id = B.a_id GROUP BY A.id HAVING COUNT(*) = 1;
For those of you not using MySQL, you will need to use aggregate functions (like min() or max()) on all the columns (except A.id) so your database engine doesn't complain.
It helps if you specify the keys of your tables when asking a question such as this. It isn't obvious from your example what the key of B might be (assuming it has one).
Here's a possible solution assuming that ID is a candidate key of table B.
SELECT *
FROM A, B
WHERE B.id =
(SELECT MIN(B.id)
FROM B
WHERE A.id = B.a_id);
First, I would recommend using the JOIN syntax instead of the outdated syntax of separating tables by commas. Second, if A.id is the primary key of the table A, then you need only inspect table B for duplicates:
Select ...
From A
Join B
On B.a_id = A.id
Where Exists (
Select 1
From B B2
Where B2.a_id = A.id
Having Count(*) = 1
)
This avoids the cost of counting matching rows, which can be expensive for large tables.
As usual, when comparing various possible solutions, benchmarking / comparing the execution plans is suggested.
select
*
from
A
join B on A.id = B.a_id
where
not exists (
select
1
from
B B2
where
A.id = b2.a_id
and b2.id != b.id
)