A join that "allocates" from available rows - sql

I have a table X I want to update according to the entries in another table Y. The join between them is not unique. However, I want each entry in Y to update a different entry in X.
So if I have table X:
i (unique) k v
---------- ---------- ----------
p 100 b
q 101 a
r 202 x
s 301 a
and table Y:
k (unique) v
---------- ----------
0 a
1 b
2 a
3 c
4 a
I want to end up with table X like:
i k v
---------- ---------- ----------
p 1 b
q 0 a
r 202 x
s 2 a
The important result here is that the two rows in X with v = 'a' have been updated to two distinct values of k from Y. (It doesn't matter which ones.)
Currently, this result is achieved by an extra column and a program roughly like:
UPDATE X SET X.used = FALSE;
for Yk, Yv in Y:
UPDATE X
SET X.k = Yk,
X.used = TRUE
WHERE X.i IN (SELECT X.i FROM X
WHERE X.v = Yv AND NOT X.used
LIMIT 1);
In other words, the distinctness is achieved by "using up" the rows in Y. This doesn't scale well.
(I'm using SQLite3 and Python, but don't let that limit you.)

This can be solved by using rowids to pair up the results of a join. Window functions aren't necessary. (Thanks to xQbert for pointing me in this direction.)
First, we sort the two tables by v to make tables with rowids in a suitable order for the join.
CREATE TEMPORARY TABLE Xv AS SELECT * FROM X ORDER BY v;
CREATE TEMPORARY TABLE Yv AS SELECT * FROM Y ORDER BY v;
Then we can pick out the minimum rowid for each value of v in order to create a "zip join" for that value, pairing up the rows.
SELECT i, Yv.k, Xv.v
FROM Xv JOIN Yv USING (v)
JOIN (SELECT v, min(Xv.rowid) AS r FROM Xv GROUP BY v) AS xmin USING (v)
JOIN (SELECT v, min(Yv.rowid) AS r FROM Yv GROUP BY v) AS ymin
ON ymin.v = Xv.v AND Xv.rowid - xmin.r = Yv.rowid - ymin.r;
The clause Xv.rowid - min.x = Yv.rowid - min.y is the trick: it does a pairwise match of rows with the same value of v, essentially allocating one to the other. The result:
i k v
---------- ---------- ----------
q 0 a
s 2 a
p 1 b
It's then a simple matter to use the result of this query in an UPDATE.
WITH changes AS (<the SELECT above>)
UPDATE X SET k = (SELECT k FROM changes WHERE i = X.i)
WHERE i IN (SELECT i FROM changes);
The temporary tables could be restricted to the common values of v and possibly indexed on v if the query is large.
I'd welcome refinements (or bugs!)

Related

"SELECT column ... WHERE", knowing value of other column, can't use subquery

I'm working on this SELECT query:
SELECT * FROM x
JOIN y ON x.struktorg = y.ref
WHERE y.rodzic = (SELECT ref FROM y WHERE y.symbol = 'the value i know')
The goal is to not use subselect. I know the value of symbol column, but the table that I need to get results from doesn't use it. It uses the reference number of that value.
you can join to y one more time:
SELECT * FROM x
JOIN y y1 ON x.struktorg = y1.ref
join y y2
ON y1.rodzic = y2.ref
and y2.symbol = 'the value i know'
but I don't see any benefit using join over subquery in this scenario .
if the subquery table y is the same of the JOIN y, then you can do this
SELECT *
FROM x
JOIN y ON x.struktorg = y.ref and y.rodzic = y.ref and y.symbol = 'the value i know'
if the subquery table y is diferent of the JOIN y, then you can do this renaming subquery table y for z
SELECT * FROM x
JOIN y ON x.struktorg = y.ref
JOIN z ON y.rodzic = z.ref and z.symbol = 'the value i know'
I would go around the sub-select by creating a temporary table first, like in the example below:
SELECT ref INTO #TEMP_TABLE FROM y WHERE y.symbol = 'the value i know'
Then I would join on that temporary table I created like in the example here:
SELECT * FROM x
JOIN y ON x.struktorg = y.ref
JOIN #TEMP_TABLE z on z.ref = y.rodzic
Having said that, I am sure that the above solution works effectively for SQL Server. However, I've never used Firebird, so the principles there might be different.

Using the result from a subquery elsewhere in the query

I have the following pseudo-sqlite call:
SELECT x, y,
(SELECT --very long SQL call--) AS z,
(SELECT a FROM diff_table_name WHERE b = z) AS e
FROM table_name
WHERE c = d
Essentially I want to use the z variable result from the first subquery in the second subquery, but I get a
no such column: z
error when I do. I can repeat the very long SQL call in the second subquery and that works, but I was hoping to not have to do that. Or maybe there's a way to return both a and z from one subquery?
This part of your query:
SELECT x, y,
(SELECT --very long SQL call--) AS z
FROM table_name
WHERE c = d
can be safely wrapped inside a CTE and then use the value of z:
WITH cte AS (
SELECT x, y,
(SELECT --very long SQL call--) AS z
FROM table_name
WHERE c = d
)
SELECT x, y, z,
(SELECT a FROM diff_table_name WHERE b = z) AS e
FROM cte

join operation difference between when distinct is 1 and n?

let say I have a df and df2.
if I want to join this two tables.
df : name , classId
df2: classId, time
a) df.classid.distinct().count() = 1
b) df.classid.distinct().count() = n , n < 500
c) df.classid.distinct().count() = n , n > 100000
if I want to make a join operation. it will be different for this 3 senarios?
A join does not depend on the number of rows in a table. It's always written in the same way. In your case you are looking for:
select df.name, df.classid, df2.time
from df
join df2 on df.classid = df2.classid;

How to use union in select clause?

I have two tables (created here for example) like:
X1 Y1
a 1
b 2
c 3
d 4
and
X2 Y2
a 5
m 6
n 7
b 4
And I want the output column:
X Y1 y2
a 1 5
b 2 4
c 3 0
d 4 0
m 0 6
n 0 7
What I tried is:
SELECT (A.X1 UNION B.X1) AS X, A.Y1,B.Y2
FROM A FULL OUTER JOIN B
ON A.X1 = B.X2
(the query mentioned above is just sample).
After executing this query I am getting error message:
Syntax error: near UNION in select clause
Can someone tell me what is wrong here. Is there any other option to get the output table in the mentioned format?
union is used to join results one after another. You're attempting to join results side by side (which you already did!). The only thing you're missing is a coalesce call to handle the missing values:
SELECT COALESCE(a.x1, b.x2) AS x,
COALESCE(a.y1, 0) AS y1,
COALESCE(b.y2, 0) AS y2
FROM a
FULL OUTER JOIN b on a.x1 = b.x2
You can try COALESCE
The COALESCE function returns the first of its arguments that is not
null. Null is returned only if all arguments are null.
SELECT COALESCE(A.X1,B.X2) AS X, COALESCE(A.Y1, 0) AS Y1, COALESCE(B.Y2, 0) AS Y2
FROM A FULL OUTER JOIN B
ON A.X1 = B.X2
SELECT Coalesce(a.x1,b.x1) AS X, coalesce(a.y1,0) as Y1 coalesce(b.y2,0) as Y2
FROM a
FULL OUTER JOIN
b ON a.x1 = b.x2
You don't need the UNION statement here, the union is used to add a resultset from a select to a resultset from a different select
You just need to use your join here with the correct on statement (which you did correct) and get the x1 or x2 from one of the tables as x1 would be equal to x2 within the same row
EDIT: Added coalesce statements to my query to return value for x if a.x1 does not exist but b.x2 does exist, also added 0 if a field doesn't exist for y1 or y2
The error is because UNION is not command that can be used in the list of columns, it is on set level, you can UNION two selects like:
SELECT * FROM table1
UNION
SELECT * FROM table2
they just need to have same columns

Execute query for each pair of values from a list

I have a list of value pairs over which I iterate and run a query, the skeleton of which could be thought of like this.
list of pairs - ((x1,y1), (x2,y2), ... (xn,yn)) xi, yi are not all distinct.
q is an oracle query which returns a single value for any (xi,yi)
global_table is a single row table with
id col deleted
1 Y NULL
A few rows from 'table':
id col deleted pid did
1 NULL Y 25 1
81 N NULL NULL 149
101 Y NULL 22 149
61 Y NULL NULL NULL
Also, there is a UNIQUE constraint on (pid, did, deleted) in table.
The query q goes like this.
select w.finalcol from
(select coalesce(a.col,b.col,c.col,d.col) as finalcol from
(select * from global_table where deleted is null) a
left outer join
(select * from table where deleted is null) b
on b.pid is null and b.did is null
left outer join
(select * from table where deleted is null) c
on c.pid is null and c.did = xi
left outer join
(select * from table where deleted is null) d
on d.pid = yi and d.did = xi
) w
n = 60
n is determined by another query which returns the list of value pairs.
for element in (list of pairs)
q(xi,yi) (xi and yi might be used any number of times in the query)
I am trying to reduce the number of times I run this query. (from n times to 1)
I can try passing the individual lists and to the query after isolating them from the list of pairs but the catch is that not all pairs are present in the table(s) being queried from. But, you do get a value from the table for pairs that dont exist in the table(s) since there is a default case at play here (not important).
The default case is when
select * from table where deleted is null
and c.pid is null and c.did = xi
select * from table where deleted is null
and c.pid = yi and c.did = xi
dont return any rows.
I want my result to be of the form
x1 y1 q(x1,y1)
x2 y2 q(x2,y2)
.
.
.
xn yn q(xn,yn)
(no pair must be left out, given that a few pairs might actually not be in the table)
How do I achieve this using just a single query?
Thanks