Execute query for each pair of values from a list - sql

I have a list of value pairs over which I iterate and run a query, the skeleton of which could be thought of like this.
list of pairs - ((x1,y1), (x2,y2), ... (xn,yn)) xi, yi are not all distinct.
q is an oracle query which returns a single value for any (xi,yi)
global_table is a single row table with
id col deleted
1 Y NULL
A few rows from 'table':
id col deleted pid did
1 NULL Y 25 1
81 N NULL NULL 149
101 Y NULL 22 149
61 Y NULL NULL NULL
Also, there is a UNIQUE constraint on (pid, did, deleted) in table.
The query q goes like this.
select w.finalcol from
(select coalesce(a.col,b.col,c.col,d.col) as finalcol from
(select * from global_table where deleted is null) a
left outer join
(select * from table where deleted is null) b
on b.pid is null and b.did is null
left outer join
(select * from table where deleted is null) c
on c.pid is null and c.did = xi
left outer join
(select * from table where deleted is null) d
on d.pid = yi and d.did = xi
) w
n = 60
n is determined by another query which returns the list of value pairs.
for element in (list of pairs)
q(xi,yi) (xi and yi might be used any number of times in the query)
I am trying to reduce the number of times I run this query. (from n times to 1)
I can try passing the individual lists and to the query after isolating them from the list of pairs but the catch is that not all pairs are present in the table(s) being queried from. But, you do get a value from the table for pairs that dont exist in the table(s) since there is a default case at play here (not important).
The default case is when
select * from table where deleted is null
and c.pid is null and c.did = xi
select * from table where deleted is null
and c.pid = yi and c.did = xi
dont return any rows.
I want my result to be of the form
x1 y1 q(x1,y1)
x2 y2 q(x2,y2)
.
.
.
xn yn q(xn,yn)
(no pair must be left out, given that a few pairs might actually not be in the table)
How do I achieve this using just a single query?
Thanks

Related

A join that "allocates" from available rows

I have a table X I want to update according to the entries in another table Y. The join between them is not unique. However, I want each entry in Y to update a different entry in X.
So if I have table X:
i (unique) k v
---------- ---------- ----------
p 100 b
q 101 a
r 202 x
s 301 a
and table Y:
k (unique) v
---------- ----------
0 a
1 b
2 a
3 c
4 a
I want to end up with table X like:
i k v
---------- ---------- ----------
p 1 b
q 0 a
r 202 x
s 2 a
The important result here is that the two rows in X with v = 'a' have been updated to two distinct values of k from Y. (It doesn't matter which ones.)
Currently, this result is achieved by an extra column and a program roughly like:
UPDATE X SET X.used = FALSE;
for Yk, Yv in Y:
UPDATE X
SET X.k = Yk,
X.used = TRUE
WHERE X.i IN (SELECT X.i FROM X
WHERE X.v = Yv AND NOT X.used
LIMIT 1);
In other words, the distinctness is achieved by "using up" the rows in Y. This doesn't scale well.
(I'm using SQLite3 and Python, but don't let that limit you.)
This can be solved by using rowids to pair up the results of a join. Window functions aren't necessary. (Thanks to xQbert for pointing me in this direction.)
First, we sort the two tables by v to make tables with rowids in a suitable order for the join.
CREATE TEMPORARY TABLE Xv AS SELECT * FROM X ORDER BY v;
CREATE TEMPORARY TABLE Yv AS SELECT * FROM Y ORDER BY v;
Then we can pick out the minimum rowid for each value of v in order to create a "zip join" for that value, pairing up the rows.
SELECT i, Yv.k, Xv.v
FROM Xv JOIN Yv USING (v)
JOIN (SELECT v, min(Xv.rowid) AS r FROM Xv GROUP BY v) AS xmin USING (v)
JOIN (SELECT v, min(Yv.rowid) AS r FROM Yv GROUP BY v) AS ymin
ON ymin.v = Xv.v AND Xv.rowid - xmin.r = Yv.rowid - ymin.r;
The clause Xv.rowid - min.x = Yv.rowid - min.y is the trick: it does a pairwise match of rows with the same value of v, essentially allocating one to the other. The result:
i k v
---------- ---------- ----------
q 0 a
s 2 a
p 1 b
It's then a simple matter to use the result of this query in an UPDATE.
WITH changes AS (<the SELECT above>)
UPDATE X SET k = (SELECT k FROM changes WHERE i = X.i)
WHERE i IN (SELECT i FROM changes);
The temporary tables could be restricted to the common values of v and possibly indexed on v if the query is large.
I'd welcome refinements (or bugs!)

finding number of people in a table that are NOT in other tables in SQL Server

I have 3 tables (let's say A, B, and C), and I have a common key column in all 3, called G.
I need a script to find the number of G that are in A (the main table - Level 1) that are not in either of B or C (level 2 tables). Basically, I want to left join a table on the result of full join of other 2 tables.
I tried the left join but the result is not correct. I used following script:
SELECT COUNT(DISTINCT A.G)
FROM A LEFT JOIN B ON A.G = B.G
FULL JOIN C ON A.G = C.G
WHERE (B.G IS NULL) OR (C.G IS NULL)
Appreciate your help.
P.S.
Choice of correct answer is based on the superiority in processing time. I ran both alternatives (exists vs. left joins) on my data set (which is relatively large and time consuming).
LEFT JOIN approach (selected answer) is far more process efficient than the EXISTS. It took former approach 0:23 minutes, compared to 7:52 minutes for later approach.
using not exists() to count() rows where G does not exist in B or C:
select count(*)
from A
where not exists (select 1 from B where A.G = B.G)
or not exists (select 1 from C where A.G = C.G)
If you want to count() rows where G does not exist in both B and C, change or to and in the above code.
rextester example demo: http://rextester.com/MSVVN6153
You are close - You need a left join to both B and C tables as well as AND instead of OR in your where clause:
SELECT COUNT(DISTINCT A.G)
FROM A
LEFT JOIN B ON A.G = B.G
LEFT JOIN C ON A.G = C.G
WHERE B.G IS NULL
AND C.G IS NULL;
I suspect that you want:
select count(*)
from a
where not exists (select 1 from b where a.g = b.g) and
not exists (select 1 from c where a.g = c.g);
You specify: "in A . . . that are not in either of B or C ". This suggests that G doesn't exist in B and doesn't exist in C. Hence the and rather than or (as in your version of the query).
I also removed the count(distinct). Instinct suggests that a.g is unique. If not, then count(distinct a.g) is correct..
And one last option using set operators you should learn. I'll leave counting out just to demonstrate the correctness:
set nocount on;
declare #a table (id smallint not null);
declare #b table (id smallint not null, xx smallint not null);
declare #c table (id smallint not null, yy smallint not null);
insert #a(id) values (1), (2), (3), (4);
insert #b(id, xx) values (1,0), (1,1), (3, 0);
insert #c(id, yy) values (1,9), (2, 1), (2,2), (5,1); -- notice the value 5 does not exist in #a
select id from #a except
select id from #b except
select id from #c
;

Why is Selecting From Table Variable Far Slower than List of Integers

I have a pretty big MSSQL stored procedure that I need to conditionally check for certain IDs:
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (1,2,3,4,5)
I wanted to conditionally check the SomeID field, so I did the following:
if #enteredText = 'This'
INSERT INTO #AwesomeIDs
VALUES(1),(2),(3)
if #enteredText = 'That'
INSERT INTO #AwesomeIDs
VALUES(4),(5)
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (Select ID from #AwesomeIDs)
Nothing else has changed, yet I can't even get the latter query to grab 5 records. The top query returns 5000 records in less than 3 seconds. Why is selecting from a table variable so much drastically slower?
Two other possible options you can consider
Option 1
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where
( b.SomeID IN (1,2,3) AND #enteredText = 'This')
OR
( b.SomeID IN (4,5) AND #enteredText = 'That')
Option 2
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where EXISTS (Select 1
from #AwesomeIDs
WHERE b.SomeID = ID)
Mind you for Table variables , SQL Server always assumes there is only ONE row in the table (except sql 2014 , assumption is 100 rows) and it can affect the estimated and actual plans. But 1 row against 3 not really a deal breaker.

return column name of the maximum value in sql server 2012

My table looks like this (Totally different names)
ID Column1--Column2---Column3--------------Column30
X 0 2 6 0101 31
I want to find the second maximum value of Column1 to Column30 and Put the column_Name in a seperate column.
First row would look like :
ID Column1--Column2---Column3--------------Column30------SecondMax
X 0 2 6 0101 31 Column3
Query :
Update Table
Set SecondMax= (select Column_Name from table where ...)
with unpvt as (
select id, c, m
from T
unpivot (c for m in (c1, c2, c3, ..., c30)) as u /* <-- your list of columns */
)
update T
set SecondMax = (
select top 1 m
from unpvt as u1
where
u1.id = T.id
and u1.c < (
select max(c) from unpvt as u2 where u2.id = u1.id
)
order by c desc, m
)
I really don't like relying on top but this isn't a standard sql question anyway. And it doesn't do anything about ties other than returning the first column name by order of alphabetical sort.
You could use a modification via the condition below to get the "third maximum". (Obviously the constant 2 comes from 3 - 1.) Your version of SQL Server lets you use a variable there as well. I think SQL 2012 also supports the limit syntax if that's preferable to top. And since it should work for top 0 and top 1 as well, you might just be able to run this query in a loop to populate all of your "maximums" from first to thirty.
Once you start having ties you'll eventually get a "thirtieth maximum" that's null. Make sure you cover those cases though.
and u1.c < all (
select top 2 distinct c from unpvt as u2 where u2.id = u1.id
)
And after I think about it. If you're going to rank and update so many columns it would probably make even more sense to use a proper ranking function and do the update all at once. You'll also handle the ties a lot better even if the alphabetic sorting is still arbitrary.
with unpvt as (
select id, c, m, row_number() over (partition by id order by c desc, m) as nthmax
from T
unpivot (c for m in (c1, c2, c3, ..., c30)) as u /* <-- your list of columns */
)
update T set
FirstMax = (select c from unpvt as u where u.id = T.id and nth_max = 1),
SecondMax = (select c from unpvt as u where u.id = T.id and nth_max = 2),
...
NthMax = (select c from unpvt as u where u.id = T.id and nth_max = N)

Select random record within groups sqlite

I can't seem to get my head around this. I have a single table in SQlite, from which I need to select a random() record for EACH group. So, considering a table such as:
id link chunk
2 a me1
3 b me1
4 c me1
5 d you2
6 e you2
7 f you2
I need sql that will return a random link value for each chunk. So one time I run it would give:
me1 | a
you2 | f
the next time maybe
me1 | c
you2 | d
I know similar questions have been answered but I'm not finding a derivation of one that applies here.
UPDATE:
Nuts, follow up question: so now I need to EXCLUDE rows where a new field "qcinfo" is set to 'Y'.
This, of course, hides rows whenever the random ID hits one where qcinfo = 'Y', which is wrong. I need to exclude the row from being considered in the chunk, but still generate a random record for the chunk if any records have qcinfo <> 'Y'.
select t.chunk ,t.id, t.qcinfo, t.link from table1
inner join
(
select chunk ,cast(min(id)+abs(random() % (max(id)-min(id)))as int) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id
where qcinfo <> 'Y'
A bit hackish, but it works... See sql fiddle http://sqlfiddle.com/#!2/81e75/7
select t.chunk
,t.link
from table1 t
inner join
(
select chunk
,FLOOR(min(id) + RAND() * (max(id)-min(id))) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id
Sorry, I thought that you said MySQL.
Here is the fiddle and the code for SQLite
http://sqlfiddle.com/#!5/81e75/12
select t.chunk
,t.link
from table1 t
inner join
(
select chunk
,cast(min(id)+abs(random() % (max(id)-min(id)))as int) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id
Note that SQLite returns the first value corresponding to a group when we do a group by without an aggregation
select link, chunk from table group by chunk;
Running that you'll get this
me1 | a
you2| d
Now you can make the first value random by randomly sorting the table and then grouping. Here's the final solution.
select link, chunk from (select * from table order by random()) group by chunk;