I've got a SQL table like below where one value is linked to a second value and vice versa.
ROW
ID1
ID2
1
1
2
2
2
1
3
3
4
4
4
3
....
This might be some bad design but this is what I'm stuck with. I need to produce a SQL query in SQL Server to return only the following (doesn't matter which order):
ROW
ID1
ID2
1
1
2
3
3
4
....
OR
ROW
ID1
ID2
2
2
1
4
4
3
....
I've got a list of ID's (1, 2, 3, 4) which I used to query the table against ID1 field or ID2 field, but it always returns all the rows because those IDs exist in both columns.
I've tried looking at eliminating one row by looking if the one field it exists in the other column, but then I get no results. Obviously.
The one solution that could work is by looking at the rownum field and only get the even or odd rows. But this feels hacky. Also there might be other values in that list that is not part of my IN list, so that could possibly miss some rows?
Anything eloquent to consider from a TSQL perspective
Here's one (quite cumbersome but pretty effective) way to do it.
First, Create and populate sample table (Please save us this step in future questions):
CREATE TABLE Table1 (
[ROW] int,
[ID1] int,
[ID2] int
);
INSERT INTO Table1 ([ROW], [ID1], [ID2]) VALUES
(1, 1, 2),
(2, 2, 1),
(3, 3, 4),
(4, 4, 3),
(5, 1, 4);
Note: The last raw is not a part of the sample data you've provided, but I assumed you would also like to include in the results records where only one row had the connection beteween Id1 and Id2.
Then, use a couple of common table expression to get the minimum row number of any pair of Id1 and Id2, regardless of the order of ids, and then query the original table joined to the second cte:
WITH CTE1 AS
(
SELECT Row,
IIF(Id1 < Id2, Id1, Id2) As Small,
IIF(Id1 < Id2, Id2, Id1) As Big
FROM Table1
), CTE2 AS
(
SELECT Min(Row) As MinRow
FROM CTE1
GROUP BY Small, Big
)
SELECT Row, Id1, Id2
FROM Table1
JOIN CTE2
ON Row = MinRow;
Results:
Row Id1 Id2
1 1 2
3 3 4
5 1 4
You can see a live demo on DB<>Fiddle
Related
Context:
I have two tables: markettypewagerlimitgroups (mtwlg) and stakedistributionindicators (sdi). When a mtwlg is created, 2 rows are created in the sdi table which are linked to the mtwlg - each row with the same values bar 2, the id and another field (let's call it column X) which must contain a 0 for one row and 1 for the other.
There was a bug present in our codebase which prevented this happening automatically, so any mtwlg's created during the time that bug was present do not have the related sdi's, causing NPE's in various places.
To fix this, a patch needs to be written to loop through the mtwlg table and for each ID, search the sdi table for the 2 related rows. If the rows are present, do nothing; if there is only 1 row, check if F is a 0 or a 1, and insert a row with the other value; if neither row is present, insert them both. This needs to be done for every mtwlg, and a unique ID needs to be inserted too.
Pseudocode:
For each market type wager limit group ID
Check if there are 2 rows with that id in the stake distributions table, 1 where column X = 0 and one where column X = 1
if none
create 2 rows in the stake distributions table with unique id's; 1 for each X value
if one
create the missing row in the stake distributions table with a unique id
if 2
do nothing
If it helps at all - the patch will be applied using liquibase.
Anyone with any advice or thoughts as to if and how this will be possible to write in SQL/a liquibase patch?
Thanks in advance, let me know of any other information you need.
EDIT:
I've actually just been advised to do this using PL/SQL, do you have any thoughts/suggestions in regards to this?
Thanks again.
Oooooh, an excellent job for MERGE.
Here's your pseudo code again:
For each market type wager limit group ID
Check if there are 2 rows with that id in the stake distributions table,
1 where column X = 0 and one where column X = 1
if none
create 2 rows in the stake distributions table with unique id's;
1 for each X value
if one
create the missing row in the stake distributions table with a unique id
if 2
do nothing
Here's the MERGE variant (still pseudo-code'ish as I don't know how your data really looks):
MERGE INTO stake_distributions d
USING (
SELECT limit_group_id, 0 AS x
FROM market_type_wagers
UNION ALL
SELECT limit_group_id, 1 AS x
FROM market_type_wagers
) t
ON (
d.limit_group_id = t.limit_group_id AND d.x = t.x
)
WHEN NOT MATCHED THEN INSERT (d.limit_group_id, d.x)
VALUES (t.limit_group_id, t.x);
No loops, no PL/SQL, no conditional statements, just plain beautiful SQL.
Nice alternative suggested by Boneist in the comments uses a CROSS JOIN rather than UNION ALL in the USING clause, which is likely to perform better (unverified):
MERGE INTO stake_distributions d
USING (
SELECT w.limit_group_id, x.x
FROM market_type_wagers w
CROSS JOIN (
SELECT 0 AS x FROM DUAL
UNION ALL
SELECT 1 AS x FROM DUAL
) x
) t
ON (
d.limit_group_id = t.limit_group_id AND d.x = t.x
)
WHEN NOT MATCHED THEN INSERT (d.limit_group_id, d.x)
VALUES (t.limit_group_id, t.x);
Answer: you don't. There is absolutely no need to loop through anything - you can do it in a single insert. All you need to do is identify the rows that are missing, and then you just need to add them in.
Here is an example:
drop table t1;
drop table t2;
drop sequence t2_seq;
create table t1 (cola number,
colb number,
colc number);
create table t2 (id number,
cola number,
colb number,
colc number,
colx number);
create sequence t2_seq
START WITH 1
INCREMENT BY 1
MAXVALUE 99999999
MINVALUE 1
NOCYCLE
CACHE 20
NOORDER;
insert into t1 values (1, 10, 100);
insert into t2 values (t2_seq.nextval, 1, 10, 100, 0);
insert into t2 values (t2_seq.nextval, 1, 10, 100, 1);
insert into t1 values (2, 20, 200);
insert into t2 values (t2_seq.nextval, 2, 20, 200, 0);
insert into t1 values (3, 30, 300);
insert into t2 values (t2_seq.nextval, 3, 30, 300, 1);
insert into t1 values (4, 40, 400);
commit;
insert into t2 (id, cola, colb, colc, colx)
with dummy as (select 1 id from dual union all
select 0 id from dual)
select t2_seq.nextval,
t1.cola,
t1.colb,
t1.colc,
d.id
from t1
cross join dummy d
left outer join t2 on (t2.cola = t1.cola and d.id = t2.colx)
where t2.id is null;
commit;
select * from t2
order by t2.cola;
ID COLA COLB COLC COLX
---------- ---------- ---------- ---------- ----------
1 1 10 100 0
2 1 10 100 1
3 2 20 200 0
5 2 20 200 1
7 3 30 300 0
4 3 30 300 1
6 4 40 400 0
8 4 40 400 1
If the processing logic is too gnarly to be encapsulated in a single SQL statement, you may need to resort to cursor for loops and row types - basically allows you to do things like the following:
DECLARE
r_mtwlg markettypewagerlimitgroups%ROWTYPE;
BEGIN
FOR r_mtwlg IN (
SELECT mtwlg.*
FROM markettypewagerlimitgroups mtwlg
)
LOOP
-- do stuff here
-- refer to elements of the current row like this
DBMS_OUTPUT.PUT_LINE(r_mtwlg.id);
END LOOP;
END;
/
You can obviously nest another loop inside this one that hits the stakedistributionindicators table, but I'll leave that as an exercise for you. You could also left join to stakedistributionindicators a couple of times in this first cursor so that you only return rows that don't already have an x=1 and x=0, again you can probably work that bit out for yourself.
If you would rather write your logic in Java vs. PL/SQL, Liquibase allows you to create custom changes. The custom change points to a Java class you write that can do whatever logic you need. A simple example can be found here
Trying to figure out if it's possible to write a single, set based query to return what I want with data in one single table. The below is just an example, and I need something that could easily work if most (but not all) of combinations 1 to 9 (or 1 to 20 etc) exist.
Table AllCovered has two columns. ID1 and ID2. There are 16 rows in this table, each containing a combination of the numbers 1 to 4 (so 1,1 1,2 1,3 1,4 2,1 .... 4,3 4,4)
Table SomeGaps has the same structure but only has 12 rows, again each row is a combination of 1 to 4, but with some of the combinations missing.
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM AllCovered
GROUP BY ID1, ID2
- this query returns 16 rows, each combination with 1 in the 3rd column (THIS)
SELECT ID1, ID2, COUNT(ID1) as THIS
FROM SomeGaps
GROUP BY ID1, ID2
- this returns the 12 rows. How can I create query that will return 16 rows, of each combination but with 0 in THIS for the combinations that are missing in somegaps?
ID1 ID2 THIS
1 1 1
1 2 0 (1,2 combination does NOT exist in SomeGaps)
1 3 1
1 4 1
2 1 1
2 2 0 (2,2 combination does NOT exist in SomeGaps)
Obviously I've tried using a crossjoin to get all combinations of ID1 and ID2 but the COUNT is, as expected, vastly inflated.
Hope this makes sense. Apologies if it's an easy solution, I can't seem to crack it!
You can do this by cross-joining all the distinct values for the two columns. Then use left outer join and aggregation to get the counts for all combinations:
select ac.id1, ac.id2, count(ac.id1) as cnt
from (select distinct id1 from AllCovered) ac1 cross join
(select distinct id2 from AllCovered) ac2 left join
AllCovered ac
on ac.id1 = ac1.id1 and ac.id2 = ac2.id2
group by ac.id1, ac.id2;
I'm probably missing something obvious, but I'll take a bite anyway:
create table #AllCovered (id1 int, id2 int);
insert #AllCovered values
(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(2,4),(3,1),(3,2),(3,3),(3,4),(4,1),(4,2),(4,3),(4,4);
create table #gaps (id1 int, id2 int);
insert #gaps values(1,1),(1,2),(1,3),(1,4),(2,1),(2,4),(3,1),(3,2),(3,3),(4,1),(4,2),(4,4);
select #AllCovered.id1, #AllCovered.id2,
count(#gaps.id1) as this
from #AllCovered
left outer join #gaps
on #AllCovered.id1 = #gaps.id1 and #AllCovered.id2 = #gaps.id2
group by #AllCovered.id1, #AllCovered.id2;
drop table #AllCovered, #gaps
From your narrative, there are no duplicate combinations of (id1, id2) in neither table, and AllCovered contains all possible combinations -- otherwise will use distinct subqueries and fabricate AllCovered.
I am trying to run below 2 queries on the same table and hoping to get results in 2 different columns.
Query 1: select ID as M from table where field = 1
returns:
1
2
3
Query 2: select ID as N from table where field = 2
returns:
4
5
6
My goal is to get
Column1 - Column2
-----------------
1 4
2 5
3 6
Any suggestions? I am using SQL Server 2008 R2
Thanks
There has to be a primary key to foreign key relationship to JOIN data between two tables.
That is the idea about relational algebra and normalization. Otherwise, the correlation of the data is meaningless.
http://en.wikipedia.org/wiki/Database_normalization
The CROSS JOIN will give you all possibilities. (1,4), (1,5), (1, 6) ... (3,6). I do not think that is what you want.
You can always use a ROW_NUMBER() OVER () function to generate a surrogate key in both tables. Order the data the way you want inside the OVER () clause. However, this is still not in any Normal form.
In short. Why do this?
Quick test database. Stores products from sporting goods and home goods using non-normal form.
The results of the SELECT do not mean anything.
-- Just play
use tempdb;
go
-- Drop table
if object_id('abnormal_form') > 0
drop table abnormal_form
go
-- Create table
create table abnormal_form
(
Id int,
Category int,
Name varchar(50)
);
-- Load store products
insert into abnormal_form values
(1, 1, 'Bike'),
(2, 1, 'Bat'),
(3, 1, 'Ball'),
(4, 2, 'Pot'),
(5, 2, 'Pan'),
(6, 2, 'Spoon');
-- Sporting Goods
select * from abnormal_form where Category = 1
-- Home Goods
select * from abnormal_form where Category = 2
-- Does not mean anything to me
select Id1, Id2 from
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid1, Id as Id1
from abnormal_form where Category = 1) as s
join
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid2, Id as Id2
from abnormal_form where Category = 2) as h
on s.Rid1 = h.Rid2
We definitely need more information from the user.
I have table with 2 columns ....
id id2
1 1
1 2
1 3
2 1
2 2
2 4
3 2
3 3
3 4
I want to return the ids which have for example id2 in (1, 2, 4) but that has all of the values in the list.
In this above case it would return id = 2. Is this possible?
select id
from MyTable
where id2 in (1, 2, 4)
group by id
having count(distinct id2) = 3 --this must match the number of elements in IN clause
Update:
If the list of IDs is variable, then you should create an additional table that contains the varying sets of IDs, which you can then JOIN against to do your filtering.
Are you alluding to relational division? e.g. the supplier who supplies all products, the pilot that can fly all the planes in the hanger, etc?
If so, this article has many example implementations in SQL.
Do a self-join to test different rows on the same table in one go:
SELECT id
FROM t AS t0
JOIN t AS t1 ON t1.id=t0.id
JOIN t AS t2 ON t2.id=t1.id
WHERE t0.id2=1
AND t1.id2=2
AND t2.id2=4
Can somebody give a hint on this one? :
I have a table, let's say tblA, where I have id1 and id2 as columns and index(id1,id2).
I want to select the id1´s where id2´s belong to several sets. So I would want to say
select id1 from tblA
where id2 in (val1,val2,val3 ...)
union
select id1 from tblA
where id2 in (val4,val2,val3 ...)
union
(...)*
Let's say we have in table A the following:
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(2,1)
(2,2)
(2,3)
Now I want all the id1s that have id2 in (3,4).
So what I want to get is id1 = 1.
2 shouldn't appear because although we have a relation (2,3) we don't have (2,4).
Any ideas how to perform this query? I guess the way above has a problem with performance if the (...) grows to much!? Thanks.
greets
You should create a temporary table like this:
CREATE TABLE temp (id INT NOT NULL PRIMARY KEY) ENGINE MEMORY;
, fill it with values you are searching for (2 and 3 in your example):
INSERT
INTO temp
VALUES (3), (4)
and issue this query:
SELECT ad.id1
FROM (
SELECT DISTINCT id1
FROM a
) ad
WHERE NOT EXISTS
(
SELECT NULL
FROM temp
WHERE NOT EXISTS
(
SELECT NULL
FROM a
WHERE a.id1 = ad.id1
AND a.id2 = temp.id
)
)
You should create a composite index on (id1, id2) for this to work.
For each id1, this will probe each id2 against temp at most once, and will return false as soon as the first id2 absent in temp is found for each id1.
Here's the plan for the query:
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 2, 'Using where'
3, 'DEPENDENT SUBQUERY', 'temp', 'ALL', '', '', '', '', 2, 'Using where'
4, 'DEPENDENT SUBQUERY', 'a', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'ad.id1,test.temp.id', 1, 'Using index'
2, 'DERIVED', 'a', 'range', '', 'PRIMARY', '4', '', 3, 'Using index for group-by'
, no temporary, no filesort.
The union is gonna kill your performance. Use something like this:
select id1 from tblA where id2 in (val1,val2,val3 ...) or id2 in (val4,val2,val3)
Can you combine all the sets into one large set?
If the order is not important, this would seem to be the fastest way.
First, remember that
select id1 from tblA where id2 in (val1, val2, val3) union
select id1 from tblA where id2 in (val4, val5, val6)
should give the same result as
select id1 from tblA where id2 in (val1, val2, val3, val4, val5, val6)
so you can perhaps improve efficiency by formulating a single query rather than using a union.
Secondly (and independent of the above) you should add an index on id2 to tblA. Without it the id2 values are randomly distributed through both the existing index and the table data, so the optimizer will have no option but to perform a linear scan - of the index, if you are lucky.
But all these queries give back both ids from column id1! I think Robert meant that as a result he just wants "1" from column id1:
id1 id2
1 | 1
1 | 2
1 | 3
1 | 4 --> id1s that have id2 with 3 and 4
1 | 5
2 | 1
2 | 2
2 | 3
Because id1=2 does not have 3 AND 4 it should not be a result.
Please correct me if I misunderstood...
I was trying to do a statement but I could not get just the id1=1 back, but I am as well very interested in an efficient solution to this!
You need to create a separate index on column 'id2' because combined index on (id1,id2) will not be used when looking up for id2 only.
This query does what you mentioned
SELECT id1 FROM tblA WHERE id2 IN (?,?,?,?)
GROUP BY id1 HAVING COUNT(id2)=4
NOTE: You need to adjust the COUNT(id2) condition in HAVING clause to the number of values mentioned in the IN clause. Here i used four '?' to represent four values that's why i have written COUNT(id2)=4.
For the scenario which you mentioned in the comment, query will look like following
SELECT id1 FROM tblA WHERE id2 IN (3,4)
GROUP BY id1 HAVING COUNT(id2)=2