All possible combinations of records in table sql server - sql

I have a table
declare #table table(t varchar(50), d varchar(50), activ varchar(10), groupid int, rownum int)
insert into #table values('ALK','ceri', '0.2',1,1)
insert into #table values('ALK','criz', '24',1,2)
insert into #table values('EGFR','erlo', '2',2,3)
insert into #table values('EGFR','gefi', '57',2,4)
insert into #table values('EGFR','ibru', '5.6',2,5)
insert into #table values('EGFR','ceri', '900',2,6)
insert into #table values('EGFR','cetu', 'NULL',2,7)
insert into #table values('EGFR','afat', '10',2,8)
insert into #table values('EGFR','lapa', '10.8',2,9)
insert into #table values('EGFR','pani', 'NULL',2,10)
insert into #table values('ERBB2','pert', 'NULL',3,11)
insert into #table values('ERBB2','tras', 'NULL',3,12)
insert into #table values('ERBB2','lapa', '9.2',3,13)
insert into #table values('ERBB2','ado-', 'NULL',3,14)
insert into #table values('ERBB2','afat', '14',3,15)
insert into #table values('ERBB2','ibru', '9.4',3,16)
in output I need all combinations by groupid or t in format
t,d,t,d,t,d,activ and so on then I will qualify best combinations.
Any help will be appreciated. This will show doctors optimum combination of drugs for cancer patients. The table is dynamic and different for every patient.
Thank you

For all possible combinations, you would use CROSS JOIN:
SELECT * FROM table1 AS t1
CROSS JOIN table2 AS t2
on t1.ID = t2.ID
Keep in mind this gives a O(n^2) result set, likely to be huge for large sets of data.

I will use #TT to represent the table var since calling it #table may be a bit confusing
I also changed the datatype of active to float
There are really 3 possible cross joins
-- #1 -- producing 256 rows
select * from #TT as T1
cross join #TT as T2
-- #2 -- produces 104 rows
select * from #TT as T1
cross join #TT as T2
where T1.GroupID = T2.GroupID
-- #3 -- produces 104
select * from #TT as T1
cross join #TT as T2
where T1.t = T2.t
The 1st is a true cross join on the whole table.
The 2nd and 3rd are cross joins on GroupID and t respectively, but they are identical since Group 1 represents T='ALK', etc. This is easily confirmed since a union of 2 & 3 3 also produces 104 rows
However, select * on a self join is silly as is obvious if you change select * to
select T1.*, '===', T2.*
You can see the columns on the left of '===' are the same as the columns to the right of '==='
Since GroupID is an integer I would write the cross join as
select T1.* from #TT as T1
cross join #TT as T2
where T1.GroupID = T2.GroupID
Now since the poster wanted to grouping based on the smallest total active, I think it makes sense to group the response by GroupID and T and D giving and report the sum of Activ and order by GroupID and sum(Activ)
-- #4 adding group by and sum -- 16 rows generated
select T1.groupid, T1.t, T1.d, sum(T1.activ) as SumActiv
from #TT as T1
cross join #TT as T2
where T1.groupid = T2.groupid
group by T1.t, T1.groupid, T1.d
order by groupid, sum(T1.Activ)
Now you are getting close except for the fact that no CROSS JOIN is needed at all
-- #5 remove the cross join
select T1.groupid, T1.t, T1.d, sum(T1.activ) as SumActiv
from #TT as T1
group by T1.t, T1.groupid, T1.d
When I remove the cross join portion of the query I get the exact same result. I think we finally have what is wanted, with the possible exception of removing all but the first row for each combination of GroupID and d

Related

If one join works per rep id, don't join next

I am matching two datasets that I imported into a Redshift DB: both are at rep id level.
This is my initial query to match the two datasets:
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
#t is the source of truth I am matching to, and unique_id is supposedly the universal identifier (though only matches about 60%) for rep id (internal identifier), however, in some cases #t2 table has (incorrectly) multiple unique_ids per rep, and incorrectly multiple emails.
How can I change it so that it is more restrictive, ie. when getting a match by unique_id- dont match next record for that rep, when matching by email- dont match next record for that rep, and lastly join by firstname/lastname.
Thank you!
I think there are a few ways to skin this cat. As one option you could add a rank for each join as a CASE statement, and then pick out the one that has the min rank:
SELECT *
FROM
(
SELECT *,
min(ranktest) OVER (PARTITION BY t1.unique_id) as minrank
FROM
(
select *,
CASE WHEN lower(trim(t.unique_id))=lower(trim(t2.unique_id)) THEN 1
WHEN lower(trim(t.email))=lower(trim(t2.email)) THEN 2
WHEN ower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1))) THEN 3
END as ranktest
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
) sub1
WHERE ranktest = minrank;
You could also do this by querying twice, once to get your data, and once to get the min(ranktest). It will almost definitely be slower, but.. it's a little prettier:
WITH subquery AS
(
select *,
CASE WHEN lower(trim(t.unique_id))=lower(trim(t2.unique_id)) THEN 1
WHEN lower(trim(t.email))=lower(trim(t2.email)) THEN 2
WHEN ower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1))) THEN 3
END as ranktest
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
)
SELECT *
FROM subquery t1
WHERE t1.ranktest = (SELECT min(ranktest) FROM subquery WHERE subquery.unique_id = t1.ranktest)
Alternatively, you could run this as a UNION ALL, testing for the join differently each time to avoid repeats and only allowing the top most ranked join through:
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
UNION ALL
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))<>lower(trim(t2.unique_id))
AND lower(trim(t.email))=lower(trim(t2.email))
UNION ALL
select *
FROM #t t
join #t2 t2
ON lower(trim(t.unique_id))<>lower(trim(t2.unique_id))
AND lower(trim(t.email))<>lower(trim(t2.email))
AND lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))

SQL - How to select distinct and join multiple tables without duplicating data

I have the following table setup/data:
create table #temp (irecordid int, sdocumentno varchar(20), dtfileddate datetime, mnyconsideration money)
insert into #temp values (1, '3731572', '6-30-2014', 120.00)
Create table #temp2 (irecordid int, address varchar(255))
insert into #temp2 values (1, '406 N CUSTER')
insert into #temp2 values (1, '2015 E HANSON')
Create table #temp3 (irecordid int, srdocumentno varchar(25))
insert into #temp3 values (1, '55489')
insert into #temp3 values (1, '99809')
I am trying to select so I only get a distinct instance of each table. I am trying:
select distinct sdocumentno, address, srdocumentno
from #temp t1
join #temp2 t2 on t1.irecordid = t2.irecordid
join #temp3 t3 on t1.irecordid = t3.irecordid
And my results are as follows:
3731572 2015 E HANSON 55489
3731572 2015 E HANSON 99809
3731572 406 N CUSTER 55489
3731572 406 N CUSTER 99809
I would really like only the distinct data from each table like this:
3731572 2015 E HANSON 55489
3731572 406 N CUSTER 99809
Is there a way I can accomplish this?
Thanks!
I am guessing that you want to join on "row number", but that doesn't exist. But, you can generate one and then join on them:
select sdocumentno, address, srdocumentno
from #temp t1 join
(select t2.*,
row_number() over (partition by irecordid order by (select NULL)) as seqnum
from #temp2 t2
) t2
on t1.irecordid = t2.irecordid join
(select t3.*,
row_number() over (partition by irecordid order by (select NULL)) as seqnum
from #temp2 t3
) t3
on t1.irecordid = t3.irecordid and t2.seqnum = t3.seqnum;
You can use full outer join if the lists are of different lengths.

What's the best way to select data only appearing in one of two tables?

If I have two tables such as this:
CREATE TABLE #table1 (id INT, name VARCHAR(10))
INSERT INTO #table1 VALUES (1,'John')
INSERT INTO #table1 VALUES (2,'Alan')
INSERT INTO #table1 VALUES (3,'Dave')
INSERT INTO #table1 VALUES (4,'Fred')
CREATE TABLE #table2 (id INT, name VARCHAR(10))
INSERT INTO #table2 VALUES (1,'John')
INSERT INTO #table2 VALUES (3,'Dave')
INSERT INTO #table2 VALUES (5,'Steve')
And I want to see all rows which only appear in one of the tables, what would be the best way to go about this?
All I can think of is to either do:
SELECT * from #table1 except SELECT * FROM #table2
UNION
SELECT * from #table2 except SELECT * FROM #table1
Or something along the lines of:
SELECT id,MAX(name) as name FROM
(
SELECT *,1 as count from #table1 UNION ALL
SELECT *,1 as count from #table2
) data
group by id
HAVING SUM(count) =1
Which would return Alan,Fred and Steve in this case.
But these feel really clunky - is there a more efficient way of approaching this?
select coalesce(t1.id, t2.id) id,
coalesce(t1.name, t2.name) name
from #table1 t1
full outer join #table2 t2
on t1.id = t2.id
where t1.id is null
or t2.id is null
The full outer join guarantees records from both sides of the join. Whatever record that does not have in both sides (the ones you are looking for) will have NULL in one side or in other. That's why we filter for NULL.
The COALESCE is there to guarantee that the non NULL value will be displayed.
Finally, it's worth highlighting that repetitions are detected by ID. If you want it also to be by name, you should add name to the JOIN. If you only want to be by name, join by name only. This solution (using JOIN) gives you that flexibility.
BTW, since you provided the CREATE and INSERT code, I actually ran them and the code above is a fully working code.
You can use EXCEPT and INTERSECT:
-- All rows
SELECT * FROM #table1
UNION
SELECT * FROM #table2
EXCEPT -- except
(
-- those in both tables
SELECT * FROM #table1
INTERSECT
SELECT * FROM #table2
)
Not sure if this is any better than your EXCEPT and UNION example...
select id, name
from
(select *, count(*) over(partition by checksum(*)) as cc
from (select *
from #table1
union all
select *
from #table2
) as T
) as T
where cc = 1

SQLServer join two tables

I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks
You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01
You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )

Join Tables with no Join Criteria

This seems so simple, but I just can't figure it out. I want to simply join 2 tables together. I don't care which values are paired with which. Using TSQL, here is an example:
declare #tbl1 table(id int)
declare #tbl2 table(id int)
insert #tbl1 values(1)
insert #tbl1 values(2)
insert #tbl2 values(3)
insert #tbl2 values(4)
insert #tbl2 values(5)
select * from #tbl1, #tbl2
This returns 6 rows, but what kind of query will generate this (just slap the tables side-by-side):
1 3
2 4
null 5
You can give each table row numbers and then join on the row numbers:
WITH
Table1WithRowNumber as (
select row_number() over (order by id) as RowNumber, id from Table1
),
Table2WithRowNumber as (
select row_number() over (order by id) as RowNumber, id from Table2
)
SELECT Table1WithRowNumber.Id, Table2WithRowNumber.Id as Id2
FROM Table1WithRowNumber
FULL OUTER JOIN Table2WithRowNumber ON Table1WithRowNumber.RowNumber = Table2WithRowNumber.RowNumber
Edit: Modiifed to use FULL OUTER JOIN, so you get all rows (with nulls).
Use Cross Join
Select * From tableA Cross Join TableB
But understand you will get a row in the output for every combination of rows in TableA with every Row in TableB...
So if Table A has 8 rows, and TableB has 4 rows, you will get 32 rows of data...
If you want any less than that, you have to specify some join criteria, that will filter out the extra rows from the output
Well, this will work:
Select A.ID, B.ID From
(SELECT ROW_NUMBER () OVER (ORDER BY ID) AS RowNumber, ID FROM Tbl2 ) A
full outer join
(SELECT ROW_NUMBER () OVER (ORDER BY ID) AS RowNumber, ID FROM Tbl1 ) B
on (A.RowNumber=B.RowNumber)
The SQL1 cross join applies here also.
Select *
From tableA, TableB