Matching rows: Do I need a cursor? (SQL Server 2005) - sql

I have 2 tables, call them G and T, on which I am selecting records based upon matching on a number of fields.
SELECT
g.ID, t.ID
FROM
g JOIN t
ON (g.Field1 = t.Field1
AND g.Field2 = t.Field2
AND .... )
There can be more than one record that matches each side e.g. rows t1 and t2 are identical on the fields used for matching, as are g1 and g2 and they match each other, giving
t1 g1
t1 g2
t2 g1
t2 g2
(the actual ids are ints, but you get the idea)
What we want is for each T record to match to only one G record (we don't care which as long as they are different ones) e.g. either of
t1 g1
t2 g2
OR
t1 g2
t2 g1
would be acceptable, but NOT
t1 g1
t2 g1
And not both resultsets - we only want the 2 rows total (in this example).
There might be (say) 30,000 rows in the initial selection from each table. Not everything will have a match, this is fine.
Can this be done set-wise or do I have to use a cursor?
EDITED in response to answer.

You can use ROW_NUMBER() to assign some arbitrary identifiers to do the matching on:
;With TOrdered as (
select ID,Field1,Field2,
ROW_NUMBER() OVER (PARTITION BY Field1,Field2 ORDER BY ID) as rn
from T
), GOrdered as (
select ID,Field1,Field2,
ROW_NUMBER() OVER (PARTITION BY Field1,Field2 ORDER BY ID) as rn
from G
)
SELECT
g.ID, t.ID
FROM
GOrdered g
JOIN
TOrdered t
ON (g.Field1 = t.Field1
AND g.Field2 = t.Field2
AND g.rn = t.rn )
(If there are mismatches on counts between the two tables, some rows will not appear in the final result at all - but you haven't really indicated whether they should or not, or how they should be dealt with)

Related

Joining two tables together strictly by order

If I have two tables(t1, t2), each with one column
t1
letters
a
b
c
t2
nums
1
2
3
Is it possible to "join" the two together in a way that produces a two-column result set that looks like this:
letters nums
a 1
b 2
c 3
Requirements for the solution:
Must combine each table's data in a specified order, so being able
to order each table's data before joining
Doesn't use any functions, like row_number, to add an extra column to join on
Bonus points:
- If two tables have different row counts, final result set is count of the max of the two tables, and the "missing" data is nulls.
Just wondering if this is possible given the constraints.
You want to use row_number(). However, SQL tables represent unordered sets, so you need a column that specifies the ordering.
The idea is:
select l.letter, n.number
from (select l.*, row_number() over (order by ?) as seqnum
from letters l
) l join
(select n.*, row_number() over (order by ?) as seqnum
from numbers n
) n
on l.seqnum = n.seqnum;
The ? is for the column that specifies the ordering.
If you want all rows in both tables, use full join rather than an inner join.
EDTI:
row_number() is the obvious solution, but you can do this with a correlated subquery assuming the values are unique:
select l.letter, n.number
from (select l.*,
(select count(*) from letters l2 where l2.letter <= l.letter) as seqnum
from letters l
) l join
(select n.*,
(select count(*) from numbers n2 where n2.num <= n.num) as seqnum
from numbers n
) n
on l.seqnum = n.seqnum;
I find the restriction on not using row_number() to be rather absurd, given that it is an ISO/ANSI standard function supported by almost all databases.
If your version of SQL supports an ASCII function (which can generate an ASCII code for each lowercase letter), then you may join on the ASCII code shifted downwards by 96:
SELECT
t1.letters,
t2.nums
FROM table1 t1
INNER JOIN table2 t2
ON t2.nums = ASCII(t1.letters) - 96;
Demo

SQL Find most rows that match between two tables

I am using SQL Server 2012 I have two tables like the following
Table1 and Table 2 both have many groups, indicated by the group column. The name of the group may match in both tables, but it may not. What is important is finding the group on Table 2 that has the most members that match members in a group on Table1.
I first tried doing this with a vlookup, but the problem is vlookup pulls the first entry in the Group column that has a match, not the group with the most matches. Below vlookup would pull BBB, but the correct result is CCC.
Ties may occur. There might be more than one group on Table2 that match Table1 with the same number of members thus the best thing may be to count the number of matches, but there are thousands of groups so it's not ideal to sort and sift through a column with counts. I need something like a case statement where if there is a MAX(match) then Table1 would show the group name with MAX(Match) in the derived column BestMatch. It'd be most ideal if the column could display all the groups on table 2 that have MAX(Match which may be one or more. Perhaps it could be comma separated.
If not if the column could just say tie and I could look for the tie, it'd be ideal if this is the best option, when the word tie appears it repeats besides every member that matches so I will know to look for groups that matching which accounts and how many that matched.
We really could do with some expected output to help clarify the question.
If I understand you correctly however, this query will get you close to the results you require:
;with cte as
( SELECT t1a.[group] AS Group1
, t2a.[Group] AS Group2
, RANK() OVER(PARTITION BY t1a.[group]
ORDER BY COUNT(t2a.[Group]) DESC) AS MatchRank
FROM Table1 t1a
JOIN Table2 t2a
ON t1a.member = t2a.member
GROUP BY t1a.[group], t2a.[GRoup])
SELECT *
FROM cte
WHERE MatchRank=1
The query doesn't identify ties, but it will display any tied results...
If you are a newbie to common table expressions(the ;with statement) there is a useful description here.
select *
from Table1 t1
outer apply
(
select top 1 t2.[Group]
from Table2 t2
where t2.Member = t1.Member
group by t2.[Group]
order by count(*) desc
) m
It may not be the most elegant solution but I think it could do the work:
select *
from
(select t1.[group] as t1group, t1.member, t2.[group] as t2group
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)a
where member = (select max(t1.member)
from Table1 t1 inner join Table2 t2 on t1.member = t2.member)
In case of 2 rows from Table2 matching the maximum members in Table1, both results would be displayed
PS: an example of your desired results would have been helpful
Count member matches per group pair and rank them so the group pairs with the highest match count get rank #1. Once you found these, you can select the related records from table1 and table2.
select t1.grp, t1.member, t2.grp
from t1
join
(
select
t1.grp as grp1,
t2.grp as grp2,
rank() over (order by count(*) desc) as rnk
from t1
join t2 on t2.member = t1.member
group by t1.grp, t2.grp
) grps on grps.rnk = 1 and grps.grp1 = t1.grp
left join t2 on t2.grp = grps.grp2 and t2.member = t1.member
order by t1.grp, t1.member, t2.grp;
This gives you ties in separate rows, e.g. for AAA having four different members (123,456,789,555) with two matches both in CCC and DDD:
grp1 member grp2
AAA 123 CCC
AAA 123 DDD
AAA 456 CCC
AAA 789
AAA 555 DDD
If you want one row per grp1 and member with all matching grp2 in a string then you need some clumsy STUFF trick in SQL Server as far as I am aware. Look up "GROUP_CONCAT in SQL Server" to find the technique needed.

SQL index relative to another field

I have, as part of a query, a bunch of distinct pairs of values:
a d
a e
a f
b g
b h
c i
I'd like to be able to calculate an counter relative to the first field:
a d 1
a e 2
a f 3
b g 1
b h 2
c i 1
I can't use the position in the temporary table - apart from anything else it goes too high, whereas the value I need can't go over 2 digits (and there isn't going to be more than 50 entries with the same first field. Are there any methods or techniques to help?
Thanks for any help!
select t1.c1 , t1.c2 , count(t2.c1) cnt
from mytable t1
join mytable t2 on t1.c1 = t2.c1 and t1.c2 >= t2.c2
group by t1.c1, t1.c2
order by t1.c1, cnt
Demo
Explanation
This query assumes that the pair (c1,c2) is unique.
To rank each row (c1,c2) the query counts the number of rows within the group c1 where c2 is less than or equal to c2. For example, for (a,e), there are 2 rows within the group a that are less than or equal to e (namely d and e).
You didn't specify your DBMS, so this is ANSI SQL:
select a,
b,
row_number() over (partition by a order by b) as idx
from the_table;
SQLFiddle: http://sqlfiddle.com/#!15/4cf96/1
row_number() is a window function which will generate a unique number based on the "grouping" and ordering defined with the partition by clause. The Postgres manual has a nice introduction to window functions: http://www.postgresql.org/docs/current/static/tutorial-window.html
This is going to be much faster than a self join

sql to combine two unrelated tables into one

I have tables
table1
col1 col2
a b
c d
and table2
mycol1 mycol2
e f
g h
i j
k l
I want to combine the two tables, which have no common field into one table looking like:
table 3
col1 col2 mycol1 mycol2
a b e f
c d g h
null null i j
null null k l
ie, it is like putting the two tables side by side.
I'm stuck! Please help!
Get a row number for each row in each table, then do a full join using those row numbers:
WITH CTE1 AS
(
SELECT ROW_NUMBER() OVER(ORDER BY col1) AS ROWNUM, * FROM Table1
),
CTE2 AS
(
SELECT ROW_NUMBER() OVER (ORDER BY mycol1) AS ROWNUM, * FROM Table2
)
SELECT col1, col2, mycol1, mycol2
FROM CTE1 FULL JOIN CTE2 ON CTE1.ROWNUM = CTE2.ROWNUM
This is assuming SQL Server >= 2005.
It's really good if you put in a description of why this problem needs to be solved. I'm guessing it is just to practice sql syntax?
Anyway, since the rows don't have anything connecting them, we have to create a connection. I chose the ordering of their values. Also since they have nothing connecting them that also begs the question on why you would want to put them next to each other in the first place.
Here is the complete solution: http://sqlfiddle.com/#!6/67e4c/1
The select code looks like this:
WITH rankedt1 AS
(
SELECT col1
,col2
,row_number() OVER (order by col1,col2) AS rn1
FROM table1
)
,rankedt2 AS
(
SELECT mycol1
,mycol2
,row_number() OVER (order by mycol1,mycol2) AS rn2
FROM table2
)
SELECT
col1,col2,mycol1,mycol2
FROM rankedt1
FULL OUTER JOIN rankedt2
ON rn1=rn2
Option 1: Single Query
You have to join the two tables, and if you want each row in table1 to match to only one row in table2, you have to restrict the join somehow. Calculate row numbers in each table and join on that column. Row numbers are database-specific; here is a solution for mysql:
SELECT
t1.col1, t1.col2, t2.mycol1, t2.mycol2
FROM
(SELECT col1, col2, #t1_row := t1_row + 1 AS rownum FROM table1, (SELECT #t1_row := 0) AS r1) AS t1
LEFT JOIN
(SELECT mycol1, mycol2, #t2_row := t2_row + 1 AS rownum FROM table2, (SELECT #t2_row := 0) AS r2) AS t2
ON t1.rownum = t2.rownum;
This assumes table1 is longer than table2; if table2 is longer, either use RIGHT JOIN or switch the order of the t1 and t2 sub-selects. Also note that you can specify the order of each table separately using an ORDER BY clause in the sub-selects.
(See select increment counter in mysql)
Option 2: Post-processing
Consider making two selects, and then concatenating the results with your favorite scripting language. This is a much more reasonable approach.

Selecting distinct identity rows based on the lowest value of a joined priority column

Simplified table structures, all INT columns and no PKs outside of the identity columns:
Nodes (n) table: id
Attributes (a) table: id, node_id, type_id
Type (t) table: id, priority
I'm trying to select a set of attributes, each of which has the lowest type.priority for its respective node. Though there are multiple attributes per node_id, I only want to select the one with the lowest priority value:
a1 n1 t1 p0 *
a2 n1 t2 p1
a3 n2 t2 p1 *
a4 n2 t3 p2
This is the basic query that I'm working from, at which point I'm also getting stuck:
SELECT *
FROM a
LEFT JOIN t ON a.type_id = t.id
GROUP BY node_id
My first thought was to use an aggregate, MIN, but I'm then having problems matching up the lowest priority for a node_id with the correct attribute.
This question is a variation of the "greatest-n-per-group" problem, but you're looking for the least instead of the greatest, and your criteria are in a lookup table (Type) instead of the principle table (Attributes).
So you want the rows (a1) from Attributes such that no other row with the same node_id is associated with a lower priority.
SELECT a1.*
FROM Attributes a1 INNER JOIN Type t1 ON (a1.type_id = t1.id)
LEFT OUTER JOIN (
(Attributes a2 INNER JOIN Type t2 ON (a2.type_id = t2.id))
ON (a1.node_id = a2.node_id AND t1.priority > t2.priority)
WHERE a2.node_id IS NULL;
Note that this can result in ties. You haven't described how you would resolve ties if two Attributes referenced Types with the same priority. In other words, in the following examples, which attributes should be selected?
a1 n1 t1 p0
a2 n1 t1 p0
a3 n2 t2 p1
a4 n2 t3 p1
PS: I hope you don't mind I added the "greatest-n-per-group" tag to your question. Click that tag to see other questions on SO that I have tagged similarly.
Use tie-breaker query (not tested):
SELECT n.*, a.*
FROM Nodes n
LEFT JOIN Attributes a
ON a.id = (SELECT x.id --//TOP 1 x.id
FROM Attributes x
INNER JOIN Type t
ON x.type_id = t.id
WHERE x.node_id = n.id
ORDER BY t.priority ASC,
--//just in case there are 2 attributes
--//with the same priority, order also on x.id
x.id ASC
LIMIT 1
)