Find 'Most Similar' Items in Table by Foreign Key

Find 'Most Similar' Items in Table by Foreign Key - sql

I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).
The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).
I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.
Any hints as to how to go about this would be very much appreciated.
ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*

You can do this with a self join:
select t.materialid, count(*) as nummatches
from t join
t tmat
on t.Charact = tmat.Charact and t.value = tmat.value
where tmat.materialid = #MaterialId
group by t.materialid
order by nummatches desc;
Notes:
You might want to remove the specified material, by adding where t.MaterialId <> tmat.MaterialId to the where clause.
If you want all materials, then make the join a left join and move the where condition to the on clause.
If you want only one material with the most matches, use select top 1.
If you want all materials with the most matches when there are ties, use `select top (1) with ties.

Related

Selecting Rows and removing duplicates based on some fields based on two fields and limit to Top Ten?

Having this table:
Row Athlete Event Mark Meet
1 1 3 10 A
2 2 2 5 A
3 3 3 3 A
4 4 4 7 A
5 2 2 4 A
6 3 2 5 B
7 1 1 10 C
How can I select all rows but remove duplicate rows with have the athlete in the same event (Fields Athlete and Event), and pick the lowest (or highest Mark for that athlete), I would also like to limit each event to top 10 athletes (not shown in results)
Expected Output (choosing highest mark), (row 5 is removed)
Row Athlete Event Mark Meet
1 1 3 10 A
2 2 2 5 A
3 3 3 3 A
4 4 4 7 A
6 3 2 5 B
7 1 1 10 C
Thanks for the help the query that did what I wanted (minus the top ten) is:
SELECT [tblPerformanceData-FieldBoys].Eventnum, [tblPerformanceData- FieldBoys].Mark, [tblPerformanceData-FieldBoys].Meet, [tblPerformanceData-FieldBoys].CY, [tblPerformanceData-FieldBoys].AthleteID, [tblPerformanceData-FieldBoys].MeetID
FROM [tblPerformanceData-FieldBoys] INNER JOIN MaxAthleteByEventBoysField ON ([tblPerformanceData-FieldBoys].AthleteID = MaxAthleteByEventBoysField.AthleteID) AND ([tblPerformanceData-FieldBoys].Mark = MaxAthleteByEventBoysField.MaxOfMark) AND ([tblPerformanceData-FieldBoys].Eventnum = MaxAthleteByEventBoysField.Eventnum)
GROUP BY [tblPerformanceData-FieldBoys].Eventnum, [tblPerformanceData-FieldBoys].Mark, [tblPerformanceData-FieldBoys].Meet, [tblPerformanceData-FieldBoys].CY, [tblPerformanceData-FieldBoys].AthleteID, [tblPerformanceData-FieldBoys].MeetID
ORDER BY [tblPerformanceData-FieldBoys].Mark DESC;

You can do it using cascading queries. Try running a group-by query on the main table that only includes the athlete, event, and mark. The max or min clause would be applied to the mark depending on the outcome you're looking for. Use this query as the source for a second query where you link back to the initial table using direct links between the athlete, event, and Mark field. what the second query should look like
That solves the first part. I'm not sure how to get the top ten for each event using queries.

I don't own or have access to MS Access, but I can give you SQL, and hope Access will support some basic syntax.
Option 1: it's easier if Row is your primary key but you do not need to return it in the result; in this case you can even get both MIN and MAX of the Mark for the same athlete in the same row using a simple query:
SELECT
Athlete, Event, Meet, MAX(Mark) AS HighestMark, MIN(Mark) AS LowestMark
FROM
MyTable
GROUP BY
Athlete, Event, Meet
Note: I assumed you also want to group by Meet, but if that's not the case, you could remove it from GROUP BY, but then its value loses meaning in the result.
Option 2: Row is primary key, but you do need to return it - obviously in this case min and max cannot be returned in the same row and the query looks quite different:
SELECT
Row, Athlete, Event, Mark, Meet
FROM
MyTable m0
WHERE m0.Row IN
(SELECT MAX(Row)
FROM MyTable m1
WHERE
Athlete = m0.Athlete AND
Event = m0.Event AND
Meet = m0.Meet
Mark = (SELECT MAX(Mark)
FROM MyTable
WHERE
Athlete = m1.Athlete AND
Event = m1.Event AND
Meet = m1.Meet)
GROUP BY
Athlete, Event, Meet, Mark)
Few notes:
above query returns MAX(Mark); change it to MIN(Mark) to return lowest values
this query could be rewritten with JOINs as well; I'm not sure which method Access likes better (i.e. runs faster)
it has 2 sub-queries; the top sub-query MAX(Row) is there to make sure only 1 row is selected if the same athlete in the same meet and event gets the same Mark; in this case, the greater Row is returned
it is possible to return both MIN and MAX with one query (as separate rows) at the expense of additional sub-queries, but that you didn't ask for

Access "Not In" query not working while only In is working correctly

I have below given query which is working fine but I want to use "Not In" operator instead of "In" but its giving no results:
SELECT DISTINCT OrderProdDetails.Priority
FROM OrderProdDetails
WHERE (((OrderProdDetails.Priority) In (SELECT DISTINCT OrderProdDetails.Priority
FROM OrderProdDetails WHERE (((OrderProdDetails.OrdID)=[Forms]![UpdateOrder]![OdrID])))));
Desired Query:
SELECT DISTINCT OrderProdDetails.Priority
FROM OrderProdDetails
WHERE (((OrderProdDetails.Priority) Not In (SELECT DISTINCT OrderProdDetails.Priority
FROM OrderProdDetails WHERE (((OrderProdDetails.OrdID)=[Forms]![UpdateOrder]![OdrID])))));
Basically it is referencing a control on parent form and based on that in a subform I want to populate the priority numbers i.e 1,2,3 and if for that record 1 is entered I want to get only 2 and 3 as drop-down option.
ReocordID OrdID Brand Name Priority
2 1 Org 1 2
3 2 Org 2 1
4 1 Org 1 1
6 1 Org 1 3
7 3 Org 3 1
8 4 Org 1 1
9 5 Org 2 1
10 5 Org 2 2
11 6 Org 1 1
12 6 Org 2 2
If there is any other better approach for the same please suggest.
Thanks in advance for your help.

In all likelihood, your problem is that Priority can take on NULL values. In that case, NOT IN doesn't work as expected (although it does work technically). The usual advice is to always use NOT EXISTS with subqueries rather than NOT IN.
But, in your case, I would suggest conditional aggregation instead:
SELECT opd.Priority
FROM OrderProdDetails as opd
GROUP BY opd.Priority
HAVING SUM(IIF(opd.OrdID = [Forms]![UpdateOrder]![OdrID], 1, 0)) = 0;
The HAVING clause counts the number of times the forms OdrId is in the orders. The = 0 means it is never there. Plus, you no longer need a select distinct.

Thanks for your prompt answers however I figured out what the problem was and the answer to problem is.
SELECT DISTINCT OrderProdDetails.Priority
FROM OrderProdDetails
WHERE (((OrderProdDetails.Priority) Not In (SELECT OrderProdDetails.Priority
FROM OrderProdDetails WHERE (((OrderProdDetails.OrdID)=[Forms]![UpdateOrder]![OdrID])
and ((OrderProdDetails.Priority) Is not null) ))));
I realized that the problem was happening only to those where there was a null value in priority so I puth the check of not null and it worked fine.
Thanks

Get MAX() on repeating IDs

This is how my query results look like currently. How can I get the MAX() value for each unique id ?
IE,
for 5267139 is 8.
for 5267145 is 4
5267136 5
5267137 8
5267137 2
5267139 8
5267139 5
5267139 3
5267141 4
5267141 3
5267145 4
5267145 3
5267146 1
5267147 2
5267152 3
5267153 3
5267155 8
SELECT DISTINCT st.ScoreID, st.ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st
ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
ORDER BY st.ScoreID, st.ScoreTrackingTypeID DESC

GROUP BY will partition your table into separate blocks based on the column(s) you specify. You can then apply an aggregate function (MAX in this case) against each of the blocks -- this behavior applies by default with the below syntax:
SELECT First_column, MAX(Second_column) AS Max_second_column
FROM Table
GROUP BY First_column
EDIT: Based on the query above, it looks like you don't really need the ScoreTrackingType table at all, but leaving it in place, you could use:
SELECT st.ScoreID, MAX(st.ScoreTrackingTypeID) AS ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
GROUP BY st.ScoreID
ORDER BY st.ScoreID
The GROUP BY will obviate the need for DISTINCT, MAX will give you the value you are looking for, and the ORDER BY will still apply, but since there will only be a single ScoreTrackingTypeID value for each ScoreID you can pull it out of the ordering.

Finding contiguous regions in a sorted MS Access query

I am a long time fan of Stack Overflow but I've come across a problem that I haven't found addressed yet and need some expert help.
I have a query that is sorted chronologically with a date-time compound key (unique, never deleted) and several pieces of data. What I want to know is if there is a way to find the start (or end) of a region where a value changes? I.E.
DateTime someVal1 someVal2 someVal3 target
1 3 4 A
1 2 4 A
1 3 4 A
1 2 4 B
1 2 5 B
1 2 5 A
and my query returns rows 1, 4 and 6. It finds the change in col 5 from A to B and then from B back to A? I have tried the find duplicates method and using min and max in the totals property however it gives me the first and last overall instead of the local max and min? Any similar problems?

I didn't see any purpose for the someVal1, someVal2, and someVal3 fields, so I left them out. I used an autonumber as the primary key instead of your date/time field; but this approach should also work with your date/time primary key. This is the data in my version of your table.
pkey_field target
1 A
2 A
3 A
4 B
5 B
6 A
I used a correlated subquery to find the previous pkey_field value for each row.
SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m;
Then put that in a subquery which I joined to another copy of the base table.
SELECT
sub.pkey_field,
sub.target,
sub.prev_pkey_field,
prev.target AS prev_target
FROM
(SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m) AS sub
LEFT JOIN YourTable AS prev
ON sub.prev_pkey_field = prev.pkey_field
WHERE
sub.prev_pkey_field Is Null
OR prev.target <> sub.target;
This is the output from that final query.
pkey_field target prev_pkey_field prev_target
1 A
4 B 3 A
6 A 5 B

Here is a first attempt,
SELECT t1.Row, t1.target
FROM t1 WHERE (((t1.target)<>NZ((SELECT TOP 1 t2.target FROM t1 AS t2 WHERE t2.DateTimeId<t1.DateTimeId ORDER BY t2.DateTimeId DESC),"X")));

Why does this query return "incorrect" results?

I have 3 tables:
'CouponType' table:
AutoID Code Name
1 CouT001 SunCoupon
2 CouT002 GdFriCoupon
3 CouT003 1for1Coupon
'CouponIssued' table:
AutoID CouponNo CouponType_AutoID
1 Co001 1
2 Co002 1
3 Co003 1
4 Co004 2
5 Co005 2
6 Co006 2
'CouponUsed' table:
AutoID Coupon_AutoID
1 2
2 3
3 5
I am trying to join 3 tables together using this query below but apparently I am not getting right values for CouponIssued column:
select CouponType.AutoID, Code, Name, Count(CouponIssued.CouponType_AutoID), count(CouponUsed.Coupon_AutoID)
from (CouponType left join CouponIssued
on (CouponType.AutoID = CouponIssued.CouponType_AutoID))
left join CouponUsed
on (couponUsed.Coupon_AutoID = CouponIssued.AutoID)
group by CouponType.AutoID, code, name
order by code
The expected result should be like:
**Auto ID Code Name Issued used**
1 CouT001 SunCoupon 3 2
2 CouT002 GdFriCoupon 3 1
3 CouT003 1for1Coupon 0 0
Thanks!

SELECT t.AutoID
,t.Code
,t.Name
,count(i.CouponType_AutoID) AS issued
,count(u.Coupon_AutoID) AS used
FROM CouponType t
LEFT JOIN CouponIssued i ON i.CouponType_AutoID = t.AutoID
LEFT JOIN CouponUsed u ON u.Coupon_AutoID = i.AutoID
GROUP BY 1,2,3;
You might consider using less confusing names for your table columns. I have made very good experiences with using the same name for the same data across tables (as far as sensible).
In your example, AutoID is used for three different columns, two of which appear a second time in another table under a different name. This would still make sense if Coupon_AutoID was named CouponIssued_AutoID instead.

change count(Coupon.CouponType_AutoID) to count(CouponIssued.CouponType_AutoID) and count(Coupon.Coupon_AutoID) to count(CouponUsed.Coupon_AutoID)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find 'Most Similar' Items in Table by Foreign Key - sql

Related

Selecting Rows and removing duplicates based on some fields based on two fields and limit to Top Ten?

Access "Not In" query not working while only In is working correctly

Get MAX() on repeating IDs

Finding contiguous regions in a sorted MS Access query

Why does this query return "incorrect" results?

Categories

Resources