Quantifying the unique number of records using fuzzy matching - sql

I am currently inner joining a customer table using the mds.mdq.similarity function in SQL Server to fuzzy match the customer name records:
Select a.CUST_ID as a_CUST_ID
,a.CU_NAME as a_CU_NAME
,b.CUST_ID as b_CUST_ID
,b.CU_NAME as b_CU_NAME
from #tmp a
inner join #tmp b
on a.CUST_ID > b.CUST_ID
and (mds.mdq.Similarity (a.CU_NAME, b.CU_NAME, 2, 0, 0)) > 0.9
Now, running this query gives me the following sample table:
a_CUST_ID a_CU_NAME b_CUST_ID b_CU_NAME
112 abc 111 abbc
113 abc- 111 abbc
111 abbc 110 abc_
112 abc 110 abc_
114 xyz 115 xyz-
What I would like to find is a way to quantify the number of "unique" CU_NAMEs from this ("unique" being as per the mds.mdq.similarity matching logic).
In the above sample, we would say 110 ~ 111 ~ 112 ~ 113 and 114 ~ 115. Hence, there would be 2 "unique" CU_NAMEs. The expected outcome would be:
Number_of_Unique_CU_NAME
2

Related

Compare values in one table basing on previous dates

I have a problem which I can not resolve in dax. I have a table in report view -
Real table has other values and contains 10000+ records, but below is example what I would like to achieve:
(in real table the difference between dates in tales does not always equals 1)
user
username
salary
date
1
x
123
14-10-2022
2
y
455
11-10-2022
3
z
333
13-10-2022
4
t
222
12-10-2022
5
h
111
10-10-2022
desired output:
user
username
salary
date
salary (date-1 day)
salary (date-3 days)
1
x
123
14-10-2022
333
455
2
y
455
11-10-2022
111
3
z
333
13-10-2022
222
111
4
t
222
12-10-2022
455
5
h
111
10-10-2022
I know that the way could be self join like
on table1.user = table2.user and table1.date = table2.date - 1 and table1.date = table2.date - 3
but is there any other idea how to achieve desired tables without doing many joins?
Thanks you very much in advance

Query returns a few extra records

I have these tables and this query in an Access database:
samples
hole_id | depth_from | depth_to |
DH001 100 105
DH001 105 120
DH001 110 115
DH001 115 120
overlapping_samples (and therefore the correct output)
hole_id | depth_from | depth_to |
DH001 110 115
DH001 115 120
query
SELECT a.*
FROM samples AS a
INNER JOIN overlapping_samples AS o
ON a.hole_id=o.hole_id
WHERE a.hole_id=o.hole_id AND a.depth_to=o.depth_to
;
results
hole_id | depth_from | depth_to |
DH001 100 105
DH001 110 115
DH001 115 120
It's very simple. The result is almost ok, but it includes some extra records from the left table (i.e. samples). In fact, in the example above it may not necessarily return the extra row. Only a small percentage are.
If not obvious, I want to return all the records from the left table that match to the right table. The right table is actually a subset of the left, and therefore the query should have the same number of records. It's intended for a DELETE statement, but
i've changed your query to:
SELECT a.hole_id as ahole_id, a.depth_from as adepth_from, a.depth_to as adepth_to,o.hole_id as ohole_id, o.depth_from as odepth_from, o.depth_to as odepth_to
FROM samples AS a
LEFT JOIN overlapping_samples AS o ON a.hole_id=o.hole_id AND a.depth_to=o.depth_to AND a.depth_from=o.depth_from
WHERE a.hole_id=o.hole_id AND a.depth_to=o.depth_to;
and it gave me this result
ahole_id | adepth_from | adepth_to | ohole_id | odepth_from | odepth_to |
DH001 110 115 DH001 110 115
DH001 115 120 DH001 115 120
is that what you were looking for?
this may work:
SELECT a.*
FROM samples a
JOIN overlapping_samples o ON a.hole_id = o.hole_id
WHERE a.depth_from = o.depth_from
AND a.depth_to = o.depth_to;
I fixed a problem in WHERE clause, from:
a.hole_id=o.hole_id
to:
a.depth_from = o.depth_from
hole_id is already present in JOIN ... ON a.hole_id = o.hole_id
if you still don't get correct count you may need to look at your data and add some extra condition either in WHERE or JOIN clause

Using multiple and interdepended CROSS-APPLY across multiple tables

How can I use either CROSS APPLY (or INNER JOIN) to get data from one table based on the values in other tables?
I.e. I have the following tables:
Table Descriptions:
ProdID
Description
TrackNum
361
Test 1
499
388
Test 2
003
004
5599
238
Test 3
499
361
Test 10
555
004
Test 40
555
Table Products:
ProdID
ProductName
Price
361
P1
5.00
388
P2
5.00
004
P3
12.00
238
P4
6.00
515
P5
7.00
636
P6
7.00
775
P7
7.00
858
P8
8.00
Table Invoices:
ProdID
TrackNum
InvoiceID
361
499
718
388
199
718
004
499
718
238
499
718
361
555
333
004
555
444
361
111
444
388
222
333
616
116
565
717
116
565
361
003
221
388
003
221
004
5599
728
What I need my query to do is to:
Go into Invoices table first, and get only records that matches specified InvoiceID and TrackNum;
Then go into Products table and get only rows that have matches on ProdID between the data I pulled out in Step #1 and the data existis in the Products table.
Then finally get all columns from the Descriptions table, but only for the rows which I got in the Step #2 and which matches on ProdID.
What I need to have at the end is something like this (if I get more columns that is fine, but I do not want to get more rows):
ProdID
Description
TrackNum
361
Test 1
499
004
5599
238
Test 3
499
I have following query (and I have tried using INNER JOIN and CROSS APPLY) - but it returns me more rows than I need:
SELECT * FROM [Descriptions] AS [DES]
CROSS APPLY
(
select * from [Invoices] AS [INV] where [INV].[TrackNum] = '499' AND [INV].[InvoiceID] = '718'
) [INV]
CROSS APPLY
(
select * from [Products] AS [GP]
WHERE [GP].[ProdID] = [INV].[ProdID]
) [GP2]
WHERE
[DES].[ProdID] = [GP2].[ProdID]
order by [DES].[ProdID] asc
SELECT
*
FROM
invoices AS i
LEFT JOIN
descriptions AS d
ON d.prodid = i.prodid
AND d.tracknum = i.tracknum -- you don't have this, but I think it's required.
LEFT JOIN
products AS p
ON p.prodid = i.prodid
WHERE
i.invoiceid = 718
AND i.tracknum = 499
ORDER BY
i.prodid
One thing that concerns me is that both the invoices and descriptions have a column named tracknum, but your query and expected data indicate that you don't want to include that in the join? That's very confusing and either a poor column name, or a mistake in your query and example results.
Based on what you describe you want the following, start with your Invoices table and a where clause to get the right rows, then join on Products and Descriptions.
I'm also guessing that you want to match the Description on TrackNum? Since it appears you have a unique Description per ProdId/TrackNum combination.
select [INV].[ProdID], [DES].[Description], [INV].[TrackNum]
from [Invoices] as [INV]
inner join [Products] as [GP] on [GP].[ProdID] = [INV].[ProdID]
inner join [Descriptions] on [DES].[ProdID] = [GP].[ProdID] and [DES].[TrackNum] = [INV].[TrackNum]
where [INV].[TrackNum] = '499' AND [INV].[InvoiceID] = '718'
order by [DES].[ProdID] asc;
Note: You normally only use a 'CROSS APPLY' for queries where you want to run/evaluate something per row in your main table.
In this case the Inner Join is sufficient. You don't need to use Cross Apply

MS Access SQL Query unmatched rows from two tables

Let's say I have Table_A and Table_B with following rows:
Table_A:
ID PART_ID KIT_ID
---------------------
1 1 340
2 12 340
3 19 340
4 30 340
5 1 348
6 19 348
7 27 348
...
Table_B:
PART_ID REQ
-------------
1 Y
12 Y
19 Y
27 Y
30 Y
...
How do I get the following result in Table_C?
Table_C:
PART_ID KIT_ID
----------------
27 340
12 348
30 348
...
I've tried the Query Wizard with the Unmatched Rows and for some reason cannot get any results that resemble what I need.. E.g., a customer orders a kit and each kit contains a bunch of parts (some required and some not); how do I find the missing parts for each kit?
Generate all combinations for the kits and the parts, then filter out the ones that don't exist:
select k.kit_id, p.part_id
from (select distinct kit_id from table_a) as k, -- no cross join in MS ACCESS
table_b p
where not exists (select 1
from table_a as a
where a.kit_id = k.kit_id and a.part_id = p.part_id
);
You may need the condition REQ = "Y" in the outer where clause. I'm not sure if that is important.

group by column not having specific value

I am trying to obtain a list of Case_Id's where the case does not contain a specific RoleId using Microsoft Sql Server 2012.
For example, I would like to obtain a collection of Case_Id's that do not contain a RoleId of 4.
So from the data set below the query would exclude Case_Id's 49, 50, and 53.
Id RoleId Person_Id Case_Id
--------------------------------------
108 4 108 49
109 1 109 49
110 4 110 50
111 1 111 50
112 1 112 51
113 2 113 52
114 1 114 52
115 7 115 53
116 4 116 53
117 3 117 53
So far I have tried the following
SELECT Case_Id
FROM [dbo].[caseRole] cr
WHERE cr.RoleId!=4
GROUP BY Case_Id ORDER BY Case_Id
The not exists operator seems to fit your need exactly:
SELECT DISTINCT Case_Id
FROM [dbo].[caseRole] cr
WHERE NOT EXISTS (SELECT *
FROM [dbo].[caseRole] cr_inner
WHERE cr_inner.Case_Id = cr.case_id
AND cr_inner.RoleId = 4);
Just add a having clause instead of where:
SELECT Case_Id
FROM [dbo].[caseRole] cr
GROUP BY Case_Id
HAVING SUM(case when cr.RoleId = 4 then 1 else 0 end) = 0
ORDER BY Case_Id;