Access 2016 Using Top in a Union Query - sql
I am trying to create a query that will output the top 6 results based on a union of two other queries. The two queries are as follows:
SELECT GameData.GameID,
GameData.DivisionID,
GameData.SeasonID,
GameData.HomeTeamID AS TeamID,
GameData.GameDate,
IIf([FullTimeResult] = 'H', 1, 0) AS W,
IIf([FullTimeResult] = 'D', 1, 0) AS D,
IIf([FullTimeResult] = 'A', 1, 0) AS L,
IIf([HalfTimeResult] = 'H', 1, 0) AS WHT,
IIf([HalfTimeResult] = 'D', 1, 0) AS DHT,
IIf([HalfTimeResult] = 'A', 1, 0) AS LHT,
GameData.FullTimeHomeGoals AS GS,
GameData.FullTimeAwayGoals AS GC,
IIf([FullTimeResult] = 'H', 3, IIf([FullTimeResult] = 'D', 1, 0)) AS P,
GameData.HalfTimeHomeGoals AS GSHT,
GameData.HalfTimeAwayGoals AS GCHT,
GameData.HomeShots AS Sh,
GameData.AwayShots AS ShA,
GameData.HomeShotsOnTarget AS ShOnT,
GameData.AwayShotsOnTarget AS ShAOnT,
GameData.HomeFouls AS FM,
GameData.AwayFouls AS FA,
GameData.HomeCorners AS C,
GameData.AwayCorners AS CA,
GameData.HomeYellowCards AS YC,
GameData.AwayYellowCards AS YCA,
GameData.HomeRedCards AS RC,
GameData.AwayRedCards AS RCA
FROM TeamsDivSea
INNER JOIN GameData ON TeamsDivSea.TeamID = GameData.HomeTeamID
WHERE (
(
(GameData.GameID) IN (
SELECT TOP 3 GameID
FROM GameData AS Dupe
WHERE Dupe.HomeTeamID = GameData.HomeTeamID
ORDER BY Dupe.GameDate DESC,
Dupe.GameID DESC
)
)
);
SELECT GameData.GameID,
GameData.DivisionID,
GameData.SeasonID,
GameData.AwayTeamID AS TeamID,
GameData.GameDate,
IIf([FullTimeResult] = 'A', 1, 0) AS W,
IIf([FullTimeResult] = 'D', 1, 0) AS D,
IIf([FullTimeResult] = 'H', 1, 0) AS L,
IIf([HalfTimeResult] = 'A', 1, 0) AS WHT,
IIf([HalfTimeResult] = 'D', 1, 0) AS DHT,
IIf([HalfTimeResult] = 'H', 1, 0) AS LHT,
GameData.FullTimeAwayGoals AS GS,
GameData.FullTimeHomeGoals AS GC,
IIf([FullTimeResult] = 'A', 3, IIf([FullTimeResult] = 'D', 1, 0)) AS P,
GameData.HalfTimeAwayGoals AS GSHT,
GameData.HalfTimeHomeGoals AS GCHT,
GameData.AwayShots AS Sh,
GameData.HomeShots AS ShA,
GameData.AwayShotsOnTarget AS ShOnT,
GameData.HomeShotsOnTarget AS ShAOnT,
GameData.AwayFouls AS FM,
GameData.HomeFouls AS FA,
GameData.AwayCorners AS C,
GameData.HomeCorners AS CA,
GameData.AwayYellowCards AS YC,
GameData.HomeYellowCards AS YCA,
GameData.AwayRedCards AS RC,
GameData.HomeRedCards AS RCA
FROM TeamsDivSea
INNER JOIN GameData ON TeamsDivSea.TeamID = GameData.AwayTeamID
WHERE (
(
(GameData.GameID) IN (
SELECT TOP 3 GameID
FROM GameData AS Dupe
WHERE Dupe.AwayTeamID = GameData.AwayTeamID
ORDER BY Dupe.GameDate DESC,
Dupe.GameID DESC
)
)
);
Is it possible to create a union query only using one SELECT TOP statement so only the top 6 are output from the joined results?
Many thanks
This should work - basically UNION your two queries and select the top 6.
SELECT TOP 6 GameID, DivisionID, SeasonID, TeamID, GameDate, W,D,L,WHT,DHT,LHT,GS,GC,P,GSHT,GCHT,Sh,shA, shOnT, ShAOnt, FM,FA,C,CA,YC,YCA,RC,RCA
FROM (
SELECT GameData.GameID,
GameData.DivisionID,
GameData.SeasonID,
GameData.HomeTeamID AS TeamID,
GameData.GameDate,
IIf([FullTimeResult] = 'H', 1, 0) AS W,
IIf([FullTimeResult] = 'D', 1, 0) AS D,
IIf([FullTimeResult] = 'A', 1, 0) AS L,
IIf([HalfTimeResult] = 'H', 1, 0) AS WHT,
IIf([HalfTimeResult] = 'D', 1, 0) AS DHT,
IIf([HalfTimeResult] = 'A', 1, 0) AS LHT,
GameData.FullTimeHomeGoals AS GS,
GameData.FullTimeAwayGoals AS GC,
IIf([FullTimeResult] = 'H', 3, IIf([FullTimeResult] = 'D', 1, 0)) AS P,
GameData.HalfTimeHomeGoals AS GSHT,
GameData.HalfTimeAwayGoals AS GCHT,
GameData.HomeShots AS Sh,
GameData.AwayShots AS ShA,
GameData.HomeShotsOnTarget AS ShOnT,
GameData.AwayShotsOnTarget AS ShAOnT,
GameData.HomeFouls AS FM,
GameData.AwayFouls AS FA,
GameData.HomeCorners AS C,
GameData.AwayCorners AS CA,
GameData.HomeYellowCards AS YC,
GameData.AwayYellowCards AS YCA,
GameData.HomeRedCards AS RC,
GameData.AwayRedCards AS RCA
FROM TeamsDivSea
INNER JOIN GameData ON TeamsDivSea.TeamID = GameData.HomeTeamID
WHERE (
(
(GameData.GameID) IN (
SELECT TOP 3 GameID
FROM GameData AS Dupe
WHERE Dupe.HomeTeamID = GameData.HomeTeamID
ORDER BY Dupe.GameDate DESC,
Dupe.GameID DESC
)
)
)
UNION ALL SELECT GameData.GameID,
GameData.DivisionID,
GameData.SeasonID,
GameData.AwayTeamID AS TeamID,
GameData.GameDate,
IIf([FullTimeResult] = 'A', 1, 0),
IIf([FullTimeResult] = 'D', 1, 0),
IIf([FullTimeResult] = 'H', 1, 0),
IIf([HalfTimeResult] = 'A', 1, 0),
IIf([HalfTimeResult] = 'D', 1, 0),
IIf([HalfTimeResult] = 'H', 1, 0),
GameData.FullTimeAwayGoals,
GameData.FullTimeHomeGoals,
IIf([FullTimeResult] = 'A', 3, IIf([FullTimeResult] = 'D', 1, 0)),
GameData.HalfTimeAwayGoals,
GameData.HalfTimeHomeGoals,
GameData.AwayShots,
GameData.HomeShots,
GameData.AwayShotsOnTarget,
GameData.HomeShotsOnTarget,
GameData.AwayFouls,
GameData.HomeFouls,
GameData.AwayCorners,
GameData.HomeCorners,
GameData.AwayYellowCards,
GameData.HomeYellowCards,
GameData.AwayRedCards,
GameData.HomeRedCards
FROM TeamsDivSea
INNER JOIN GameData ON TeamsDivSea.TeamID = GameData.AwayTeamID
WHERE (
(
(GameData.GameID) IN (
SELECT TOP 3 GameID
FROM GameData AS Dupe
WHERE Dupe.AwayTeamID = GameData.AwayTeamID
ORDER BY Dupe.GameDate DESC,
Dupe.GameID DESC
)
)
)
)
Related
How much unique data is there, put it all in a table
I would like to query in SQL how many unique values there are and how many rows are there. In Python, I could do it like this. But how do I do this in SQL so that I get the result like at the bottom? In Python I could do the following d = {'sellerid': [1, 1, 1, 2, 2, 3, 3, 3], 'modelnumber': [85, 45, 85, 12 ,85, 74, 85, 12] , 'modelgroup': [2, 3, 2, 1, 2, 3, 2, 1 ]} df = pd.DataFrame(data=d) display(df.head(10)) df['Dataframe']='df' unique_sellerid = df['sellerid'].nunique() print("unique_sellerid", unique_sellerid) unique_modelnumber = df['modelnumber'].nunique() print("unique_modelnumber", unique_modelnumber) unique_modelgroup = df['modelgroup'].nunique() print("unique_modelgroup", unique_modelgroup) total_rows = df.shape[0] print("total_rows", total_rows) [OUT] unique_sellerid 3 unique_modelnumber 4 unique_modelgroup 3 total_rows 8 I want a query like Here is the dummy table CREATE TABLE cars ( sellerid INT NOT NULL, modelnumber INT NOT NULL, modelgroup INT, ); INSERT INTO cars (sellerid , modelnumber, modelgroup ) VALUES (1, 85, 2), (1, 45, 3), (1, 85, 2), (2, 12, 1), (2, 85, 2), (3, 74, 3), (3, 85, 2), (3, 12, 1);
You could use the count(distinct column) aggregation function like : select count(distinct col1) as nunique_col1, count(distinct col2) as nunique_col2, count(1) as nb_rows from database Also in pandas, you can also apply the nunique() function on the dataset, rather than doing it on each column: df.nunique()
Invalid identifier error when using RIGHT JOIN inside FROM clause
I want to use within a FROM a subset of 2 tables using RIGHT JOIN (I want from that subset all the rows of ITV2_VEHICULOS whose ID is not in ITV2_HIST_VEHICULOS) so that the SELECT "takes" the data from there and with the WHERE it can filter My query: SELECT * FROM ITV2_INSPECCIONES I, ITV2_HORAS_INSPECCION HI_FIN, ITV2_INSPECCIONES I_SIG, ITV2_HORAS_INSPECCION HI_SIG_INI, ITV2_HIST_VEHICULOS VH, ITV2_CATEGORIAS_VEHICULO CAT, ITV2_CLASIF_VEH_CONS CVC, ITV2_CLASIF_VEH_USO CVU, ( SELECT * FROM ITV2_HIST_VEHICULOS VH RIGHT JOIN ITV2_VEHICULOS V ON VH.C_VEHICULO_ID = V.C_VEHICULO_ID ) VI WHERE I.C_TIPO_INSPECCION = 1 AND I.F_DESFAVORABLE IS NOT NULL AND I.C_RESULTADO IN( 3, 4 ) AND I.C_VEHICULO_ID = VI.C_VEHICULO_ID AND VI.C_CATEGORIA_ID = CAT.C_CATEGORIA_ID AND VI.C_CLASIF_VEH_CONS_ID = CVC.C_CLASIF_VEH_CONS_ID AND VI.C_CLASIF_VEH_USO_ID = CVU.C_CLASIF_VEH_USO_ID -- HORAS AND I.C_ESTACION_ID = HI_FIN.C_ESTACION_ID AND I.C_INSPECCION_ID = HI_FIN.C_INSPECCION_ID AND I.N_ANNO = HI_FIN.N_ANNO AND HI_FIN.C_TIPO_HORA_ID = 6 -- INSPECCION SIGUIENTE AND I.C_ESTACION_ID = I_SIG.C_ESTACION_ID_FASE_ANT AND I.C_INSPECCION_ID = I_SIG.C_INSPECCION_ID_FASE_ANT AND I.N_ANNO = I_SIG.N_ANNO_FASE_ANT -- AND I_SIG.N_ANNO IN( 2013, 2014, 2015, 2016, 2017, 2018 ) AND I_SIG.C_ESTACION_ID IN( 3, 21, 22, 26, 28, 32, 34, 37, 41, 47, 53, 59, 60 ) AND I_SIG.F_INSPECCION >= '01/09/2015' AND I_SIG.F_INSPECCION <= '30/09/2018' -- AND I_SIG.F_DESFAVORABLE IS NULL AND I_SIG.C_RESULTADO IN( 1, 2 ) -- Y HORAS AND I_SIG.C_ESTACION_ID = HI_SIG_INI.C_ESTACION_ID AND I_SIG.C_INSPECCION_ID = HI_SIG_INI.C_INSPECCION_ID AND I_SIG.N_ANNO = HI_SIG_INI.N_ANNO AND HI_SIG_INI.C_TIPO_HORA_ID = 1 --GROUP BY... I expect in the output: C_ESTACION_ID(FROM I) |C_VEHICULO_ID(FROM(I) |C_TIPO_HORA_ID(FROM HI_FIN)|F_HORA (FROM I_FIN) |A_MATRICULA FROM (V) | F_CAMBIO FROM (VH -> IF subdata of V EXISTS) ---------------------|----------------------|---------------------------|--------------------|---------------------|---------------------------------------
This is what your query would look like if you use "explicit join syntax" instead of just some commas between table names: SELECT * FROM ITV2_INSPECCIONES I INNER JOIN ITV2_HORAS_INSPECCION HI_FIN ON I.C_ESTACION_ID = HI_FIN.C_ESTACION_ID AND I.C_INSPECCION_ID = HI_FIN.C_INSPECCION_ID AND I.N_ANNO = HI_FIN.N_ANNO INNER JOIN ITV2_INSPECCIONES I_SIG ON I.C_ESTACION_ID = I_SIG.C_ESTACION_ID_FASE_ANT AND I.C_INSPECCION_ID = I_SIG.C_INSPECCION_ID_FASE_ANT AND I.N_ANNO = I_SIG.N_ANNO_FASE_ANT INNER JOIN ITV2_HORAS_INSPECCION HI_SIG_INI ON I_SIG.C_ESTACION_ID = HI_SIG_INI.C_ESTACION_ID AND I_SIG.C_INSPECCION_ID = HI_SIG_INI.C_INSPECCION_ID AND I_SIG.N_ANNO = HI_SIG_INI.N_ANNO WHERE I.C_TIPO_INSPECCION = 1 AND I.F_DESFAVORABLE IS NOT NULL AND I.C_RESULTADO IN (3, 4) AND HI_FIN.C_TIPO_HORA_ID = 6 -- INSPECCION SIGUIENTE AND HI_SIG_INI.C_TIPO_HORA_ID = 1 AND I_SIG.F_INSPECCION >= '01/09/2015' AND I_SIG.F_INSPECCION <= '30/09/2018' AND I_SIG.F_DESFAVORABLE IS NULL AND I_SIG.N_ANNO IN (2013, 2014, 2015, 2016, 2017, 2018) AND I_SIG.C_ESTACION_ID IN (3, 21, 22, 26, 28, 32, 34, 37, 41, 47, 53, 59, 60) AND I_SIG.C_RESULTADO IN (1, 2) -- Y HORAS Now I had to pull out several tables and the subquery from that because, frankly, they don't make much sense to me: ITV2_HIST_VEHICULOS VH, << no join conditions to preceding tables ITV2_CATEGORIAS_VEHICULO CAT, << no join conditions to preceding tables ITV2_CLASIF_VEH_CONS CVC, << no join conditions to preceding tables ITV2_CLASIF_VEH_USO CVU, << no join conditions to preceding tables ( SELECT * FROM ITV2_VEHICULOS V LEFT JOIN ITV2_HIST_VEHICULOS VH ON VH.C_VEHICULO_ID = V.C_VEHICULO_ID ) VI AND I.C_VEHICULO_ID = VI.C_VEHICULO_ID AND VI.C_CATEGORIA_ID = CAT.C_CATEGORIA_ID AND VI.C_CLASIF_VEH_CONS_ID = CVC.C_CLASIF_VEH_CONS_ID AND VI.C_CLASIF_VEH_USO_ID = CVU.C_CLASIF_VEH_USO_ID
Pandas apply on index of dataframe with a mult-iindex
I have a dataframe with an index like MultiIndex(levels=[['A', 'B', 'C', 'D', 'E', 'F', 'G'], [0, 1]], labels=[[0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6], [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1]], names=['I1', 'I2']) Now I would like to apply a function to index I1. If it were a simple column I would do something like df['I1'] = df['I1'].apply(lamdba x :...) How can I apply this to an Index in a df with a multi-index?
I believe need rename: df = df.rename(lambda x: 'a' + x, level=0) Or Index.get_level_values for select level of MultiIndex, map and then create MultiIndex.from_arrays: idx = df.index.get_level_values('I1').map(lambda x: 'a' + x) df.index = pd.MultiIndex.from_arrays([idx, df.index.get_level_values('I2')]) because I get : df.index = df.index.set_levels(idx, level='I1') ValueError: Level values must be unique: ['aA', 'aA', 'aB', 'aB', 'aC', 'aD', 'aD', 'aE', 'aE', 'aF', 'aF', 'aG', 'aG'] on level 0 Sample: mux = pd.MultiIndex(levels=[['A', 'B', 'C', 'D', 'E', 'F', 'G'], [0, 1]], labels=[[0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6], [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1]], names=['I1', 'I2']) df = pd.DataFrame([0] * 13, index=mux, columns=['a']) df = df.rename(lambda x: 'a' + x, level=0) print(df) a I1 I2 aA 0 0 1 0 aB 0 0 1 0 aC 1 0 aD 0 0 1 0 aE 0 0 1 0 aF 0 0 1 0 aG 0 0 1 0
Converting formula from Crystal Reports to SSRS
I'll try and keep this as short as possible but I'm trying to convert a formula cell from crystal report to SSRS. Here is the query: SELECT (SELECT START_DATE FROM APPS.GL_PERIODS WHERE PERIOD_TYPE = 'Month' AND TRUNC(SYSDATE-:Days) BETWEEN START_DATE AND END_DATE) STR_DATE, (SELECT END_DATE FROM APPS.GL_PERIODS WHERE PERIOD_TYPE = 'Month' AND TRUNC(SYSDATE-:Days) BETWEEN START_DATE AND END_DATE) END_DATE, DECODE(RT.ORGANIZATION_ID, 104, 'LPD',RT.ORGANIZATION_ID) ORG, SUBSTR(POV.VENDOR_NAME, 1, 24) VENDOR_NAME, DECODE(SUBSTR(PHA.SEGMENT1, 2,1), 'E', 'EXPENSE', 'e', 'EXPENSE', 'P', 'PRODUCT', 'p', 'PRODUCT', ' OTHER') PO_TYPE, DECODE(SIGN(TRUNC(RT.TRANSACTION_DATE) - TRUNC(NVL(PLL.PROMISED_DATE - 3, PLL.NEED_BY_DATE - 3))), -1, 'LATE', 'ON TIME') PERFORMANCE, COUNT(*) LINE_COUNT FROM APPS.RCV_TRANSACTIONS RT, APPS.PO_HEADERS_ALL PHA, APPS.PO_LINES_ALL PLA, APPS.PO_LINE_LOCATIONS_ALL PLL, APPS.PO_VENDORS POV WHERE RT.ORGANIZATION_ID = 104 AND RT.TRANSACTION_DATE >= (SELECT START_DATE FROM APPS.GL_PERIODS WHERE PERIOD_TYPE = 'Month' AND TRUNC(SYSDATE-:Days) BETWEEN START_DATE AND END_DATE) AND RT.TRANSACTION_DATE < (SELECT END_DATE + 1 FROM APPS.GL_PERIODS WHERE PERIOD_TYPE = 'Month' AND TRUNC(SYSDATE-:Days) BETWEEN START_DATE AND END_DATE) AND RT.TRANSACTION_TYPE = 'RECEIVE' AND RT.PO_HEADER_ID = PLL.PO_HEADER_ID AND RT.PO_LINE_LOCATION_ID = PLL.LINE_LOCATION_ID AND RT.PO_LINE_ID = PLL.PO_LINE_ID AND RT.ORGANIZATION_ID = PLL.SHIP_TO_ORGANIZATION_ID AND PLA.PO_LINE_ID = PLL.PO_LINE_ID AND PLA.PO_HEADER_ID = PLL.PO_HEADER_ID AND PHA.PO_HEADER_ID = PLA.PO_HEADER_ID AND PHA.VENDOR_ID = POV.VENDOR_ID GROUP BY DECODE(RT.ORGANIZATION_ID, 104, 'LPD', RT.ORGANIZATION_ID), SUBSTR(POV.VENDOR_NAME, 1, 24), DECODE(SUBSTR(PHA.SEGMENT1, 2, 1), 'E', 'EXPENSE', 'e', 'EXPENSE', 'P', 'PRODUCT', 'p', 'PRODUCT', ' OTHER'), DECODE(SIGN(TRUNC(RT.TRANSACTION_DATE) - TRUNC(NVL(PLL.PROMISED_DATE - 3, PLL.NEED_BY_DATE - 3))), -1, 'LATE', 'ON TIME') ORDER BY ORG, VENDOR_NAME, PO_TYPE, PERFORMANCE In crystal the formula is SUM({query.LINE_COUNT},{query.PERFORMANCE}) % SUM({query.LINE_COUNT}, {query.PO_TYPE}) This cell basically is just calculating the percentage of on time deliveries and late ones.
SQL Server Fuzzy Search with Percentage of match
I am using SQL Server 2008 R2 SP1. I have a table with about 36034 records of customers. I am trying to implement Fuzy search on Customer Name field. Here is Function for Fuzzy Search ALTER FUNCTION [Party].[FuzySearch] ( #Reference VARCHAR(200) , #Target VARCHAR(200) ) RETURNS DECIMAL(5, 2) WITH SCHEMABINDING AS BEGIN DECLARE #score DECIMAL(5, 2) SELECT #score = CASE WHEN #Reference = #Target THEN CAST(100 AS NUMERIC(5, 2)) WHEN #Reference IS NULL OR #Target IS NULL THEN CAST(0 AS NUMERIC(5, 2)) ELSE ( SELECT [Score %] = CAST(SUM(LetterScore) * 100.0 / MAX(WordLength * WordLength) AS NUMERIC(5, 2)) FROM ( -- do SELECT seq = t1.n , ref.Letter , v.WordLength , LetterScore = v.WordLength - ISNULL(MIN(tgt.n), v.WordLength) FROM ( -- v SELECT Reference = LEFT(#Reference + REPLICATE('_', WordLength), WordLength) , Target = LEFT(#Target + REPLICATE('_', WordLength), WordLength) , WordLength = WordLength FROM ( -- di SELECT WordLength = MAX(WordLength) FROM ( VALUES ( DATALENGTH(#Reference)), ( DATALENGTH(#Target)) ) d ( WordLength ) ) di ) v CROSS APPLY ( -- t1 SELECT TOP ( WordLength ) n FROM ( VALUES ( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), ( 10), ( 11), ( 12), ( 13), ( 14), ( 15), ( 16), ( 17), ( 18), ( 19), ( 20), ( 21), ( 22), ( 23), ( 24), ( 25), ( 26), ( 27), ( 28), ( 29), ( 30), ( 31), ( 32), ( 33), ( 34), ( 35), ( 36), ( 37), ( 38), ( 39), ( 40), ( 41), ( 42), ( 43), ( 44), ( 45), ( 46), ( 47), ( 48), ( 49), ( 50), ( 51), ( 52), ( 53), ( 54), ( 55), ( 56), ( 57), ( 58), ( 59), ( 60), ( 61), ( 62), ( 63), ( 64), ( 65), ( 66), ( 67), ( 68), ( 69), ( 70), ( 71), ( 72), ( 73), ( 74), ( 75), ( 76), ( 77), ( 78), ( 79), ( 80), ( 81), ( 82), ( 83), ( 84), ( 85), ( 86), ( 87), ( 88), ( 89), ( 90), ( 91), ( 92), ( 93), ( 94), ( 95), ( 96), ( 97), ( 98), ( 99), ( 100), ( 101), ( 102), ( 103), ( 104), ( 105), ( 106), ( 107), ( 108), ( 109), ( 110), ( 111), ( 112), ( 113), ( 114), ( 115), ( 116), ( 117), ( 118), ( 119), ( 120), ( 121), ( 122), ( 123), ( 124), ( 125), ( 126), ( 127), ( 128), ( 129), ( 130), ( 131), ( 132), ( 133), ( 134), ( 135), ( 136), ( 137), ( 138), ( 139), ( 140), ( 141), ( 142), ( 143), ( 144), ( 145), ( 146), ( 147), ( 148), ( 149), ( 150), ( 151), ( 152), ( 153), ( 154), ( 155), ( 156), ( 157), ( 158), ( 159), ( 160), ( 161), ( 162), ( 163), ( 164), ( 165), ( 166), ( 167), ( 168), ( 169), ( 170), ( 171), ( 172), ( 173), ( 174), ( 175), ( 176), ( 177), ( 178), ( 179), ( 180), ( 181), ( 182), ( 183), ( 184), ( 185), ( 186), ( 187), ( 188), ( 189), ( 190), ( 191), ( 192), ( 193), ( 194), ( 195), ( 196), ( 197), ( 198), ( 199), ( 200) ) t2 ( n ) ) t1 CROSS APPLY ( SELECT Letter = SUBSTRING(Reference, t1.n, 1) ) ref OUTER APPLY ( -- tgt SELECT TOP ( WordLength ) n = ABS(t1.n - t2.n) FROM ( VALUES ( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), ( 10), ( 11), ( 12), ( 13), ( 14), ( 15), ( 16), ( 17), ( 18), ( 19), ( 20), ( 21), ( 22), ( 23), ( 24), ( 25), ( 26), ( 27), ( 28), ( 29), ( 30), ( 31), ( 32), ( 33), ( 34), ( 35), ( 36), ( 37), ( 38), ( 39), ( 40), ( 41), ( 42), ( 43), ( 44), ( 45), ( 46), ( 47), ( 48), ( 49), ( 50), ( 51), ( 52), ( 53), ( 54), ( 55), ( 56), ( 57), ( 58), ( 59), ( 60), ( 61), ( 62), ( 63), ( 64), ( 65), ( 66), ( 67), ( 68), ( 69), ( 70), ( 71), ( 72), ( 73), ( 74), ( 75), ( 76), ( 77), ( 78), ( 79), ( 80), ( 81), ( 82), ( 83), ( 84), ( 85), ( 86), ( 87), ( 88), ( 89), ( 90), ( 91), ( 92), ( 93), ( 94), ( 95), ( 96), ( 97), ( 98), ( 99), ( 100), ( 101), ( 102), ( 103), ( 104), ( 105), ( 106), ( 107), ( 108), ( 109), ( 110), ( 111), ( 112), ( 113), ( 114), ( 115), ( 116), ( 117), ( 118), ( 119), ( 120), ( 121), ( 122), ( 123), ( 124), ( 125), ( 126), ( 127), ( 128), ( 129), ( 130), ( 131), ( 132), ( 133), ( 134), ( 135), ( 136), ( 137), ( 138), ( 139), ( 140), ( 141), ( 142), ( 143), ( 144), ( 145), ( 146), ( 147), ( 148), ( 149), ( 150), ( 151), ( 152), ( 153), ( 154), ( 155), ( 156), ( 157), ( 158), ( 159), ( 160), ( 161), ( 162), ( 163), ( 164), ( 165), ( 166), ( 167), ( 168), ( 169), ( 170), ( 171), ( 172), ( 173), ( 174), ( 175), ( 176), ( 177), ( 178), ( 179), ( 180), ( 181), ( 182), ( 183), ( 184), ( 185), ( 186), ( 187), ( 188), ( 189), ( 190), ( 191), ( 192), ( 193), ( 194), ( 195), ( 196), ( 197), ( 198), ( 199), ( 200) ) t2 ( n ) WHERE SUBSTRING(#Target, t2.n, 1) = ref.Letter ) tgt GROUP BY t1.n , ref.Letter , v.WordLength ) do ) END RETURN #score END Here is the query to call the function select [Party].[FuzySearch]('First Name Middle Name Last Name', C.FirstName) from dbo.Customer C This is taking about 2 minutes 22 seconds to give me the percentage of fuzzy match for all How can I fix this to run in lessthan a second. Any suggestions on my function to make it more robust. Expected ouput is 45.34, 40.00, 100.00, 23.00, 81.23.....
The best I have been able to do is simplify some of the query, and change it to a table valued function. Scalar functions are notoriously poor performers, and the benefit of an inline TVF is that the query definition is expanded out into the main query, much like a view. This reduces the execution time significantly on the tests I have done. ALTER FUNCTION dbo.FuzySearchTVF (#Reference VARCHAR(200), #Target VARCHAR(200)) RETURNS TABLE AS RETURN ( WITH N (n) AS ( SELECT TOP (ISNULL(CASE WHEN DATALENGTH(#Reference) > DATALENGTH(#Target) THEN DATALENGTH(#Reference) ELSE DATALENGTH(#Target) END, 0)) ROW_NUMBER() OVER(ORDER BY n1.n) FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1 (n) CROSS JOIN (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N2 (n) CROSS JOIN (VALUES (1), (1)) AS N3 (n) WHERE #Reference IS NOT NULL AND #Target IS NOT NULL ), Src AS ( SELECT Reference = CASE WHEN DATALENGTH(#Reference) > DATALENGTH(#Target) THEN #Reference ELSE #Reference + REPLICATE('_', DATALENGTH(#Target) - DATALENGTH(#Reference)) END, Target = CASE WHEN DATALENGTH(#Target) > DATALENGTH(#Reference) THEN #Target ELSE #Target + REPLICATE('_', DATALENGTH(#Target) - DATALENGTH(#Reference)) END, WordLength = CASE WHEN DATALENGTH(#Reference) > DATALENGTH(#Target) THEN DATALENGTH(#Reference) ELSE DATALENGTH(#Target) END WHERE #Reference IS NOT NULL AND #Target IS NOT NULL AND #Reference != #Target ), Scores AS ( SELECT seq = t1.n , Letter = SUBSTRING(s.Reference, t1.n, 1), s.WordLength , LetterScore = s.WordLength - ISNULL(MIN(ABS(t1.n - t2.n)), s.WordLength) FROM Src AS s CROSS JOIN N AS t1 INNER JOIN N AS t2 ON SUBSTRING(#Target, t2.n, 1) = SUBSTRING(s.Reference, t1.n, 1) WHERE #Reference IS NOT NULL AND #Target IS NOT NULL AND #Reference != #Target GROUP BY t1.n, SUBSTRING(s.Reference, t1.n, 1), s.WordLength ) SELECT [Score] = 100 WHERE #Reference = #Target UNION ALL SELECT 0 WHERE #Reference IS NULL OR #Target IS NULL UNION ALL SELECT CAST(SUM(LetterScore) * 100.0 / MAX(WordLength * WordLength) AS NUMERIC(5, 2)) FROM Scores WHERE #Reference IS NOT NULL AND #Target IS NOT NULL AND #Reference != #Target GROUP BY WordLength ); And this would be called as: SELECT f.Score FROM dbo.Customer AS c CROSS APPLY [dbo].[FuzySearch]('First Name Middle Name Last Name', c.FirstName) AS f It is still a fairly complex function though, and, depending on the number of records in your customer table, I think getting it down to 1 second is going to be a bit of a challenge.
This is how I could accomplish this: Explained further # SQL Server Fuzzy Search - Levenshtein Algorithm Create below file using any editor of your choice: using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; public partial class StoredFunctions { [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = false)] public static SqlDouble Levenshtein(SqlString stringOne, SqlString stringTwo) { #region Handle for Null value if (stringOne.IsNull) stringOne = new SqlString(""); if (stringTwo.IsNull) stringTwo = new SqlString(""); #endregion #region Convert to Uppercase string strOneUppercase = stringOne.Value.ToUpper(); string strTwoUppercase = stringTwo.Value.ToUpper(); #endregion #region Quick Check and quick match score int strOneLength = strOneUppercase.Length; int strTwoLength = strTwoUppercase.Length; int[,] dimention = new int[strOneLength + 1, strTwoLength + 1]; int matchCost = 0; if (strOneLength + strTwoLength == 0) { return 100; } else if (strOneLength == 0) { return 0; } else if (strTwoLength == 0) { return 0; } #endregion #region Levenshtein Formula for (int i = 0; i <= strOneLength; i++) dimention[i, 0] = i; for (int j = 0; j <= strTwoLength; j++) dimention[0, j] = j; for (int i = 1; i <= strOneLength; i++) { for (int j = 1; j <= strTwoLength; j++) { if (strOneUppercase[i - 1] == strTwoUppercase[j - 1]) matchCost = 0; else matchCost = 1; dimention[i, j] = System.Math.Min(System.Math.Min(dimention[i - 1, j] + 1, dimention[i, j - 1] + 1), dimention[i - 1, j - 1] + matchCost); } } #endregion // Calculate Percentage of match double percentage = System.Math.Round((1.0 - ((double)dimention[strOneLength, strTwoLength] / (double)System.Math.Max(strOneLength, strTwoLength))) * 100.0, 2); return percentage; } }; Name it levenshtein.cs Go to Command Prompt. Go to the file directory of levenshtein.cs then call csc.exe /t: library /out: UserFunctions.dll levenshtein.cs you may have to give the full path of csc.exe from NETFrameWork 2.0. Once your DLL is ready. Add it to the assemblies Database>>Programmability>>Assemblies>> New Assembly. Create function in your database: CREATE FUNCTION dbo.LevenshteinSVF ( #S1 NVARCHAR(200) , #S2 NVARCHAR(200) ) RETURNS FLOAT AS EXTERNAL NAME UserFunctions.StoredFunctions.Levenshtein GO In my case I had to enable clr: sp_configure 'clr enabled', 1 GO reconfigure GO Test the function: SELECT dbo.LevenshteinSVF('James','James Bond') Result: 50 % match