Fuzzy matching a List in SQL - sql

Have a set of data that is riddled with duplicates. The company names are either written as their Workplace name, e.g. Amazon, or the legal name, e.g Amazon.com Inc. Both entries have information I need.
Issue with the name is I am running a subquery to generate the correct list of companies to search for, however the LIKE function only seems to work for a set list.
FROM CRM.organizations
WHERE name LIKE (SELECT org_name FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won')```
The code above returns the following error: 'Error: Scalar subquery produced more than one element'
Trying to understand if there is a function that can help, or I will need to create a list manually with: 'companyAinc';'companyBllc';....

Well, the LIKE operator doesn't support directly passing a list a values to match with, you can use the CROSS APPLY to map each value to fuzzy match in your statement.
You can refer to this example for the same to use multiple clauses with LIKE operator.
On the other hand you can also try using User-defined functions/routines, in which you can map all your the returned values with the LIKE and OR operators and return your required query as a string.

FROM CRM.organizations
WHERE name in (SELECT org_name FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won');
FROM CRM.organizations
WHERE exists (SELECT 1 FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won' and organizations.name like deals.org_name );

Related

SQL Query to get all the data from tables when there is no parameter

I am trying to get the whole data when there are no filters selected. I have made an array that contains the selections. In case there are no selections then there will be just '' , i.e. no characters but not null.
SELECT * FROM Skills WHERE person IN ('Technology', 'Drilling');
For example - In this query it will return all required - filtered data. So my array contains Technology and Drilling. In case there is nothing selected by the user as a filter then the query would look like:
SELECT * FROM Skills WHERE person IN ('');
In this case the table is returning nothing in SQL Server. I want it to return everything from the table without any filters.
I would really like to get some help here and maybe some resources that might help me achieve the required thing.
The array is being filled in javascript.
It seems really strange to have a column called person compared to values like "Drilling". But you would do something like:
SELECT *
FROM Skills
WHERE person IN (<whatever>) OR <whatever> = '';
Often NULL is used to mean everything, so that would be:
WHERE person IN (<whatever>) OR <whatever> IS NULL;
And "whatever" might be a delimited string, so this might look like:
WHERE person IN (SELECT s.value FROM string_split(#params) s) OR
#params IS NULL;

SQL Parameter to Include All on ID Column

I'm just taking a look at the following query
select * from tablename
where id like '%%';
So that it can handle parameters to include all of the data or filtered data like bellow
select * from tablename
where id like '%1%';
Which is fine for most parameters I use but this seems wrong for an ID because it will return all data that has IDs containing 1 which I don't want
To get around this I can only append the where clause if the ID is given but that seems like a pain in the butt
Is it possible to use a different type of where clause so that a wildcard can be used in a where equals clause instead of a where like clause, example
select * from tablename
where id = '*';
So that the same query can be used to return all or filtered data? Pass parameter '*' for all or parameter '1' for ID 1 specifically
(I'm not sure if it matters for this case but I'm using PostgreSQL 9.6.12 in this example)
This would often be expressed as:
where (id = :id or :id is null)
null is the "magic" value that represents all rows.

SQL inner join list split for an SSRS report

I have a field that lists all language descriptions that a product has and the field can contain data like:
EN;FR;DE
It will always be a two letter language code followed by a semi colon.
I then have a stored procedure that looks for all products with a particular language code. Simply done by:
WHERE
ext.languages LIKE '%' + #language + '%'
The #language variable might just represent the letters EN for example. Now when I want to find a product that has both French and English languages on I need to pass in 'FR, EN' for the language variable. Now I have a custom function in SQL that splits the language variable into rows so I effectively have
Row 1-EN
Row 2-FR
Now I need to check my ext.language field to see if both those values exist.
I have attempted to do:
INNER JOIN MyFunctionsDatabase.dbo.listSplit(#language) as var1
ON ext.language LIKE '%'+var1.StringLiteral+'%'
This only brings back products where it contains either french or english I need it to bring back values where it contains both English and French.
Any advice would be greatly appreciated:
Try with below script, this i write for 3 language but can be done generic
Declare #Product AS Table(ProductID INT, [Language] Varchar(500))
Insert Into #Product Values(1,'EN;FR;DE'),(2,'EN'),(3,'EN;DE'),(4,'EN;FR')
SELECT * FROM
(
Select P.ProductID,L.Value
From #Product P
CROSS APPLY dbo.[udfSplit]([Language],';') L
) Product
PIVOT
(
Count(Value)
For Value in (EN,FR,DE)
)
AS PV
Where EN=1 AND FR=1
I'd be inclined to use a function that accepts a delimited string containing the language codes to check for and the string to check. It checks that each language code is in the string and returns false as soon as one of the desired languages isn't found. If everything is found it returns true.
Your sql would look like
select *
from mytable
where CheckHasAllLanguages(language, #languagesToCheck) = 1
I would make your parameter a multi-select and have each individual language be a selection. You could even feed the parameter with values from the database so it would automatically update if there is a new language. I'm going to call this parameter #LangMultiSelect
Since you only want items that items that match all of the selections you need to pass in a second parameter with the number of items that have been selected. In the properties of your dataset you can add another parameter that is set by an expression. Name it #LangCount and use the expression:
=Parameters!LangMultiSelect.Count
Then use a SQL query similar to this:
SELECT Name
FROM (
SELECT Name,
COUNT(*) OVER(PARTITION BY pt.id) AS lCount
FROM ProductTable AS pt
INNER JOIN MyFunctionsDatabase.dbo.listSplit(#language) AS var1 ON var1.id=pt.id
WHERE pt.language IN (#LangMultiSelect)
) AS t
WHERE t.lCount = #LangCount
That query uses the COUNT() aggregate as a window function to determine the number of matches the item has and then only returns results that match all of the selections in the multi-select parameter.
It works because I am splitting the count by a field that is the same for all of the item names that are the same item but in a different language. If you don't have a field like that this won't work.

SQL Server where column in where clause is null

Let's say that we have a table named Data with Id and Weather columns. Other columns in that table are not important to this problem. The Weather column can be null.
I want to display all rows where Weather fits a condition, but if there is a null value in weather then display null value.
My SQL so far:
SELECT *
FROM Data d
WHERE (d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%' OR d.Weather IS NULL)
My results are wrong, because that statement also shows values where Weather is null if condition is not correct (let's say that users mistyped wrong).
I found similar topic, but there I do not find appropriate answer.
SQL WHERE clause not returning rows when field has NULL value
Please help me out.
Your query is correct for the general task of treating NULLs as a match. If you wish to suppress NULLs when there are no other results, you can add an AND EXISTS ... condition to your query, like this:
SELECT *
FROM Data d
WHERE d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'
OR (d.Weather IS NULL AND EXISTS (SELECT * FROM Data dd WHERE dd.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'))
The additional condition ensures that NULLs are treated as matches only if other matching records exist.
You can also use a common table expression to avoid duplicating the query, like this:
WITH cte (id, weather) AS
(
SELECT *
FROM Data d
WHERE d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'
)
SELECT * FROM cte
UNION ALL
SELECT * FROM Data WHERE weather is NULL AND EXISTS (SELECT * FROM cte)
statement show also values where Wether is null if condition is not correct (let say that users typed wrong sunny).
This suggests that the constant 'sunny' is coming from end-user's input. If that is the case, you need to parameterize your query to avoid SQL injection attacks.

Replace value in result by a specific value

I need to make a query to collect some data from a database via SQL. In this data there is 1 value used as collection value. This are ID's of courses given. Sometimes a course can be given about f.e. Office. But people can do a course there for word, excel, powerpoint... But this is all given in 1 course by 1 tutor. Still for statistics I need to know if they participated the course for Word, Excel, Powerpoint ...
Is it possible to replace values in the resultset? With this i mean something like this:
if value = courseValue ==> replace value with specific courseValue (I can get the value via a subquery)
I hope this makes my problem clear and i appriciate all the help!
You can use a case statement in your select to return something other than the course id that is on the row. For example:
SELECT
field1 AS 'Name',
CASE
WHEN field2 = 'Foo'
THEN 'Bar'
WHEN field2 = 'Lorem'
THEN 'Ipsum'
ELSE 'Some Value'
END
AS 'Type',
field3 AS 'Description'
FROM table
If I understand you correctly, you will need something along the lines of this:
Create a new table with "courseID" and "replacementID" columns, fill it for the cases where there is a replacement
In your query do an outer join with this table over the courseID fields and also return the "replacementID", which can be null is there is no replacement
Use either the replacementID if it isn't null or the courseID