I have a need to build a string from Last Name, First Name, Middle Initial according to the following rules:
If the Last Name is unique, just
return the Last Name
If the Last
Name isn't unique, but the first
letter of the First Name is unique,
return Last Name + first letter of
First Name
If the Last Name and
first letter of the First Name are
not unique, return the Last Name +
first letter of First Name + Middle
Initial.
For example, the table might be:
MDC MDLast MDFirst MDInit
3 Jones Fred A
21 Smith Sam D
32 Brown Tom E
42 Brown Ted A
55 Smith Al D
The query should return:
MDC MDFormattedName
3 Jones
21 Smith S
32 Brown TE
42 Brown TA
55 Smith A
I've written up a query that almost works, but it is using several nested queries, and will still need several more to (possibly) make a workable solution, and is so inefficient. I'm sure there is a 'proper' way to implement this (for SQL Server 2005, BTW).
This is what I've got so far. It doesn't work, due to the aggregations I lose the IDs can can't do the final join to get ID/Name pairs.
select
CASE
WHEN CountLastFirst > 1 THEN
CASE WHEN MDInit IS NOT NULL THEN MDLastFirst + LEFT(MDInit,1) ELSE MDLastFirst END
WHEN CountLastFirst = 1 AND CountLast > 1 THEN MDLastFirst
ELSE MDLast
END as MDName
FROM
(
select x.MDLast, CountLast, MDLastFirst, CountLastFirst FROM
(
select MDLast,Count(MDLast) as CountLast FROM
MDList
GROUP BY MDLast) as x
INNER JOIN
(select MDLast, MDLastFirst,Count(MDLastFirst) as CountLastFirst FROM
(
select MDLast,
MDLast + ' ' + LEFT(MDFirst,1) as MDLastFirst
From MDList
) as a
GROUP BY MDLastFirst, MDLast) as y ON x.MDLast = y.MDLast
) as z
Assuming a table name of MDCTable, this should work:
SELECT MDCTable.MDC,
CASE MDCCount.NameCount
WHEN 1
THEN MDCTable.MDLast
ELSE
CASE MDFormat1Count
WHEN 1
THEN MDFormat1.MDFormat1Name
ELSE MDCTable.MDLast + ' ' + upper(left(MDCTable.MDFirst, 1)) +
MDCTable.MDInit
END
END AS MDFormattedName
FROM MDCTable
INNER JOIN
(
SELECT COUNT(MDLast) as NameCount, MDLast
FROM MDCTable
GROUP BY MDLast
) MDCCount ON MDCCount.MDLast = MDCTable.MDLast
INNER JOIN (
SELECT COUNT(MDLast + left(MDFirst, 1)) as MDFormat1Count, MDLast + ' ' +
left(MDFirst, 1) AS MDFormat1Name
FROM MDCTable
GROUP BY MDLast + ' ' + left(MDFirst, 1)
) MDFormat1 ON MDCTable.MDLast + ' ' + left(MDCTable.MDFirst, 1) =
MDFormat1.MDFormat1Name
ORDER BY MDCTable.MDC
Have you considered performing this operation in your application instead of directly in an SQL statement? Unless you have a good reason to do this directly in SQL, this is almost always the preferable approach for situations like this.
Related
I have 2 tables name and match. The name and match table have columns type.
The columns and data in the name table
ID| type |
--| ---- |
1| 1ABC |
2| 2DEF |
3| 3DEF |
4| 4IJK |
The columns and data in match table is
type
DATA
NOT %ABC% AND NOT %DEF%
NOT ABC AND NOT DEF
%DEF%
DEF ONLY
NOT %DEF% AND NOT %IJK%
NOT DEF AND NOT IJK
I have tried using case statement. The first 3 characters will be NOT if there is a NOT in the type in match table.
The below query is giving me a missing keyword error. I am not sure what I am missing here
SELECT s.id, s.type, m.data
where case when substr(m.type1,3)='NOT' then s.type not in (REPLACE(REPLACE(m.type,'NOT',''),'AND',','))
ELSE s.type in (m.type) end
from source s, match m;
I need the output to match the type in source column and display the data in match column.
The output should be
ID|type|DATA
1 |1ABC|NOT DEF AND NOT IJK
2 |2DEF|DEF ONLY
3 |3DEF|DEF ONLY
4 |4IJK|NOT ABC AND NOT DEF
The biggest problem with your attempted query seems to be that SQL requires the WHERE clause to come after the FROM clause.
But your query is flawed in other ways as well. Although it can have complicated logic within it, including subqueries, a CASE statement must ultimately return a constant. Conditions within it are not applied as if they are in a WHERE clause of the main query (like you appear to be trying to do).
My recommendation would be to not store the match table as you currently are. It seems much preferable to have something that contains each condition you want to evaluate. Assuming that's not possible, I suggest a CTE (or even a view) that breaks it down that way first.
This query (based on Nefreo's answer for breaking strings into multiple rows)...
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
... breaks your match table into something more like:
DATA
NUM
NEGATE
MATCH
NOT ABC AND NOT DEF
2
1
%ABC%
NOT ABC AND NOT DEF
2
1
%DEF%
DEF ONLY
1
0
%DEF%
NOT DEF AND NOT IJK
2
1
%DEF%
NOT DEF AND NOT IJK
2
1
%IJK%
So we now know each specific like condition, whether it should be negated, and the number of conditions that need to be matched for each MATCH row. (For simplicity, I am using match.data as essentially a key for this since it is unique for each row in match and is what we want to return anyway, but if you were actually storing the data this way you'd probably use a sequence of some sort and not repeat the human-readable text.)
That way, your final query can be quite simple:
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
The conditions in the ON do the appropriate LIKE or NOT LIKE (matches one condition from the CRITERIA view/CTE), and the condition in the HAVING makes sure we had the correct number of total matches to return the row (makes sure we matched all the conditions in one row of the MATCH table).
You can see the entire thing...
WITH criteria AS
(
SELECT
data,
regexp_count(m.type, ' AND ') + 1 num,
CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
FROM match m INNER JOIN
table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ') + 1) as sys.OdciNumberList)) levels
ON 1=1
)
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
ON
(criteria.negate = 0 AND name.type LIKE criteria.match)
OR
(criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id
... working in this fiddle.
As a one-off, I don't think this is significantly different than the other answer already provided, but I wanted to do this since I think this is probably more maintainable if the complexity of your conditions changes.
It already handles arbitrary numbers of conditions, mixes of NOT and not-NOT within the same row of MATCH, and allows for the % signs (for the like) to be placed arbitrarily (e.g. startswith%, %endswith, %contains%, start%somewhere%end, exactmatch should all work as expected). If in the future you want to add different types of conditions or handle ORs, I think the general ideas here will apply.
Not knowing the possible other rules for selecting rows, just with your data from the question, maybe you could use this:
WITH
tbl_name AS
(
Select 1 "ID", '1ABC' "A_TYPE" From Dual Union All
Select 2 "ID", '2DEF' "A_TYPE" From Dual Union All
Select 3 "ID", '3DEF' "A_TYPE" From Dual Union All
Select 4 "ID", '4IJK' "A_TYPE" From Dual
),
tbl_match AS
(
Select 'NOT %ABC% AND NOT %DEF%' "A_TYPE", 'NOT ABC AND NOT DEF' "DATA" From Dual Union All
Select '%DEF%' "A_TYPE", 'DEF ONLY' "DATA" From Dual Union All
Select 'NOT %DEF% AND NOT %IJK%' "A_TYPE", 'NOT DEF AND NOT IJK' "DATA" From Dual
)
Select
n.ID "ID",
n.A_TYPE,
m.DATA
From
tbl_match m
Inner Join
tbl_name n ON (1=1)
Where
(
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 1) = 0
AND
INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 2) = 0
AND
Length(m.A_TYPE) > Length(SubStr(n.A_TYPE, 2)) + 2
)
OR
(
Length(m.A_TYPE) = Length(SubStr(n.A_TYPE, 2)) + 2
AND
'%' || SubStr(n.A_TYPE, 2) || '%' = m.A_TYPE
)
Order By n.ID
Result:
ID
A_TYPE
DATA
1
1ABC
NOT DEF AND NOT IJK
2
2DEF
DEF ONLY
3
3DEF
DEF ONLY
4
4IJK
NOT ABC AND NOT DEF
Any other format of condition should be evaluated separately ...
Regards...
WITH match_cte AS (
SELECT m.data
,m.type
,decode(instr(m.type,'NOT')
,1 -- found at position 1
,0
,1) should_find_str_1
,substr(m.type
,instr(m.type,'%',1,1) + 1
,instr(m.type,'%',1,2) - instr(m.type,'%',1,1) - 1) str_1
,decode(instr(m.type,'NOT',instr(m.type,'%',1,2))
,0 -- no second NOT
,1
,0) should_find_str_2
,substr(m.type
,instr(m.type,'%',1,3) + 1
,instr(m.type,'%',1,4) - instr(m.type,'%',1,3) - 1) str_2
FROM match m
)
SELECT s.id
,s.type
,m.data
FROM source s
CROSS JOIN match_cte m
WHERE m.should_find_str_1 = sign(instr(s.type,m.str_1))
AND (m.str_2 IS NULL
OR m.should_find_str_2 = sign(instr(s.type, m.str_2))
)
ORDER BY s.id, m.data
MATCH_CTE
|DATA|TYPE|SHOULD_FIND_STR_1|STR_1|SHOULD_FIND_STR_2|STR_2|
|-|-|-|-|-|-|
|NOT ABC AND NOT DEF|NOT %ABC% AND NOT %DEF%|0|ABC|0|DEF|
|DEF|%DEF%|1|DEF|1|NULL|
|NOT DEF AND NOT IJK|NOT %DEF% AND NOT %IJK%|0|DEF|0|IJK|
I'm looking for an explanation for why 1 of the following 3 queries aren't returning what I am expecting.
-- Query 1
SELECT ANNo, ANCpr
FROM Anmodning
WHERE LEFT(ANCpr,6) + '-' + RIGHT(ANCpr,4) NOT IN (SELECT PSCpr FROM Person)
-- Query 2
SELECT ANNo, ANCpr
FROM Anmodning a
LEFT JOIN Person p ON p.PSCpr = LEFT(a.ANCpr,6) + '-' + RIGHT(a.ANCpr,4)
WHERE p.PSNo IS NULL
-- Query 3
SELECT ANNo, ANCpr
FROM Anmodning
WHERE ANNo NOT IN
(
SELECT ANNo
FROM Anmodning
WHERE LEFT(ANCpr,6) + '-' + RIGHT(ANCpr,4) IN (SELECT PSCpr FROM Person)
)
Assume the following:
Anmodning with ANNo=1, ANCpr=1111112222
And the Person table doesn't have a row with PSCpr=111111-2222
Queries are executed in Management Studio against a SQL Server 2017.
Queries 2 and 3 returns the Anmodning row as expected but query 1 does not.
Why is that?
I suspect the issue with the first query is a null-safety problem. If there are null values in Person(PSCpr), then the not in condition filters out all Anmodning rows, regardless of other values in Person.
Consider this simple example:
select 1 where 1 not in (select 2 union all select null)
Returns no rows, while:
select 1 where 1 not in (select 2 union all select 3)
Returns 1 as you would expect.
This problem does not happen when you use left join, as in the second query.
You could also phrase this with not exists, which is null-safe, which I would recommend here:
SELECT ANNo, ANCpr
FROM Anmodning a
WHERE NOT EXITS (SELECT 1 FROM Person p WHERE p.PSCpr = LEFT(a.ANCpr,6) + '-' + RIGHT(a.ANCpr,4))
I need some help with my query...I am trying to get a count of names in each house, all the col#'s are names.
Query:
SELECT House#,
COUNT(CASE WHEN col#1 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#2 IS NOT NULL THEN 1 ELSE 0 END) +
COUNT(CASE WHEN col#3 IS NOT NULL THEN 1 ELSE 0 END) as count
FROM myDB
WHERE House# in (house#1,house#2,house#3)
GROUP BY House#
Desired results:
house 1 - the count is 3 /
house 2 - the count is 2 /
house 3 - the count is 1
...with my current query the results for count would be just 3's
In this case, it seems that counting names is the same as counting the commas (,) plus one:
SELECT House_Name,
LEN(Names) - LEN(REPLACE(Names,',','')) + 1 as Names
FROM dbo.YourTable;
Another option since Lamak stole my thunder, would be to split it and normalize your data, and then aggregate. This uses a common split function but you could use anything, including STRING_SPLIT for SQL Server 2016+ or your own...
declare #table table (house varchar(16), names varchar(256))
insert into #table
values
('house 1','peter, paul, mary'),
('house 2','sarah, sally'),
('house 3','joe')
select
t.house
,NumberOfNames = count(s.Item)
from
#table t
cross apply dbo.DelimitedSplit8K(names,',') s
group by
t.house
Notice how the answers you are getting are quite complex for what they're doing? That's because relational databases are not designed to store data that way.
On the other hand, if you change your data structure to something like this:
house name
1 peter
1 paul
1 mary
2 sarah
2 sally
3 joe
The query now is:
select house, count(name)
from housenames
group by house
So my recommendation is to do that: use a design that's more suitable for SQL Server to work with, and your queries become simpler and more efficient.
One dirty trick is to replace commas with empty strings and compare the lengths:
SELECT house +
' has ' +
CAST((LEN(names) - LEN(REPLACE(names, ',', '')) + 1) AS VARCHAR) +
' names'
FROM mytable
You can parse using xml and find count as below:
Select *, a.xm.value('count(/x)','int') from (
Select *, xm = CAST('<x>' + REPLACE((SELECT REPLACE(names,', ','$$$SSText$$$') AS [*] FOR XML PATH('')),'$$$SSText$$$','</x><x>')+ '</x>' AS XML) from #housedata
) a
select House, 'has '+cast((LEN(Names)-LEN(REPLACE(Names, ',', ''))+1) as varchar)+' names'
from TempTable
Hello I need help with the following Scenario.
There is a table with Company_Cd, Company_Name and All I need is the first 2 words of from the Company name if it has more than 3 words and 1 word if it has 2 words
Example:
Company_Cd Company_Name
123 ABC SOLUTIONS INC
345 XYZ GLOBAL TECH SOLUTIONS
899 NOWHERE COMPANY INC LTD
654 QSW SOLUTIONS
Desired Output:
Company_Cd Company_Name
123 ABC SOLUTIONS
345 XYZ GLOBAL
899 NOWHERE COMPANY
654 QSW
You can use the instr function to find the 1st and 2nd occurence of space and then use substr accordingly:
SELECT c.company_name,
(
CASE
WHEN instr(c.company_name,' ',1,2) >0 THEN SUBSTR(c.company_name, 1, instr(c.company_name,' ',1,2))
WHEN instr(c.company_name,' ',1,2) =0 AND instr(c.company_name,' ',1,1) >0 THEN SUBSTR(c.company_name, 1, instr(c.company_name,' ',1,1))
ELSE c.company_name
END)
FROM customer c
Please find below query for your use:
SELECT Company_Cd, IF((length(Company_Name) - length(replace(Company_Name, ' ', '')) + 1) >= 3, SUBSTRING_INDEX(Company_Name, ' ', 2), IF((length(Company_Name) - length(replace(Company_Name, ' ', '')) + 1) >= 2, SUBSTRING_INDEX(Company_Name, ' ', 1), Company_Name)) as result FROM company LIMIT 20;
You can also use Regular Expression:
SELECT Company_Cd,
regexp_replace(company_name,'(((\w+)\s){'||CASE WHEN regexp_count(trim(company_name),' ') IN (0,1) THEN 1
ELSE 2
END||'}).*','\1' )
FROM customer;
select company_cd,
trim(substr(company_name, 1, instr(company_name || ' ', ' ', 1, 2) - 1))
from company_tbl;
This solution begins by adding two spaces at the end of company_name; then it finds the position of the second space in this extended string, it removes the second space and everything after it - and then it trims the remaining space at the end (only needed if the company name was a single word; if all company names were guaranteed to be at least two words, the solution would be even simpler).
Just curious, if I wanted to send strings to a database (perhaps for MS SQL Server) can anyone provide any insight on what the best way would be to return results from a database where the result set might be sorted and "scored" on its closeness to the string passed in?
So, if I sent a query for :
SELECT name FROM table where name LIKE 'Jon'
and then get a result of 1000 results that looks like:
100 Jon
98 John
80 Jonathan
32 Nathan
Views, indexes, stored procedures, coded solution? What is the recommendation?
You could, but you'd need to use another function to do it. Levenshtein ratio or Jaro distance would be the most common solutions. I'm not sure what, if anything, SQL Server includes builtin for this. If nothing else I think you can use the SimMetrics library as described here. Regardless, it would look something like this.
select top 1000
jaro('John', name) as score, name
from table
where name like '%John%'
order by 1 desc
EDIT
Due to some persistent prodding from the comments, I present here an implementation of the Levenshtein distance calculation in SQL. TSQL for SQL Server 2005+ is used here, but the technique can be converted to other DBMS as well. Maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all
select 'Bone' union all
select 'BJon' union all
select 'Nathan' union all
select 'Jonne')
SELECT *, SCORE_Levenshtein + SCORE_SOUNDEX TotalScore
FROM
(
SELECT name,
CAST(50 /
(
select 1.0 + MAX(LDist)
FROM
(
select startAt.number,
LEN(longer) -
sum(case when SUBSTRING(longer, startAt.number+offset.number, 1)
= SUBSTRING(shorter, 1+offset.number, 1) then 1 else 0 end ) LDist
FROM
(select case when LEN(Name) < LEN(LookFor) then Name else LookFor end shorter) shorter
cross join
(select case when LEN(Name) >= LEN(LookFor) then Name else LookFor end longer) longer
inner join master..spt_values startAt
on startAt.type='P' and startAt.number between 1 and len(longer) - LEN(shorter) + 1
inner join master..spt_values offset
on offset.type='P' and offset.number between 0 and LEN(shorter)-1
group by startAt.number, longer, shorter
) X
) AS NUMERIC(16,4)) SCORE_Levenshtein
,
CAST(50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX(LookFor) as B) X
)) AS NUMERIC(16,4)) AS SCORE_SOUNDEX
FROM tbl
CROSS JOIN (SELECT 'Jon' as LookFor) LookFor
) Scored
Order by SCORE_Levenshtein + SCORE_SOUNDEX DESC
Note - This line CROSS JOIN (SELECT 'Jon' as LookFor) LookFor is used so that the input 'Jon' does not need to be repeated many times in the query. One could also define a variable instead and use it where LookFor is used in the query.
Output
It is worth noting that together with SOUNDEX, Jonny gets to score higher than Bone which won't happen with Levenshtein alone.
name SCORE_Levenshtein SCORE_SOUNDEX TotalScore
Jon 50.0000 50.0000 100.0000
John 12.5000 50.0000 62.5000
Jonny 8.3333 50.0000 58.3333
Jonne 8.3333 50.0000 58.3333
Bone 10.0000 25.0000 35.0000
BJon 10.0000 12.5000 22.5000
Jonathan 5.5556 16.6667 22.2223
Nathan 7.1429 12.5000 19.6429
Original answer follows, based on pre-filtering the input based on LIKE '%x%' which collapses the Levenshtein to a simple Len(column) - Len(Like-expression) calculation
Have a look at this example - it tests the length and SOUNDEX differences, for lack of better measures.
The maximum score is 100.
;with tbl as (
select 'Jon' AS Name union all
select 'Jonathan' union all
select 'Jonny' union all
select 'John' union all -- doesn't match LIKE
select 'BJon' union all
select 'Jonne')
SELECT name,
50 / (Len(Name) - LEN('Jon') + 1.0) -- inversely proportional to length difference
+
50 / (5- -- inversely proportional to soundex difference
(
SELECT 0.0 +
case when Substring(A,1,1)=Substring(B,1,1) then 1 else 0 end
+
case when Substring(A,2,1)=Substring(B,2,1) then 1 else 0 end
+
case when Substring(A,3,1)=Substring(B,3,1) then 1 else 0 end
+
case when Substring(A,4,1)=Substring(B,4,1) then 1 else 0 end
FROM (select soundex(name) as A, SOUNDEX('Jon') as B) X
)) AS SCORE
FROM tbl
where name LIKE '%Jon%'
Order by SCORE DESC
Output
name SCORE
Jon 100.00000000000000000
Jonny 66.66666666666660000
Jonne 66.66666666666660000
BJon 37.50000000000000000
Jonathan 24.99999999999996666
Something like this might help:
http://www.mombu.com/microsoft/microsoft/t-equivalent-sql-server-functions-for-match-against-in-mysql-2292412.html