display the length of the first word of a dataset - sql

Hi I'm having problems trying to switch the output of the FirstWordLength to print the amount of characters in each first word this output to
SELECT InvoiceLineItemDescription,
LEFT(InvoiceLineItemDescription,
CASE
WHEN charindex(' ', InvoiceLineItemDescription) = 0 THEN LEN(InvoiceLineItemDescription)
ELSE charindex(' ', InvoiceLineItemDescription) - 1 END)
AS FirstWordLength
FROM InvoiceLineItems
ORDER BY FirstWordLength desc;
It should look something like this:
InvoiceLineItemDescription FirstWordLength
citi bank 4

You can get the first word length using charindex():
SELECT InvoiceLineItemDescription,
CHARINDEX(' ', InvoiceLineItemDescription + ' ') - 1 as FirstWordLength
FROM InvoiceLineItems
ORDER BY FirstWordLength desc;
As in your question, this assumes that only spaces are used for delimiting words. You can use PATINDEX() to support more separator characters.
The case in your code should work, but you are using it to extract the first word, rather than just the length.

Related

How to remove Roman letter and numeric value from column in SQL

Using SQL Server, I have a column with numeric and Roman numerals at the end. How do I remove the numeric alone without specifying the position?
Job_Title
Data Analyst 2
Manager 50
Robotics 1615
Software Engineer
DATA ENGINEER III
I tried using this query:
SELECT
CASE
WHEN PATINDEX('%[0-9 ]%', job_title) > 0
THEN RTRIM(SUBSTRING(Job_title, 1, PATINDEX('%[0-9 ]%', job_title) - 1))
ELSE JOB_TITLE
END
FROM
my_table
WHERE
PATINDEX('%[0-9]%', JOB_TITLE) <> 0
But the result I'm getting is:
Job_Title
Data
Manager
Robotics
Use the TRANSLATE function like this :
SELECT TRANSLATE(Job_title, '0123456789', ' ') AS JOB_TITLE
from my_table
You can use RTRIM to complete
You should remove the space character in the regex expression. So, new code should be
SELECT case when patindex('%[0-9]%', job_title) > 0 then
rtrim(substring(Job_title,1, patindex('%[0-9]%', job_title) - 1))
else
JOB_TITLE
end
from my_table
WHERE PATINDEX('%[0-9]%',JOB_TITLE) <>0
I think you're trying to remove numbers from the end of a job title, and not exclude results. So, as others have mentioned, you need to remove the space from the brackets of the regex and put it in front of the brackets to say it is separated from the stuff in front of it by a space. But I think you also need to remove the wildcard character from the right side of the comparison value so that the numbers have to be at the end of the job title, like...
SELECT case when patindex('% [0-9]', job_title) > 0 then
rtrim(substring(Job_title,1, patindex('% [0-9]', job_title) - 1))
else
JOB_TITLE
end
from my_table
But, you also mention roman numerals... and... that's tougher if it's possible for a job title to end in something like " X" where it means "X" and not "10". If that's not possible, you should just be able to do [0-9IVXivx] to replace all the bracketed segments.

RIGHT of CHARINDEX not selecting correctly

I am trying to parse out a last name field that may have two last names that are separated by either a blank space ' ' or a hyphen '-' or it may only have one name.
Here is what I'm using to do that:
select top 1000
BENE_FIRST_NAME,
BENE_LAST_NAME,
FirstNm =
case
when BENE_FIRST_NAME like '% %' then
left(BENE_FIRST_NAME, CHARINDEX(' ', BENE_FIRST_NAME))
when BENE_FIRST_NAME like '%-%' then
left(BENE_FIRST_NAME, CHARINDEX('-', BENE_FIRST_NAME))
else BENE_FIRST_NAME
end,
LastNm =
case
when BENE_LAST_NAME like '% %' then
right(BENE_LAST_NAME, CHARINDEX(' ', BENE_LAST_NAME))
when BENE_LAST_NAME like '%-%' then
right(BENE_LAST_NAME, CHARINDEX('-', BENE_LAST_NAME))
else BENE_LAST_NAME
end,
CharIndxDash = CHARINDEX('-', BENE_LAST_NAME),
CharIndxSpace = CHARINDEX(' ', BENE_LAST_NAME)
from xMIUR_Elig_Raw_v3
Here are some results:
BENE_FIRST_NAME
BENE_LAST_NAME
FirstNm
LastNm
CharIndxDash
CharIndxSpace
JUANA
PEREZ-MARTINEZ
JUANA
RTINEZ
6
0
EMILIANO
PICENO ESPINOZA
EMILIANO
SPINOZA
0
7
JULIAN
NIETO-CARRENO
JULIAN
ARRENO
6
0
EMILY
SALMERON TERRIQUEZ
EMILY
TERRIQUEZ
0
9
The CHARINDEX seems to be selecting the correct position but it is not bringing in all of the CHARs to the right of that position. Sometimes it works like in the last record. But sometimes it is off by 1. And sometimes it is off by 2. Any ideas?
If you need to select part of a last name after space/hyphen, you need to get right part of the string with length = total_lenght - space_position:
...
LastNm =
case
when BENE_LAST_NAME like '% %' then
right(BENE_LAST_NAME, LEN(BENE_LAST_NAME) - CHARINDEX(' ', BENE_LAST_NAME))
when BENE_LAST_NAME like '%-%' then
right(BENE_LAST_NAME, LEN(BENE_LAST_NAME) -CHARINDEX('-', BENE_LAST_NAME))
else BENE_LAST_NAME
end,
...
Your last name logic doesn't make sense..
RIGHT takes N chars from the right of the string
CHARINDEX gives the position of a char from the left of the string
You can't use it to find a position from left and then take that number of chars from the right of the string
Here's a name:
JOHN MALKOVICH
The space is at 5. If you take 5 chars from the right, you get OVICH. The shorter the name before the space and the longer the name after the space, the fewer chars you get from the last name
Perhaps you mean to put a LEN in there so you take the string length minus the index of the space.. You can also use it in a call to SUBSTRING as the start index, and tell SQLS to take 9999 chars (of any number longer than the remaining string) and it will take up to the end of the string
SUBSTRING(name, CHARINDEX(' ', name)+1, 9999)
I think you can simplify your code by a lot. Consider below with a different but representative sample data
with data (name) as
(select 'first-last' union select 'first last' union select 'firstlast'),
data_prepped (name, indx) as
(select name,coalesce(nullif(charindex(' ', name)+charindex('-', name),0),len(name))
from data)
select name,
left(name, indx-1) as part1,
right(name, indx) as part2
from data_prepped

Highlighting rows with trailing and leading space/s

I want to highlight rows with trailing and leading spaces. I've below query but want to know if there is better more efficient way to achieve this.
SELECT *
FROM DummyTable lc
WHERE
(lc.Code LIKE '% ' OR lc.Code LIKE ' %' or lc.Code like '% % %' OR lc.Code like '% % %')
AND (lc.StartDate <= getdate() AND lc.EndDate > getdate())
AND (lc.CodeTypeID <> 27)
ORDER BY 4 DESC
Please note that I don't want to remove space from the field "Code" but just highlight in my result set.
There is not a more efficient method, but you can reduce the number of comparisons:
WHERE CONCAT(' ', lc.Code, ' ') LIKE '% %' AND
lc.StartDate <= getdate() AND
lc.EndDate > getdate() AND
lc.CodeTypeID <> 27
This adds a space to the beginning and end and then looks for two spaces in a row (which seems to be your intention despite how the question is phrased).
Unfortunately, there is little you can do to improve performance beause all the comparisons are inequalities.
You could just use string functions LEFT() and RIGHT() to check if the string starts or ends with a space, like:
SELECT *
FROM DummyTable
WHERE
(LEFT(Code, 1) = ' ' OR RIGHT(Code, 1) = ' ')
AND StartDate <= getdate()
AND EndDate > getdate()
AND CodeTypeID <> 27
ORDER BY 4 desc
As commented by Martin Smith, (LEFT(Code, 1) = ' ' OR RIGHT(Code, 1) = ' ') can be simplified as ' ' IN (LEFT(Code, 1), RIGHT(Code, 1)).
NB: few simplifications in your query:
you don't need to prefix the columns with the table name, since only one table is involved in the query
you don't need to surround individual conditions with parenthesis (just make sure to surround the ORed conditions with parenthesis to separate them from the ANDed conditions

trim the column value string

In SQL Query, I need the values as below using select query of my column.
Result has to be the text after the first space ' ' and before the first '('
Source Column
create Table Test_Table (Column1 Varchar(50))
Insert into Test_Table Values
('0636 KAVITHI (LOC)'),
('0638 SRI KRISHNA (NAT)'),
('0639 SELVAM'),
('0643 GOOD SERVICE (LOC)'),
('0644 FINA CARE EVENT (LOC)')
I need get the string found between first ' ' and the '('
Expected Result
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
Another approach without using an OUTER APPLY.
SELECT CASE WHEN Column1 LIKE '%(%'
THEN SUBSTRING(RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0,
CHARINDEX('(',RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1)),0))
ELSE RIGHT(Column1,LEN(Column1)-CHARINDEX(' ',Column1))
END AS Trimmed
FROM Test_Table
OUTPUT
Trimmed
KAVITHI
SRI KRISHNA
SELVAM
GOOD SERVICE
FINA CARE EVENT
SQL Fiddle: http://sqlfiddle.com/#!3/69dd1/20/0
CHARINDEX() can be used to find the position of specific characters.
OUTER APPLY can be used to find the position of the space and brace characters, and store them in a place that you can re-use them.
SUBSTRING() can be used to find the text between the space and the brace.
EDIT: Added CASE to cope with values that contain no (.
SELECT
SUBSTRING(
test_table.column1, -- the field we're searching
stats.idx_space + 1, -- starting from the character after the first space
CASE
WHEN stats.idx_brace > stats.idx_space
THEN stats.idx_brace
ELSE stats.idx_eos
END
-
stats.idx_space -- for as many characters as there are between the space and the brace
)
FROM
test_table
OUTER APPLY
(
SELECT
CHARINDEX(' ', test_table.column1) AS idx_space, -- position of the first space
CHARINDEX('(', test_table.column1) AS idx_brace, -- position of the first brace
LEN(test_table.column1) AS idx_eos -- position of the end-of-string
)
AS stats
EDIT: A single "line", as requested.
Do note that forcing this as a single line does make this harder to read, maintain and adapt. One of APPLY's strongest use-cases is to maintain DRY (Don't Repeat Yourself) principles.
This query repeats several parts several times:
- find the first space repeated 2 times
- find the first brace repeated 3 times
SELECT
SUBSTRING(
test_table.column1,
CHARINDEX(' ', test_table.column1) + 1,
CASE
WHEN CHARINDEX('(', test_table.column1) > CHARINDEX(' ', test_table.column1)
THEN CHARINDEX('(', test_table.column1)
ELSE LEN(test_table.column1)
END
-
CHARINDEX('(', test_table.column1)
)
FROM
test_table

SQL Server : find percentage match of LIKE string

I'm trying to write a query to find the percentage match of a search string in a notes or TEXT column.
This is what I'm starting with:
SELECT *
FROM NOTES
WHERE UPPER(NARRATIVE) LIKE 'PAID CALLED RECEIVED'
Ultimately, what I want to do is:
Split the search string by spaces and search individually for all words in the string
Order the results descending based on percentage match
For example, in the above scenario, each word in the search string would constitute 33.333% of the total. A NARRATIVE with 3 matches (100%) should be at the top of the results, while a match containing 2 of the keywords (66.666%) would be lower, and a match containing 1 of the keywords (33.333%) would be even lower.
I then want to display the resulting percentage match for that row in a column, along with all the other columns from that table (*).
Hopefully, this makes sense and can be done. Any thoughts on how to proceed? This MUST all be done in SQL Server, and I would prefer not to write any CTEs.
Thank you in advance for any guidance.
Here is what I came up with:
DECLARE #VISIT VARCHAR(25) = '999232'
DECLARE #KEYWORD VARCHAR(100) = 'PAID,CALLED,RECEIVED'
DECLARE SPLIT_CURSOR CURSOR FOR
SELECT RTRIM(LTRIM(VALUE)) FROM Rpt_Split(#KEYWORD, ',')
IF OBJECT_ID('tempdb..#NOTES_FF_SEARCH') IS NOT NULL DROP TABLE #NOTES_FF_SEARCH
SELECT N.VISIT_NO
,N.CREATE_DATE
,N.CREATE_BY
,N.NARRATIVE
,0E8 AS PERCENTAGE
INTO #NOTES_FF_SEARCH
FROM NOTES_FF AS N
WHERE N.VISIT_NO = #VISIT
DECLARE #KEYWORD_VALUE AS VARCHAR(255)
OPEN SPLIT_CURSOR
FETCH NEXT FROM SPLIT_CURSOR INTO #KEYWORD_VALUE
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE #NOTES_FF_SEARCH
SET PERCENTAGE = PERCENTAGE + ( 100 / ##CURSOR_ROWS )
WHERE UPPER(NARRATIVE) LIKE '%' + UPPER(#KEYWORD_VALUE) + '%'
FETCH NEXT FROM SPLIT_CURSOR INTO #KEYWORD_VALUE
END
CLOSE SPLIT_CURSOR
DEALLOCATE SPLIT_CURSOR
SELECT * FROM #NOTES_FF_SEARCH
WHERE PERCENTAGE > 0
ORDER BY PERCENTAGE, CREATE_DATE DESC
There may be a more efficient way to do this but every other road I started down ended in a dead-end. Thanks for your help
If you want to do a "percentage" match, you need to do two things: calculate the number of words in the string and calculate the number of words you care about. Before giving some guidance, I will say that full text search probably does everything you want and much more efficiently.
Assuming the search string has space delimited words, you can count the words with the expression:
(len(narrative) - len(replace(narrative, ' ', '') + 1) as NumWords
You can count the matching words with success replaces. So, for keywords, it would be something like removing each key word, fixing the spaces, and counting the words.
The overall code is best represented with subqueries. The resulting query is something like:
select n.*
from (select n.*,
(len(narrative) - len(replace(narrative, ' ', '') + 1.0) as NumWords,
ltrim(rtrim(replace(replace(replace(narrative + ' ', #keyword1 + ' ', ''),
#keyword2 + ' ', ''),
#keyword3 + ' ', ''))) as NoKeywords
from notes n
) n
order by 1 - (len(NoKeywords) - len(replace(NoKeywords, ' ', '') + 1.0) / NumWords desc;
SQL Server -- as with many databases -- is not particularly good at parsing strings. You can do that outside the query and assign the #keyword variables accordingly.