How to divide the text into two columns by matching text using SQL select query - sql

I have table with a column Note / Reason - like this:
Note / Reason
test
test2
test REASON:ANOTHER PROVIDER
this is a test REASON:NO FITTER
I want to divide the text like below into note and reason like below reason with start with reason and other text will be note
Note Reason
---------------------------------------------------
test
test2'
test REASON:ANOTHER PROVIDER
this is a test REASON:NO FITTER

Using Sean's sample data, as I mentioned in the comments, use CHARINDEX, LEFT and STUFF:
SELECT LEFT(NoteReason,CHARINDEX('REASON:',NoteReason + 'REASON:')-1) AS Note,
STUFF(NoteReason,1,CHARINDEX('REASON:',NoteReason)-1,'') AS Reason
FROM #NoteReason;
Considering you have extra white space, you may also want to wrap the expressions in a TRIM.

It would be better if you provided sample data in a consumable format. That way it is easy for others to use and it is also more precise so others aren't guessing or making assumptions about your tables and sample data. Given the sparse information in your question something like this should be somewhat close.
declare #NoteReason table (NoteReason varchar(100))
insert #NoteReason values
('test')
, ('test2')
, ('test REASON:ANOTHER PROVIDER')
, ('this is a test REASON:NO FITTER')
select Note = case when charindex('REASON', n.NoteReason) = 0 then n.NoteReason
else left(n.NoteReason, charindex('REASON', n.NoteReason) - 1)
end
, Reason = case when charindex('REASON', n.NoteReason) > 0 then substring(n.NoteReason, charindex('REASON', n.NoteReason), len(n.NoteReason)) else '' end
from #NoteReason n

Related

Remove hyphen from end of string in SQL Server

I'd like to remove hyphen from string values using SQL.
I have a column that contains data like:
middle-to high-income parents
Assets -
business – 1 year in their Certified program
explain your assets
10-12-15 years -
and this is what I need from those string:
middle-to high-income parents
Assets
business – 1 year in their Certified program
explain your assets
10-12-15 years
I tried
rtrim(ltrim(replace(bal_sheet_item, '-', ' ')))
but it removed all hyphens not just the ones at the end of the string.
You can accomplish this with a little gymnastics, however why not just have your presentation layer remove any trailing dashes?
SELECT bal_sheet_item = LEFT(bal_sheet_item, LEN(bal_sheet_item)
- CASE WHEN RIGHT(bal_sheet_item,1) = '-' THEN 1 ELSE 0 END)
FROM
(
SELECT bal_sheet_item = RTRIM(bal_sheet_item),
lenA = LEN(bal_sheet_item)
FROM dbo.YourTable
) AS x;
In newer versions you can just say:
SELECT bal_sheet_item = TRIM('- ' FROM bal_sheet_item)
FROM dbo.YourTable;
Working example in this fiddle.
Seems like you were on the right track with a simple replace()... just make it unique
Example
Select *
,NewValue = replace(replace(rtrim(bal_sheet_item)+'>~<','->~<',''),'>~<','')
From YourTable
Results
#JavaLifeLove - It would be helpful if, when you posted data examples, you posted them as readily consumable data, like the following...
--===== If the temp table exists, drop it to make reruns easier in SSMS.
DROP TABLE IF EXISTS #TestTable;
GO
--===== Create and populate the temp table on-the-fly for such simple data.
-- Do a separate CREATE TABLE and an INSERT/VALUES for something more complex.
SELECT v.*
INTO #TestTable
FROM (VALUES
('middle-to high-income parents')
,('Assets -')
,('business – 1 year in their Certified program')
,('explain your assets')
,(' 10-12-15 years - ')
)v(bal_sheet_item)
;
GO
That'll not only help explain things but it will help the people that are trying to help you because a lot of them like to actually test their code before they publish a possible answer to your question. It'll entice people to help more quickly, as well. :D
I'd also be checking out who the source of the data is coming from. People don't just causally end something with a dash. It can be a pretty strong indication that something got dropped along the way and I'd let someone know about that possibility before writing any code to possibly perpetuate the bad data.
If they say "Just do it", then I'd turn a solution into a bit of repeatable code that others can use because you're probably not the only one that will need to deal with such garbage data.
With that being said, here's a high performance iTVF (inline Table Valued Function) using one of many methods to solve this problem. The usage example is in the flower box of the function.
CREATE FUNCTION dbo.DropTrailingCharacter
/**********************************************************************************************************************
Purpose:
Remove the last given #pCharacter from the given #pString even if #pCharacter is followed by trailing spaces.
The final result is trimmed for both leading and trailing spaces.
Works for 2012+.
-----------------------------------------------------------------------------------------------------------------------
WARNING:
Modifying original data may not be the correct thing to do. For example, people don't arbitrarily add dashes to the
end of data. It may be an indication that a part of the original data may be missing. Check with the people that are
providing the data to ensure that's not what's happening.
-----------------------------------------------------------------------------------------------------------------------
Usage Example:
--===== Test table. This is not a part of the solution. It's just an example for usage.
DROP TABLE IF EXISTS #TestTable
;
SELECT v.*
INTO #TestTable
FROM (VALUES
('middle-to high-income parents')
,('Assets -')
,('business – 1 year in their Certified program')
,('explain your assets')
,(' 10-12-15 years - ')
)v(bal_sheet_item)
;
--===== Remove single trailing dash even if followed by spaces.
SELECT tt.bal_sheet_item
,bal_sheet_item_cleaned = ca.Cleaned
FROM #TestTable tt
CROSS APPLY dbo.DropTrailingCharacter(tt.bal_sheet_item,'-')ca
;
-----------------------------------------------------------------------------------------------------------------------
Revision History:
Ref 00 - 30 Oct 2022 - Jeff Moden
- Initial creation and unit test.
- Ref: https://stackoverflow.com/questions/74255805/remove-hyphen-from-end-of-string-in-sql-server/74256248
**********************************************************************************************************************/
(#pString VARCHAR(8000), #pCharacter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT Cleaned = LTRIM(RTRIM(
IIF(RIGHT(ca.Trimmed,1) = #pCharacter
,LEFT(ca.Trimmed,LEN(ca.Trimmed)-1) --Character found and removed from end of of string.
,ca.Trimmed))) --Character not found. Keep everything
FROM (VALUES(RTRIM(#pString)))ca(Trimmed) --DRYs out the code above.
;
GO
You can achieve it just simply like query with substring.
Following is the table creation query which I have used to verify scenario.
CREATE TABLE dbo.hyphentest(bal_sheet_item varchar(255));
INSERT dbo.hyphentest(bal_sheet_item) VALUES
('middle-to high-income parents'),
('Assets -'),
('business – 1 year in their Certified program'),
('explain your assets'),
('10-12-15 years - ');
To remove - sign you can first trim the column so if any trail space is there after - that can be removed and use like query with percentage on starting as initial characters are any but we need to filter string ends with '-' or '- ' and then substring it with len - 1
SELECT
substring(TRIM(bal_sheet_item), 1, (len(TRIM(bal_sheet_item)) - 1))
FROM dbo.hyphentest WHERE TRIM(bal_sheet_item) LIKE '%-'
if you want to get all rows inclusive hyphen ended data then you can use following query with case
SELECT CASE WHEN TRIM(bal_sheet_item) LIKE '%-' THEN substring(TRIM(bal_sheet_item), 1, (len(TRIM(bal_sheet_item)) - 1))
ELSE bal_sheet_item END AS bal_sheet_item FROM dbo.hyphentest

How can I find only one distinct digit in a cell in SQL

I have customer data with mobile phone numbers where '1' has been entered 10 times or more in a cell to bypass the customer onboarding system validation. For example '1111111111'
I used below condition in my where clause but that didn't really help.
AND p.mobile_no LIKE '%[1111111111]%'
It is possible that users might enter 1 multiple number of times in the new customer form to bypass validation. To find only 0 values in the cell I used %[^0]% in the WHERE clause and I was hoping to use something similar to find 1s where regardless of how many times it has been entered in the field, as long as it only has 1 in it it will skim out the data for me.
How can I find these instances in my data using a SQL query?
The goal is to find these anomalies and remove them.
Using: Microsoft SQL Server 2016 (SP2).
I think you are looking for the following, which tests if at least 1 '1' exists, and that no other characters exist.
select Number
from (values ('111'),('121'),('1-2'),('22')) x (Number)
-- Test that at least 1 '1' exists
where Number like '%1%'
-- And that no other allowable characters exist - expand to cover all options
and Number not like '%[0,2-9,-]%'
Using a table to define invalid phone numbers:
Declare #invalidPhoneNumbers Table (PhoneNumber char(10));
Insert Into #testData (PhoneNumber)
Values ('0000000000'), ('1111111111'), ('2222222222'), ('3333333333'), ('4444444444')
, ('5555555555'), ('6666666666'), ('7777777777'), ('8888888888'), ('9999999999');
Select ...
From ...
Where ...
And p.mobile_no Not In (Select i.PhoneNumber From #invalidPhoneNumbers i)
Or - using NOT EXISTS which may perform better:
Declare #invalidPhoneNumbers Table (PhoneNumber char(10));
Insert Into #testData (PhoneNumber)
Values ('0000000000'), ('1111111111'), ('2222222222'), ('3333333333'), ('4444444444')
, ('5555555555'), ('6666666666'), ('7777777777'), ('8888888888'), ('9999999999');
Select ...
From ...
Where ...
And Not Exists (Select * From #invalidPhoneNumbers i Where i.PhoneNumber = p.mobile_no)
When declaring the table - make sure the data type defined matches exactly the defined data type of p.mobile_no. This will make sure there are no implicit conversions that can cause issues.

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

SQL query - LEFT 1 = char, RIGHT 3-5 = numbers in Name

I need to filter out junk data in SQL (SQL Server 2008) table. I need to identify these records, and pull them out.
Char[0] = A..Z, a..z
Char[1] = 0..9
Char[2] = 0..9
Char[3] = 0..9
Char[4] = 0..9
{No blanks allowed}
Basically, a clean record will look like this:
T1234, U2468, K123, P50054 (4 record examples)
Junk data looks like this:
T12.., .T12, MARK, TP1, SP2, BFGL, BFPL (7 record examples)
Can someone please assist with a SQL query to do a LEFT and RIGHT method and extract those characters, and do a LIKE IN or something?
A function would be great though!
The following should work in a few different systems:
SELECT *
FROM TheTable
WHERE Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]%'
AND Data NOT LIKE '% %'
This approach will indeed match P2343, P23423JUNK, and other similar text but requires that the format is A0000*.
Now, if the OP implies a format of 1st position is a character and all succeeding positions are numeric, as in A0+, then use the following (in SQL Server and a good deal of other database systems):
SELECT *
FROM TheTable
WHERE SUBSTRING(Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(Data, 2, LEN(Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(Data) >= 5
To incorporate this into a SQL Server 2008 function, since this appears to be what you'd like most, you can write:
CREATE FUNCTION ufn_IsProperFormat(#data VARCHAR(50))
RETURNS BIT
AS
BEGIN
RETURN
CASE
WHEN SUBSTRING(#Data, 1, 1) LIKE '[A-Za-z]'
AND SUBSTRING(#Data, 2, LEN(#Data) - 1) NOT LIKE '%[^0-9]%'
AND LEN(#Data) >= 5 THEN 1
ELSE 0
END
END
...and call into it like so:
SELECT *
FROM TheTable
WHERE dbo.ufn_IsProperFormat(Data) = 1
...this query needs to change for Oracle queries because Oracle doesn't appear to support bracket notation in LIKE clauses:
SELECT *
FROM TheTable
WHERE REGEXP_LIKE(Data, '^[A-za-z]\d{4,}$')
This is the expansion gbn is doing in his answer, but these versions allow for varying string lengths without the OR conditions.
EDIT: Updated to support examples in SQL Server and Oracle for ensuring the format A0+, so that A1324, A2342388, and P2342 match but A2342JUNK and A234 do not.
The Oracle REGEXP_LIKE code was borrowed from Mark's post but updated to support 4 or more numeric digits.
Added a custom SQL Server 2008 approach which implements these techniques.
Depends on your database. Many have regex functions (note examples not tested so check)
e.g. Oracle
SELECT x
FROM table
WHERE REGEXP_LIKE(x, '^[A-za-z][:digit:]{4}$')
Sybase uses LIKE
Given that you're allowing between 3 and 6 digits for the number in your examples then it's probably better to use the ISNUMERIC() function on the 2nd character onwards:
SELECT *
FROM TheTable
-- start with a letter
WHERE Data LIKE '[A-Za-z]%'
-- everything from 2nd character onwards is a number
AND ISNUMERIC( SUBSTRING( Data, 2, 50 ) ) = 1
-- number doesn't have a decimal place
AND Data NOT LIKE '%.%'
For more information look at the ISNUMERIC function on MSDN.
Also note that:
I've limited the 2nd part with the number to 50 characters maximum, change this to suit your needs.
Strictly speaking you should check for currency symbols etc, as ISNUMERIC allows them, as well as +/- and some others
A better option might be to create a function that checks that each character after the first is between 0 and 9 (or 1 and 0 if you're using ASCII codes).
You can't use Regular Expressions in SQL Server, so you have to use OR. Correcting David Andres' answer...
WHERE
(
Data LIKE '[A-Za-z][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9]'
OR
Data LIKE '[A-Za-z][0-9][0-9][0-9][0-9][0-9]'
)
David's answer allows "D1234junk" through
You also only need "[A-Z]" if you don't have case sensitivity