Related
I have a column in a SQL Server table that has strings of varying lengths. I need to find the position of the first occurrence of the string , -- that's not enclosed in single quotes or square brackets.
For example, in the following two strings, I've bolded the portion I would like to get the position of. Notice in the first string, the first time , -- appears on its own (without being between single quote or square bracket delimiters) is at position 13 and in the second string, it's at position 16.
'a, --'[, --]**, --**[, --]
[a, --b]aaaaaaa_ **, --**', --'
Also I should mention that , -- itself could appear multiple times in the string.
Here's a simple query that shows the strings and my desired output.
SELECT
t.string, t.desired_pos
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
Is there any way to accomplish this using a SELECT query (or multiple) without using a function?
Thank you in advance!
I've tried variations of SUBSTRING, CHARINDEX, and even some CROSS APPLYs but I can't seem to get the result I'm looking for.
Before i write down my solution, i must warn you: DON'T USE IT. Use a function, or do this in some other language. This code is probably buggy.
It doesn't handle stuff like escaped quotes etcetc.
The idea is to first remove the stuff inside brackets [] and quotes '' and then just do a "simple" charindex.
To remove the brackets, i'm using a recursive CTE that loops ever part of matching quotes and replaces their content with placeholder strings.
One important point is that quotes might be embedded in each other, so you have to try both variants and chose the one that is earliest.
WITH CTE AS (
SELECT *
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
)
, cte2 AS (
select x.start
, x.finish
, case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
, 1 as level
, string as orig
, desired_pos
from cte
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', string) AS start
, charindex(']', string) AS finish
UNION ALL
SELECT charindex('''', string) AS startQ
, charindex('''', string, charindex('''', string) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
UNION ALL
select x.start
, x.finish
, STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
, 1 as level
, orig
, desired_pos
from cte2
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', newString) AS start
, charindex(']', newString) AS finish
UNION ALL
SELECT charindex('''', newString) AS startQ
, charindex('''', newString, charindex('''', newString) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
AND x.start > 0
AND cte2.start > 0 -- Must have been a match
)
SELECT PATINDEX('%, --%', newString), *
from (
select *, row_number() over(partition by orig order by level desc) AS sort
from cte2
) x
where x.sort = 1
Try this approach. I'm replacing the strings you don't need for another string of the same length. Then look for the position of the interested string.
SELECT string, desired_pos,
CHARINDEX(', --', REPLACE(REPLACE(string, ''', --''', '******'), '[, --]', '******')
) start_index
FROM (VALUES (N''', --''[, --], --[, --]', 13),
(N'[, --]aaaaaaa_ , --'', --''', 16)) t(string, desired_pos)
I don't know if it makes sense with a C# solution, but this class for CVS is a nice little parcer: TextFieldParser
Then you just define Delimeters etc. and assuming the input is escaped consistently then all is good.
Im late the game here but This kind of thing is simple in SQL Server when leveraging NGrams8k. Not only do you not need REGEX, a CLR, C# required. Furthermore, NGrams8k will be the fastest by far. In 8 years nobody has produced anything remotely as fast. Furthermore, this code will be faster and far less complex than a recursive CTE solution (which are almost always slow in SQL Server)
;--==== Sample Data
DECLARE #T Table (String VARCHAR(100))
INSERT #T
VALUES (N'''a, --''[, --], --[, --]'),
(N'[a, —-b]aaaaaaa_ , --'', --''');
;--==== Solution
SELECT
t.String, ng.Position
FROM #t AS t
CROSS APPLY (VALUES(REPLACE(t.String,'[',CHAR(1)))) AS f(S)
CROSS APPLY samd.NGrams8k(f.S,4) AS ng
CROSS APPLY (VALUES(SUBSTRING(f.S,ng.Position-2,7))) AS g(String)
WHERE ng.Token = ', --'
AND g.String NOT LIKE '%''%''%'
AND g.String NOT LIKE '%'+CHAR(1)+'%]%';
Results:
String Position
----------------------------- --------------------
'a, --'[, --], --[, --] 14
[a, —-b]aaaaaaa_ , --', --' 18
I have a column with different length strings which has dashes (-) that separates alphanumeric strings.
The string could look like "A1-2-3".
I need to order by first "A1" then "2" then "3"
I want to achieve the following order for the column:
A1
A1-1-1
A1-1-2
A1-1-3
A1-2-1
A1-2-2
A1-2-3
A1-7
A2-1-1
A2-1-2
A2-1-3
A2-2-1
A2-2-2
A2-2-3
A2-10-1
A2-10-2
A2-10-3
A10-1-1
A10-1-2
A10-1-3
A10-2-1
A10-2-2
A10-2-3
I can separate the string with the following code:
declare #string varchar(max) = 'A1-2-3'
declare #first varchar(max) = SUBSTRING(#string,1,charindex('-',#string)-1)
declare #second varchar(max) = substring(#string, charindex('-',#string) + 1, charindex('-',reverse(#string))-1)
declare #third varchar(max) = right(#string,charindex('-',reverse(#string))-1)
select #first, #second, #third
With the above logic I thought that I could use the following:
Note this only regards strings with 2 dashes
select barcode from tabelWithBarcodes
order by
case when len(barcode) - len(replace(barcode,'-','')) = 2 then
len(SUBSTRING(barcode,1,charindex('-',barcode)-1))
end
, case when len(barcode) - len(replace(barcode,'-','')) = 2 then
SUBSTRING(barcode,1,(charindex('-',barcode)-1))
end
, case when len(barcode) - len(replace(barcode,'-','')) = 2 then
len(substring(barcode, charindex('-',barcode) + 1, charindex('-',reverse(barcode))-1))
end
, case when len(barcode) - len(replace(barcode,'-','')) = 2 then
substring(barcode, charindex('-',barcode) + 1, charindex('-',reverse(barcode))-1)
end
, case when len(barcode) - len(replace(barcode,'-','')) = 2 then
len(right(barcode,charindex('-',reverse(barcode))-1))
end
, case when len(barcode) - len(replace(barcode,'-','')) = 2 then
right(barcode,charindex('-',reverse(barcode))-1)
end
But the sorting is not working for the second and third section of the string.
(I haven't added the code for checking if the string has only 1 or no dash in it for simplicity)
Not sure if I'm on the right path here.
Is anybody able to solve this?
This is not pretty, however...
USE Sandbox;
GO
WITH VTE AS(
SELECT V.SomeString
--Randomised order
FROM (VALUES ('A1-1-1'),
('A10-1-3'),
('A10-2-2'),
('A1-1-3'),
('A10-2-1'),
('A2-2-2'),
('A1-2-1'),
('A1-2-2'),
('A2-1-1'),
('A10-1-2'),
('B2-1-2'),
('A1'),
('A2-2-1'),
('A2-10-3'),
('A10-2-3'),
('A2-1-2'),
('B1-4'),
('A2-10-2'),
('A2-2-3'),
('A10-1-1'),
('A1-A1-3'),
('A1-7'),
('A2-10-1'),
('A2-1-3'),
('A1-1-2'),
('A1-2-3')) V(SomeString)),
Splits AS(
SELECT V.SomeString,
DS.Item,
DS.ItemNumber,
CONVERT(int,STUFF((SELECT '' + NG.token
FROM dbo.NGrams8k(DS.item,1) NG
WHERE TRY_CONVERT(int, NG.Token) IS NOT NULL
ORDER BY NG.position
FOR XML PATH('')),1,0,'')) AS NumericPortion
FROM VTE V
CROSS APPLY dbo.DelimitedSplit8K(V.SomeString,'-') DS),
Pivoted AS(
SELECT S.SomeString,
MIN(CASE V.P1 WHEN S.Itemnumber THEN REPLACE(S.Item, S.NumericPortion,'') END) AS P1Alpha,
MIN(CASE V.P1 WHEN S.Itemnumber THEN S.NumericPortion END) AS P1Numeric,
MIN(CASE V.P2 WHEN S.Itemnumber THEN REPLACE(S.Item, S.NumericPortion,'') END) AS P2Alpha,
MIN(CASE V.P2 WHEN S.Itemnumber THEN S.NumericPortion END) AS P2Numeric,
MIN(CASE V.P3 WHEN S.Itemnumber THEN REPLACE(S.Item, S.NumericPortion,'') END) AS P3Alpha,
MIN(CASE V.P3 WHEN S.Itemnumber THEN S.NumericPortion END) AS P3Numeric
FROM Splits S
CROSS APPLY (VALUES(1,2,3)) AS V(P1,P2,P3)
GROUP BY S.SomeString)
SELECT P.SomeString
FROM Pivoted P
ORDER BY P.P1Alpha,
P.P1Numeric,
P.P2Alpha,
P.P2Numeric,
P.P3Alpha,
P.P3Numeric;
This outputs:
A1
A1-1-1
A1-1-2
A1-1-3
A1-2-1
A1-2-2
A1-2-3
A1-7
A1-A1-3
A2-1-1
A2-1-2
A2-1-3
A2-2-1
A2-2-2
A2-2-3
A2-10-1
A2-10-2
A2-10-3
A10-1-1
A10-1-2
A10-1-3
A10-2-1
A10-2-2
A10-2-3
B1-4
B2-1-2
This makes use of 2 user defined functions. Firstly or DelimitedSplit8k_Lead (I used DelimitedSplit8k as I don't have the other on my sandbox at the moment). Then you also have NGrams8k.
I really should explain how this works, but yuck... (edit coming).
OK... (/sigh) What it does. Firstly, we split the data into its relevant parts using delimitedsplit8k(_lead). Then, within the SELECT we use FOR XML PATH to get (only) the nuemrical part of that string (For example, for 'A10' we get '10') and we convert it to a numerical value (an int).
Then we pivot that data out into respective parts. The alphanumerical part, and the numerical part. So, for the value 'A10-A1-12' we end up with the row:
'A', 10, 'A', 1, 12
Then, now that we've pivoted the data, we sort it by each column individually. And voila.
This will fall over if you have a value like 'A1A' or '1B1', and honestly, I'm not changing it to catter for that. This was messy, and really isn't what the RDBMS should be doing.
Up to 3 dashes can be covered by fiddling with replace & parsename & patindex:
declare #TabelWithBarcodes table (id int primary key identity(1,1), barcode varchar(20) not null, unique (barcode));
insert into #TabelWithBarcodes (barcode) values
('2-2-3'),('A2-2-2'),('A2-2-1'),('A2-10-3'),('A2-10-2'),('A2-10-1'),('A2-1-3'),('A2-1-2'),('A2-1-1'),
('A10-2-3'),('A10-2-2'),('A10-2-10'),('A10-1-3'),('AA10-A111-2'),('A10-1-1'),
('A1-7'),('A1-2-3'),('A1-2-12'),('A1-2-1'),('A1-1-3'),('B1-1-2'),('A1-1-1'),('A1'),('A10-10-1'),('A12-10-1'), ('AB1-2-E1') ;
with cte as
(
select barcode,
replace(BarCode, '-', '.')
+ replicate('.0', 3 - (len(BarCode)-len(replace(BarCode, '-', '')))) as x
from #TabelWithBarcodes
)
select *
, substring(parsename(x,4), 1, patindex('%[0-9]%',parsename(x,4))-1)
,cast(substring(parsename(x,4), patindex('%[0-9]%',parsename(x,4)), 10) as int)
,substring(parsename(x,3), 1, patindex('%[0-9]%',parsename(x,3))-1)
,cast(substring(parsename(x,3), patindex('%[0-9]%',parsename(x,3)), 10) as int)
,substring(parsename(x,2), 1, patindex('%[0-9]%',parsename(x,2))-1)
,cast(substring(parsename(x,2), patindex('%[0-9]%',parsename(x,2)), 10) as int)
,substring(parsename(x,1), 1, patindex('%[0-9]%',parsename(x,1))-1)
,cast(substring(parsename(x,1), patindex('%[0-9]%',parsename(x,1)), 10) as int)
from cte
order by
substring(parsename(x,4), 1, patindex('%[0-9]%',parsename(x,4))-1)
,cast(substring(parsename(x,4), patindex('%[0-9]%',parsename(x,4)), 10) as int)
,substring(parsename(x,3), 1, patindex('%[0-9]%',parsename(x,3))-1)
,cast(substring(parsename(x,3), patindex('%[0-9]%',parsename(x,3)), 10) as int)
,substring(parsename(x,2), 1, patindex('%[0-9]%',parsename(x,2))-1)
,cast(substring(parsename(x,2), patindex('%[0-9]%',parsename(x,2)), 10) as int)
,substring(parsename(x,1), 1, patindex('%[0-9]%',parsename(x,1))-1)
,cast(substring(parsename(x,1), patindex('%[0-9]%',parsename(x,1)), 10) as int)
extend each barcode to 4 groups by adding trailing .0 if missing
split each barcode in 4 groups
split each group in leading characters and trailing digits
sort by the leading character first
then by casting the digits as numeric
See db<>fiddle
An alterative approach would be to use your technique to split the string into its 3 component parts, then left pad those strings with leading zeros (or characters of your choice). That avoids any issues where the string may contain alphanumerics rather than just numerics. However, it does mean that strings containing different length alphabetic characters may not be sorted as you may expect... Here's the code to play with (using the definitions from #dnoeth's excellent answer):
;with cte as
(
select barcode
, case
when barcode like '%-%' then
substring(barcode,1,charindex('-',barcode)-1)
else
barcode
end part1
, case
when barcode like '%-%' then
substring(barcode, charindex('-',barcode) + 1, case
when barcode like '%-%-%' then
(charindex('-',barcode,charindex('-',barcode) + 1)) - 1
else
len(barcode)
end
- charindex('-',barcode))
else
''
end part2
, case
when barcode like '%-%-%' then
right(barcode,charindex('-',reverse(barcode))-1) --note: assumes you don't have %-%-%-%
else
''
end part3
from #TabelWithBarcodes
)
select barcode
, part1, part2, part3
, right('0000000000' + coalesce(part1,''), 10) lpad1
, right('0000000000' + coalesce(part2,''), 10) lpad2
, right('0000000000' + coalesce(part3,''), 10) lpad3
from cte
order by lpad1, lpad2, lpad3
DBFiddle Example
I have a scenario where I pull out data from multiple tables and the output is fixed width format. The fixed width output will look like:
Current output:
1001RJOHNKEITH25 20181017 NA
1002CDWANEKANE36 20181010 RR
1003CMIKAYLAGN44 20181011 RR
Desired output:
1001RJOHNKEITH25 20181017 NA
1002CDWANEKANE36 NA
1003RMIKAYLAGN44 20181010 RR
In this output, 1001 is the Person ID, R/C is the hard-coded indicator, then comes the name, age and registration date, record type. There is a condition for Registration date. If the record indicator is R, the registration date will show up. Otherwise, it should be null. I am not sure how to write a condition based on the fixed width field.
Rextester demo attached : https://rextester.com/MKESI50760
Any help?!
OK, well this is a little messy. But because your output is fixed width, you can always make the query into a view or a CTE (shown below) and then access specific positions in the string via SUBSTRING function.
There are LOT of drawbacks to doing this. If anybody changes the order or size of the fields being concatenated ... it all breaks. So, in the spirit of answering your question.. this is a way to do it. But I don't think It's a good way.
WITH BaseQuery as
(
select
t.Cid,
cast
(
concat(
LEFT(CONCAT(isnull(t.Cid,''),space(5)),5), -- PersonID
LEFT(CONCAT(isnull
((case when t.registeredonline = '1' and t.recordtype = 'NA' then 'R'
else 'C' end),''),space(10)),10),-- Record Indicator
LEFT(CONCAT(isnull(t.name,''),space(14)),14), --name
LEFT(CONCAT(isnull(t.age,''),space(5)),5), --age
LEFT(CONCAT(isnull(t.registrationdate,''),space(14)),14), -- Registration date should show up when record indicator is 'R'
LEFT(CONCAT(isnull(t.recordtype,''),space(3)),3) --Record type
) as nvarchar(max)
) result
from #temp t
)
SELECT
CONCAT(
SUBSTRING(result, 1, 34) -- portion before the 'registration date' region
, CASE WHEN SUBSTRING (RESULT, 6, 1) = 'R' THEN SUBSTRING (RESULT, 35, 10) ELSE SPACE(10) END
, SUBSTRING (RESULT, 46, 5)
)
FROM
BaseQuery
this gives the result:
1001 R JOHNKEITH 25 2018-10-17 NA
1002 C DWANEKANE 36 RR
1003 C JOHNKEITH 44 RR
The line
LEFT(CONCAT(isnull(t.registrationdate,''),space(14)),14)
become
CASE WHEN t.registeredonline = '1' and t.recordtype = 'NA' THEN LEFT(CONCAT(isnull(t.registrationdate,''),space(14)),14) ELSE SPACE(14) END, -- Registration date should show up when record indicator is 'R'
Just enclosing the original line with a condition to see if the result is 'R' or not.
The condition is showed up in the query from your link.
You just need to update one line in your query:
LEFT(CONCAT(isnull(t.registrationdate,''),space(14)),14), -- Registration date should show up when record indicator is 'R'
becomes
LEFT(CONCAT(isnull(CASE WHEN t.registeredonline = '1' and t.recordtype = 'NA' THEN CONVERT(char(10), t.registrationdate,126) ELSE NULL END,''),space(14)),14), -- Registration date should show up when record indicator is 'R'
This will check your date field and put in spaces instead of a date when the logic for record indicator evaluates to'R'
The 'convert' statement is needed otherwise the NULL date will end up showing as 1900-01-01.
Hope it helps.
Dealing with fixed width data:
Data in a fixed-width text file or string is arranged in rows and
columns, with one entry per row. Each column has a fixed width,
specified in characters, which determines the maximum amount of data
it can contain. No delimiters are used to separate the fields in the
file.
Parsing that data in T-SQL you can use SUBSTRING
https://learn.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-2017
SUBSTRING ( expression ,start , length )
Here's an example:
DECLARE #SampleData TABLE
(
[LineData] NVARCHAR(255)
);
INSERT INTO #SampleData (
[LineData]
)
VALUES ( '1001RJOHNKEITH25 20181017 NA' )
, ( '1002CDWANEKANE36 20181010 RR' )
, ( '1003CMIKAYLAGN44 20181011 RR' );
SELECT SUBSTRING([LineData], 1, 4) AS [PersonId]
, SUBSTRING([LineData], 5, 1) AS [Indicator]
, SUBSTRING([LineData], 6, 9) AS [Name]
, SUBSTRING([LineData], 15, 2) AS [Age]
, SUBSTRING([LineData], 18, 8) AS [RegDate]
, SUBSTRING([LineData], 27, 2) AS [RecordType]
, *
FROM #SampleData;
So in your example you're wanted to evaluate whether or not the "Indicator" is 'R', you can get to that value with:
SUBSTRING([LineData], 5, 1)
Not sure how that fits into what you have been tasked with. Based on other comments there's more to how this "Indicator" is determined.
Not ideal, but you could parse out all the fields and then put them back together doing the evaluation on that indicator field or use stuff in a case statement to replace the date with blanks when evaluating if indicator is R in the string.
DECLARE #SampleData TABLE
(
[LineData] NVARCHAR(255)
);
INSERT INTO #SampleData (
[LineData]
)
VALUES ( '1001RJOHNKEITH25 20181017 NA' )
, ( '1002CDWANEKANE36 20181010 RR' )
, ( '1003CMIKAYLAGN44 20181011 RR' );
--We check for R using substring
--when not equal to R we replace where Registration date in the string was with blanks.
SELECT CASE WHEN SUBSTRING([LineData], 5, 1) = 'R' THEN [LineData]
ELSE STUFF([LineData], 18, 8, ' ')
END AS [LineData]
FROM #SampleData;
Select ColA, CASE WHEN ColB (Criteria here) THEN NULL ELSE ColB END AS ColB, ColC
Thank you in advance.
I want to remove string after . including ., but length is variable and string can be of any length.
1)Example:
Input:- SCC0204.X and FRK0005.X and RF0023.X and ADF1010.A and HGT9010.V
Output: SCC0204 and FRK0005 and RF0023 and ADF1010.A and HGT9010.V
I tried using the charindex but as the length keeps on changing i wasn't able to do it. I want to trim the values with ending with only X
Any help will be greatly appreciated.
Assuming there is only one dot
UPDATE TABLE
SET column_name = left(column_name, charindex('.', column_name) - 1)
For SELECT
select left(column_name, charindex('.', column_name) - 1) AS col
from your_table
Hope this helps. The code only trims the string when the value has a decimal "." in it and if that value is equal to .X
;WITH cte_TestData(Code) AS
(
SELECT 'SCC0204.X' UNION ALL
SELECT 'FRK0005.X' UNION ALL
SELECT 'RF0023.X' UNION ALL
SELECT 'ADF1010.A' UNION ALL
SELECT 'HGT9010.V' UNION ALL
SELECT 'SCC0204' UNION ALL
SELECT 'FRK0005'
)
SELECT CASE
WHEN CHARINDEX('.', Code) > 0 AND RIGHT(Code,2) = '.X'
THEN SUBSTRING(Code, 1, CHARINDEX('.', Code) - 1)
ELSE Code
END
FROM cte_TestData
If the criteria is only to replace remove .X then probably this should also work
;WITH cte_TestData(Code) AS
(
SELECT 'SCC0204.X' UNION ALL
SELECT 'FRK0005.X' UNION ALL
SELECT 'RF0023.X' UNION ALL
SELECT 'ADF1010.A' UNION ALL
SELECT 'HGT9010.V' UNION ALL
SELECT 'SCC0204' UNION ALL
SELECT 'FRK0005'
)
SELECT REPLACE (Code,'.X','')
FROM cte_TestData
Use LEFT String function :
DECLARE #String VARCHAR(100) = 'SCC0204.XXXXX'
SELECT LEFT(#String,CHARINDEX('.', #String) - 1)
I think your best bet here is to create a function that parses the string and uses regex. I hope this old post helps:
Perform regex (replace) in an SQL query
However, if the value you need to trim is constantly ".X", then you should use
select replace(string, '.x', '')
Please check the below code. I think this will help you.
DECLARE #String VARCHAR(100) = 'SCC0204.X'
IF (SELECT RIGHT(#String,2)) ='.X'
SELECT LEFT(#String,CHARINDEX('.', #String) - 1)
ELSE
SELECT #String
Update: I just missed one of the comments where the OP clarifies the requirement. What I put together below is how you would deal with a requirement to remove everything after the first dot on strings ending with X. I leave this here for reference.
;WITH cte_TestData(Code) AS
(
SELECT 'SCC0204.X' UNION ALL -- ends with '.X'
SELECT 'FRK.000.X' UNION ALL -- ends with '.X', contains multiple dots
SELECT 'RF0023.AX' UNION ALL -- ends with '.AX'
SELECT 'ADF1010.A' UNION ALL -- ends with '.A'
SELECT 'HGT9010.V' UNION ALL -- ends with '.V'
SELECT 'SCC0204.XF' UNION ALL -- ends with '.XF'
SELECT 'FRK0005' UNION ALL -- totally clean
SELECT 'ABCX' -- ends with 'X', not dots
)
SELECT
orig_string = code,
newstring =
SUBSTRING
(
code, 1,
CASE
WHEN code LIKE '%X'
THEN ISNULL(NULLIF(CHARINDEX('.',code)-1, -1), LEN(code))
ELSE LEN(code)
END
)
FROM cte_TestData;
FYI - SQL Server 2012+ you could simplify this code like this:
SELECT
orig_string = code,
newstring =
SUBSTRING(code, 1,IIF(code LIKE '%X', ISNULL(NULLIF(CHARINDEX('.',code)-1, -1), LEN(code)), LEN(code)))
FROM cte_TestData;
With SUBSTRING you can achieve your requirements by below code.
SELECT SUBSTRING(column_name, 0, CHARINDEX('.', column_name)) AS col
FROM your_table
If you want to remove fixed .X from string you can also use REPLACE function.
SELECT REPLACE(column_name, '.X', '') AS col
I have to check the string with the following scenarios in WHERE condition.
The data ProductId stored in the database can be like
7314-3337 sometimes with - symbol and not prefixed with 19
73143337 sometimes without symbol and not prefixed with 19
1973143337 correct format
197314-3337 sometimes with - symbol
I need to filter the record ProductId and the input is correct format , i.e 1973143337
WHERE P.ProductId=#ProductId
How can i filter it if the data stored in other 3 formats?
How to use the string replace(-) and prefix 19 if not exists in SQL server?
please check this 2 approach.
one is very simple and second is some trick. (I think you go with second option which cover everythings)
declare #t table (ProductId varchar(100))
insert into #t
values
('7314-3337')
,('73143337')
,('1973143337')
,('197314-3337')
,('73683337')
,('73143338')
declare #valuetosearch varchar(100) = '1973143337'
--this is very simple , but not work in each schenerio. the second approach is fine.
--select CHARINDEX ( '19','1973143337'), SUBSTRING('1973143337',3,len('1973143337'))
--select * from
--#t
--where
--replace(REPLACE(ProductId ,'-','') ,'19','') = replace(REPLACE(#valuetosearch ,'-','') ,'19','')
select * from
#t
where
REPLACE( case when CHARINDEX ( '19',ProductId) = 1
then SUBSTRING( ProductId ,3,LEN(ProductId))
else ProductId
end ,'-','')
=
REPLACE ( case when CHARINDEX ( '19',#valuetosearch) = 1
then SUBSTRING( #valuetosearch ,3,LEN(#valuetosearch))
else #valuetosearch
end ,'-','')
You should first sanitize your data, if it is not consistent then you won't be able to get the correct results.
For prefixing with 19:
UPDATE foo
SET ProductId = '19' + ProductId
WHERE Left(ProductID, 2) <> '19'
For removing the '-':
UPDATE foo
SET ProductId = REPLACE(ProductId, '-', '')
Then you should be able to get the results you want.
UPDATE:
You could construct a CTE with the results in a single format, and then, filter that CTE:
WITH cte (
FormattedPID
,ProductId
)
AS (
SELECT CASE
WHEN LEFT(ProductId, 2) = '19'
THEN REPLACE(ProductId, '-', '')
ELSE '19' + REPLACE(ProductId, '-', '')
END
,ProductId
FROM foo
)
SELECT FormattedPID
,ProductId
FROM cte
WHERE FormattedPID = #ProductID
You could make sure the column is in the correct format like this:
Remove the - by replacing it with an empty string (197314-3337 -> 1973143337, 7314-3337 -> 73143337).
Add 19 at the beginning (1973143337 -> 191973143337, 73143337 -> 1973143337).
Take 10 rightmost characters of the result and compare to the input (1973143337 -> 1973143337, 1973143337 -> 1973143337).
In Transact-SQL:
WHERE RIGHT('19' + REPLACE(P.ProductId, '-', ''), 10) = #ProductId
Of course, this means no index seek for you, because we are applying functions to the column.
An alternative to that would be to produce the three non-standard formats out of the input:
cut off the initial 19 (1973143337 -> 73143337);
insert the - (1973143337 -> 197314-3337);
insert the - and cut off the 19 (1973143337 -> 197314-3337 -> 7314-3337).
In Transact-SQL:
WHERE P.ProductId IN (
#ProductId,
SUBSTRING(#ProductId, 3, 999999999),
STUFF(#ProductId, 7, 0, '-'),
SUBSTRING(STUFF(#ProductId, 7, 0, '-'), 3, 999999999)
)
This way if there is an index on P.ProductId, it will be used efficiently.
Both approaches assume that the length of the correct format is fixed.