How to substring records with variable length - sql

I have a table which has a column with doc locations, such as AA/BB/CC/EE
I am trying to get only one of these parts, lets say just the CC part (which has variable length). Until now I've tried as follows:
SELECT RIGHT(doclocation,CHARINDEX('/',REVERSE(doclocation),0)-1)
FROM Table
WHERE doclocation LIKE '%CC %'
But I'm not getting the expected result

Use PARSENAME function like this,
DECLARE #s VARCHAR(100) = 'AA/BB/CC/EE'
SELECT PARSENAME(replace(#s, '/', '.'), 2)

This is painful to do in SQL Server. One method is a series of string operations. I find this simplest using outer apply (unless I need subqueries for a different reason):
select *
from t outer apply
(select stuff(t.doclocation, 1, patindex('%/%/%', t.doclocation), '') as doclocation2) t2 outer apply
(select left(tt.doclocation2), charindex('/', tt.doclocation2) as cc
) t3;

The PARSENAME function is used to get the specified part of an object name, and should not used for this purpose, as it will only parse strings with max 4 objects (see SQL Server PARSENAME documentation at MSDN)
SQL Server 2016 has a new function STRING_SPLIT, but if you don't use SQL Server 2016 you have to fallback on the solutions described here: How do I split a string so I can access item x?

The question is not clear I guess. Can you please specify which value you need? If you need the values after CC, then you can do the CHARINDEX on "CC". Also the query does not seem correct as the string you provided is "AA/BB/CC/EE" which does not have a space between it, but in the query you are searching for space WHERE doclocation LIKE '%CC %'
SELECT SUBSTRING(doclocation,CHARINDEX('CC',doclocation)+2,LEN(doclocation))
FROM Table
WHERE doclocation LIKE '%CC %'

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

SQL append in middle of column data in update query

When we do data refresh we need to append \\prod01\\Test\Load server name with \\prod01.qa.com\\Test\Load
How do I write update query for this. There can be different server names all I need to do is write update script to append server name with qa.com
This is my query that gives all results that have server location.
select * from AppSetting where Value like '%\\%\%' or Value like '%//%/%';
My Prod data looks like this
Value
\\prod01\Images\Load
\\prod01prod6253\Images\Load
\\server05ser\Images\Delete
\\pgdg1076\Email
\\pgdg1076ythg\Test\Load
http://prod7/delta/
My QA data should looks like this after update query
Value
\\prod01.qa.com\Images\Load
\\prod01prod6253.qa.com\Images\Load
\\server05ser.qa.com\Images\Delete
\\pgdg1076.qa.com\Email
\\pgdg1076ythg.qa.com\Test\Load
http://prod7.qa.com/delta/
This is the update query I have. Can I write a generic query
UPDATE eroom.AppSetting
SET Location = REPLACE(Location, '\\prod01\', '\\prod01.qa.tbc.com\')
WHERE Location like '%\\prod01\%';
UPDATE eroom.AppSetting
SET Location = REPLACE(Location, '\\server05ser\', '\\server05ser.qa.tbc.com\')
WHERE Location like '%\\server05ser\%';
I'm posting this as a new answer, as the OP moved the goal posts quite a bit. Instead, I now use CHARINDEX to find the location of each slash (forward or back). As fortunately the injection needs to happen before the 3rd slash, we can use that to our advantage:
SELECT STUFF(V.Value,CI3.I,0,'.qa.tbc.com') AS NewValue,*
FROM (VALUES('\\prod01\Images\Load'),
('\\prod01prod6253\Images\Load'),
('\\server05ser\Images\Delete'),
('\\pgdg1076\Email'),
('\\pgdg1076ythg\Test\Load'),
('http://prod7/delta/'))V([value])
CROSS APPLY (VALUES(CASE WHEN V.[value] LIKE '%/%' THEN '/' ELSE '\' END)) L(C) --So I don't have to keep checking what character I need
CROSS APPLY (VALUES(CHARINDEX(L.C,V.[value]))) CI1(I)
CROSS APPLY (VALUES(CHARINDEX(L.C,V.[value],CI1.I+1))) CI2(I)
CROSS APPLY (VALUES(CHARINDEX(L.C,V.[value],CI2.I+1))) CI3(I);
This is one method. I put the expressions for the PATINDEX and CHARINDEX in the FROM, as I find it far easier to read, and means less repetition:
SELECT V.[value],
ISNULL(STUFF(V.Value,ISNULL(CI.fs,CI.bs),0,'.qa.tbc.com'),V.[value]) AS NewValue
FROM (VALUES('\\prod01\Images\Load'),
('\\prod05\Images\Delete'),
('\\prod10\Email'),
('//http://prod7/delta/'))V([value])
CROSS APPLY (VALUES(NULLIF(PATINDEX('%prod[0-9]%',V.value),0)))PI(I)
CROSS APPLY (VALUES(NULLIF(CHARINDEX('/',V.[value],PI.I),0),NULLIF(CHARINDEX('\',V.[value],PI.I),0))) CI(fs,bs);
This answers the original version of the question.
The simplest way might be with stuff() and a case expression:
select (case when location like '\\prod[0-9][0-9]\*'
then stuff(location, 9, 0, '.qa.tbc.com'
else location
end)
The "prod" portion looks to be fixed length, so you don't need to search for a pattern.

Count occurences of a pattern in SQL Server column

I have a varchar column in SQL Server 2012 with 3-letter patterns that are concatenated, like this value:
DECLARE #str VARCHAR(MAX) = 'POKPOKPOKHRSPOKPOKPOKPOKPOKPOIHEFHEFPOKPOHHRTHRT'
I need a query to search and count the occurrences of the pattern POK in that string. The trick is, all POK that are together must be counted as one. So, in the string above there are 3 "chains" of POK:
POKPOKPOK, interrupted by a HRS
POKPOKPOKPOKPOK, interrupted by a POI
POK, interrupted by a POH
So, my desired result is 3. If I use the following query, I get 9, that are the total POKs in string, which is not what I need.
SELECT (LEN(#str) - LEN(REPLACE(#str, 'POK', '')))/LEN('POK')
I think I need some sort of regexp to isolate the POKs and then count, but couldn't find a way to apply that in SQL Server. Any help much appreciated.
This is really not something that you want to do in SQL. But you can. Here is one method to reduce the adjacent 'POK's to a single POK:
select replace(replace(#str, 'POK', '<POK>'), 'POK><', '')
Well, this actually creates a '<POK>', but that is fine for our purposes.
Now, you can search in that:
select (len(replace(replace(#str, 'POK', '<POK>'), 'POK><', '')) -
len(replace(replace(replace(#str, 'POK', '<POK>'), 'POK><', ''), 'POK', ''))
) / 3
Here is a SQL Fiddle.

Is there a way to query JSON column in SQL Server ignoring capitalization of keys?

I am trying to query a JSON column that has mixed capitalization. For instance, some rows have keys that are all lower case like below:
{"name":"Screening 1","type":"template","pages":[{"pageNumber":1,...}
However, some of the rows have keys that are capitalized on its first letter like this:
{"Type":"template","Name":"Screening2","Pages":[{"PageNumber":1,...}
Unfortunately, SQL Server seems to only supports JSON path system that is case sensitive. Therefore, I can't query on all rows successfully. If I use lower case path like '$.pages' in a query like below:
SELECT ST.Id AS Screening_Tool_Id
, ST.Name AS Screening_Tool_Name
, ST.Description AS Screening_Tool_Description
, COUNT(JSON_VALUE (SRQuestions.value, '$.id')) AS Question_Specific_Id
FROM dbo.ScreeningTemplate AS ST
CROSS APPLY OPENJSON(ST.Workflow, '$.pages') AS SRPages
CROSS APPLY OPENJSON(SRPages.Value, '$.sections') AS SRSections
I miss any row that has capitalized keys. Is there any way to query all rows ignoring their capitalization?
According to MS, looks like you're stuck with a case-sensitive query:
When OPENJSON parses a JSON array, the function returns the indexes of
the elements in the JSON text as keys.+ The comparison used to match
path steps with the properties of the JSON expression is
case-sensitive and collation-unaware (that is, a BIN2 comparison).
https://learn.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql
If the only variations are in the capitalization of the first character, you could try to work around this limitation by creating queries with the variants and UNION the results together.
Maybe you can just lower the json:
COUNT(JSON_VALUE (lower(SRQuestions.value), '$.id')) AS Question_Specific_Id
Old question but I came across this when googling a similar issue so I will chip in with my solution:
SELECT #pb = PB from
OPENJSON(#PropertyBagsAsJson, '$."$values"')
WITH (
PbId1 nvarchar(MAX) 'lax $.Id',
PbId2 nvarchar(MAX) 'lax $.id',
PB nvarchar(MAX) '$' AS JSON
)
WHERE COALESCE(PbId1,PbId2) = #PropertyBagId
I hope that the example is clear. Basically I just add all possible casing of the property and then just use Coalesce to filter the results.
You can use openjson. Instead of
JSON_VALUE (SRQuestions.value, '$.id')
you can write
(select Value
from openjson( SRQuestions.value )
where [Key] collate latin1_general_ci = 'id')
You must use a Case-Insensitive "_ci" collation here. "UTF8_General_CI" works too, as does "database_default" if the database uses a CI collation.

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'