Regex to find a pattern using sql - sql

I have a column called 'user_details' in a SQL table 'customers' with the below value in it:
{4:"2021-06-07T16:17:26.327+02:00",5:"1623075805735.phna3uyo",6:"www.abc.com/connexion",10:"loggedOut",12:"567879",2:"1026530505.1619610156",3:"event"}
{4:"2021-06-01T13:11:34.742+02:00",5:"1622545894742.ml2cyuw",6:"www.seigneuriegauthier.com/connexion",10:"loggedOut",12:"",2:"435305774.1622545085",3:"event"}
{4:"2021-06-01T10:13:30.85+02:00",5:"1622535210085.vlowlxmj",6:"www.seigneuriegauthier.com/connexion",10:"loggedOut",12:"278356",2:"1381684281.1622534907",3:"event"}
{4:"2021-06-01T10:24:51.808+02:00",5:"1622536405142.h45exkgg",6:"seigneuriegauthier.com/connexion",10:"loggedOut",12:"251666",2:"1019448131.1621925108",3:"event"}
{4:"2021-06-01T14:13:54.476+02:00",5:"1622551449049.k14838ij",6:"www.seigneuriegauthier.com/connexion",10:"loggedOut",12:"601322",2:"1975087820.1622548509",3:"event"}
I'm trying to extract the id after number 12:" without the "" i.e. 567879,278356 etc into another column.
Since there are many reputative "" I'm unable to build the regex expression.
I tried the below but didn't get the exact match
(?<=:)"[0-9][0-9][0-9][0-9][0-9][0-9]
How can I write a SQL query to retrieve this. Pls help.

On SQL Server, with no regex, support, we can try using the base string functions SUBSTRING along with CHARINDEX:
SELECT
user_details,
SUBSTRING(user_details,
CHARINDEX('12:"', user_details) + 4,
CHARINDEX('"', user_details, CHARINDEX('12:"', user_details) + 4) -
CHARINDEX('12:"', user_details) - 4) AS val
FROM customers;
Demo

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

How to substring records with variable length

I have a table which has a column with doc locations, such as AA/BB/CC/EE
I am trying to get only one of these parts, lets say just the CC part (which has variable length). Until now I've tried as follows:
SELECT RIGHT(doclocation,CHARINDEX('/',REVERSE(doclocation),0)-1)
FROM Table
WHERE doclocation LIKE '%CC %'
But I'm not getting the expected result
Use PARSENAME function like this,
DECLARE #s VARCHAR(100) = 'AA/BB/CC/EE'
SELECT PARSENAME(replace(#s, '/', '.'), 2)
This is painful to do in SQL Server. One method is a series of string operations. I find this simplest using outer apply (unless I need subqueries for a different reason):
select *
from t outer apply
(select stuff(t.doclocation, 1, patindex('%/%/%', t.doclocation), '') as doclocation2) t2 outer apply
(select left(tt.doclocation2), charindex('/', tt.doclocation2) as cc
) t3;
The PARSENAME function is used to get the specified part of an object name, and should not used for this purpose, as it will only parse strings with max 4 objects (see SQL Server PARSENAME documentation at MSDN)
SQL Server 2016 has a new function STRING_SPLIT, but if you don't use SQL Server 2016 you have to fallback on the solutions described here: How do I split a string so I can access item x?
The question is not clear I guess. Can you please specify which value you need? If you need the values after CC, then you can do the CHARINDEX on "CC". Also the query does not seem correct as the string you provided is "AA/BB/CC/EE" which does not have a space between it, but in the query you are searching for space WHERE doclocation LIKE '%CC %'
SELECT SUBSTRING(doclocation,CHARINDEX('CC',doclocation)+2,LEN(doclocation))
FROM Table
WHERE doclocation LIKE '%CC %'

how to use substr in SQL Server?

I have the following extract of a code used in SAS and wanted to write it in SQL Server to extract data.
substr(zipname,1,4) in("2000","9000","3000","1000");run;
How do I write this in SQL Server ?
I tried and got this error:
An expression of non-boolean type specified in a context where a
condition is expected
In sql server, there's no substr function (it's substring)
by the way, you need a complete query...
select blabla
from blibli
where substring(zipname, 1, 4) in ('2000', '9000', 3000', '1000')
assuming zipname is a varchar or something like that...
You need a table that you are getting the records from, and zipname would be a column in the table. The statement would be something like this:
select * from tablename where substring(zipname,1,4) in ('2000','9000','3000','1000')
Since you want the first x characters, you can also use the left() function.
where left(zipname, 4) in (values go here)
Bear in mind that your values have to be single quoted. Your question has double quotes.

SQL Server substring

I need a good expression in order to select correctly parts of a field.
For example, the field can be of the type: "google_organic" or "google_campaign_HereGoesMyCode" . The part I am interested in is "organic" or "campaign" without any other addition.
So far I select with this:
substring(Referer, charIndex('_',Referer)+1, len(Referer))
But in the case of "campaign" I select the whole thing... I don't know how to manage the existence or non-existence of the second underscore...
thank you
One way is to basically create a lastIndex type search using the below SQL and use the result as the length:
len(Referer) – (charindex('_', reverse(Referer))-1)
You can then rewrite your query as follows, although you need the result of the first charIndex so this is fairly intense:
substring(Referer, charIndex('_',Referer)+1, (len(Referer) – (charindex('_', reverse(Referer))-1) - (charIndex('_',Referer)+1))-1 )
I realize that this will now only work if you have 2 underscores. But you can filter which query to run based off a CASE/WHEN statement.

Implement an IN Query using XQuery in MSSQLServer 2005

I'm trying to query an xml column using an IN expression. I have not found a native XQuery way of doing such a query so I have tried two work-arounds:
Implement the IN query as a concatenation of ORs like this:
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:variable("#Param1368320145") or
text() = sql:variable("#Param2043685301") or ...
Implement the IN query with the String fn:contains(...) method like this:
WHERE Data.exist('/Document/Field2[fn:contains(sql:variable("#Param1412022317"), .)]') = 1
Where the given parameter is a (long) string with the values separated by "|"
The problem is that Version 1. doesn't work for more than about 50 arguments. The server throws an out of memory exception. Version 2. works, but is very, very slow.
Has anyone a 3. idea? To phrase the problem more complete: Given a list of values, of any sql native type, select all rows whose xml column has one of the given values at a specific field in the xml.
Try to insert all your parameters in a table and query using sql:column clause:
SELECT Mytable.Column FROM MyTable
CROSS JOIN (SELECT '#Param1' T UNION ALL SELECT '#Param2') B
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:column("T")