How to group through a string part? - sql

I've a table which contains logs from a web portal, it contains the url visited, the request duration, the referer...
One of these columns is the path info and contains strings like following:
/admin/
/export/
/project2/
/project1/news
/project1/users
/user/id/1
/user/id/1/history
/user/id/2
/forum/topic/14/post/456
I would like to calculate with sql queries some stats based on this column, so I would like to know how can I create aggregate based on the first part of the path info?
It'd let me count number of url starting by /admin/, /export/, /project1/, /project2/, /user/, /forum/, ...
Making it with a programming language would be easy with regex, but I read that similar function does not exists on SQLServer.

I would use CHARINDEX() to find the first occurrence of the "/" starting AFTER the leading first character '/', so anything AFTER the second is stripped off.
select
LEFT( pathInfo, CHARINDEX( '/', pathInfo, 2 )) as RootLevelPath,
count(*) as Hits
from
temp
group by
LEFT( pathInfo, CHARINDEX( '/', pathInfo, 2 ))
Working result from SQLFiddle

DRapp's is perfect for grouping on the first fragment of the URL. If you need to group by other levels it might get unwieldy to manage the nested LEFT/CHARINDEX statements.
Here's one way to group by a parameterized level:
declare #t table (pathId int identity(1,1) primary key, somePath varchar(100));
insert into #t
select '/admin/' union all
select '/export/' union all
select '/project2/' union all
select '/project1/news' union all
select '/project1/users' union all
select '/user/id/1' union all
select '/user/id/1/history' union all
select '/user/id/2' union all
select '/forum/topic/14/post/456' union all
select '/forum/topic/14/post/789' union all
select '/forum/topic/14/post/789'
declare #level int =1;
;with fragments as
( select pathId,
[n] = x.query('.'),
[Fragment] = x.value('.', 'varchar(100)')
from ( select PathId,
cast('<r>' + replace(stuff(somePath, 1, 1, ''), '/', '</r><r>') + '</r>' as xml)
.query('r[position()<=sql:variable("#level")]')
from #t
) d (PathId, X)
)
select count(*), [path] = max(r.v)
from fragments f
cross
apply ( select '/' + p.n.value('.', 'varchar(100)')
from fragments
cross
apply n.nodes('r')p(n)
where PathId = f.PathId
for xml path('')
) r(v)
group
by fragment;

Related

Query everything that comes after the '#'

I am setting up a new query but unfortunately I got stuck in some kind of functions in SQL. I have some records with specific emails. All I want is bringing everything that comes after the '#'.
For example:
cesarcastillo88#hotmail.com ==> as a result I should get the following: hotmail.com.
This was not complicated at all because of the fact that the record shows one email only.
But...what if that record includes the following emails:
cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com
I did it perfectly for those cases with only 1 email in a single record
I used the following formula:
substring(**columnName**, charindex('#', sfe.**columnName**), len(sfe.**columnName**))
However, how am I suppose to do it with 3 emails in a single record?
My desired outcome is the following:
hotmail.com ; gmail.com ; compliance.com
Here is a possible solution based on the assumption that you have some sort of ID column that could help to identify each unique row:
;with smpl as (
select *
from (values
(1, 'cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com'),
(2, 'abc#cde.net'),
(3, 'laura23#gmail.com ; test#compliance.com')) x(id, email)
), split(id, A, B) as (
select distinct id, CAST(LEFT(email, CHARINDEX(';',email+';')-1) as varchar(100)),
CAST(STUFF(email, 1, CHARINDEX(';',email+';'), '') as varchar(100))
from smpl
union all
select id, CAST(LEFT(B, CHARINDEX(';',B+';')-1) as varchar(100)),
CAST(STUFF(B, 1, CHARINDEX(';',B+';'), '') as varchar(100))
from split
where B > ''
), clr as (
select ID, substring(LTRIM(RTRIM(A)), charindex('#', LTRIM(RTRIM(A))) + 1, len(LTRIM(RTRIM(A)))) cleanEmail
--into #tempTbl
from split
), ccat as (
SELECT DISTINCT ST2.ID,
SUBSTRING(
(
SELECT ';'+ST1.cleanEmail AS [text()]
FROM clr ST1
WHERE ST1.ID = ST2.ID
ORDER BY ST1.ID
FOR XML PATH ('')
), 2, 1000) Emails
FROM clr ST2
)
select * from ccat
And here is some explanation on how this all works:
First CTE expression splits emails into separate rows using ; as a separator
Second CTE is based on your function to remove the recipient from email address and only leave the domain
The last one concatenates everything back and uses same ; as separator. Feel free to add extra spaces around if that's your preferred output.
You don't say what version of SQL Server, but I'll assume 2016 or newer. They key is the STRING_SPLIT function. To join it to your data, you'll want to use CROSS APPLY.
create table #a (
id int identity(1,1),
email varchar(max)
)
insert #a
values ('cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com')
, ('dannyboy#irish.com')
select id
, email
, substring(email, CHARINDEX('#', email) + 1, len(email)) as domain
from #a
select a.id
, substring(ltrim(rtrim(b.value)), CHARINDEX('#', ltrim(rtrim(b.value))) + 1, len(ltrim(rtrim(b.value)))) as domain
from #a a
cross apply string_split(email, ';') b
drop table #a

How to split multiple strings and insert SQL Server FN_SplitStr

I have 2 strings and one integer:
#categoryID int = 163,
#Ids nvarchar(2000) = '1,2,3',
#Names nvarchar(2000) = 'Bob,Joe,Alex'
I need to select 3 columns 3 rows; The most accomplished is 3 rows 2 columns:
select #categoryID,items from FN_SplitStr(#Ids,',')
resulting:
163,1
163,2
163,3
But I can't figure out how to split both strings.
I tried many ways like:
select #categoryID,items from FN_SplitStr((#Ids,#Names),',')
select #categoryID,items from FN_SplitStr(#Ids,','),items from FN_SplitStr(#Names,',')
EXPECTED OUTPUT:
163,1,Bob
163,2,Joe
163,3,Alex
NOTE1: I looked over tens of questions the most similar is:
How to split string and insert values into table in SQL Server AND SQL Server : split multiple strings into one row each but this question is different.
NOTE2: FN_SplitStr is a function for spliting strings in SQL. And I'm trying to create a stored procedure.
Based on your expected output, you have to use cross apply twice and then create some sort of ranking to make sure that you are getting the right value. As IDs and Names don't seem to have any relationship cross apply will create multiple rows (when you split the string to Names and ID)
There might be better way but this also gives your expected output. You can change this string split to your local function.
1st Dense rank is to make sure that we get three unique names and 2nd dense rank is the rank within the name based on order by with ID and outside of the sub query you have to do some comparison to get only 3 rows.
Declare #categoryID int = 163,
#Ids nvarchar(2000) = '1,2,3',
#Names nvarchar(2000) = 'Bob,Joe,Alex'
select ConcatenatedValue, CategoryID, IDs, Names from (
select concat(#categoryID,',',a.value,',',b.value) ConcatenatedValue, #categoryID CategoryID,
A.value as IDs, b.value as Names , DENSE_RANK() over (order by b.value) as Rn,
DENSE_RANK() over (partition by b.value order by a.value) as Ranked
from string_split(#IDs,',') a
cross apply string_split(#names,',') B ) t
where Rn - Ranked = 0
Output:
Inside your stored procedure do a string split of #Ids and insert into #temp1 table with an identity(1,1) column rowed. You will get:
163,1,1
163,2,2
163,3,3
Then do the second string split of #Names and insert into #temp2 table with an identity(1,1) column rowed. You will get:
Bob,1
Joe,2
Alex,3
You can then do an inner join with #temp1 and #temp2 on #temp1.rowid = #temp2.rowid and get:
163,1,Bob
163,2,Joe
163,3,Alex
I hope this solves your problem.
You can do this with a recursive CTE:
with cte as (
select #categoryId as categoryId,
convert(varchar(max), left(#ids, charindex(',', #ids + ',') - 1)) as id,
convert(varchar(max), left(#names, charindex(',', #names + ',') - 1)) as name,
convert(varchar(max), stuff(#ids, 1, charindex(',', #ids + ','), '')) as rest_ids,
convert(varchar(max), stuff(#names, 1, charindex(',', #names + ','), '')) as rest_names
union all
select categoryId,
convert(varchar(max), left(rest_ids, charindex(',', rest_ids + ',') - 1)) as id,
convert(varchar(max), left(rest_names, charindex(',', rest_names + ',') - 1)) as name,
convert(varchar(max), stuff(rest_ids, 1, charindex(',', rest_ids + ','), '')) as rest_ids,
convert(varchar(max), stuff(rest_names, 1, charindex(',', rest_names + ','), '')) as rest_names
from cte
where rest_ids <> ''
)
select categoryid, id, name
from cte;
Here is a db<>fiddle.
You need to split CSV value with record number. For that you need to use ROW_NUMBER() function to generate record wise unique ID as column like "RID", while you split CSV columns in row.
You can use table value split function or XML as used below.
Please check this let us know your solution is found or not.
DECLARE
#categoryID int = 163,
#Ids nvarchar(2000) = '1,2,3',
#Names nvarchar(2000) = 'Bob,Joe,Alex'
SELECT
#categoryID AS categoryID,
q.Id,
w.Names
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY f.value('.','VARCHAR(10)')) AS RID,
f.value('.','VARCHAR(10)') AS Id
FROM
(
SELECT
CAST('<a>' + REPLACE(#Ids,',','</a><a>') + '</a>' AS XML) AS idXML
) x
CROSS APPLY x.idXML.nodes('a') AS e(f)
) q
INNER JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY h.value('.','VARCHAR(10)')) AS RID,
h.value('.','VARCHAR(10)') AS Names
FROM
(
SELECT
CAST('<a>' + REPLACE(#Names,',','</a><a>') + '</a>' AS XML) AS namesXML
) y
CROSS APPLY y.namesXML.nodes('a') AS g(h)
) w ON w.RID = q.RID

Select text from between two characters in string

I have data in database an example of data below
folder/subfolder/file/doc
folder/subfolder/doc
how do I get the 1st instance of characters from between the '/'
I want to extract 'folder/subfolder'
I have tried the following but not what I need. this gets 'folder/'
LEFT([Cat], CHARINDEX('/', [Cat]) ) as 'doc_cat',
and the below gets the last part
RIGHT([Cat], CHARINDEX('/', [Cat]) ) as 'doc_cat2',
I want to get the 1st part of and second part of string
Here is one method:
select left(doc_cat_1, charindex('/', doc_cat_1) - 1)
from t cross apply
(select stuff(cat, 1, charindex('/', cat), '') as doc_cat_1
) v1;
The string handling capabilities of SQL Server are pretty lousy. Apply at least makes it easier to handle intermediate results.
You can use LEFT and CHARINDEX
LEFT([Cat],charindex('/',[Cat],charindex('/',[Cat])+1)-1) AS 'doc_cat'
One more way to accomplish using XML -
declare #s table(patterns nvarchar(100))
insert into #s
values ('folder/subfolder/file/doc'), ('folder/subfolder/doc'),('folder/subfolder')
select cast(concat('<x>', REPLACE(patterns, '/', '</x><x>'), '</x>') as xml).value('/x[1]','varchar(100)') + '/'
+ cast(concat('<x>', REPLACE(patterns, '/', '</x><x>'), '</x>') as xml).value('/x[2]','varchar(100)')
from #s
If you're on SQL 2016 or newer, you could use STRING_SPLIT()
WITH cte AS (
SELECT cat, value, ROW_NUMBER() OVER (PARTITION BY cat ORDER BY cat) rn
FROM someTable CROSS APPLY
STRING_SPLIT(cat,'/')
)
SELECT cat, value FROM cte WHERE rn = 2;
The advantage here is that rn could be any number you need.
Fiddle here.

how to extract a particular id from the string using sql

I want to extract a particular ids from the records in a table.For example i have a below table
Id stringvalue
1 test (ID 123) where another ID 2596
2 next ID145 and the condition I(ID 635,897,900)
I want the result set as below
ID SV
1 123,2596
2 145,635,897,900
i have tried the below query which extracts only one ID from the string:
Select Left(substring(string,PATINDEX('%[0-9]%',string),Len(string)),3) from Table1
I seriously don't encourage the T-SQL approach (as SQL is not meant to do this), however, a working version is presented below -
Try this
DECLARE #T TABLE(ID INT IDENTITY,StringValue VARCHAR(500))
INSERT INTO #T
SELECT 'test (ID 123) where another ID 2596' UNION ALL
SELECT 'next ID145 and the condition I(ID 635,897,900)'
;WITH SplitCTE AS(
SELECT
F1.ID,
X.SplitData
,Position = PATINDEX('%[0-9]%', X.SplitData)
FROM (
SELECT *,
CAST('<X>'+REPLACE(REPLACE(StringValue,' ',','),',','</X><X>')+'</X>' AS XML) AS XmlFilter
FROM #T F
)F1
CROSS APPLY
(
SELECT fdata.D.value('.','varchar(50)') AS SplitData
FROM f1.xmlfilter.nodes('X') AS fdata(D)) X
WHERE PATINDEX('%[0-9]%', X.SplitData) > 0),
numericCTE AS(
SELECT
ID
,AllNumeric = LEFT(SUBSTRING(SplitData, Position, LEN(SplitData)), PATINDEX('%[^0-9]%', SUBSTRING(SplitData, Position, LEN(SplitData)) + 't') - 1)
FROM SplitCTE
)
SELECT
ID
,STUFF(( SELECT ',' + c1.AllNumeric
FROM numericCTE c1
WHERE c1.ID = c2.ID
FOR XML PATH(''),TYPE)
.value('.','NVARCHAR(MAX)'),1,1,'') AS SV
FROM numericCTE c2
GROUP BY ID
/*
Result
ID SV
1 123,2596
2 145,635,897,900
*/
However, I completely agree with #Giorgi Nakeuri. It is better to use some programming language (if you have that at your disposal) and use regular expression for the same. You can figure out that, I have used REPLACE function two times, first to replace the blank space and second to replace the commas(,).
Hope you will get some idea to move on.

Select rows using in with comma-separated string parameter

I'm converting a stored procedure from MySql to SQL Server. The procedure has one input parameter nvarchar/varchar which is a comma-separated string, e.g.
'1,2,5,456,454,343,3464'
I need to write a query that will retrieve the relevant rows, in MySql I'm using FIND_IN_SET and I wonder what the equivalent is in SQL Server.
I also need to order the ids as in the string.
The original query is:
SELECT *
FROM table_name t
WHERE FIND_IN_SET(id,p_ids)
ORDER BY FIND_IN_SET(id,p_ids);
The equivalent is like for the where and then charindex() for the order by:
select *
from table_name t
where ','+p_ids+',' like '%,'+cast(id as varchar(255))+',%'
order by charindex(',' + cast(id as varchar(255)) + ',', ',' + p_ids + ',');
Well, you could use charindex() for both, but the like will work in most databases.
Note that I've added delimiters to the beginning and end of the string, so 464 will not accidentally match 3464.
You would need to write a FIND_IN_SET function as it does not exist. The closet mechanism I can think of to convert a delimited string into a joinable object would be a to create a table-valued function and use the result in a standard in statement. It would need to be similar to:
DECLARE #MyParam NVARCHAR(3000)
SET #MyParam='1,2,5,456,454,343,3464'
SELECT
*
FROM
MyTable
WHERE
MyTableID IN (SELECT ID FROM dbo.MySplitDelimitedString(#MyParam,','))
And you would need to create a MySplitDelimitedString type table-valued function that would split a string and return a TABLE (ID INT) object.
A set based solution that splits the id's into ints and join with the base table which will make use of index on the base table id. I assumed the id would be an int, otherwise just remove the cast.
declare #ids nvarchar(100) = N'1,2,5,456,454,343,3464';
with nums as ( -- Generate numbers
select top (len(#ids)) row_number() over (order by (select 0)) n
from sys.messages
)
, pos1 as ( -- Get comma positions
select c.ci
from nums n
cross apply (select charindex(',', #ids, n.n) as ci) c
group by c.ci
)
, pos2 as ( -- Distinct posistions plus start and end
select ci
from pos1
union select 0
union select len(#ids) + 1
)
, pos3 as ( -- add row number for join
select ci, row_number() over (order by ci) as r
from pos2
)
, ids as ( -- id's and row id for ordering
select cast(substring(#ids, p1.ci + 1, p2.ci - p1.ci - 1) as int) id, row_number() over (order by p1.ci) r
from pos3 p1
inner join pos3 p2 on p2.r = p1.r + 1
)
select *
from ids i
inner join table_name t on t.id = i.id
order by i.r;
You can also try this by using regex to get the input values from comma separated string :
select * from table_name where id in (
select regexp_substr(p_ids,'[^,]+', 1, level) from dual
connect by regexp_substr(p_ids, '[^,]+', 1, level) is not null );