Analogue of SUM and GROUP BY for string concatenation

Analogue of SUM and GROUP BY for string concatenation - sql

Suppose I have a really long query:
select T.TipoVers As TypeVers,
sum(T.ImpTot) As N,
C.DataCalendario As Date,
from ( SELECT ... )
group by C.DataCalendario, T.TipoVers
This produce output like:
TypeVers N Date
================================
Type1 1 2012-09-10
Type2 47 2012-09-10
Type3 5 2012-09-11
I almost done but the final touch will be: Rows with the same date needs to be concatenate (for string value) and summed (for numeric value - right now this is the only part working), i.e.:
TypeVers N Date
====================================
Type1,Type2 48 2012-09-10
Type3 5 2012-09-11
I have read round here about XML path. The problem with that solution is great amount of code (I should rewrite my query inside the STUFF clause generating really long query).
What alternatives I have?

If you do not want to write your query again, then I would suggesting using a CTE so you can self-reference it in the STUFF/FOR XML PATH query:
;with cte as
(
select typeVers, n, date -- your current query will go here
from yd
)
select STUFF((SELECT DISTINCT ', ' + TypeVers
FROM cte t
WHERE c.Date = t.Date
FOR XML PATH('')), 1, 1, '') TypeVers,
sum(n) n,
date
from cte c
group by date;
See SQL Fiddle with Demo

You don't have to write the long code twice. Write your really long code that gets the result set you want your inner for xml path('') query to work on as a common table expression:
--Creates a common table expression that we can reference more than once in the select.
WITH myLongQueryCTE AS
(
SELECT
ROW_NUMBER() over (order by something) AS SOME_UNIQUE_COLUMN --Put you unique column here or create a new one with row_number
,TipoVers
,ImpTot
,DataCalendario
...
GROUP BY DataCalendario
)
SELECT
STUFF((SELECT ', ' + TipoVers FROM myLongQueryCTE AS a WHERE a.SOME_UNIQUE_COLUMN = b.SOME_UNIQUE_COLUMN FOR XML PATH('')),1,1,'') as TypeVers
,SUM(ImpTot) AS N
,DataCalendario AS Date
FROM myLongQueryCTE AS b

Related

Count Similar Substrings SQL query

I've tried a few scenarios and googled a lot, but still can't find a solution.
I have a table of user names with entries something like the below:
UserName
Cakes420
18Jack01
18Jack04
16Jack22
22Jack16
Mapple7609
Chrom44
chrom22
chrom77
013Cake
016Cake
122Cake
123Cake87
So I need a query that checks for all records that share 4 or more (in sequence) characters in the table.
So I need to return something like :
Characters
Times Used
Names Sharing
Cake
5
Cakes420, 013Cake, 016Cake, 122Cake, 123Cake87
Chro
3
Chrom44, chrom22, chrom77
or anything similar as I'd prefer not to repeat patterns, but hey, at this stage if it returns the values properly, I don't mind.
The shared characters can naturally appear in any place in the string, which is what makes this so difficult.

Should you do this in T-SQL? Probably not.
Can you do this in T-SQL? Yes.
Sample data
create table Names
(
Name nvarchar(20)
);
insert into Names (Name) values
('Cakes420'),
('18Jack01'),
('18Jack04'),
('16Jack22'),
('22Jack16'),
('Mapple7609'),
('Chrom44'),
('chrom22'),
('chrom77'),
('013Cake'),
('016Cake'),
('122Cake'),
('123Cake87');
Solution
Using STRING_AGG() for easy concatenation. Available from SQL Server 2017. Alternatives available for older SQL versions (use the search box on this site, there are many examples).
with rcte as
(
select n.Name,
convert(nvarchar(4), substring(n.Name, 1, 4)) as Part,
1 as PartFrom
from Names n
where len(n.Name) >= 4
union all
select r.Name,
convert(nvarchar(4), substring(r.Name, r.PartFrom+1, r.PartFrom+4)),
r.PartFrom+1
from rcte r
where len(r.Name) >= r.PartFrom+4
),
cte_count as
(
select r.Part,
count(1) as PartCount
from rcte r
where r.Part not like '%[0-9]%' -- exclude parts with numbers in them
group by r.Part
having count(1) > 1
)
select c.Part,
c.PartCount,
string_agg(r.Name, ', ') as Names
from cte_count c
join rcte r
on r.Part = c.Part
group by c.Part,
c.PartCount
order by c.Part;
Result
Part PartCount Names
---- --------- ----------------------------------------------
Cake 5 Cakes420, 123Cake87, 122Cake, 016Cake, 013Cake
Chro 3 Chrom44, chrom22, chrom77
hrom 3 chrom77, chrom22, Chrom44
Jack 4 22Jack16, 16Jack22, 18Jack04, 18Jack01
Fiddle to see it in action with the intermediate CTE results.

Let's use Itzik Ben-Gan's Tally Function to break out a list of substrings, then group them. This is called N-Gram, after the more common Trigram which is 3-character substrings.
I've removed one extra cross-join from the function to speed it up slightly, it's now good for up to varchar(65536):
CREATE OR ALTER FUNCTION dbo.GetNums(#num AS BIGINT)
RETURNS TABLE
AS
RETURN
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
L2 AS ( SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L2 )
SELECT TOP(#num)
rownum AS rn
FROM Nums
ORDER BY rownum;
GO
DECLARE #substringLen int = 4;
SELECT
Characters,
[Times Used] = COUNT(*),
[Names Sharing] = STRING_AGG(Username, ', ')
FROM (
SELECT DISTINCT
-- remove DISTINCT if you want to know about multiple in a single username
t.Username,
Characters = SUBSTRING(t.Username, n.rn, #substringLen)
FROM myTable t
CROSS APPLY dbo.GetNums (LEN(t.UserName) - #substringLen + 1) n
) t
GROUP BY t.Characters
HAVING COUNT(*) > 1

How to retrieve single column separated by comma with multiples values, and apply join in sql

I have the following two tables in sql.I want to get the calendarId from calenderschedule and join with calendar table to get the calendarcode for each productId. Output format is described below.
MS SQL Server 2012 version string_split is not working. Please help to get the desired output.
Table1: calenderschedule
productid, calendarid
100 1,2,3
200 1,2
Table2: calendar
calendarid, calendarCode
1 SIB
2 SIN
3 SIS
Output:
productId, calendarCode
100 SIB,SIN,SIS
200 SIB,SIN

You can normalize the data by converting to XML and then using CROSS APPLY to split it. Once it's normalized, use the STUFF function to combine the calendar codes into a comma-separated list. Try this:
;WITH normalized_data as (
SELECT to_xml.productid
,split.split_calendarid
FROM
(
SELECT *,
cast('<X>'+replace(cs.calendarid,',','</X><X>')+'</X>' as XML) as xmlfilter
FROM calendarschedule cs
) to_xml
CROSS APPLY
(
SELECT new.D.value('.','varchar(50)') as split_calendarid
FROM to_xml.xmlfilter.nodes('X') as new(D)
) split
) select distinct
n.productid
,STUFF(
(SELECT distinct ', ' + c.calendarCode
FROM calendar c
JOIN normalized_data n2 on n2.split_calendarid = c.calendarid
WHERE n2.productid = n.productid
FOR XML PATH ('')), 1, 1, '') calendarCode
from normalized_data n
I feel like this solution is a bit overly complex, but it's the only way I got it to work. If anybody knows how to simplify it, I'd love to hear some feedback.

substring query

I want to get the substring out of a cell value wrt following eg-
Input: "J.H.Ambani.School"-----------School
Output: "H.Ambani"-----------------MidName
That is all the text that comes between the first and the last dots. Length of string or number of dots in string can be any. I am trying to form a query for above input column "School" to get the output column "MidName".What can be the sql query for it?

For Oracle Database:
SELECT
REGEXP_REPLACE(yourColumn, '^[^.]*.|.[^.]*$', '') AS yourAlias
FROM yourTable

If is correctly understood your problem by your statement
"That is all the text that comes between the first and the last dots". Then below is solution to your problem is as given below. Below is working solution in SQL SERVER, for other databases i could not check because of lack of time.
#SourceString : this is your input
#DestinationString : this is your output
declare #SourceString varchar(100)='J.H.Ambani.School'
declare #DestinationString varchar(100)
;with result as
(
select ROW_NUMBER()over (order by (select 100))SNO,d from(
select t.c.value('.','varchar(100)')as d from
(select cast('<a>'+replace(#SourceString,'.','</a><a>')+'</a>' as xml)data)as A cross apply data.nodes('/a') as t(c))B
)
select #DestinationString=COALESCE(#DestinationString+'.','')+ISNULL(d,'') from result where SNO>(select top 1 SNO from result order by SNO)
and SNO<(select top 1 SNO from result order by SNO desc)
select #DestinationString

Substring in a column

I have a column that has several items in which I need to count the times it is called, my column table looks something like this:
Table Example
Id_TR Triggered
-------------- ------------------
A1_6547 R1:23;R2:0;R4:9000
A2_1235 R2:0;R2:100;R3:-100
A3_5436 R1:23;R2:100;R4:9000
A4_1245 R2:0;R5:150
And I would like the result to be like this:
Expected Results
Triggered Count(1)
--------------- --------
R1:23 2
R2:0 3
R2:100 2
R3:-100 1
R4:9000 2
R5:150 1
I've tried to do some substring, but cant seem to find how to solve this problem. Can anyone help?

This solution is X3 times faster than the CONNECT BY solution
performance: 15K records per second
with cte (token,suffix)
as
(
select substr(triggered||';',1,instr(triggered,';')-1) as token
,substr(triggered||';',instr(triggered,';')+1) as suffix
from t
union all
select substr(suffix,1,instr(suffix,';')-1) as token
,substr(suffix,instr(suffix,';')+1) as suffix
from cte
where suffix is not null
)
select token,count(*)
from cte
group by token
;

with x as (
select listagg(Triggered, ';') within group (order by Id_TR) str from table
)
select regexp_substr(str,'[^;]+',1,level) element, count(*)
from x
connect by level <= length(regexp_replace(str,'[^;]+')) + 1
group by regexp_substr(str,'[^;]+',1,level);
First concatenate all values of triggered into one list using listagg then parse it and do group by.
Another methods of parsing list you can find here or here

This is a fair solution.
performance: 5K records per second
select triggered
,count(*) as cnt
from (select id_tr
,regexp_substr(triggered,'[^;]+',1,level) as triggered
from t
connect by id_tr = prior id_tr
and level <= regexp_count(triggered,';')+1
and prior sys_guid() is not null
) t
group by triggered
;

This is just for learning purposes.
Check my other solutions.
performance: 1K records per second
select x.triggered
,count(*)
from t
,xmltable
(
'/r/x'
passing xmltype('<r><x>' || replace(triggered,';', '</x><x>') || '</x></r>')
columns triggered varchar(100) path '.'
) x
group by x.triggered
;

Index number for records within a pipe-delimited field inside a csv

I have a csv that I'm bringing into a SQL table. The csv has a field within it for CrimeType. That field is pipe delimited. So, I'm using cross apply to break up the pipe, like this:
SELECT CrimeRecords.CaseNum, CrimeRecords.Offense, PrimaryCrime.PrimaryCrime
FROM (SELECT CaseNum ,x.i.value('.','varchar(20)') AS Offense
FROM (SELECT CaseNum, CONVERT(XML,'<i>'+REPLACE(CrimeType, '|', '</i><i>') + '</i>') AS d
FROM CrimeView.dbo.tblCrimeData)x1 CROSS APPLY d.nodes('i') AS x(i)) AS CrimeRecords
Can someone help me add a step to create a field for a sequence number? Basically I just want to return the order of the items in the pipe.
For rows like:
1, Burglary|Assault
2, Burglary
3, Assault|Assault-Weapon|Theft
My result table would look like this:
CaseNum CrimeType SeqNum
1 Burglary 1
1 Assault 2
2 Burglary 1
3 Assault 1
3 Assault-Weapon 2
3 Theft 3
Edit to show that the Sequence Number resets for each CaseNum.
Edit tags to clarify that this is Microsoft SQL, not MySQL.

Try including the ROW_NUMBER() function in your SELECT statement (http://technet.microsoft.com/en-us/library/ms186734.aspx).
i.e.
SELECT ROW_NUMBER() OVER (PARTITION BY CrimeRecords.CaseNum ORDER BY CrimeRecords.CaseNum) As Idx, CrimeRecords.CaseNum, CrimeRecords.Offense, PrimaryCrime.PrimaryCrime
FROM (SELECT CaseNum ,x.i.value('.','varchar(20)') AS Offense
FROM (SELECT CaseNum, CONVERT(XML,'<i>'+REPLACE(CrimeType, '|', '</i><i>') + '</i>') AS d
FROM CrimeView.dbo.tblCrimeData)x1 CROSS APPLY d.nodes('i') AS x(i)) AS CrimeRecords
Edit: Included Partition By to reset the sequence for each case.

if you have a simple table CrimeRecords like CaseNum | CrimeType
you have to do something like this
SELECT CaseNum,CrimeType, #row:=#row+1 SeqNum
FROM CrimeRecords a JOIN (SELECT #row := 0) b;
ok.. I cant see crearly in your query and i cant try it in a db, buy try to use the query i shared.
It is just an example to show how you can add numbers in order 1,2,3...x from some elements in the rows.. so try to mix it code in your query and reestart the #row each time the group change..
so you ll get it

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Analogue of SUM and GROUP BY for string concatenation - sql

Related

Count Similar Substrings SQL query

How to retrieve single column separated by comma with multiples values, and apply join in sql

substring query

Substring in a column

Index number for records within a pipe-delimited field inside a csv

Categories

Resources