How to find the number of continuous times a character appears in a string - sql

I have a string 11111122111131111111
I want to create an array of the number of times 1 appears continuously i.e. first 6 characters are 1s -> two 2's -> four 1's -> one 3 -> seven 1's
So the output that i want is [6,4,7]
I know how to find the number of times a character appears in a string but how to find the numbers of times they appear in a contiguous patter.

Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '11111122111131111111' line
)
SELECT line, ARRAY(SELECT LENGTH(e) FROM UNNEST(REGEXP_EXTRACT_ALL(line, r'1+')) e) result
FROM `project.dataset.table`
with result
[
{
"line": "11111122111131111111",
"result": [
"6",
"4",
"7"
]
}
]

A little unclear as to the actual RDBMS
Here we use an ad-hoc tally table (any table of adequate size will do). Then we apply a standard Gaps-and-Islands.
Example
Declare #S varchar(500) = '11111122111131111111'
Declare #C varchar(10) = '1'
Select Seq=Row_Number() over (Order by Seq)
,Cnt=count(*)
From (
Select N
,S = substring(#S,N,1)
,Seq = N - Row_Number() over (Order by N)
From ( Select Top (len(#S))
N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1
) A
Where substring(#S,N,1)=#C
) A
Group By Seq
Order By Seq
Returns
Seq Cnt
1 6
2 4
3 7

;WITH splitString(val) AS
(
-- convert the string to xml, seperating the elements by spaces
SELECT CAST('<r><i>' + REPLACE(#string,' ','</i><i>') + '</i></r>' AS XML)
)
SELECT [Key],
COUNT(*) [WordCount]
FROM ( -- select all of the values from the xml created in the cte
SELECT p.value('.','varchar(100)') AS [Key]
FROM splitString
CROSS APPLY val.nodes('//i') t (p)) AS t
GROUP BY [Key]

Related

Alphanumeric sort on nvarchar(50) column

I am trying to write a query that will return data sorted by an alphanumeric column, Code.
Below is my query:
SELECT *
FROM <<TableName>>
CROSS APPLY (SELECT PATINDEX('[A-Z, a-z][0-9]%', [Code]),
CHARINDEX('', [Code]) ) ca(PatPos, SpacePos)
CROSS APPLY (SELECT CONVERT(INTEGER, CASE WHEN ca.PatPos = 1 THEN
SUBSTRING([Code], 2,ISNULL(NULLIF(ca.SpacePos,0)-2, 8000)) ELSE NULL END),
CASE WHEN ca.PatPos = 1 THEN LEFT([Code],
ISNULL(NULLIF(ca.SpacePos,0)-0,1)) ELSE [Code] END) ca2(OrderBy2, OrderBy1)
WHERE [TypeID] = '1'
OUTPUT:
FFS1
FFS2
...
FFS12
FFS1.1
FFS1.2
...
FFS1.1E
FFS1.1R
...
FFS12.1
FFS12.2
FFS.12.1E
FFS12.1R
FFS12.2E
FFS12.2R
DESIRED OUTPUT:
FFS1
FFS1.1
FFS1.1E
FFS1.1R
....
FFS12
FFS12.1
FFS12.1E
FFS12.1R
What am I missing or overlooking?
EDIT:
Let me try to detail the table contents a little better. There are records for FFS1 - FFS12. Those are broken into X subs, i.e., FFS1.1 - FFS1.X to FFS12.1 - FFS12.X. The E and the R was not a typo, each sub record has two codes associated with it: FFS1.1E & FFS1.1R.
Additionally I tried using ORDER BY but it sorted as
FFS1
...
FFS10
FFS2
This will work for any count of parts separated by dots. The sorting is alphanumerical for each part separately.
DECLARE #YourValues TABLE(ID INT IDENTITY, SomeVal VARCHAR(100));
INSERT INTO #YourValues VALUES
('FFS1')
,('FFS2')
,('FFS12')
,('FFS1.1')
,('FFS1.2')
,('FFS1.1E')
,('FFS1.1R')
,('FFS12.1')
,('FFS12.2')
,('FFS.12.1E')
,('FFS12.1R')
,('FFS12.2E')
,('FFS12.2R');
--The query
WITH Splittable AS
(
SELECT ID
,SomeVal
,CAST(N'<x>' + REPLACE(SomeVal,'.','</x><x>') + N'</x>' AS XML) AS Casted
FROM #YourValues
)
,Parted AS
(
SELECT Splittable.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNmbr
,A.part.value(N'text()[1]','nvarchar(max)') AS Part
FROM Splittable
CROSS APPLY Splittable.Casted.nodes(N'/x') AS A(part)
)
,AddSortCrit AS
(
SELECT ID
,SomeVal
,(SELECT LEFT(x.Part + REPLICATE(' ',10),10) AS [*]
FROM Parted AS x
WHERE x.ID=Parted.ID
ORDER BY PartNmbr
FOR XML PATH('')
) AS SortColumn
FROM Parted
GROUP BY ID,SomeVal
)
SELECT ID
,SomeVal
FROM AddSortCrit
ORDER BY SortColumn;
The result
ID SomeVal
10 FFS.12.1E
1 FFS1
4 FFS1.1
6 FFS1.1E
7 FFS1.1R
5 FFS1.2
3 FFS12
8 FFS12.1
11 FFS12.1R
9 FFS12.2
12 FFS12.2E
13 FFS12.2R
2 FFS2
Some explanation:
The first CTE will transform your codes to XML, which allows to address each part separately.
The second CTE returns each part toegther with a number.
The third CTE re-concatenates your code, but each part is padded to a length of 10 characters.
The final SELECT uses this new single-string-per-row in the ORDER BY.
Final hint:
This design is bad! You should not store these values in concatenated strings... Store them in separate columns and fiddle them together just for the output/presentation layer. Doing so avoids this rather ugly fiddle...

ORDER BY specific numerical value in string [SQL]

Have a column ID that I would like to ORDER in a specific format. Column has a varchar data type and always has an alphabetic value, typically P in front followed by three to four numeric values. Possibly even followed by an underscore or another alphabetic value. I have tried few options and none are returning what I desire.
SELECT [ID] FROM MYTABLE
ORDER BY
(1) LEN(ID), ID ASC
/ (2) LEFT(ID,2)
OPTIONS TRIED (3) SUBSTRING(ID,2,4) ASC
\ (4) ROW_NUMBER() OVER (ORDER BY SUBSTRING(ID,2,4))
(5) SUBSTRING(ID,PATINDEX('%[0-9]%',ID),LEN(ID))
(6) LEFT(ID, PATINDEX('%[0-9]%', ID)-1)
Option 1 seems to be closest to what I am looking for except when an _ or Alphabetic values follow the numeric value. See results from Option 1 below
P100
P208
P218
P301
P305
P306
P4200
P4510
P4511
P4512
P5011
P1400A
P4125H
P4202A
P4507L
P4706A
P1001_2
P2103_B
P4368_RL
Would like to see..
P100
P208
P218
P301
P305
P306
P1001_2
P1400A
P2103_B
P4125H
P4200
P4202A
P4368_RL
P4507L
P4510
P4511
P4512
P4706A
P5011
ORDER BY
CAST(SUBSTRING(id, 2, 4) AS INT),
SUBSTRING(id, 6, 3)
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9464
And one that's still less complex than a getOnlyNumbers() UDF, but copes with varying length of numeric part.
CROSS APPLY
(
SELECT
tail_start = PATINDEX('%[0-9][^0-9]%', id + '_')
)
stats
CROSS APPLY
(
SELECT
numeric = CAST(SUBSTRING(id, 2, stats.tail_start-1) AS INT),
alpha = RIGHT(id, LEN(id) - stats.tail_start)
)
id_tuple
ORDER BY
id_tuple.numeric,
id_tuple.alpha
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9499
Finally, one that can cope with there being no number at all (but still assumes the first character exists and should be ignored).
CROSS APPLY
(
SELECT
tail_start = NULLIF(PATINDEX('%[0-9][^0-9]%', id + '_'), 0)
)
stats
CROSS APPLY
(
SELECT
numeric = CAST(SUBSTRING(id, 2, stats.tail_start-1) AS INT),
alpha = RIGHT(id, LEN(id) - ISNULL(stats.tail_start, 1))
)
id_tuple
ORDER BY
id_tuple.numeric,
id_tuple.alpha
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/9507
This is a rather strange way to sort but now that I understand it I figured out a solution. I am using a table valued function here to strip out only the numbers from a string. Since the function returns all numeric characters I also need to check for the _ and only pass in the part of the string before that.
Here is the function.
create function GetOnlyNumbers
(
#SearchVal varchar(8000)
) returns table as return
with MyValues as
(
select substring(#SearchVal, N, 1) as number
, t.N
from cteTally t
where N <= len(#SearchVal)
and substring(#SearchVal, N, 1) like '[0-9]'
)
select distinct NumValue = STUFF((select number + ''
from MyValues mv2
order by mv2.N
for xml path('')), 1, 0, '')
from MyValues mv
This function is using a tally table. If you have one you can tweak that code slightly to fit. Here is my tally table. I keep it as a view.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
GO
Next of course we need to have some data to work. In this case I just created a table variable to represent your actual table.
declare #Something table
(
SomeVal varchar(10)
)
insert #Something values
('P100')
, ('P208')
, ('P218')
, ('P301')
, ('P305')
, ('P306')
, ('P4200')
, ('P4510')
, ('P4511')
, ('P4512')
, ('P5011')
, ('P1400A')
, ('P4125H')
, ('P4202A')
, ('P4507L')
, ('P4706A')
, ('P1001_2')
, ('P2103_B')
, ('P4368_RL')
With all the legwork and setup behind us we can get to the actual query needed to accomplish this.
select s.SomeVal
from #Something s
cross apply dbo.GetOnlyNumbers(case when charindex('_', s.SomeVal) = 0 then s.SomeVal else left(s.SomeVal, charindex('_', s.SomeVal) - 1) end) x
order by convert(int, x.NumValue)
This returns the rows in the order you listed them in your question.
You can break down ID in steps to extract the number. Then, order by the number and ID. I like to break down long string manipulation into steps using CROSS APPLY. You can do it inline (it'd be long) or bundle it into an inline TVF.
SELECT t.*
FROM MYTABLE t
CROSS APPLY (SELECT NoP = STUFF(ID, 1, 1, '')) nop
CROSS APPLY (SELECT FindNonNumeric = LEFT(NoP, ISNULL(NULLIF(PATINDEX('%[^0-9]%', NoP)-1, -1), LEN(NoP)))) fnn
CROSS APPLY (SELECT Number = CONVERT(INT, FindNonNumeric)) num
ORDER BY Number
, ID;
I think your best bet is to create a function that strips the numbers out of the string, like this one, and then sort by that. Even better, as #SeanLange suggested, would be to use that function to store the number value in a new column and sort by that.

Query Split string into rows

I have a table that looks like this:
ID Value
1 1,10
2 7,9
I want my result to look like this:
ID Value
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
2 7
2 8
2 9
I'm after both a range between 2 numbers with , as the delimiter (there can only be one delimiter in the value) and how to split this into rows.
Splitting the comma separated numbers is a small part of this problem. The parsing should be done in the application and the range stored in separate columns. For more than one reason: Storing numbers as strings is a bad idea. Storing two attributes in a single column is a bad idea. And, actually, storing unsanitized user input in the database is also often a bad idea.
In any case, one way to generate the list of numbers is to use a recursive CTE:
with t as (
select t.*, cast(left(value, charindex(',', value) - 1) as int) as first,
cast(substring(value, charindex(',', value) + 1, 100) as int) as last
from table t
),
cte as (
select t.id, t.first as value, t.last
from t
union all
select cte.id, cte.value + 1, cte.last
from cte
where cte.value < cte.last
)
select id, value
from cte
order by id, value;
You may need to fiddle with the value of MAXRECURSION if the ranges are really big.
Any table that a field with multiple values such as this is a problem in terms of design. The only way to deal with these records as it is is to split the values on the delimiter and put them into a temporary table, implement custom splitting code, integrate a CTE as noted, or redesign the original table to put the comma-delimited fields into separate fields, eg
ID LOWLIMIT HILIMIT
1 1 10
similar with Gordon Linoff variant, but has some difference
--create temp table for data sample
DECLARE #Yourdata AS TABLE ( id INT, VALUE VARCHAR(20) )
INSERT #Yourdata
( id, VALUE )
VALUES ( 1, '1,10' ),
( 2, '7,9' )
--final query
;WITH Tally
AS ( SELECT MIN(CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))) AS MinV ,
MAX(CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))) AS MaxV
FROM #yourdata AS y
UNION ALL
SELECT MinV = MinV + 1 , MaxV
FROM Tally
WHERE MinV < Maxv
)
SELECT y.id , t.minV AS value
FROM #yourdata AS y
JOIN tally AS t ON t.MinV BETWEEN CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))
AND CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))
ORDER BY id, minV
OPTION ( MAXRECURSION 999 ) --change it if required
output

Find all possible combinations of array without permutations

Input is an array of 'n' length.
I need all combinations inside this array stored into new array.
IN: j='{A, B, C ..}'
OUT: k='{A, B, C, AB, AC, BC, ABC ..}'
Without repetitions, so without BA, CA etc.
Generic solution using a recursive CTE
Works for any number of elements and any base data type that supports the > operator.
WITH RECURSIVE t(i) AS (SELECT * FROM unnest('{A,B,C}'::text[])) -- provide array
, cte AS (
SELECT i::text AS combo, i, 1 AS ct
FROM t
UNION ALL
SELECT cte.combo || t.i::text, t.i, ct + 1
FROM cte
JOIN t ON t.i > cte.i
)
SELECT ARRAY (
SELECT combo
FROM cte
ORDER BY ct, combo
) AS result;
Result is an array of text in the example.
Note that you can have any number of additional non-recursive CTEs when using the RECURSIVE keyword.
More generic yet
If any of the following apply:
Array elements are non-unique (like '{A,B,B}').
The base data type does not support the > operator (like json).
Array elements are very big - for better performance.
Use a row number instead of comparing elements:
WITH RECURSIVE t AS (
SELECT i::text, row_number() OVER () AS rn
FROM unnest('{A,B,B}'::text[]) i -- duplicate element!
)
, cte AS (
SELECT i AS combo, rn, 1 AS ct
FROM t
UNION ALL
SELECT cte.combo || t.i, t.rn, ct + 1
FROM cte
JOIN t ON t.rn > cte.rn
)
SELECT ARRAY (
SELECT combo
FROM cte
ORDER BY ct, combo
) AS result;
Or use WITH ORDINALITY in Postgres 9.4+:
PostgreSQL unnest() with element number
Special case: generate decimal numbers
To generate decimal numbers with 5 digits along these lines:
WITH RECURSIVE t AS (
SELECT i
FROM unnest('{1,2,3,4,5}'::int[]) i
)
, cte AS (
SELECT i AS nr, i
FROM t
UNION ALL
SELECT cte.nr * 10 + t.i, t.i
FROM cte
JOIN t ON t.i > cte.i
)
SELECT ARRAY (
SELECT nr
FROM cte
ORDER BY nr
) AS result;
SQL Fiddle demonstrating all.
if n is small < 20 , all possible combinations can be found using a bitmask approach. There are 2^n different combinations of it. The number values 0 to
(2^n - 1) represents one of the combination.
e.g n=3
0 represents {},empty element
2^3-1=7= 111 b represents element, abc
pseudo code as follows
for b=0 to 2^n - 1 do #each combination
res=""
for i=0 to (n-1) do # which elements are included
if (b && (1<<i) != 0)
res= res+arr[i]
end
print res
end
end

Select records where column has n character occurrences

I was wondering if this is possible in sqlite.
SELECT * FROM tbl WHERE substr_count(f, '*') = 5
It should return records that have 5 asterisks in the "f" column, like
a*b**c**
****a*
and so on
SELECT * FROM tbl WHERE length(f)-replace(f,'*','') = 5
This solution is easy if you have a tally or numbers table which simply contains a sequential list of integers. This would be a table you populated once but has many uses. With that you have:
Create Table Tally ( N int );
Insert Tally( N )
...
Select Z.<PrimaryKeyCol>, Sum( Z.Val )
From (
Select <PrimaryKeyCol>, 1 As Val
From tbl
Cross Join Tally As T
Where substr( tbl.f, T.N, 1 ) = '*'
) As Z
Group By Z.<PrimaryKeyCol>
Having Sum( Z.Val ) = 5