SQL: how to get all the distinct characters in a column, across all rows - sql

Is there an elegant way in SQL Server to find all the distinct characters in a single varchar(50) column, across all rows?
Bonus points if it can be done without cursors :)
For example, say my data contains 3 rows:
productname
-----------
product1
widget2
nicknack3
The distinct inventory of characters would be "productwigenka123"

Here's a query that returns each character as a separate row, along with the number of occurrences. Assuming your table is called 'Products'
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT aChar, COUNT(*) FROM ProductChars
GROUP BY aChar
To combine them all to a single row, (as stated in the question), change the final SELECT to
SELECT aChar AS [text()] FROM
(SELECT DISTINCT aChar FROM ProductChars) base
FOR XML PATH('')
The above uses a nice hack I found here, which emulates the GROUP_CONCAT from MySQL.
The first level of recursion is unrolled so that the query doesn't return empty strings in the output.

Use this (shall work on any CTE-capable RDBMS):
select x.v into prod from (values('product1'),('widget2'),('nicknack3')) as x(v);
Test Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select v, x, n from a -- where n > 0
order by v, n
option (maxrecursion 0)
Final Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select distinct x from a where n > 0
order by x
option (maxrecursion 0)
Oracle version:
with a(v,x,n) as
(
select v, '' as x, 0 as n from prod
union all
select v, substr(v,n+1,1) as x, n+1 as n from a where n < length(v)
)
select distinct x from a where n > 0

Given that your column is varchar, it means it can only store characters from codes 0 to 255, on whatever code page you have. If you only use the 32-128 ASCII code range, then you can simply see if you have any of the characters 32-128, one by one. The following query does that, looking in sys.objects.name:
with cteDigits as (
select 0 as Number
union all select 1 as Number
union all select 2 as Number
union all select 3 as Number
union all select 4 as Number
union all select 5 as Number
union all select 6 as Number
union all select 7 as Number
union all select 8 as Number
union all select 9 as Number)
, cteNumbers as (
select U.Number + T.Number*10 + H.Number*100 as Number
from cteDigits U
cross join cteDigits T
cross join cteDigits H)
, cteChars as (
select CHAR(Number) as Char
from cteNumbers
where Number between 32 and 128)
select cteChars.Char as [*]
from cteChars
cross apply (
select top(1) *
from sys.objects
where CHARINDEX(cteChars.Char, name, 0) > 0) as o
for xml path('');

If you have a Numbers or Tally table which contains a sequential list of integers you can do something like:
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From dbo.Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
If you are using SQL Server 2005 and beyond, you can generate your Numbers table on the fly using a CTE:
With Numbers As
(
Select Row_Number() Over ( Order By c1.object_id ) As Value
From sys.columns As c1
Cross Join sys.columns As c2
)
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')

Building on mdma's answer, this version gives you a single string, but decodes some of the changes that FOR XML will make, like & -> &.
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT STUFF((
SELECT N'' + aChar AS [text()]
FROM (SELECT DISTINCT aChar FROM Chars) base
ORDER BY aChar
FOR XML PATH, TYPE).value(N'.[1]', N'nvarchar(max)'),1, 1, N'')
-- Allow for a lot of recursion. Set to 0 for infinite recursion
OPTION (MAXRECURSION 365)

Related

How do I create a list of all possible anagrams of a word/string in PostgreSQL

How do I create a list of all possible anagrams of a word/string in PostgreSQL.
For example if String is 'act'
then the desired output should be:
act,
atc,
cta,
cat,
tac,
tca
I have one Table 'tbl_words' which contains million of words.
Then I want to check/search for only valid words in my database table from this anagrams list.
Like from above list of anagrams valid words are : act, cat.
Is there any way to do this?
Update 1:
I need output like this:
(all permutation for given word )
any idea ??
The query generates all permutations of 3 elements set:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select p as permutation
from rec
where cardinality(p) = 3
order by 1
permutation
-------------
{1,2,3}
{1,3,2}
{2,1,3}
{2,3,1}
{3,1,2}
{3,2,1}
(6 rows)
Modify the final query to generate permutations of the letters of a given word:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select a[p[1]] || a[p[2]] || a[p[3]] as result
from rec
cross join regexp_split_to_array('act', '') as a
where cardinality(p) = 3
order by 1
result
--------
act
atc
cat
cta
tac
tca
(6 rows)
Here is a solution:
with recursive params as (
select *
from (values ('cata')) v(str)
),
nums as (
select str, 1 as n
from params
union all
select str, 1 + n
from nums
where n < length(str)
),
pos as (
select str, array[n] as poses, array_remove(array_agg(n) over (partition by str), n) as rests, 1 as lev
from nums
union all
select pos.str, array_append(pos.poses, nums.n), array_remove(rests, nums.n), lev + 1
from pos join
nums
on pos.str = nums.str and array_position(pos.rests, nums.n) > 0
where cardinality(rests) > 0
)
select distinct pos.str , string_agg(substr(pos.str, thepos, 1), '')
from pos cross join lateral
unnest(pos.poses) thepos
where cardinality(rests) = 0
group by pos.str, pos.poses;
This is quite tricky, particularly when there are repeated letters in the string. The approach taken here generates all permutations of the numbers from 1 to n, where n is the length of the string. It then uses these as indexes to extract characters from the original string.
Those who are keen will notice that this uses select distinct with group by. That seems like the easiest way to avoid duplication in the resultant strings.

How to Read Data Number by Number

I have a field that contains numbers such as the examples below in #Numbers. Each number within each row in #Numbers relates
to many different values that are contained within the #Area table.
I need to make a relationship from #Numbers to #Area using each number within each row.
CREATE TABLE #Numbers
(
Number int
)
INSERT INTO #Numbers
(
Number
)
SELECT 102 UNION
SELECT 1 UNION
SELECT 2 UNION
select * from #Numbers
CREATE TABLE #Area
(
Number int,
Area varchar(50)
)
INSERT INTO #Area
(
Number,
Area
)
SELECT 0,'Area1' UNION
SELECT 1,'Area2' UNION
SELECT 1,'Area3' UNION
SELECT 1,'Area5' UNION
SELECT 1,'Area8' UNION
SELECT 1,'Area9' UNION
SELECT 2,'Area12' UNION
SELECT 2,'Area43' UNION
SELECT 2,'Area25' UNION
select * from #Area
It would return the following for 102:
102,Area2
102,Area3
102,Area5
102,Area8
102,Area9
102,Area1
102,Area12
102,Area43
102,Area25
For 1 it would return:
1,Area2
1,Area3
1,Area5
1,Area8
1,Area9
For 2 it would return:
2,Area12
2,Area43
2,Area25
Note how the numbers match up to the individual Areas and return the values accordingly.
Well, the OP marked an answer already, which even got votes. Maybe he will not read this, but here is another option using direct simple select, which (according to the EP) seems like using a lot less resources:
SELECT *
FROM #Numbers t1
LEFT JOIN #Area t2 ON CONVERT(VARCHAR(10), t1.Number) like '%' + CONVERT(CHAR(1), t2.Number) + '%'
GO
Note! According to Execution Plan this solution uses only 27% while the selected answer (written by Squirrel) uses 73%, but Execution Plan can be misleading sometimes and you should check IO and TIME statistics as well using the real table structure and real data.
looks like you need to extract individual digit from #Number and then used it to join to #Area
; with tally as
(
select n = 1
union all
select n = n + 1
from tally
where n < 10
)
select n.Number, a.Area
from #Numbers n
cross apply
(
-- here it convert n.Number to string
-- then extract 1 digit
-- and finally convert back to integer
select num = convert(int,
substring(convert(varchar(10), n.Number),
t.n,
1)
)
from tally t
where t.n <= len(convert(varchar(10), n.Number))
) d
inner join #Area a on d.num = a.Number
order by n.Number
or if you prefer to do it in arithmetic and not string
; with Num as
(
select Number, n = 0, Num = Number / power(10, 0) % 10
from #Numbers
union all
select Number, n = n + 1, Num = Number / power(10, n + 1) % 10
from Num
where Number > power(10, n + 1)
)
select n.Number, a.Area
from Num n
inner join #Area a on n.Num = a.Number
order by n.Number
Here is my idea. In theory, it should work.
Have a table (temp or permanent) with the values and it's translation
I.E.
ID value
1 Area1, Area2, Area7, Area8, Area15
2 Area28, Area35
etc
Take each row and put a some special character between each number. Use a function like string_split with that character to turn it into a column of values.
e.g 0123 will then be something like 0|1|2|3 and when you run that through string_split you would get
0
1
2
3
Now join each value to your lookup table and return the Value.
Now you have a row with all the values that you want. Use another function like STUFF FOR XML and put those values back into a single column.
This doesn't sound very efficient.. but this is one way of achieving what you desire..
Another is to do a replace().. but that would be very messy!
Create a third table called n which contains a single column also called n that contains integers from 1 to the maximum number of digits in your number. Make it 1000 if you like, doesn't matter. Then:
select #numbers.number, substring(convert(varchar,#numbers.number),n,1) as chr, Area
from #numbers
join n on n>0 and n <=len(convert(varchar,number))
join #area on #area.number=substring(convert(varchar,#numbers.number),n,1)
The middle column chr is just there to show you what it's doing, and would be removed from the final result.

select statement to list numbers in range

In DB2, I have this query to list numbers 1-x:
select level from SYSIBM.SYSDUMMY1 connect by level <= "some number"
But this maxes out due to SQL20450N Recursion limit exceeded within a hierarchical query.
How can I generate a list of numbers between 1 and x using a select statement when x is not known at runtime?
I found an answer based on this post:
WITH d AS
(SELECT LEVEL - 1 AS dig FROM SYSIBM.SYSDUMMY1 CONNECT BY LEVEL <= 10)
SELECT t1.n
FROM (SELECT (d7.dig * 1000000) +
(d6.dig * 100000) +
(d5.dig * 10000) +
(d4.dig * 1000) +
(d3.dig * 100) +
(d2.dig * 10) +
d1.dig AS n
FROM d d1
CROSS JOIN d d2
CROSS JOIN d d3
CROSS JOIN d d4
CROSS JOIN d d5
CROSS JOIN d d6
CROSS JOIN d d7) t1
JOIN ("subselect that returns desired value as i") t2
ON t1.n <= t2.i
ORDER BY t1.n
That's how I usually create lists:
For your example
numberlist (num) as
(
select min(1) from anytable
union all
select num + 1 from numberlist
where num <= x
)
I did something like this when I wanted a list of values to correspond with months:
with t1 (mon) as (
values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
)
select * from t1
It seems a bit kludgy, but for a small list like 1-12, or even 1-50, it did what I needed it to.
It's nice to see someone else tagging their questions with DB2.
If you have any table known to have more than x rows, you can always do:
select * from (
select row_number() over () num
from my_big_table
) where num <= x
or, per bhamby's suggestion:
select row_number() over () num
from my_big_table
fetch first X rows only
For DB2 you can use recursive common table expressions (cf. IBM documentation on recursive CTE):
with max(num) as (
select 1 from sysibm.sysdummy1
)
,result (num) as (
select num from max
union ALL
select result.num+1
from result
where result.num<=100
)
select * from result;

Find all possible combinations of array without permutations

Input is an array of 'n' length.
I need all combinations inside this array stored into new array.
IN: j='{A, B, C ..}'
OUT: k='{A, B, C, AB, AC, BC, ABC ..}'
Without repetitions, so without BA, CA etc.
Generic solution using a recursive CTE
Works for any number of elements and any base data type that supports the > operator.
WITH RECURSIVE t(i) AS (SELECT * FROM unnest('{A,B,C}'::text[])) -- provide array
, cte AS (
SELECT i::text AS combo, i, 1 AS ct
FROM t
UNION ALL
SELECT cte.combo || t.i::text, t.i, ct + 1
FROM cte
JOIN t ON t.i > cte.i
)
SELECT ARRAY (
SELECT combo
FROM cte
ORDER BY ct, combo
) AS result;
Result is an array of text in the example.
Note that you can have any number of additional non-recursive CTEs when using the RECURSIVE keyword.
More generic yet
If any of the following apply:
Array elements are non-unique (like '{A,B,B}').
The base data type does not support the > operator (like json).
Array elements are very big - for better performance.
Use a row number instead of comparing elements:
WITH RECURSIVE t AS (
SELECT i::text, row_number() OVER () AS rn
FROM unnest('{A,B,B}'::text[]) i -- duplicate element!
)
, cte AS (
SELECT i AS combo, rn, 1 AS ct
FROM t
UNION ALL
SELECT cte.combo || t.i, t.rn, ct + 1
FROM cte
JOIN t ON t.rn > cte.rn
)
SELECT ARRAY (
SELECT combo
FROM cte
ORDER BY ct, combo
) AS result;
Or use WITH ORDINALITY in Postgres 9.4+:
PostgreSQL unnest() with element number
Special case: generate decimal numbers
To generate decimal numbers with 5 digits along these lines:
WITH RECURSIVE t AS (
SELECT i
FROM unnest('{1,2,3,4,5}'::int[]) i
)
, cte AS (
SELECT i AS nr, i
FROM t
UNION ALL
SELECT cte.nr * 10 + t.i, t.i
FROM cte
JOIN t ON t.i > cte.i
)
SELECT ARRAY (
SELECT nr
FROM cte
ORDER BY nr
) AS result;
SQL Fiddle demonstrating all.
if n is small < 20 , all possible combinations can be found using a bitmask approach. There are 2^n different combinations of it. The number values 0 to
(2^n - 1) represents one of the combination.
e.g n=3
0 represents {},empty element
2^3-1=7= 111 b represents element, abc
pseudo code as follows
for b=0 to 2^n - 1 do #each combination
res=""
for i=0 to (n-1) do # which elements are included
if (b && (1<<i) != 0)
res= res+arr[i]
end
print res
end
end

Select records where column has n character occurrences

I was wondering if this is possible in sqlite.
SELECT * FROM tbl WHERE substr_count(f, '*') = 5
It should return records that have 5 asterisks in the "f" column, like
a*b**c**
****a*
and so on
SELECT * FROM tbl WHERE length(f)-replace(f,'*','') = 5
This solution is easy if you have a tally or numbers table which simply contains a sequential list of integers. This would be a table you populated once but has many uses. With that you have:
Create Table Tally ( N int );
Insert Tally( N )
...
Select Z.<PrimaryKeyCol>, Sum( Z.Val )
From (
Select <PrimaryKeyCol>, 1 As Val
From tbl
Cross Join Tally As T
Where substr( tbl.f, T.N, 1 ) = '*'
) As Z
Group By Z.<PrimaryKeyCol>
Having Sum( Z.Val ) = 5