Remove spaces in words - sql

I do have a lot of strings in my database (PostgreSQL), an example:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
select * from mystrings
Is there a way how I can remove spaces between characters in words? For my example the result should be:
Hello, how are you
I am fine, thank you
This is strange text
With crazy space between characters
I started with replace, but there are many such words with spaces between characters and I cannot even find them all.
Because it might be difficult to meaningfully concatenate characters, it might be better idea to get just list of concatenation candidates. Using example data, the result should be:
H e l l o
t h a n k
s t r a n g e
c r a z y
b e t w e e n
Such query should find and return all substrings in string when there are at least three individual characters separated by two spaces (and continue until patern [space] individual character occurs):
He l l o how are you --> llo
H e l l o how are you --> Hello
C r a z y space b e t w e e n --> {crazy, between}

As per your edited question, the below gets all the possible candidates that have a least three individual characters separated by two spaces
SELECT
data || ' --> {' || replace_candidates || '}'
FROM(
SELECT
data,
( SELECT
array_to_string( array_agg( data ),',' )
FROM (
SELECT
data,
length( data )
FROM (
SELECT
replace( data, ' ', '' ) AS data
FROM
regexp_split_to_table( data, '\S{2,}' ) AS data
) t
WHERE length( data ) > 2
) t ) AS replace_candidates
FROM
mystrings
) T
WHERE
replace_candidates IS NOT NULL
Working
Start looking at the inner most query first (the one with regexp_split_to_table)
The regexg gets all strings that have 2 characters in a sequence (not separated by a space)
regexp_split_to_table gets the inverse of the match, more on it here
Replace spaces by a empty char and filter records having a length greater than 2
The reaming are array aggregate functions to take care of formatting, as per your requirement, more of this here
Results
H e l l o how are you --> {Hello}
I am fine, t h a n k you --> {thank}
This is s t r a n g e text --> {strange}
With c r a z y space b e t w e e n characters --> {crazy,between}
SOME MORE TEST T E X T --> {TEXT}
SQLFIDDLE
Note: It considers chars which fall in as [space][char][space], but, you can modify it to suit your needs as [space][space][char][space] or [space][char][special_char][space] ...
Hope this helps ;p

You can use a resource such as online dictionary if the word exists then you dont have to remove spaces otherwise remove spaces or you can use a table where you have to put all strings that exist and then you have to check with that table.Hope you got my point.

The following finds possible concatenation candidates:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
, u as (
select string, strpart[rn] as strpart, rn
from (
select *, generate_subscripts(strpart, 1) as rn
from (
select string, string_to_array(replace(string,',',''), ' ') as strpart
from mystrings
) x
) y
)
,w as (
select
string,strpart,rn,
case when length(strpart) = 1 then 1 else 0 end as indchar ,
case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart,
case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend
from u
)
,x as (
select
string,rn,strpart,indchar,strstart,
sum(strstart) over (order by string, rn) as strid
from w
where indchar = 1 and not (strstart = 1 and strend = 1)
)
select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid

Related

How do I create a list of all possible anagrams of a word/string in PostgreSQL

How do I create a list of all possible anagrams of a word/string in PostgreSQL.
For example if String is 'act'
then the desired output should be:
act,
atc,
cta,
cat,
tac,
tca
I have one Table 'tbl_words' which contains million of words.
Then I want to check/search for only valid words in my database table from this anagrams list.
Like from above list of anagrams valid words are : act, cat.
Is there any way to do this?
Update 1:
I need output like this:
(all permutation for given word )
any idea ??
The query generates all permutations of 3 elements set:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select p as permutation
from rec
where cardinality(p) = 3
order by 1
permutation
-------------
{1,2,3}
{1,3,2}
{2,1,3}
{2,3,1}
{3,1,2}
{3,2,1}
(6 rows)
Modify the final query to generate permutations of the letters of a given word:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select a[p[1]] || a[p[2]] || a[p[3]] as result
from rec
cross join regexp_split_to_array('act', '') as a
where cardinality(p) = 3
order by 1
result
--------
act
atc
cat
cta
tac
tca
(6 rows)
Here is a solution:
with recursive params as (
select *
from (values ('cata')) v(str)
),
nums as (
select str, 1 as n
from params
union all
select str, 1 + n
from nums
where n < length(str)
),
pos as (
select str, array[n] as poses, array_remove(array_agg(n) over (partition by str), n) as rests, 1 as lev
from nums
union all
select pos.str, array_append(pos.poses, nums.n), array_remove(rests, nums.n), lev + 1
from pos join
nums
on pos.str = nums.str and array_position(pos.rests, nums.n) > 0
where cardinality(rests) > 0
)
select distinct pos.str , string_agg(substr(pos.str, thepos, 1), '')
from pos cross join lateral
unnest(pos.poses) thepos
where cardinality(rests) = 0
group by pos.str, pos.poses;
This is quite tricky, particularly when there are repeated letters in the string. The approach taken here generates all permutations of the numbers from 1 to n, where n is the length of the string. It then uses these as indexes to extract characters from the original string.
Those who are keen will notice that this uses select distinct with group by. That seems like the easiest way to avoid duplication in the resultant strings.

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Counting characters in an Access database column using SQL

I have the following table
col1 col2 col3 col4
==== ==== ==== ====
1233 4566 ABCD CDEF
1233 4566 ACD1 CDEF
1233 4566 D1AF CDEF
I need to count the characters in col3, so from the data in the previous table it would be:
char count
==== =====
A 3
B 1
C 2
D 3
F 1
1 2
Is this possible to achieve by using SQL only?
At the moment I am thinking of passing a parameter in to SQL query and count the characters one by one and then sum, however I did not start the VBA part yet, and frankly wouldn't want to do that.
This is my query at the moment:
PARAMETERS X Long;
SELECT First(Mid(TABLE.col3,X,1)) AS [col3 Field], Count(Mid(TABLE.col3,X,1)) AS Dcount
FROM TEST
GROUP BY Mid(TABLE.col3,X,1)
HAVING (((Count(Mid([TABLE].[col3],[X],1)))>=1));
Ideas and help are much appreciated, as I don't usually work with Access and SQL.
You can accomplish your task in pure Access SQL by using a Numbers table. In this case, the Numbers table must contain integer values from 1 to some number larger than the longest string of characters in your source data. In this example, the strings of characters to be processed are in [CharacterData]:
CharacterList
-------------
GORD
WAS
HERE
and the [Numbers] table is simply
n
--
1
2
3
4
5
If we use a cross join to extract the characters (eliminating any empty strings that result from n exceeding Len(CharacterList))...
SELECT
Mid(cd.CharacterList, nb.n, 1) AS c
FROM
CharacterData cd,
Numbers nb
WHERE Mid(cd.CharacterList, nb.n, 1) <> ""
...we get ...
c
--
G
W
H
O
A
E
R
S
R
D
E
Now we can just wrap that in an aggregation query
SELECT c AS Character, COUNT(*) AS CountOfCharacter
FROM
(
SELECT
Mid(cd.CharacterList, nb.n, 1) AS c
FROM
CharacterData cd,
Numbers nb
WHERE Mid(cd.CharacterList, nb.n, 1) <> ""
)
GROUP BY c
which gives us
Character CountOfCharacter
--------- ----------------
A 1
D 1
E 2
G 1
H 1
O 1
R 2
S 1
W 1
Knowing that colum3 has a fixed length of 4, this problem is quite easy.
Assume there is a view V with four columns, each for one character in column 3.
V(c1, c2, c3, c4)
Unfortunately, I'm not familiar with Access-specific SQL, but this is the general SQL statement you would need:
SELECT c, COUNT(*) FROM
(
SELECT c1 AS c FROM V
UNION ALL
SELECT c2 FROM V
UNION ALL
SELECT c3 FROM V
UNION ALL
SELECT c4 FROM V
)
GROUP BY c
It's a shame that you don't want to consider using VBA; you don't need as much as you might think:
Public charCounts As Dictionary
Sub LoadCounts(s As String)
If charCounts Is Nothing Then Init
Dim length As Integer, i As Variant
length = Len(s)
For i = 1 To length
Dim currentChar As String
currentChar = Mid(s, i, 1)
If Not charCounts.Exists(currentChar) Then charCounts(currentChar) = 0
charCounts(currentChar) = charCounts(currentChar) + 1
Next
End Sub
Sub Init()
Set charCounts = New Scripting.Dictionary
charCounts.CompareMode = TextCompare 'for case-insensitive comparisons; otherwise use BinaryCompare
End Sub
Then, you execute the query once:
SELECT LoadCount(col3)
FROM Table1
Finally, you read out the values in the Dictionary:
Dim key As Variant
For Each key In charCounts
Debug.Print key, charCounts(key)
Next
Note that between query executions you have to call Init to clear out the old values.
Please Try this,,, I hope this will work
with cte as
(
select row_number() over(order by (select null)) as i from Charactor_Count
)
select substring( name, i, 1 ) as char, count(*) as count
from Charactor_Count, cte
where cte.i <= len(Charactor_Count.name)
group by substring(name,i,1)
order by substring(name,i,1)

Oracle procedure is griping about a select query that runs fine outside of the procedure

WITH Q (L) AS
(
SELECT 1 FROM DUAL
UNION ALL
SELECT L + 1
FROM Q
WHERE L < 99
)
SELECT MIN(L)
INTO next_priority
FROM Q
LEFT JOIN gxrdird on gxrdird_priority = L
and gxrdird_pidm = aPidm_in and gxrdird_ap_ind = 'Y'
WHERE L NOT IN (select gxrdird_priority
from gxrdird where gxrdird_pidm = aPidm_in);
This query returns the results that I want when run manually. I'm trying to put it in a package procedure, but I get:
51/5 PL/SQL: SQL Statement ignored
55/22 PL/SQL: ORA-00932: inconsistent datatypes: expected NUMBER got -
That corresponds to the line "SELECT L + 1" on the column L is in. Is there any way to declare L as a NUMBER specifically, inside the with clause? I've been googling for an hour, and the few examples of with clauses I can find that have parameters do not declare them as any type.
This is driving me nuts, and there's no simpler query I can come up with that gives me the correct results.
Edit, adding context:
CURSOR xxx_cur IS
SELECT ROWID, GXRDIRD_PRIORITY
FROM GXRDIRD
WHERE GXRDIRD_PIDM = aPidm_in
AND GXRDIRD_AP_IND = 'A'
AND GXRDIRD_ATYP_CODE IS NULL
AND GXRDIRD_ADDR_SEQNO IS NULL
ORDER BY GXRDIRD_PRIORITY DESC;
xxx_rec xxx_cur%ROWTYPE;
next_priority NUMBER;
BEGIN
OPEN xxx_cur;
LOOP
FETCH xxx_cur INTO xxx_rec;
EXIT WHEN xxx_cur%NOTFOUND;
-- Here we should update that particular row, but we can't just increment it.
WITH Q (L) AS
(
SELECT 1 FROM DUAL
UNION ALL
SELECT L + 1
FROM Q
WHERE L < 99
)
SELECT MIN(L)
INTO next_priority
FROM Q
LEFT JOIN gxrdird on gxrdird_priority = L and gxrdird_pidm = aPidm_in and gxrdird_ap_ind = 'Y'
WHERE L NOT IN (select gxrdird_priority from gxrdird where gxrdird_pidm = aPidm_in);
-- The above query found the lowest-numbered unused priority, and now we'll set this record to that.
UPDATE GXRDIRD SET GXRDIRD_PRIORITY = next_priority WHERE ROWID = xxx_rec.ROWID;
-- If the above record was originally 7 and the lowest was 15, now 7 is free and will be used if we loop
-- again.
DBMS_OUTPUT.PUT_LINE(OBJECT_NAME || '.P_RESEQUENCE_INACTV_ACCNTS - Changed priority ' || xxx_rec.GXRDIRD_PRIORITY || ' into ' || next_priority);
END LOOP;
Line 51: WITH Q (L) AS
Line 55: SELECT L + 1
It looks like you're trying to generate dummy rows with consecutive numbers. My preferred way of doing that would be:
WITH Q AS
(
SELECT rownum AS l
FROM dual
CONNECT BY level < 100
)
SELECT MIN(L)
INTO next_priority
FROM Q
...
Please try if this works for you.

SQL: how to get all the distinct characters in a column, across all rows

Is there an elegant way in SQL Server to find all the distinct characters in a single varchar(50) column, across all rows?
Bonus points if it can be done without cursors :)
For example, say my data contains 3 rows:
productname
-----------
product1
widget2
nicknack3
The distinct inventory of characters would be "productwigenka123"
Here's a query that returns each character as a separate row, along with the number of occurrences. Assuming your table is called 'Products'
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT aChar, COUNT(*) FROM ProductChars
GROUP BY aChar
To combine them all to a single row, (as stated in the question), change the final SELECT to
SELECT aChar AS [text()] FROM
(SELECT DISTINCT aChar FROM ProductChars) base
FOR XML PATH('')
The above uses a nice hack I found here, which emulates the GROUP_CONCAT from MySQL.
The first level of recursion is unrolled so that the query doesn't return empty strings in the output.
Use this (shall work on any CTE-capable RDBMS):
select x.v into prod from (values('product1'),('widget2'),('nicknack3')) as x(v);
Test Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select v, x, n from a -- where n > 0
order by v, n
option (maxrecursion 0)
Final Query:
with a as
(
select v, '' as x, 0 as n from prod
union all
select v, substring(v,n+1,1) as x, n+1 as n from a where n < len(v)
)
select distinct x from a where n > 0
order by x
option (maxrecursion 0)
Oracle version:
with a(v,x,n) as
(
select v, '' as x, 0 as n from prod
union all
select v, substr(v,n+1,1) as x, n+1 as n from a where n < length(v)
)
select distinct x from a where n > 0
Given that your column is varchar, it means it can only store characters from codes 0 to 255, on whatever code page you have. If you only use the 32-128 ASCII code range, then you can simply see if you have any of the characters 32-128, one by one. The following query does that, looking in sys.objects.name:
with cteDigits as (
select 0 as Number
union all select 1 as Number
union all select 2 as Number
union all select 3 as Number
union all select 4 as Number
union all select 5 as Number
union all select 6 as Number
union all select 7 as Number
union all select 8 as Number
union all select 9 as Number)
, cteNumbers as (
select U.Number + T.Number*10 + H.Number*100 as Number
from cteDigits U
cross join cteDigits T
cross join cteDigits H)
, cteChars as (
select CHAR(Number) as Char
from cteNumbers
where Number between 32 and 128)
select cteChars.Char as [*]
from cteChars
cross apply (
select top(1) *
from sys.objects
where CHARINDEX(cteChars.Char, name, 0) > 0) as o
for xml path('');
If you have a Numbers or Tally table which contains a sequential list of integers you can do something like:
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From dbo.Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
If you are using SQL Server 2005 and beyond, you can generate your Numbers table on the fly using a CTE:
With Numbers As
(
Select Row_Number() Over ( Order By c1.object_id ) As Value
From sys.columns As c1
Cross Join sys.columns As c2
)
Select Distinct '' + Substring(Products.ProductName, N.Value, 1)
From Numbers As N
Cross Join dbo.Products
Where N.Value <= Len(Products.ProductName)
For Xml Path('')
Building on mdma's answer, this version gives you a single string, but decodes some of the changes that FOR XML will make, like & -> &.
WITH ProductChars(aChar, remain) AS (
SELECT LEFT(productName,1), RIGHT(productName, LEN(productName)-1)
FROM Products WHERE LEN(productName)>0
UNION ALL
SELECT LEFT(remain,1), RIGHT(remain, LEN(remain)-1) FROM ProductChars
WHERE LEN(remain)>0
)
SELECT STUFF((
SELECT N'' + aChar AS [text()]
FROM (SELECT DISTINCT aChar FROM Chars) base
ORDER BY aChar
FOR XML PATH, TYPE).value(N'.[1]', N'nvarchar(max)'),1, 1, N'')
-- Allow for a lot of recursion. Set to 0 for infinite recursion
OPTION (MAXRECURSION 365)