Counting characters in an Access database column using SQL - sql

I have the following table
col1 col2 col3 col4
==== ==== ==== ====
1233 4566 ABCD CDEF
1233 4566 ACD1 CDEF
1233 4566 D1AF CDEF
I need to count the characters in col3, so from the data in the previous table it would be:
char count
==== =====
A 3
B 1
C 2
D 3
F 1
1 2
Is this possible to achieve by using SQL only?
At the moment I am thinking of passing a parameter in to SQL query and count the characters one by one and then sum, however I did not start the VBA part yet, and frankly wouldn't want to do that.
This is my query at the moment:
PARAMETERS X Long;
SELECT First(Mid(TABLE.col3,X,1)) AS [col3 Field], Count(Mid(TABLE.col3,X,1)) AS Dcount
FROM TEST
GROUP BY Mid(TABLE.col3,X,1)
HAVING (((Count(Mid([TABLE].[col3],[X],1)))>=1));
Ideas and help are much appreciated, as I don't usually work with Access and SQL.

You can accomplish your task in pure Access SQL by using a Numbers table. In this case, the Numbers table must contain integer values from 1 to some number larger than the longest string of characters in your source data. In this example, the strings of characters to be processed are in [CharacterData]:
CharacterList
-------------
GORD
WAS
HERE
and the [Numbers] table is simply
n
--
1
2
3
4
5
If we use a cross join to extract the characters (eliminating any empty strings that result from n exceeding Len(CharacterList))...
SELECT
Mid(cd.CharacterList, nb.n, 1) AS c
FROM
CharacterData cd,
Numbers nb
WHERE Mid(cd.CharacterList, nb.n, 1) <> ""
...we get ...
c
--
G
W
H
O
A
E
R
S
R
D
E
Now we can just wrap that in an aggregation query
SELECT c AS Character, COUNT(*) AS CountOfCharacter
FROM
(
SELECT
Mid(cd.CharacterList, nb.n, 1) AS c
FROM
CharacterData cd,
Numbers nb
WHERE Mid(cd.CharacterList, nb.n, 1) <> ""
)
GROUP BY c
which gives us
Character CountOfCharacter
--------- ----------------
A 1
D 1
E 2
G 1
H 1
O 1
R 2
S 1
W 1

Knowing that colum3 has a fixed length of 4, this problem is quite easy.
Assume there is a view V with four columns, each for one character in column 3.
V(c1, c2, c3, c4)
Unfortunately, I'm not familiar with Access-specific SQL, but this is the general SQL statement you would need:
SELECT c, COUNT(*) FROM
(
SELECT c1 AS c FROM V
UNION ALL
SELECT c2 FROM V
UNION ALL
SELECT c3 FROM V
UNION ALL
SELECT c4 FROM V
)
GROUP BY c

It's a shame that you don't want to consider using VBA; you don't need as much as you might think:
Public charCounts As Dictionary
Sub LoadCounts(s As String)
If charCounts Is Nothing Then Init
Dim length As Integer, i As Variant
length = Len(s)
For i = 1 To length
Dim currentChar As String
currentChar = Mid(s, i, 1)
If Not charCounts.Exists(currentChar) Then charCounts(currentChar) = 0
charCounts(currentChar) = charCounts(currentChar) + 1
Next
End Sub
Sub Init()
Set charCounts = New Scripting.Dictionary
charCounts.CompareMode = TextCompare 'for case-insensitive comparisons; otherwise use BinaryCompare
End Sub
Then, you execute the query once:
SELECT LoadCount(col3)
FROM Table1
Finally, you read out the values in the Dictionary:
Dim key As Variant
For Each key In charCounts
Debug.Print key, charCounts(key)
Next
Note that between query executions you have to call Init to clear out the old values.

Please Try this,,, I hope this will work
with cte as
(
select row_number() over(order by (select null)) as i from Charactor_Count
)
select substring( name, i, 1 ) as char, count(*) as count
from Charactor_Count, cte
where cte.i <= len(Charactor_Count.name)
group by substring(name,i,1)
order by substring(name,i,1)

Related

Case statement with four columns, i.e. attributes

I have a table with values "1", "0" or "". The table has four columns: p, q, r and s.
I need help creating a case statement that returns values when the attribute is equal to 1.
For ID 5 the case statement should return "p s".
For ID 14 the case statement should return "s".
For ID 33 the case statement should return 'p r s". And so on.
Do I need to come with a case statement that has every possible combination? Or is there a simpler way. Below is what I have come up with thus far.
case
when p = 1 and q =1 then "p q"
when p = 1 and r =1 then "p r"
when p = 1 and s =1 then "p s"
when r = 1 then r
when q = 1 then q
when r = 1 then r
when s = 1 then s
else ''
end
One solution could be this which uses a case for each attribute to return the correct value, surrounded by a trim to remove the trailing space.
with tbl(id, p, q, r, s) as (
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual
)
select id,
trim(regexp_replace(case p when 1 then 'p' end ||
case q when 1 then 'q' end ||
case r when 1 then 'r' end ||
case s when 1 then 's' end, '(.)', '\1 '))
from tbl;
The real solution would be to fix the database design. This design technically violates Boyce-Codd 4th normal form in that it contains more than 1 independent attribute. The fact an ID "has" or "is part of" attribute p or q, etc should be split out. This design should be 3 tables, the main table with the ID, the lookup table containing info about attributes that the main ID could have (p, q, r or s) and the associative table that joins the two where appropriate (assuming an ID row could have more than one attribute and an attribute could belong to more than one ID), which is how to model a many-to-many relationship.
main_tbl main_attr attribute_lookup
ID col1 col2 main_id attr_id attr_id attr_desc
5 5 1 1 p
14 5 4 2 q
14 4 3 r
4 s
Then it would be simple to query this model to build your list, easy to maintain if an attribute description changes (only 1 place to change it), etc.
Select from it like this:
select m.ID, m.col1, listagg(al.attr_desc, ' ') within group (order by al.attr_desc) as attr_desc
from main_tbl m
join main_attr ma
on m.ID = ma.main_id
join attribute_lookup al
on ma.attr_id = al.attr_id
group by m.id, m.col1;
You can use concatenations with decode() functions
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as "String"
from t;
Demo
If you need spaces between letters, consider using :
with t(id,p,q,r,s) as
(
select 5,1,0,0,1 from dual union all
select 14,0,0,0,1 from dual union all
select 31,null,0,null,1 from dual union all
select 33,1,0,1,1 from dual
), t2 as
(
select id, decode(p,1,'p','')||decode(q,1,'q','')
||decode(r,1,'r','')||decode(s,1,'s','') as str
from t
), t3 as
(
select id, substr(str,level,1) as str, level as lvl
from t2
connect by level <= length(str)
and prior id = id
and prior sys_guid() is not null
)
select id, listagg(str,' ') within group (order by lvl) as "String"
from t3
group by id;
Demo
in my opinion, its a bad practice to use columns for relationships.
you should have two tables, one that's called arts and another that is called mapping art looks like this:
ID - ART
1 - p
2 - q
3 - r
4 - 2
...
and mapping maps your base-'ID's to your art-ids and looks like this
MYID - ARTID
5 - 1
5 - 4
afterwards, you should make use of oracles pivot operator. its more dynamically

looping in sql with delimiter

I just had this idea of how can i loop in sql?
For example
I have this column
PARAMETER_VALUE
E,C;S,C;I,X;G,T;S,J;S,F;C,S;
i want to store all value before (,) in a temp column also store all value after (;) into another column
then it wont stop until there is no more value after (;)
Expected Output for Example
COL1 E S I G S S C
COL2 C C X T J F S
etc . . .
You can get by using regexp_substr() window analytic function with connect by level <= clause
with t1(PARAMETER_VALUE) as
(
select 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual
), t2 as
(
select level as rn,
regexp_substr(PARAMETER_VALUE,'([^,]+)',1,level) as str1,
regexp_substr(PARAMETER_VALUE,'([^;]+)',1,level) as str2
from t1
connect by level <= regexp_count(PARAMETER_VALUE,';')
)
select listagg( regexp_substr(str1,'([^;]+$)') ,' ') within group (order by rn) as col1,
listagg( regexp_substr(str2,'([^,]+$)') ,' ') within group (order by rn) as col2
from t2;
COL1 COL2
------------- -------------
E S I G S S C C C X T J F S
Demo
Assuming that you need to separate the input into rows, at the ; delimiters, and then into columns at the , delimiter, you could do something like this:
-- WITH clause included to simulate input data. Not part of the solution;
-- use actual table and column names in the SELECT statement below.
with
t1(id, parameter_value) as (
select 1, 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual union all
select 2, ',U;,;V,V;' from dual union all
select 3, null from dual
)
-- End of simulated input data
select id,
level as ord,
regexp_substr(parameter_value, '(;|^)([^,]*),', 1, level, null, 2) as col1,
regexp_substr(parameter_value, ',([^;]*);' , 1, level, null, 1) as col2
from t1
connect by level <= regexp_count(parameter_value, ';')
and id = prior id
and prior sys_guid() is not null
order by id, ord
;
ID ORD COL1 COL2
--- --- ---- ----
1 1 E C
1 2 S C
1 3 I X
1 4 G T
1 5 S J
1 6 S F
1 7 C S
2 1 U
2 2
2 3 V V
3 1
Note - this is not the most efficient way to split the inputs (nothing will be very efficient - the data model, which is in violation of First Normal Form, is the reason). This can be improved using standard instr and substr, but the query will be more complicated, and for that reason, harder to maintain.
I generated more input data, to illustrate a few things. You may have several inputs that must be broken up at the same time; that must be done with care. (Note the additional conditions in CONNECT BY). I also illustrate the handling of NULL - if a comma comes right after a semicolon, that means that the "column 1" part of that pair must be NULL. That is shown in the output.

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Remove spaces in words

I do have a lot of strings in my database (PostgreSQL), an example:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
select * from mystrings
Is there a way how I can remove spaces between characters in words? For my example the result should be:
Hello, how are you
I am fine, thank you
This is strange text
With crazy space between characters
I started with replace, but there are many such words with spaces between characters and I cannot even find them all.
Because it might be difficult to meaningfully concatenate characters, it might be better idea to get just list of concatenation candidates. Using example data, the result should be:
H e l l o
t h a n k
s t r a n g e
c r a z y
b e t w e e n
Such query should find and return all substrings in string when there are at least three individual characters separated by two spaces (and continue until patern [space] individual character occurs):
He l l o how are you --> llo
H e l l o how are you --> Hello
C r a z y space b e t w e e n --> {crazy, between}
As per your edited question, the below gets all the possible candidates that have a least three individual characters separated by two spaces
SELECT
data || ' --> {' || replace_candidates || '}'
FROM(
SELECT
data,
( SELECT
array_to_string( array_agg( data ),',' )
FROM (
SELECT
data,
length( data )
FROM (
SELECT
replace( data, ' ', '' ) AS data
FROM
regexp_split_to_table( data, '\S{2,}' ) AS data
) t
WHERE length( data ) > 2
) t ) AS replace_candidates
FROM
mystrings
) T
WHERE
replace_candidates IS NOT NULL
Working
Start looking at the inner most query first (the one with regexp_split_to_table)
The regexg gets all strings that have 2 characters in a sequence (not separated by a space)
regexp_split_to_table gets the inverse of the match, more on it here
Replace spaces by a empty char and filter records having a length greater than 2
The reaming are array aggregate functions to take care of formatting, as per your requirement, more of this here
Results
H e l l o how are you --> {Hello}
I am fine, t h a n k you --> {thank}
This is s t r a n g e text --> {strange}
With c r a z y space b e t w e e n characters --> {crazy,between}
SOME MORE TEST T E X T --> {TEXT}
SQLFIDDLE
Note: It considers chars which fall in as [space][char][space], but, you can modify it to suit your needs as [space][space][char][space] or [space][char][special_char][space] ...
Hope this helps ;p
You can use a resource such as online dictionary if the word exists then you dont have to remove spaces otherwise remove spaces or you can use a table where you have to put all strings that exist and then you have to check with that table.Hope you got my point.
The following finds possible concatenation candidates:
with mystrings as (
select 'H e l l o, how are you'::varchar string union all
select 'I am fine, t h a n k you'::varchar string union all
select 'This is s t r a n g e text'::varchar string union all
select 'With c r a z y space b e t w e e n characters'::varchar string
)
, u as (
select string, strpart[rn] as strpart, rn
from (
select *, generate_subscripts(strpart, 1) as rn
from (
select string, string_to_array(replace(string,',',''), ' ') as strpart
from mystrings
) x
) y
)
,w as (
select
string,strpart,rn,
case when length(strpart) = 1 then 1 else 0 end as indchar ,
case when coalesce(length(lag(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strstart,
case when coalesce(length(lead(strpart) over()),0) <> 1 and length(strpart) = 1 then 1 else 0 end as strend
from u
)
,x as (
select
string,rn,strpart,indchar,strstart,
sum(strstart) over (order by string, rn) as strid
from w
where indchar = 1 and not (strstart = 1 and strend = 1)
)
select string, array_to_string(array_agg(strpart),'') as candidate from x group by string, strid

Using IN with convert in sql

I would like to use the IN clause, but with the convert function.
Basically, I have a table (A) with the column of type int.
But in the other table (B) I Have values which are of type varchar.
Essentially, what I am looking for something like this
select *
from B
where myB_Column IN (select myA_Columng from A)
However, I am not sure if the int from table A, would map / convert / evaluate properly for the varchar in B.
I am using SQL Server 2008.
You can use CASE statement in where clause like this and CAST only if its Integer.
else 0 or NULL depending on your requirements.
SELECT *
FROM B
WHERE CASE ISNUMERIC(myB_Column) WHEN 1 THEN CAST(myB_Column AS INT) ELSE 0 END
IN (SELECT myA_Columng FROM A)
ISNUMERIC will be 1 (true) for Decimal values as-well so ideally you should implement your own IsInteger UDF .To do that look at this question
T-sql - determine if value is integer
Option #1
Select * from B where myB_Column IN
(
Select Cast(myA_Columng As Int) from A Where ISNUMERIC(myA_Columng) = 1
)
Option #2
Select B.* from B
Inner Join
(
Select Cast(myA_Columng As Int) As myA_Columng from A
Where ISNUMERIC(myA_Columng) = 1
) T
On T.myA_Columng = B.myB_Column
Option #3
Select B.* from B
Left Join
(
Select Cast(myA_Columng As Int) As myA_Columng from A
Where ISNUMERIC(myA_Columng) = 1
) T
On T.myA_Columng = B.myB_Column
I will opt third one. Reason is below mentioned.
Disadvantages of IN Predicate
Suppose I have two list objects.
List 1 List 2
1 12
2 7
3 8
4 98
5 9
6 10
7 6
Using Contains, it will search for each List-1 item in List-2 that means iteration will happen 49 times !!!
You can also use exists caluse,
select *
from B
where EXISTS (select 1 from A WHERE CAST(myA_Column AS VARCHAR) = myB_Column)
You can use below query :
select B.*
from B
inner join (Select distinct MyA_Columng from A) AS X ON B.MyB_Column = CAST(x.MyA_Columng as NVARCHAR(50))
Try it by using CAST()
SELECT *
FROM B
WHERE CAST(myB_Column AS INT(11)) IN (
SELECT myA_Columng
FROM A
)