How do I insert a character every 2 spaces in a string? - sql

I have a table with a 50 CHAR column, with a content like
AABB
AA
CCXXDD
It's a string used like an array of 25 elements CHAR 2.
I need to insert a comma every 2 characters
AA,BB
AA
CC,XX,DD
It there a system function or I need to create one?

We can do a regex replacement here:
SELECT col, RTRIM(REGEXP_REPLACE(col, '(..)', '\1,'), ',') AS col_out
FROM yourTable;
The above logic inserts a comma after every two characters. For inputs having an even number of characters, this leaves an unwanted dangling comma on the right, which we remove using RTRIM().

Related

How to remove any special characters from a string even with dot and comma and spaces

INPUT STRING:'HI every one. I want to (2-21-2022) remove the comma-dot and other any special character from string(123)'.
OUTPUT STRING:'HI every one I want to 2-21-2022 remove the #comma dot and other any special #character from string 123'
Thanks IN Advance.
If what you said in title:
remove any special characters from a string even with dot and comma and spaces
means that you'd want to keep only digits and letters, then such a regular expression might do:
SQL> with test (col) as
2 (select 'HI every one. I want to (2-21-2022) remove the comma-dot and other any special character from string(123)' from dual)
3 select regexp_replace(col, '[^[:alnum:]]') result
4 from test;
RESULT
---------------------------------------------------------------------------------
HIeveryoneIwantto2212022removethecommadotandotheranyspecialcharacterfromstring123
SQL>
On the other hand, that's not what example you posted represents (as already commented).

Count specific characters in a column

I have a table with a list of titles. I am trying to figure out a way of creating a substring query that will let me count the number of times that a particular character occurs in the entire column. Such as, how many times does the letter 'A' occur? I am thinking of the substring since I want to know the count for letters A - I.
I need a new table that shows the substring letters (say A-Z) and next to them the total number of times that letter occurs in the entire column (not just in each row).
For the basic ASCII letters like A-Z (as mentioned) and a (typical) UTF-8 or LATIN* encoding (or most others):
SELECT chr(c) AS letter
, sum(octet_length(col)
- octet_length(translate(col, chr(c), ''))) AS total_count
FROM generate_series (ascii('A'), ascii('Z')) c
CROSS JOIN tbl
GROUP BY 1;
translate() works for single-character replacements and is a bit faster than replace() - which you would use looking for multi-character strings.
In (typical) UTF-8 or LATIN* encoding, basic ASCII letters are represented with a single byte. This allows the faster function octet_length(). To count characters encoded with more bytes, use length() instead, which counts characters instead of bytes.
Also, we can conveniently generate a range of letters like A-Z with generate_series(), because their byte-representation lines up in a continuous range in the mentioned encodings. Convert to integer with ascii() and back with chr().
Then CROSS JOIN to your table (tbl), measure the difference between original length and after removing the letter of interest, and sum.
But while counting many of the characters in your strings, this alternative approach is probably much faster:
SELECT letter, count(*) AS total_count
FROM tbl, unnest(string_to_array(col, NULL)) letter
WHERE ascii(letter) BETWEEN ascii('A') AND ascii('Z')
GROUP BY 1;
To count case-insensitive, throw in lower() or upper():
FROM tbl, unnest(string_to_array(upper(col), NULL)) letter
To check for multiple non-continuous ranges of characters:
WHERE letter ~ '^[a-zA-Z]$' -- a-z and A-Z separately (case-sensitive)
Or a random selection:
WHERE 'abcXYZ' ~ letter
string_to_array() with separator NULL splits the string into an array of single characters, unnest() (using implicit CROSS JOIN LATERAL), filter the ones of interest (again, using their byte-representation to make it fast. Then simply count per character.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
PostgreSQL 9.1 using collate in select statements

Insert comma after every 7th character using regex and hive sql

Insert comma after every 7th character and make sure the data is having comma after every 7th character correctly using regex in hive sql.
Also to ignore the space while selecting the 7th character.
Sample Input Data:
12F123f, 123asfH 0DB68ZZ, AG12453
112312f, 1212sfH 0DB68ZZ, AQ13463
Output:
12F123f,123asfH,0DB68ZZ,AG12453
112312f,1212sfH,0DB68ZZ,AQ13463
I tried the below code, but it didn't work and insert the commas correctly.
select regexp_replace('12345 12456,12345 123', '(/(.{5})/g,"$1$")','')
I think you can use
select regexp_replace('12345 12456,12345 123', '(?!^)[\\s,]+([^\\s,]+)', ',$1')
See the regex demo
Details
(?!^) - no match if at string start
[\s,]+ - 1 or more whitespaces or commas
([^\s,]+) - Capturing group 1: one or more chars other than whitespaces and commas.
The ,$1 replacement replaces the match with a comma and the value in Group 1.
You just want to replace the empty char to ,, am I right? the SQL as below:
select regexp_replace('12F123f,123asfH 0DB68ZZ,AG12453',' ',',') as result;
+----------------------------------+--+
| result |
+----------------------------------+--+
| 12F123f,123asfH,0DB68ZZ,AG12453 |
+----------------------------------+--+

Regex to split values in PostgreSQL

I have a list of values coming from a PGSQL database that looks something like this:
198
199
1S
2
20
997
998
999
C1
C10
A
I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.
The above list would then be split into this layout as the result from PG:
I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.
All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.
For extracting the characters before the digits:
regexp_replace(fieldname, '\d.*$', '')
For extracting the characters after the digits:
regexp_replace(fieldname, '^([^\d]*\d*)', '')
Note that:
if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
This also works for any non-alphanumeric characters like #, [, ! ...etc.
Final SQL
select
fieldname as original,
regexp_replace(fieldname, '\d.*$', '') as before_s,
regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;
See fiddle.
This answer relies on information you delivered, which is
Essentially, any non-numeric character that appears before numeric
ones needs to be returned for one column, and the other column would
show all non-numeric characters appearing AFTER numeric ones.
Everything non-numeric before a numeric value into 1 column
Everything non-numeric after a numeric value into 2 column
So there's assumption that you have a value that has a numeric value in it.
select
val,
regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from
val;
Attached SQLFiddle for a preview.

Split a field and add these fields to another table

I have a table which has a field that allows up to 120 chars. I want to split the field into three fields. If the field contains more than 40 chars and less than 80 then split the field into two. The split point should be the first space char, before the 40th character and add the two new fields to another table. and if the field is 120 char then split them in three.
Will appreciate the help!
I guess you could do something along the lines of:
SELECT
SUBSTRING(MyCol,1,40),
NULLIF(SUBSTRING(MyCol,41,40), ''),
NULLIF(SUBSTRING(MyCol,81,40), ''),
To have your 1 column broken down correctly for your INSERT statement.
The NullIf function will set whatever column needs to be NULL correctly if the SubString() function returns an empty string for that value.