Count specific characters in a column - sql

I have a table with a list of titles. I am trying to figure out a way of creating a substring query that will let me count the number of times that a particular character occurs in the entire column. Such as, how many times does the letter 'A' occur? I am thinking of the substring since I want to know the count for letters A - I.
I need a new table that shows the substring letters (say A-Z) and next to them the total number of times that letter occurs in the entire column (not just in each row).

For the basic ASCII letters like A-Z (as mentioned) and a (typical) UTF-8 or LATIN* encoding (or most others):
SELECT chr(c) AS letter
, sum(octet_length(col)
- octet_length(translate(col, chr(c), ''))) AS total_count
FROM generate_series (ascii('A'), ascii('Z')) c
CROSS JOIN tbl
GROUP BY 1;
translate() works for single-character replacements and is a bit faster than replace() - which you would use looking for multi-character strings.
In (typical) UTF-8 or LATIN* encoding, basic ASCII letters are represented with a single byte. This allows the faster function octet_length(). To count characters encoded with more bytes, use length() instead, which counts characters instead of bytes.
Also, we can conveniently generate a range of letters like A-Z with generate_series(), because their byte-representation lines up in a continuous range in the mentioned encodings. Convert to integer with ascii() and back with chr().
Then CROSS JOIN to your table (tbl), measure the difference between original length and after removing the letter of interest, and sum.
But while counting many of the characters in your strings, this alternative approach is probably much faster:
SELECT letter, count(*) AS total_count
FROM tbl, unnest(string_to_array(col, NULL)) letter
WHERE ascii(letter) BETWEEN ascii('A') AND ascii('Z')
GROUP BY 1;
To count case-insensitive, throw in lower() or upper():
FROM tbl, unnest(string_to_array(upper(col), NULL)) letter
To check for multiple non-continuous ranges of characters:
WHERE letter ~ '^[a-zA-Z]$' -- a-z and A-Z separately (case-sensitive)
Or a random selection:
WHERE 'abcXYZ' ~ letter
string_to_array() with separator NULL splits the string into an array of single characters, unnest() (using implicit CROSS JOIN LATERAL), filter the ones of interest (again, using their byte-representation to make it fast. Then simply count per character.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
PostgreSQL 9.1 using collate in select statements

Related

First 3 letters in caps and remaining letters in small case in a name -SQL

i have question, In a name (eg. Richard) first 3 letters should be in capital letter and remaining letters should be in lower case.
ANS: RIChard
can you help me to get the query for this?
Microsoft SQL Server has UPPER() and LOWER() functions that could change the characters to upper case and lower case.
for your demand, you need to use UPPER for your first 3 letters:
you can use the left() or substring functions to get the first 3 letters.
and for the remaining letters, you need to use the LOWER function.
for splitting the remaining letters you need to use the right or substring functions plus Len() function to calculate the remained letter counts.
Select UPPER(Left(Name,3)) + LOWER(right(Name,len(Name)-3))
OR
Select UPPER(substring(Name,1,3)) + LOWER(substring(Name,4,len(Name)))
What you could do is the following if you want to update it:
UPDATE employees SET First_name = CONCAT(UPPER(LEFT(First_name,3)), LOWER(SUBSTRING(First_name,4)))
You can test it here
If you want to only have it in a select you can use:
SELECT CONCAT(UPPER(LEFT(First_name,3)), LOWER(SUBSTRING(First_name,4))) as First_name FROM employees;
You can test it here.
What I'm doing in both cases is the following:
Get the first 3 characters, and convert it to uppercase
Get all the other characters, and convert it to lowercase
Concatenate the two string together
// Try this:
SELECT UPPER(LEFT('richard',3))+LOWER(SUBSTRING('richard',4));
You can do like this
SELECT UPPER(LEFT('richard',3))+LOWER(SUBSTRING('richard',4,LEN(richard')));
Explanation: Upper() is to capitalize the letter, as you can see, I use it with LEFT(), Left() will get the 3 letters from the left, 3 means the number of character that you want. The substring() will extract the remaining letters, starting from the forth

SQL Server - Regex pattern match only alphanumeric characters

I have an nvarchar(50) column myCol with values like these 16-digit, alphanumeric values, starting with '0':
0b00d60b8d6cfb19, 0b00d60b8d6cfb05, 0b00d60b8d57a2b9
I am trying to delete rows with myCol values that don't match those 3 criteria.
By following this article, I was able to select the records starting with '0'. However, despite the [a-z0-9] part of the regex, it also keeps selecting myCol values containing special characters like 00-d#!b8-d6/f&#b. Below is my select query:
SELECT * from Table
WHERE myCol LIKE '[0][a-z0-9]%' AND LEN(myCol) = 16
How should the expression be changed to select only rows with myCol values that don't contain special characters?
If the value must only contain a-z and digits, and must start with a 0 you could use the following:
SELECT *
FROM (VALUES(N'0b00d60b8d6cfb19'),
(N'0b00d60b8d6cfb05'),
(N'0b00d60b8d57a2b9'),
(N'00-d#!b8-d6/f&#b'))V(myCol)
WHERE V.myCol LIKE '0%' --Checks starts with a 0
AND V.myCol NOT LIKE '%[^0-9A-z]%' --Checks only contains alphanumerical characters
AND LEN(V.myCol) = 16;
The second clause works as the LIKE will match any character that isn't an alphanumerical character. The NOT then (obviously) reverses that, meaning that the expression only resolves to TRUE when the value only contains alphanumerical characters.
Pattern matching in SQL Server is not awesome, and there is currently no real regex support.
The % in your pattern is what is including the special characters you show in your example. The [a-z0-9] is only matching a single character. If your character lengths are 16 and you're only interested in letters and numbers then you can include a pattern for each one:
SELECT *
FROM Table
WHERE myCol LIKE '[0][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9]';
Note: you don't need the AND LEN(myCol) = 16 with this.

How to find letters in a string that contains mostly numbers in netezza

I have a field that is a string but should be mostly numbers. I need to be able to find if a letter is in this string. The letter can be in any spot in the string.
You can use:
select t.*
from t
where regexp_like(field, '[^0-9]');
That is, return any row where field has a non-digit.

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.

SQL MAX function and strings

I have a column nr that contains strings in the format of 12345-12345. The numbers before and after the dash can be of any length. I would like to get the maximum value for nr taking into account only the part after the dash. I tried
SELECT MAX(nr) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this works only for values < 10 (i.e. 12345-9 would be returned as max even if 12345-10 exists). I thought of removing the dash and doing a type conversion:
SELECT MAX(REPLACE(nr, '-', '')::int) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this of course returns the result without the dash. What would be the best way to get the maximum value while including the dash and the number before the dash in the result?
PostgreSQL 9.1
I'm no expert in PostGres, but you can use regexp_replace('foobarbaz', 'b..', 'X') to extract the string after the dash and then convert the number to int. The following query will retrieve only one row the nr from your table where the nr is like 12345-%, sorted by the number after the dash in descending order (largest number first).
SELECT nr
FROM table WHERE (nr LIKE '12345-%')
ORDER BY regexp_replace(nr, '^\d+-', '')::integer DESC
LIMIT 1
The regular expression above removes the leading digits and the dash, leaving only the last set of digits. For example, 54352-12345 would become 12345.
Official documentation.
And here is a SQL Fiddle illustrating it's use.
Use substring function with position function:
http://www.postgresql.org/docs/8.1/static/functions-string.html
to extract number after dash, and then use this value in MAX function as you have in your code now. You can also try to_number function.
It will look similiar to this:
MAX(substring(nr from position('-' in nr))::int)