Extracting number of specific length from a string in Postgres - sql

I am trying to extract a set of numbers from comments like
"on april-17 transactions numbers are 12345 / 56789"
"on april-18 transactions numbers are 56789"
"on may-19 no transactions"
Which are stored in a column called "com" in table comments
My requirement is to get the numbers of specific length. In this case length of 5, so 12345 and 56789 from the above string separately, It is possible to to have 0 five digit number or more more than 2 five digit number.
I tried using regexp_replace with the following result, I am trying the find a efficient regex or other method to achieve it
select regexp_replace(com, '[^0-9]',' ', 'g') from comments;
regexp_replace
----------------------------------------------------
17 12345 56789
I expect the result to get only
column1 | column2
12345 56789

There is no easy way to create query which gets an arbitrary number of columns: It cannot create one column for one number and at the next try the query would give two.
For fixed two columns:
demo:db<>fiddle
SELECT
matches[1] AS col1,
matches[2] AS col2
FROM (
SELECT
array_agg(regexp_matches[1]) AS matches
FROM
regexp_matches(
'on april-17 transactions numbers are 12345 / 56789',
'\d{5}',
'g'
)
) s
regexp_matches() gives out all finds in one row per find
array_agg() puts all elements into one array
The array elements can be give out as separate columns.

Related

Match similar column values in different rows

I have a table with an ID column (String) and I need to be able to find IDs that are similar between different rows. What is the SQL that will allow me to flag a row as similar? Note: There can be one-to-many rows like shown below (i.e. 12345, 12345RED, etc.)
Update: The IDs are "similar" in that there is typically leading numerical values followed by no space then alpha characters OR space " ", hyphen "-", or forward slash "/" then followed by alpha characters. ####[a-zA-z], #### [a-zA-Z], ####-[a-zA-z}, or ####/[a-zA-z]. (I'm not sure how to indicate 1-to-many numeric characters).
ID
Similar
12345
Yes
12345RED (Could also be 12345-RED, 12345/RED, or 12345 RED)
Yes
12345BLU (Could also be 12345-BLU, 12345/BLU, or 12345 BLU)
Yes
12345GRN (Could also be 12345-GRN, 12345/GRN, or 12345 GRN)
Yes
12345BLK (Could also be 12345-BLK, 12345/BLK, or 12345 BLK)
Yes
123456
No
123457
No
Assuming "similar" means "have the same leading numerals"...
First, extract the numerals, such as with a regular expression. Then, count how many other ros have the same leading numerals, using a window function.
WITH
extract_numerals AS
(
SELECT
*,
REGEXP_EXTRACT(id, r'^\d+') AS leading_numerals
FROM
your_table
)
SELECT
*,
COUNT(*) OVER (PARTITION BY leading_numerals) - 1 AS similar_rows
FROM
extract_numerals
ORDER BY
leading_numerals
Any row where the count is zero (after having deducted one from the window function) has no "similar" rows.

SQLite3 Order by highest/lowest numerical value

I am trying to do a query in SQLite3 to order a column by numerical value. Instead of getting the rows ordered by the numerical value of the column, the rows are ordered alphabetically by the first digit's numerical value.
For example in the query below 110 appears before 2 because the first digit (1) is less than two. However the entire number 110 is greater than 2 and I need that to appear after 2.
sqlite> SELECT digit,text FROM test ORDER BY digit;
1|one
110|One Hundred Ten
2|TWO
3|Three
sqlite>
Is there a way to make 110 appear after 2?
It seems like digit is a stored as a string, not as a number. You need to convert it to a number to get the proper ordering. A simple approach uses:
SELECT digit, text
FROM test
ORDER BY digit + 0

SAP HANA SQL - Concatenate multiple result rows for a single column into a single row

I am pulling data and when I pull in the text field my results for the "distinct ID" are sometimes being duplicated when there are multiple results for that ID. Is there a way to concatenate the results into a single column/row rather than having them duplicated?
It looks like there are ways in other SQL platforms but I have not been able to find something that works in HANA.
Example
Select
Distinct ID
From Table1
If I pull only Distinct ID I get the following:
ID
1
2
3
4
However when I pull the following:
Example
Select
Distinct ID,Text
From Table1
I get something like
ID
Text
1
Dog
2
Cat
2
Dog
3
Fish
4
Bird
4
Horse
I am trying to Concat the Text field when there is more than 1 row for each ID.
What I need the results to be (Having a "break" between results so that they are on separate lines would be even better but at least a "," would work):
ID
Text
1
Dog
2
Cat,Dog
3
Fish
4
Bird,Horse
I see Kiran has just referred to another valid answer in the comment, but in your example this would work.
SELECT ID, STRING_AGG(Text, ',')
FROM TABLE1
GROUP BY ID;
You can replace the ',' with other characters, maybe a '\n' for a line break
I would caution against the approach to concatenate rows in this way, unless you know your data well. There is no effective limit to the rows and length of the string that you will generate, but HANA will have a limit on string length, so consider that.

T - SQL statement for number ranges

Create table temp
(
ID nvarchar(50)
)
ID contains numeric values prevailing zeros in some cases so it is defined as varchar
How to get values starts with 3555 to 3999 and 8000 to 9999.There is no specific rule that length is always 4.
Eg:
3555
35688888888888
3590909
8000
85805667
all of the values are valid and are to be fetched.
Please let me know T- SQL statement for the above scenario
You can use few expressions with LIKE. If you have an index on ID, it would use it, so it will be efficient. Something like this:
SELECT
ID
FROM
temp
WHERE
ID LIKE '3[5-9]%'
OR ID LIKE '[89]%'
LIKE '3[5-9]%' matches any string that starts with 3 and which second character is 5 or 6 or 7 or 8 or 9. After these two characters there can be 0 or more other characters. Any number of extra characters.
LIKE '[89]%' matches any string that starts with 8 or 9 and any number characters after.
You can extract the first four chars, convert that to a number and query like this:
SELECT
[ID]
FROM temp
WHERE convert(int,LEFT([ID],4)) BETWEEN 3500 AND 3999
OR convert(int,LEFT([ID],4)) BETWEEN 8000 AND 9999
For lots of data this will be horribly slow, so if you need performance I would recommend to add an indexed int column to the table where you store the number that represents the first four digits of ID.

How can I "dynamically" split a varchar column by specific characters?

I have a column that stores 2 values. Example below:
| Column 1 |
|some title1 =ExtractThis ; Source Title12 = ExtractThis2|
I want to remove 'ExtractThis' into one column and 'ExtractThis2' into another column. I've tried using a substring but it doesn't work as the data in column 1 is variable and therefore it doesn't always carve out my intended values. SQL below:
SELECT substring(d.Column1,13,24) FROM dbo.Table d
This returns 'Extract This' but for other columns it either takes too much or too little. Is there a function or combination of functions that will allow me to split consistently on the character? This is consistent in my column unlike my length count.
select substring(col1,CHARINDEX('=',col1)+1,CHARINDEX (';',col1)-CHARINDEX ('=',col1)-1) Val1,
substring(col1,CHARINDEX('=',col1,CHARINDEX (';',col1))+1,LEN(col1)) Val2
from #data
there is duplicate calculation that can be reduced from 5 to 3 to each line.
but I want to believe this simple optimization done by SQL SERVER.