pgsql parse string to get a string after certain position - sql

I have a table column that has data like
NA_PTR_51000_LAT_CO-BOGOTA_S_A
NA_PTR_51000_LAT_COL_M_A
NA_PTR_51000_LAT_COL_S_A
NA_PTR_51000_LAT_COL_S_B
NA_PTR_51000_LAT_MX-MC_L_A
NA_PTR_51000_LAT_MX-MTY_M_A
I want to parse each column value so that I get the values in column_B. Thank you.
COLUMN_A COLUMN_B
NA_PTR_51000_LAT_CO-BOGOTA_S_A CO-BOGOTA
NA_PTR_51000_LAT_COL_M_A COL
NA_PTR_51000_LAT_COL_S_A COL
NA_PTR_51000_LAT_COL_S_B COL
NA_PTR_51000_LAT_MX-MC_L_A MX-MC
NA_PTR_51000_LAT_MX-MTY_M_A MX-MTY

I'm not sure of the Postgresql and I can't get SQL fiddle to accept the schema build...
substring and length may vary...
Select Column_A, substr(columN_A,18,length(columN_A)-17-4) from tableName
Ok how about this then:
http://sqlfiddle.com/#!15/ad0dd/56/0
Select column_A, b
from (
Select Column_A, b, row_number() OVER (ORDER BY column_A) AS k
FROM (
SELECT Column_A
, regexp_split_to_table(Column_A, '_') b
FROM test
) I
) X
Where k%7=5
Inside out:
Inner most select simply splits the data into multiple rows on _
middle select adds a row number so that we can use the use the mod operator to find all occurances of a 5th remainder.
This ASSUMES that the section of data you're after is always the 5th segment AND that there are always 7 segments...

Use regexp_matches() with a search pattern like 'NA_PTR_51000_LAT_(.+)_'
This should return everything after NA_PTR_51000_LAT_ before the next underscore, which would match the pattern you are looking for.

Related

Splitting the string into columns to extract values using BigQuery

How can i split the string by a specific character and extract the value of each. The idea is that i need to extract each word between the line including the start/end of the string as this information represents something. Is there a regex pattern ? or a way to split the info into columns ?
Name
A|B|C|D|E|F|G
Name col1 col2 col3 col4 col5 col6 col7
A|B|C|D|E|F|G A B C D E F G
I am using BigQuery for this and couldn't find a way to get the info of all of those. I tried the regex code which only works for the case where we have A|B|C.
I have to compare each column value and then create conditions using case when
CODE:
select
regexp_extract(name, "\\w+\\S(x|y)") as c2, -- gives either x or y
left(regexp_substr(name, "\\w+\\S\\w+\\S\\w+"),1) as c1,
right(regexp_extract(name, "\\w+\\S\\w+\\S\\w+"),1) as c3
from Table
Consider below approach
select * from (
select *
from your_table, unnest(split(name, '|')) value with offset
)
pivot(any_value(value) as col for offset in (0,1,2,3,4,5,6))
if applied to dummy data as in your question - output is
This seems like a use case for SPLIT().
select split(name,"|")[safe_offset(0)] as c1, split(name,"|")[safe_offset(1)] as c2, ..
from table
see https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split
Added use of safe_offset instead of offset per Array index 74 is out of bounds (overflow) google big query

SQL Select where column contains 1 value in set of words

I need a select which would return row if column A of that row contains any word from a list of words which get from user input
SELECT *
FROM MyTable
WHERE ColumnA CONTAINS ANY 'list of word'
Since the list of words has an unknown number of words, I store the whole list in the same string. each word can be separated with "_", "-" or white space.
You can try something like this if you are using oracle :
SELECT * FROM MyTable WHERE ColumnA in (select upper(regexp_substr('word1-
word2-word3','[^-]+',1,level)) from dual
connect by upper(regexp_substr('word1-word2-word3','[^-]+',1,level)) is
not null)
If you are using "_" then replace the hyphen with underscore is regexp_substr parameter.
I've came up with this solution:
SELECT *
from TableA tb
RIGHT JOIN STRING_SPLIT ( 'list of words' , 'seperator' ) v on tb.ColumnA = v.value
WHERE tb.ColumnA IS NOT NULL

Where x character equal value

How can I select records where in the column Value the 5th character is letter A?
For example the following records:
ID Value
-------------------------
1 1234A5636A6363
2 1234A4343B6363
3 1234B5353A6363
if I run
select * from table
where Value like '%A%'
this will return all records
but all I want is the first 2 where the 5th character is A, regardless if there are more A characters in the text or not
select *
from your_table
where substring(Value, 5, 1) = 'A'
The LIKE operator, in addition to %, which matches any number of any character, can use _, which matches any one single character. You may try:
SELECT *
FROM yourTable
WHERE Value LIKE '____A%'; -- 4 underscores here
use like below by using _(underscore)
LIKE '____A%'
SQL Server
select *
from YourTableName
where CHARINDEX('A', ColumnName) = 5
Note:- This finds where string 'A' starts at position 5
AND specify Your ColumnName

SQL Strings - Filter by Hypen(x number)

I am trying to formulate a query that will allow me to find all records from a single column with 3 hyphens. An example of a record would be like XXXX-RP-XXXAS1-P.
I need to be able to sort through 1000s of records with either 2 or 3 hyphens.
You can REPLACE the hyphens in the string with an empty string and compute the difference of the length of original string and the replaced string to check for the number of hyphens.
select *
from yourtable
where len(column_name)-len(replace(column_name,'-',''))=3
and substring(column_name,9,1) not like '%[0-9]%'
If your records have 2 or 3 hyphens, then just do:
where col like '%-%-%-%'
This will get 3 or more hyphens. For exactly 3:
where col like '%-%-%-%' and col not like '%-%-%-%-%'
try this,
declare #t table(col1 varchar(50))
insert into #t values ('A-B'),('A-B-C-D-E'),('A-B-C-D')
select * from
(SELECT *
,(len(col1) - len(replace(col1, '-', ''))
/ len('-')) col2
FROM #T)t4
where col2=3

Get group maxima from combined strings

I have a table with a column code containing multiple pieces of data like this:
001/2017/TT/000001
001/2017/TT/000002
001/2017/TN/000003
001/2017/TN/000001
001/2017/TN/000002
001/2016/TT/000001
001/2016/TT/000002
001/2016/TT/000001
002/2016/TT/000002
There are 4 items in 001/2016/TT/000001: 001, 2016, TT and 000001.
How can I extract the max for every group formed by the first 3 items? The result I want is this:
001/2017/TT/000003
001/2017/TN/000002
001/2016/TT/000002
002/2016/TT/000002
Edit
The subfield separator is /, and the length of subfields can vary.
I use PostgreSQL 9.3.
Obviously, you should normalize the table and split the combined string into 4 columns with proper data type. The function split_part() is the tool of choice if the separator '/' is constant in your string and the length of can vary.
CREATE TABLE tbl_better AS
SELECT split_part(code, '/', 1)::int AS col_1 -- better names?
, split_part(code, '/', 2)::int AS col_2
, split_part(code, '/', 3) AS col_3 -- text?
, split_part(code, '/', 4)::int AS col_4
FROM tbl_bad
ORDER BY 1,2,3,4 -- optionally cluster data.
Then the task is trivial:
SELECT col_1, col_2, col_3, max(col_4) AS max_nr
FROM tbl_better
GROUP BY 1, 2, 3;
Related:
Split comma separated column data into additional columns
Of course, you can do it on the fly, too. For varying subfield length you could use substring() with a regular expression like this:
SELECT max(substring(code, '([^/]*)$')) AS max_nr
FROM tbl_bad
GROUP BY substring(code, '^(.*)/');
Related (with basic explanation for regexp pattern):
Filter strings with regex before casting to numeric
Or to get only the complete string as result:
SELECT DISTINCT ON (substring(code, '^(.*)/'))
code
FROM tbl_bad
ORDER BY substring(code, '^(.*)/'), code DESC;
About DISTINCT ON:
Select first row in each GROUP BY group?
Be aware that data items cast to a suitable type may behave differently from their string representation. The max of 900001 and 1000001 is 900001 for text and 1000001 for integer ...
Use the LEFT and RIGHT functions.
SELECT MAX(RIGHT(code,6)) AS MAX_CODE
FROM yourtable
GROUP BY LEFT(code,12)
check this out, possible helpfull
select
distinct on (tab[4],tab[2]) tab[4],tab[3],tab[2],tab[1]
from
(
select
string_to_array(exe.x,'/') as tab,
exe.x
from
(
select
unnest
(
array
['001/2017/TT/000001',
'001/2017/TT/000002',
'001/2017/TN/000003',
'001/2017/TN/000001',
'001/2017/TN/000002',
'001/2016/TT/000001',
'001/2016/TT/000002',
'001/2016/TT/000001',
'002/2016/TT/000002']
) as x
) exe
) exe2
order by tab[4] desc,tab[2] desc,tab[3] desc;