How to grab certain value found in a string column? - sql

I have a column that contains several different values. This is not in JSON format. It is a string that is separated into different sections. I need to grab everything that is found under ID only.
In the examples below, I only want to grab the word: "syntax" and "village"
select value.id
from TBL_A
The above does not work since this is not a json.
Does anyone know how to grab the full word that is found under the "id" section in that string column?

Even though it's a string, since it's in properly formatted JSON you can convert the string to a JSON variant like this:
select parse_json(VALUE);
You can then access its properties using standard colon and dot notations:
select parse_json(VALUE):id::string

I would go with Greg's option of treat it as JSON because it sure looks like JSON, but if you know under some situations it most definitely is not JSON like, you could use SPLIT_TO_TABLE, and TRIM, if you know , is not inside any of the strings
SELECT t.value,
TRIM(t.value,'{}') as trim_value,
s.value as s_value,
split_part(s_value, ':', 1) as token,
split_part(s_value, ':', 2) as val,
FROM table t
,LATERAL SPLIT_TO_TABLE(trim_value, ',') s
Which can be compacted up, filtered with QUALIFY to get just the rows you want:
SELECT
SPLIT_PART(s.value, ':', 2) AS val,
FROM table t
,LATERAL SPLIT_TO_TABLE(TRIM(t.value,'{}'), ',') s
QUALIFTY SPLIT_PART(s.value, ':', 1) = 'id'

Related

Finding Unicode Characters in String Field with Regex in Oracle SQL

I have a string field (comments) that contains a user id such as 'THOMASAN'. However, the string field is dynamic and can have a plethora of things written in it. But it always has the pattern 'UserID'. I am trying to use the REGEXP_SUBSTR function in Oracle SQL to pull the name out.
I have tried REGEXP_SUBSTR(comments,'[A-Z]*') but it brings back null. In a string field how do I pull out this userid?
UPDATE:
For the specific unicode you mentioned 
with cte as ( SELECT ' the left padding thomsan the right padding' comments FROM dual),
cte2 as (select ASCIISTR(upper(comments)) cmt from cte)
SELECT replace(regexp_substr( cmt, '\F7FD[A-Z]+', 1), 'F7FD','') userid from cte2;

how to read & seprated data in single column

Field Description:
User_id Unique identifier of every user following these creators
Creator_id List of creator ids separated by ‘&’
User_id,Creator_IDs
U100,A300&A301&A302
U101,A301&A302
U102,A302
U103,A303&A301&A302
U104,A304&A301
U105,A305&A301&A302
U106,A301&A302
U107,A302
Note: I have to remove U and A before the values, I though I could use substring for U but what can I do for A since it is varying.
Moreover going forward I have to use this data to have distinct creator_id and subsequent user following them.
You could try using regexp_replace eg:
select regexp_replace(User_id, "^U", "")
, regexp_replace(regexp_replace(Creator_IDs, "A", ""), '&', ',')
You can use REPLACE function to remove A from the string. The function would be something like this -
SELECT REPLACE('A300&A301&A302', 'A','') AS NewString;
For the entire query -
select concat (REPLACE('U100', 'U',''),',',REPLACE('A300&A301&A302', 'A',''));
You can use this to see how it works. For your query of course you have to use the column names -
select concat (REPLACE(user_id, 'U',''),',',REPLACE(Creator_Id, 'A',''));

Extracting values between two delimiters for an entire column in SQL

I am looking for a way to extract all substrings between two delimiters over an entire column.I have found ways to do this for each string separately, but I need something I can apply over the entire column.
For example if I have a column called "NAMES" that contains the below values:
1235_brandon_098410090
1242353_sam_1920420101222
134214_kristein_39402384
I want my output to be
brandon
sam
kristein
how do I do this?
I've tried this:
regex_substr(names,'_(.*?)_'
Query Error: error: function regex_substr(character varying, unknown) does not exist
In MySQL, I think you can use substring_index():
select substring_index(substring_index(names, '_', 2), '_', -1)
This extracts the second value delimited by underscores, which is what all the sample data suggests is needed.
EDIT:
Your error message looks like Postgres. This is the equivalent in Postgres:
select v.*, split_part(names, '_', 2)
from (values ('1242353_sam_1920420101222')) v(names);
In Postgres, you can use substring() with a pattern as well:
select v.*, substring(names from '[A-Za-z]+')

SQL / REGEX pattern matching

I want to use regex through sql to query some data to return values. The only valid values below returned would be "GB" and "LDN", or could also be "GB-LDN"
G-GB-LDN-TT-TEST
G-GB-LDNN-TT-TEST
G-GBS-LDN-TT-TEST
As it writes the first GB set needs to have 2 characters specifically, and the LDN needs to have 3 characters specifically. Both sets/groups seperated by an - symbol. I kind of need to extract the data but at the same time ensure it is within that pattern. I took a look at regex but I can't see how to, well it's like substring but I can't see it.
IF i undertsand correctly, you could still use of substring() function to extract the string parts separated by -.
select left(parsename(a.string, 3), 2) +'-'+ left(parsename(a.string, 2) ,3) from
(
select replace(substring(data, 1, len(data)-charindex('-', reverse(data))), '-', '.') [string] from <table>
) a
As in above you could also define the length of extracted string.
Result :
GB-LDN
GB-LDN
GB-LDN

Text to List in SQL

Is there any way on how to convert a comma separated text value to a list so that I can use it with 'IN' in SQL? I used PostgreSQL for this one.
Ex.:
select location from tbl where
location in (replace(replace(replace('[Location].[SG],[Location].[PH]', ',[Location].[', ','''), '[Location].[', ''''), ']',''''))
This query:
select (replace(replace(replace('[Location].[SG],[Location].[PH]', ',[Location].[', ','''), '[Location].[', ''''), ']',''''))
produces 'SG','PH'
I wanted to produce this query:
select location from tbl where location in ('SG','PH')
Nothing returned when I executed the first query. The table has been filled with location values 'SG' and 'PH'.
Can anyone help me on how to make this work without using PL/pgSQL?
So you're faced with a friendly and easy to use tool that won't let you get any work done, I feel your pain.
A slight modification of what you have combined with string_to_array should be able to get the job done.
First we'll replace your nested replace calls with slightly nicer replace calls:
=> select replace(replace(replace('[Location].[SG],[Location].[PH]', '[Location].', ''), '[', ''), ']', '');
replace
---------
SG,PH
So we strip out the [Location]. noise and then strip out the leftover brackets to get a comma delimited list of the two-character location codes you're after. There are other ways to get the SG,PH using PostgreSQL's other string and regex functions but replace(replace(replace(... will do fine for strings with your specific structure.
Then we can split that CSV into an array using string_to_array:
=> select string_to_array(replace(replace(replace('[Location].[SG],[Location].[PH]', '[Location].', ''), '[', ''), ']', ''), ',');
string_to_array
-----------------
{SG,PH}
to give us an array of location codes. Now that we have an array, we can use = ANY instead of IN to look inside an array:
=> select 'SG' = any (string_to_array(replace(replace(replace('[Location].[SG],[Location].[PH]', '[Location].', ''), '[', ''), ']', ''), ','));
?column?
----------
t
That t is a boolean TRUE BTW; if you said 'XX' = any (...) you'd get an f (i.e. FALSE) instead.
Putting all that together gives you a final query structured like this:
select location
from tbl
where location = any (string_to_array(...))
You can fill in the ... with the nested replace nastiness on your own.
Assuming we are dealing with a comma-separated list of elements in the form [Location].[XX],
I would expect this construct to perform best:
SELECT location
FROM tbl
JOIN (
SELECT substring(unnest(string_to_array('[Location].[SG],[Location].[PH]'::text, ',')), 13, 2) AS location
) t USING (location);
Step-by-step
Transform the comma-separated list into an array and split it to a table with unnest(string_to_array()).
You could do the same with regexp_split_to_table(). Slightly shorter but more expensive.
Extract the XX part with substring(). Very simple and fast.
JOIN to tbl instead of the IN expression. That's faster - and equivalent while there are no duplicates on either side.
I assign the same column alias location to enable an equijoin with USING.
Directly using location in ('something') works
I have create a fiddle that uses IN clause on a VARCHAR column
http://sqlfiddle.com/#!12/cdf915/1