Get an element in Json using PostgreSQL - sql

I have this JSON and I want to get this part: '1000000007296871' in SQL
{"pixel_rule":"{\"and\":[{\"event\":{\"eq\":\"Purchase\"}},{\"or\":[{\"content_ids\":{\"i_contains\":\"1000000007296871\"}}]}]}"}
How to do that?
this is JSON Dump

Well, you can do this like so:
SELECT
(
(
(
(
(
'{"pixel_rule":"{\"and\":[{\"event\":{\"eq\":\"Purchase\"}},{\"or\":[{\"content_ids\":{\"i_contains\":\"1000000007296871\"}}]}]}"}'::json->>'pixel_rule'
)::json->>'and'
)::json->1
)::json->>'or'
)::json->0->>'content_ids'
)::json->>'i_contains';
But something is really funny about your input, since it contains json nested in json multiple times.

One option would be smoothing the object by trimming the wrapper quotes of the pixel_rule's value, and then getting rid of redundant backslashes, and applying json_array_elements function consecutively in order to use #>> operator with the related path to extract the desired value :
SELECT json_array_elements
(
json_array_elements(
replace(trim((jsdata #>'{pixel_rule}')::text,'"'),'\','')::json -> 'and'
) -> 'or'
)::json #>> '{content_ids,i_contains}' AS "Value"
FROM tab
Demo

my option is using regex
'\d+' if other part is not digit
select substring('string' from '\d+')

Related

How to use regexp_replace in hive to remove strings

I have a table as:
column1 -> 101#1,102#2,103#3,104#4
I am trying to remove strings (101#,102#,103#,104#). The expected output is
column2 -> 1,2,3,4
I am trying to do using regexp_replace
any help would be highly appreciated
It seems silly but, you have to break the string into an array, then transform (run a function) on each element, and finally concat the array back into a string.
select concat_ws( ',' , transform ( split('101#1,102#2,103#3,104#4',','), x -> regexp_replace( x, '.*#','' )))

Snowflake; convert strings to an array

Using snowflake, I have a column named 'column_1'. The datatype is TEXT.
An example of a value in this column is here:
["Apple", "Pear","Chicken"]
I say:
select to_array(column_1) from fake_table; and I get:
[ "[\"Apple\",\"Pear\",\"Chicken\"]" ]
So it put my text into it. But I want to convert the datatype. Seems like it should be simple.
I try strtok_to_array(column_1, ',') and get the same situation.
How can snowflake convert strings to an array?
Using PARSE_JSON:
SELECT PARSE_JSON('["Apple", "Pear","Chicken"]')::ARRAY;
DESC RESULT LAST_QUERY_ID();
Output:
Since that's valid JSON, you can use the PARSE_JSON function:
select parse_json('["Apple", "Pear","Chicken"]');
select parse_json('["Apple", "Pear","Chicken"]')[0]; -- Get first one
select parse_json('["Apple", "Pear","Chicken"]')[0]::string; -- Cast to string
I'd say parse_json is the way to go, but if you're concerned some values might not be a valid json, you could get rid of the double quotes and square brackets and split the resulting comma separated string to array
select split(translate(col,$$"[]$$,''),',')
Note : Encapsulating in $$ makes escaping quotes and any other special character easier

How to grab certain value found in a string column?

I have a column that contains several different values. This is not in JSON format. It is a string that is separated into different sections. I need to grab everything that is found under ID only.
In the examples below, I only want to grab the word: "syntax" and "village"
select value.id
from TBL_A
The above does not work since this is not a json.
Does anyone know how to grab the full word that is found under the "id" section in that string column?
Even though it's a string, since it's in properly formatted JSON you can convert the string to a JSON variant like this:
select parse_json(VALUE);
You can then access its properties using standard colon and dot notations:
select parse_json(VALUE):id::string
I would go with Greg's option of treat it as JSON because it sure looks like JSON, but if you know under some situations it most definitely is not JSON like, you could use SPLIT_TO_TABLE, and TRIM, if you know , is not inside any of the strings
SELECT t.value,
TRIM(t.value,'{}') as trim_value,
s.value as s_value,
split_part(s_value, ':', 1) as token,
split_part(s_value, ':', 2) as val,
FROM table t
,LATERAL SPLIT_TO_TABLE(trim_value, ',') s
Which can be compacted up, filtered with QUALIFY to get just the rows you want:
SELECT
SPLIT_PART(s.value, ':', 2) AS val,
FROM table t
,LATERAL SPLIT_TO_TABLE(TRIM(t.value,'{}'), ',') s
QUALIFTY SPLIT_PART(s.value, ':', 1) = 'id'

How to get the nth match from regexp_matches() as plain text

I have this code:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)') from demo
And the table I get is:
regexp_matches
{hello}
{hi}
What I would like to do is:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[1] from demo
Or even the big query version:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[offset(1)] from demo
But neither works. Is this possible? If it isn't clear, the result I would like is:
match
hello
hi
Use split_part() instead. Simpler, faster. To get the first word, before the first separator .:
WITH demo(web) AS (
VALUES
('WWW.HELLO.COM')
, ('hi.co.uk')
)
SELECT split_part(replace(lower(web), 'www.', ''), '.', 1)
FROM demo;
db<>fiddle here
See:
Split comma separated column data into additional columns
regexp_matches() returns setof text[], i.e. 0-n rows of text arrays. (Because each regular expression can result in a set of multiple matching strings.)
In Postgres 10 or later, there is also the simpler variant regexp_match() that only returns the first match, i.e. text[]. Either way, the surrounding curly braces in your result are the text representation of the array literal.
You can take the first row and unnest the first element of the array, but since you neither want the set nor the array to begin with, use split_part() instead. Simpler, faster, and less versatile. But good enough for the purpose. And it returns exactly what you want to begin with: text.
I'm a little confused. Doesn't this do what you want?
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web
)
select (regexp_matches(replace(lower(web), 'www.',''), '([^\.]*)'))[1]
from demo
This is basically your query with extra parentheses so it does not generate a syntax error.
Here is a db<>fiddle illustrating that it returns what you want.

Get everything between two strings using regexp_substr

I would like to write a query that gets everything between two strings
So for example getting everything between utm_source and the '&' sign. This is what I have tried:
select regexp_substr(full_utm,'%utm_source%','%&%') from db
However this is invalid syntax
Here is a sample of what I am trying to extract
?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice
I have also tried this
regexp_substr(full_utm, 'utm_source=(.*)&',1)
but this returns this:
utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&
I've also using split_part:
select split_part(split_part(full_utm,'%utm_source=%',1),'&',1)
The problem is this returns both sources and campaign (e.g utm_campaign=xyz)
You can use regexp_replace() instead:
select regexp_replace(full_utm, '.*utm_source=(.*)&.*', '\1')
from (select '?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice' as full_utm from dual) x;