postgresql: automated extracting strings from text - sql

I've the following table in a postgresl database
id | species
----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 |[{"id":1,"animalName":"Lupo appennico","animalCode":"LUPO"},{"id":2,"animalName":"Orso bruno marsicano","animalCode":"ORSO"},{"id":3,"animalName":"Volpe","animalCode":"VOLPE"}]
----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 |[{"id":1,"animalName":"Cinghiale","animalCode":"CINGHIALE"},{"id":2,"animalName":"Orso bruno marsicano","animalCode":"ORSO"},{"id":3,"animalName":"Cervo","animalCode":"CERVO"}]|
I would like to extract only values after '"animalName":' and put them in a new field.
id | new_field |
----+--------------------------------------------+
1 |Lupo appennico, Orso bruno marsicano,Volpe |
----+--------------------------------------------+
2 |Cinghiale, Orso bruno marsicano, Cervo |
Unfortunately the field is a text type (not json or array). I've tried with regexp without success.

Your column is not of a json datatype, but it seems to contain valid json. If so, you can cast it and use json functions on it:
select id, string_agg(j ->> 'animalName', ', ') new_field
from mytable t
cross join lateral jsonb_array_elements(t.species::jsonb) j(obj)
group by id
order by id
Demo on DB Fiddle:
id | new_field
-: | :------------------------------------------
1 | Lupo appennico, Orso bruno marsicano, Volpe
2 | Cinghiale, Orso bruno marsicano, Cervo

Related

extract json in column

I have a table
id | status | outgoing
-------------------------
1 | paid | {"a945248027_14454878":"processing","old.a945248027_14454878":"cancelled"}
2 | pending| {"069e5248cf_45299995":"processing"}
I am trying to extract the values after each underscore in the outgoing column e.g from a945248027_14454878 I want 14454878
Because the json data is not standardised I can't seem to figure it out.
You may extract the json key part after the underscore using regexp version of substring.
select id, status, outgoing,
substring(key from '_([^_]+)$') as key
from the_table, lateral jsonb_object_keys(outgoing) as j(key);
See demo.

Return only ALL CAPS strings in BigQuery

Pretty simple question, specific to BigQuery. I'm sure there's a command I'm missing. I'm used to using "collate" in another query which doesn't work here.
email
| -------- |
| eric#email.com |
| JOHN#EMAIL.COM |
| STACY#EMAIL.COM |
| tanya#email.com |
Desired return:
JOHN#EMAIL.COM,STACY#EMAIL.COM
Consider below
select *
from your_table
where upper(email) = email
If applied to sample data in your question - output is
In case you want the output as a comma separated list - use below
select string_agg(email) emails
from your_table
where upper(email) = email
with output
You can use below cte (which is exact data sample from your question) for testing purposes
with your_table as (
select 'eric#email.com' email union all
select 'JOHN#EMAIL.COM' union all
select 'STACY#EMAIL.COM' union all
select 'tanya#email.com'
)

SQL LIKE using the same row value

I'm wondering how can I use a row value as a variable for my like statement? For example
ID | PID | DESCRIPTION
1 | 4124 | Hi4124
2 | 2451 | Test
3 | 1467 | Hello
4 | 9642 | Me9642
I have a table above, I want to return IDs 1 and 4 since DESCRIPTION contains PID.
I'm thinking it would be SELECT * from TABLE WHERE DESCRIPTION LIKE '%PID%' but I can't get it.
You can use CONCAT() to assemble the matching pattern, as in:
select *
from t
where description like concat('%', PID, '%')
We could also try using CHARINDEX here:
SELECT ID, PID, DESCRIPTION
FROM yourTable
WHERE CHARINDEX(PID, DESCRIPTION) > 0;
Demo
Note that I assume in the demo that the PID column is actually text, and not a numeric column. If PID be numeric, we might have to first use a cast in order to use CHARINDEX (or any of the methods given in the other answers).
Use the CONCAT SQL function
SELECT *
FROM TABLE
WHERE DESCRIPTION LIKE CONCAT('%', PID, '%')

How to write SQL queries with respect to the following conditions?

I have a database table in which a column tags contain values such as:
"AutoMNRP, MNRP"
"Macro, MNRP"
"AutoMNRP, Micro"
"Macro, Micro"
where "...." represents a string.
I want to write a SQL query such that it filters out all results having MNRP tag in it. How can I do this?
I tried a not like operator of SQL on it, but if I want to remove MNRP tag, it also filters out AutoMNRP tag.
At the last of query I need results featuring -
"AutoMNRP, Micro"
"Macro, Micro".
(Results when MNRP is filtered out.)
The right answer to this is to fix your design, you shouldn't store the data like this (comma separated), because your table should be like (and the duplicates should be removed and handled too)
+----------+
| Data |
+----------+
| AutoMNRP |
| MNRP |
| Macro |
| MNRP |
| AutoMNRP |
| Micro |
| Macro |
| Micro |
+----------+
But... here is a way it may fit you requirements
;WITH T(Str) AS
(
SELECT 'AutoMNRP, MNRP' UNION ALL
SELECT 'Macro, MNRP' UNION ALL
SELECT 'AutoMNRP, Micro' UNION ALL
SELECT 'Macro, Micro'
)
SELECT Str
FROM T
WHERE Str NOT LIKE '% MNRP,%'
AND
Str NOT LIKE '%, MNRP';
Returns:
+-----------------+
| Str |
+-----------------+
| AutoMNRP, Micro |
| Macro, Micro |
+-----------------+
Live Demo
You also (as Larnu point to) do as
;WITH T(Str) AS
(
SELECT 'AutoMNRP, MNRP' UNION ALL
SELECT 'Macro, MNRP' UNION ALL
SELECT 'AutoMNRP, Micro' UNION ALL
SELECT 'Macro, Micro'
)
SELECT Str
FROM T
WHERE CONCAT(', ', Str, ',') NOT LIKE '%, MNRP,';
In SQL Server 2016+ you can use the STRING_SPLIT function. So you can multiply a record by the number of separated values in the tags column so you can then apply a simple WHERE clause. Something like this:
WITH cte AS
(
SELECT Id, SingleTag
FROM table_name CROSS APPLY STRING_SPLIT(tags, ',')
)
SELECT * FROM cte WHERE SingleTag = 'MNRP'

extracting a substring from a text column in hive

We have text data in a column named title like below
"id":"S-1-98-13474422323-33566802","name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0","displayName":"Jund Lee","emailAddress":"jund.lee#bm.com","title":"Leading Product Investor"
Need to extract just the display name (Jund lee in this example) from the above text data in hive, I have tried using substring function but don't seem to work,Please help
Use regexp_extract function with the matching regex to capture only the displayName from your title field value.
Ex:
hive> with tb as(select string('"id":"S-1-98-13474422323-33566802",
"name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0",
"displayName":"Jund Lee","emailAddress":"jund.lee#bm.com",
"title":"Leading Product Investor"')title)
select regexp_extract(title,'"displayName":"(.*?)"',1) title from tb;
+-----------+--+
| title |
+-----------+--+
| Jund Lee |
+-----------+--+