Split JSON into column in postgressql

Split JSON into column in postgressql - sql

I have this simple table
create table orders
(
order_id int,
address json
);
insert into orders( order_id, address)
values
(1, '[{"purchase": "online", "address": {"state": "LA", "city": "Los Angeles", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]'),
(2, '[{"purchase": "instore", "address": {"state": "WA", "city": "Seattle", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]'),
(3, '[{"purchase": "online", "address": {"state": "GA", "city": "Atlanta", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]');
so far I was able to split purchase and address into two column
select
order_id,
(address::jsonb)->0->>'purchase' as purchasetype,
(address::jsonb)->0->>'address' as address
from orders;
1 online {"city": "Los Angeles", "state": "LA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
2 instore {"city": "Seattle", "state": "WA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
3 online {"city": "Atlanta", "state": "GA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
but I was wondering if anyone can help with how I can also split the address into 3 column(state, city, order_created_date)
I tried to subquery but won't work
I would like to see something like this
1 | online | Los Angeles | LA | 2022-12-01T14:37:48.222-06:00
2 | instore | Seattl | WA | 2022-12-01T14:37:48.222-06:00
3 | online | Atlanta | GA | 2022-12-01T14:37:48.222-06:00

Try this :
SELECT o.order_id
, a->>'purchase' AS purchasetype
, a->'address'->>'state' AS state
, a->'address'->>'city' AS city
, a->'address'->>'order_create_date' AS order_created_date
FROM orders o
CROSS JOIN LATERAL jsonb_path_query(o.address, '$[*]') a
Result :
order_id
purchasetype
state
city
order_created_date
1
online
LA
Los Angeles
2022-12-01T14:37:48.222-06:00
2
instore
WA
Seattle
2022-12-01T14:37:48.222-06:00
3
online
GA
Atlanta
2022-12-01T14:37:48.222-06:00
see dbfiddle

Related

What the best way to store the format of a complex string, in a format table?

I am currently working on the design of a database that will store each component of an address in separate fields (street number, street name, city…)
I would like to have a tb_address_format table, that stores for each country the order in which the elements should be arranged to output a valid postal address.
I am thinking something like:
+──────────+───────────────────────────────────────────────────────────────+
| country | address_format |
+──────────+───────────────────────────────────────────────────────────────+
| us | FORMAT('%s\n%s %s\t%s', street, city, province, postal_code) |
| de | FORMAT('%s\n%s %s', street, postal_code, city) |
| | |
| | |
+──────────+───────────────────────────────────────────────────────────────+
Asking this question makes me realise I can store these two informations in two different fields:
the format string
the list of fields
like so:
+──────────+──────────────────+──────────────────────────────────────+
| country | format_string | list_of_fields |
+──────────+──────────────────+──────────────────────────────────────+
| us | '%s\n%s %s\t%s' | street, city, province, postal_code |
| de | '%s\n%s %s' | street, postal_code, city |
+──────────+──────────────────+──────────────────────────────────────+
What else can I do to improve this design and its security?
EDIT 1: I am trying to evaluate the result directly in a calculated field, in SQL. That would mean also using some extended constant like in this question, on how to transform \n into a new line.
EDIT 2: street, city, province, postal_code are indeed table columns. I was first thinking about storing them as a VARCHAR and then evaluating the result.
Following the hint from #Bergi in the comment below about using pg_catalog, is it possible / desirable to store an array where each item references the pg_catalog.pg_attribute table?
EDIT 3: some code to play with
CREATE TABLE tb_address(
city varchar,
street varchar,
postal_code varchar,
county varchar,
state varchar,
country varchar);
CREATE TABLE tb_address_format(
country varchar,
address_format varchar,
list_of_fields text ARRAY);
INSERT INTO tb_address(
street, city, county, state, postal_code, country
)
VALUES
('150 5th Ave', 'New York', NULL, 'NY', '10011', 'USA'),
('Holzgasse 14', 'Köln', NULL, NULL, '50676', 'Germany');
INSERT INTO tb_address_format(
country, address_format, list_of_fields
)
VALUES
('USA', '%s\n%s %s %s', '{"street", "city", "state", "postal_code"}'),
('Germany', '%s\n%s %s\n\n%s', '{"street", "postal_code", "city", "country"}');
Expected result: the calc_formatted column is a SQL calculated column.
+--------------+----------+-------------+-------+---------+-------------------------------------+
| street | city | postal_code | state | country | calc_formatted |
+--------------+----------+-------------+-------+---------+-------------------------------------+
| 150 5th Ave | New York | 10011 | NY | USA | 150 5th Ave\nNew York NY 10011\nUSA |
+--------------+----------+-------------+-------+---------+-------------------------------------+
| Holzgasse 14 | Köln | 50676 | | Germany | Holzgasse 14\n50676 Köln\nGermany |
+--------------+----------+-------------+-------+---------+-------------------------------------+

list_of_fields - JSONB (indexes, functions, readable, etc)
To ensure consistency use one of JSON schema implementations:
https://github.com/supabase/pg_jsonschema#prior-art
Blog post
UPD
DB level
App level
>>> import json_str
>>> json_str = '{"city": "par", "street": "str", "postal_code": 123}'
>>> address = json.loads(json_str)
>>> print(address)
{'city': 'par', 'street': 'str', 'postal_code': 123}
>>> print(address['city'])
par
UPD 1: CSV
DB:
+---------+-------------------------+-----------------------+
| country | headers | values |
+---------+-------------------------+-----------------------+
| us | city,street,postal_code | cty,"some street",123 |
+---------+-------------------------+-----------------------+
Python
import csv
headers = next(csv.reader(['city,street,postal_code']))
values = next(csv.reader(['cty,"some street",123']))
address = dict(zip(headers, values))
print(address)
# {'city': 'cty', 'street': 'some street', 'postal_code': '123'}

Unnest array / JSON in Bigquery - Get value from key

I have an array like this
[{"name": "Nome da Empresa", "value": "Land ", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Nome do Representante", "value": "Thomas GT", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Email Representante", "value": "p#xyz.com", "updated_at": "2022-09-02T22:30:58Z"}, {"name": "Qual o plano do cliente?", "value": "Go", "updated_at": "2022-09-02T22:31:12Z"},{"name": "Forma de pagamento", "value": "Internet Banking", "updated_at": "2022-09-16T14:09:53Z"}, {"name": "Valor total da guia", "value": "227,63", "updated_at": "2022-09-16T14:09:59Z"}]
I'm trying to get values from some "fields" like Nome da Empresa or Email Representante.
I've already tried use json_extract_scalar or unnest. With json_extract_scalar returns column with no values (blank) and with unnest returns error Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRING
Query 1:
select
id,
fields,
json_extract_scalar(fields,'$.Email Representante') as categorias,
json_value(fields,'$.Nome da Empresa') as teste
from mytable
Query 2:
SELECT
id,
fields
from pipefy.cards_startup_pack, UNNEST(fields)
Any ideas? Thanks a lot!

You may try below query.
Query 1:
SELECT (SELECT JSON_VALUE(f, '$.value')
FROM UNNEST(JSON_QUERY_ARRAY(t.fields)) f
WHERE JSON_VALUE(f, '$.name') = 'Nome da Empresa'
) AS teste,
(SELECT JSON_VALUE(f, '$.value')
FROM UNNEST(JSON_QUERY_ARRAY(t.fields)) f
WHERE JSON_VALUE(f, '$.name') = 'Email Representante'
) AS categorias,
FROM mytable t;
# Query results
+-------+------------+
| teste | categorias |
+-------+------------+
| Land | p#xyz.com |
+-------+------------+
Query 2:
SELECT JSON_VALUE(f, '$.name') name, JSON_VALUE(f, '$.value') value
FROM mytable, UNNEST(JSON_QUERY_ARRAY(fields)) f;
# Query results
+--------------------------+------------------+
| name | value |
+--------------------------+------------------+
| Nome da Empresa | Land |
| Nome do Representante | Thomas GT |
| Email Representante | p#xyz.com |
| Qual o plano do cliente? | Go |
| Forma de pagamento | Internet Banking |
| Valor total da guia | "227,63" |
+--------------------------+------------------+

how to use wildcard for a column jsonb type

--------------------------------------------------------------
ID | Day | ftfm_profile
--------------------------------------------------------------
23 | 22/10/2020 | {"name": ["EGMHTMA", "EGMHCR", "EDYYFOX2", "EGTTFIR", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
24 | 22/10/2020 | {"name": ["LFBBRL1", "LFBMC2", "LFBBTT6", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
25 | 22/10/2020 | {"name": ["LFBGTH4", "LFBMC2", "LFFFE7", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
I have a table (named profile) in my Postgres database, which has 3 columns: ID, Day, Ftfm_profile of type jsonb, I tried to extract the row where the profile name (ftfm_profile->'name') begins with 'LFBB' ( sql: LFBB%) using the wildcard as following:
select * from public.profile where ftfm_profile ->'name' ? 'LFBB%'
the expected result:
-------------------------------------------------------------------------------------------------
24 | 22/10/2020 | {"name": ["LFBBRL1", "LFBMC2", "LFBBTT6", "LFTTN8", "EGTTEI"],"type": ["AUA", "ES", "ES", "FIR"]}
-------------------------------------------------------------------------------------------------
I can't seem to find the solution, thanks for your help

One option unnests the json arrray in an exists subquery:
select *
from public.profile
where exists (
select 1
from jsonb_array_elements_text(ftfm_profile ->'name') as e(name)
where e.name like 'LFBB%'
)

How to subtract a column of string based on other columns in Hive?

With this table, I am trying to remove parts of address that happened to appear in zip_code and city.
+----------------------------------------------+----------+------------+
| address | zip_code | city |
+----------------------------------------------+----------+------------+
| Oceans Group, 12 Pear Tree Road, Derby | DE23 6PY | Derby |
| 970 Stockport Road | M19 3NN | Manchester |
| Cartridge World Guiseley | | Edinburgh |
| 33-41 Kelvin Avenue | G52 4LT | Glasgow |
| Cartridge World Haymarket, 54 Dalry Road, UK | EH5 1HX | Edinburgh |
| 50 Otley Road, Leeds, LS20 8AH, UK | LS20 8AH | |
+----------------------------------------------+----------+------------+
something like
SUBSTR('Oceans Group, 12 Pear Tree Road, Derby', 'DE23 6PY', 'Derby') returns 'Oceans Group, 12 Pear Tree Road, '
SUBSTR('50 Otley Road, Leeds, LS20 8AH, UK', 'LS20 8AH', '') returns '50 Otley Road, Leeds, , UK'
Hope this piece of code save you some time.
CREATE TABLE address_table(
address STRING
, zip_code STRING
, city STRING
);
INSERT INTO address_table VALUES ("Oceans Group, 12 Pear Tree Road, Derby", "DE23 6PY", "Derby");
INSERT INTO address_table VALUES ("970 Stockport Road", "M19 3NN", "Manchester");
INSERT INTO address_table VALUES ("Cartridge World Guiseley", "", "Edinburgh");
INSERT INTO address_table VALUES ("33-41 Kelvin Avenue", "G52 4LT", "Glasgow");
INSERT INTO address_table VALUES ("Cartridge World Haymarket, 54 Dalry Road, UK", "EH5 1HX", "Edinburgh");
INSERT INTO address_table VALUES ("50 Otley Road, Leeds, LS20 8AH, UK", "LS20 8AH", "");

Hive does not have a regular string replace function, but you can use regexp_replace() instead:
select
a.*,
regexp_replace(address, zip_code, '') new_address
from address_table
If you want an update statement:
update address_table
set address = regexp_replace(address, zip_code, '')

split rows into table with different columns

Hello I have a table with an array of strings like:
----------------------------------------------------------
"Continent": "Europe", "Nation": "Italy", "City": "Rome"
"Continent": "Asia", "Nation": "China", "City": "Beijing"
"Continent": "Europe", "Nation": "France", "City": "Paris"
"Continent": "Africa", "Nation": "Tunisia", "City": "Tunis"
-----------------------------------------------------------
And I would like to sort this out like:
ID | CONTINENT | NATION | CITY
-----------------------------------------------------
1 | Africa | Tunisia| Tunis
2 | Europe | Italy | Rome
3 | Europe | France | Paris
4 | Asia | China | Beijing
How can I do that in POSTGRESQL?

Your string values are close to being JSONs. Cast it to json by enclosing them with {} and simply use the ->> operator to extract individual elements as columns.
with js as
(
select ('{'||str||'}')::json as j from t
) select j->>'Continent' as Continent,
j->>'Nation' as Nation,
j->>'City' as City FROM js;
Demo

This is much less elegant than the accepted answer, but in the interest of sharing alternatives:
select
row_number() over (partition by 1) as id,
substring ((regexp_split_to_array (rowdata, ','))[1] from ': "(.+)"$') as Continent,
substring ((regexp_split_to_array (rowdata, ','))[2] from ': "(.+)"$') as Nation,
substring ((regexp_split_to_array (rowdata, ','))[3] from ': "(.+)"$') as City
from t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split JSON into column in postgressql - sql

Related

What the best way to store the format of a complex string, in a format table?

Unnest array / JSON in Bigquery - Get value from key

how to use wildcard for a column jsonb type

How to subtract a column of string based on other columns in Hive?

split rows into table with different columns

Categories

Resources