split rows into table with different columns - sql

Hello I have a table with an array of strings like:
----------------------------------------------------------
"Continent": "Europe", "Nation": "Italy", "City": "Rome"
"Continent": "Asia", "Nation": "China", "City": "Beijing"
"Continent": "Europe", "Nation": "France", "City": "Paris"
"Continent": "Africa", "Nation": "Tunisia", "City": "Tunis"
-----------------------------------------------------------
And I would like to sort this out like:
ID | CONTINENT | NATION | CITY
-----------------------------------------------------
1 | Africa | Tunisia| Tunis
2 | Europe | Italy | Rome
3 | Europe | France | Paris
4 | Asia | China | Beijing
How can I do that in POSTGRESQL?

Your string values are close to being JSONs. Cast it to json by enclosing them with {} and simply use the ->> operator to extract individual elements as columns.
with js as
(
select ('{'||str||'}')::json as j from t
) select j->>'Continent' as Continent,
j->>'Nation' as Nation,
j->>'City' as City FROM js;
Demo

This is much less elegant than the accepted answer, but in the interest of sharing alternatives:
select
row_number() over (partition by 1) as id,
substring ((regexp_split_to_array (rowdata, ','))[1] from ': "(.+)"$') as Continent,
substring ((regexp_split_to_array (rowdata, ','))[2] from ': "(.+)"$') as Nation,
substring ((regexp_split_to_array (rowdata, ','))[3] from ': "(.+)"$') as City
from t1

Related

What the best way to store the format of a complex string, in a format table?

I am currently working on the design of a database that will store each component of an address in separate fields (street number, street name, city…)
I would like to have a tb_address_format table, that stores for each country the order in which the elements should be arranged to output a valid postal address.
I am thinking something like:
+──────────+───────────────────────────────────────────────────────────────+
| country | address_format |
+──────────+───────────────────────────────────────────────────────────────+
| us | FORMAT('%s\n%s %s\t%s', street, city, province, postal_code) |
| de | FORMAT('%s\n%s %s', street, postal_code, city) |
| | |
| | |
+──────────+───────────────────────────────────────────────────────────────+
Asking this question makes me realise I can store these two informations in two different fields:
the format string
the list of fields
like so:
+──────────+──────────────────+──────────────────────────────────────+
| country | format_string | list_of_fields |
+──────────+──────────────────+──────────────────────────────────────+
| us | '%s\n%s %s\t%s' | street, city, province, postal_code |
| de | '%s\n%s %s' | street, postal_code, city |
+──────────+──────────────────+──────────────────────────────────────+
What else can I do to improve this design and its security?
EDIT 1: I am trying to evaluate the result directly in a calculated field, in SQL. That would mean also using some extended constant like in this question, on how to transform \n into a new line.
EDIT 2: street, city, province, postal_code are indeed table columns. I was first thinking about storing them as a VARCHAR and then evaluating the result.
Following the hint from #Bergi in the comment below about using pg_catalog, is it possible / desirable to store an array where each item references the pg_catalog.pg_attribute table?
EDIT 3: some code to play with
CREATE TABLE tb_address(
city varchar,
street varchar,
postal_code varchar,
county varchar,
state varchar,
country varchar);
CREATE TABLE tb_address_format(
country varchar,
address_format varchar,
list_of_fields text ARRAY);
INSERT INTO tb_address(
street, city, county, state, postal_code, country
)
VALUES
('150 5th Ave', 'New York', NULL, 'NY', '10011', 'USA'),
('Holzgasse 14', 'Köln', NULL, NULL, '50676', 'Germany');
INSERT INTO tb_address_format(
country, address_format, list_of_fields
)
VALUES
('USA', '%s\n%s %s %s', '{"street", "city", "state", "postal_code"}'),
('Germany', '%s\n%s %s\n\n%s', '{"street", "postal_code", "city", "country"}');
Expected result: the calc_formatted column is a SQL calculated column.
+--------------+----------+-------------+-------+---------+-------------------------------------+
| street | city | postal_code | state | country | calc_formatted |
+--------------+----------+-------------+-------+---------+-------------------------------------+
| 150 5th Ave | New York | 10011 | NY | USA | 150 5th Ave\nNew York NY 10011\nUSA |
+--------------+----------+-------------+-------+---------+-------------------------------------+
| Holzgasse 14 | Köln | 50676 | | Germany | Holzgasse 14\n50676 Köln\nGermany |
+--------------+----------+-------------+-------+---------+-------------------------------------+
list_of_fields - JSONB (indexes, functions, readable, etc)
To ensure consistency use one of JSON schema implementations:
https://github.com/supabase/pg_jsonschema#prior-art
Blog post
UPD
DB level
App level
>>> import json_str
>>> json_str = '{"city": "par", "street": "str", "postal_code": 123}'
>>> address = json.loads(json_str)
>>> print(address)
{'city': 'par', 'street': 'str', 'postal_code': 123}
>>> print(address['city'])
par
UPD 1: CSV
DB:
+---------+-------------------------+-----------------------+
| country | headers | values |
+---------+-------------------------+-----------------------+
| us | city,street,postal_code | cty,"some street",123 |
+---------+-------------------------+-----------------------+
Python
import csv
headers = next(csv.reader(['city,street,postal_code']))
values = next(csv.reader(['cty,"some street",123']))
address = dict(zip(headers, values))
print(address)
# {'city': 'cty', 'street': 'some street', 'postal_code': '123'}

Split JSON into column in postgressql

I have this simple table
create table orders
(
order_id int,
address json
);
insert into orders( order_id, address)
values
(1, '[{"purchase": "online", "address": {"state": "LA", "city": "Los Angeles", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]'),
(2, '[{"purchase": "instore", "address": {"state": "WA", "city": "Seattle", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]'),
(3, '[{"purchase": "online", "address": {"state": "GA", "city": "Atlanta", "order_create_date": "2022-12-01T14:37:48.222-06:00"}}]');
so far I was able to split purchase and address into two column
select
order_id,
(address::jsonb)->0->>'purchase' as purchasetype,
(address::jsonb)->0->>'address' as address
from orders;
1 online {"city": "Los Angeles", "state": "LA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
2 instore {"city": "Seattle", "state": "WA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
3 online {"city": "Atlanta", "state": "GA", "order_create_date": "2022-12-01T14:37:48.222-06:00"}
but I was wondering if anyone can help with how I can also split the address into 3 column(state, city, order_created_date)
I tried to subquery but won't work
I would like to see something like this
1 | online | Los Angeles | LA | 2022-12-01T14:37:48.222-06:00
2 | instore | Seattl | WA | 2022-12-01T14:37:48.222-06:00
3 | online | Atlanta | GA | 2022-12-01T14:37:48.222-06:00
Try this :
SELECT o.order_id
, a->>'purchase' AS purchasetype
, a->'address'->>'state' AS state
, a->'address'->>'city' AS city
, a->'address'->>'order_create_date' AS order_created_date
FROM orders o
CROSS JOIN LATERAL jsonb_path_query(o.address, '$[*]') a
Result :
order_id
purchasetype
state
city
order_created_date
1
online
LA
Los Angeles
2022-12-01T14:37:48.222-06:00
2
instore
WA
Seattle
2022-12-01T14:37:48.222-06:00
3
online
GA
Atlanta
2022-12-01T14:37:48.222-06:00
see dbfiddle

How to only get the rows in which every value is found from another table?

I have the following table schema:
Person:
Name | Year | Sports
Hans | 23 | Football
Hans | 23 | Baseball
Hans | 23 | Badminton
Albert | 25 | Baseball
Albert | 25 | Badminton
Sports:
Name | Tempo | Amount
Football | Fast | 5
Baseball | Slow | 3
Badminton | Fast | 4
Speed:
Name | Star
Fast | Good
Slow | Bad
The question I am trying to solve is: Which Sports are used by every person and also has the star value good?
The result I want:
Albert | 25 | Badminton
My question would be: How can I realize this with a select statement? My current solution is:
SELECT * FROM speed JOIN
(SELECT * FROM person JOIN sports USING (name)) USING (name) WHERE STAR = 'good'
I don't know how to filter this more.
Alternative Tables
Country:
Name | Capital
USA | Washington
Germany | Berlin
France | Paris
Poland | Warsaw
Sports
Country | Sport
Germany | Football
Belgium | Baseball
Belgium | Football
France | Football
Poland | Baseball
Poland | Football
Region
Country | Area
Germany | Europe
Belgium | Europe
France | Europe
Poland | Europe
New Question: Which sport is played by every European country?
Output: Football, because it is played by germany, france, belgium and poland
This is a classic problem solved using the DIVISION relational algebra operation:
SP - "Good" sports used by persons
P - All persons
SP DIVIDE P = S
----------------- ------ ---------
Name Sports Name Sports
----------------- ------ ---------
Hans Football Hans Badminton
Hans Badminton Albert
Albert Badminton
Examples of explaining and expressing this statement in SQL can be found here:
Examples of DIVISION – RELATIONAL ALGEBRA and SQL
How to implement relational equivalent of the DIVIDE operation in SQL Server
An example of using the COUNT function to solve this problem:
SELECT p.sports
FROM person p
JOIN sports st ON st.name = p.sports
JOIN speed sd ON tempo = sd.name AND star = 'Good'
GROUP BY p.sports
HAVING COUNT(*) = (SELECT COUNT(DISTINCT name) FROM person)
Update for countries:
SELECT s.sport
FROM sports s
JOIN region r ON s.country = r.country AND r.area = 'Europe'
GROUP BY s.sport
HAVING COUNT(*) = (SELECT COUNT(*) FROM region WHERE area = 'Europe')
All your joins are done using the name column, but as you can see, in each table the name column means something different. You have to use the on clause and specify which columns must match.
with person(name, year, sports) as (
select 'Hans', 23, 'Football' from dual union all
select 'Hans', 23, 'Baseball' from dual union all
select 'Hans', 23, 'Badminton' from dual union all
select 'Albert', 25, 'Baseball' from dual union all
select 'Albert', 25, 'Badminton' from dual
), sports(name, tempo, amount) as (
select 'Football', 'Fast', 5 from dual union all
select 'Baseball', 'Slow', 3 from dual union all
select 'Badminton', 'Fast', 4 from dual
), speed(name, star) as (
select 'Fast', 'Good' from dual union all
select 'Slow', 'Bad' from dual
)
select person.*
from person
join sports on person.sports = sports.name
join speed on sports.tempo = speed.name
where speed.star = 'Good'
SQL fiddle

How to subtract a column of string based on other columns in Hive?

With this table, I am trying to remove parts of address that happened to appear in zip_code and city.
+----------------------------------------------+----------+------------+
| address | zip_code | city |
+----------------------------------------------+----------+------------+
| Oceans Group, 12 Pear Tree Road, Derby | DE23 6PY | Derby |
| 970 Stockport Road | M19 3NN | Manchester |
| Cartridge World Guiseley | | Edinburgh |
| 33-41 Kelvin Avenue | G52 4LT | Glasgow |
| Cartridge World Haymarket, 54 Dalry Road, UK | EH5 1HX | Edinburgh |
| 50 Otley Road, Leeds, LS20 8AH, UK | LS20 8AH | |
+----------------------------------------------+----------+------------+
something like
SUBSTR('Oceans Group, 12 Pear Tree Road, Derby', 'DE23 6PY', 'Derby') returns 'Oceans Group, 12 Pear Tree Road, '
SUBSTR('50 Otley Road, Leeds, LS20 8AH, UK', 'LS20 8AH', '') returns '50 Otley Road, Leeds, , UK'
Hope this piece of code save you some time.
CREATE TABLE address_table(
address STRING
, zip_code STRING
, city STRING
);
INSERT INTO address_table VALUES ("Oceans Group, 12 Pear Tree Road, Derby", "DE23 6PY", "Derby");
INSERT INTO address_table VALUES ("970 Stockport Road", "M19 3NN", "Manchester");
INSERT INTO address_table VALUES ("Cartridge World Guiseley", "", "Edinburgh");
INSERT INTO address_table VALUES ("33-41 Kelvin Avenue", "G52 4LT", "Glasgow");
INSERT INTO address_table VALUES ("Cartridge World Haymarket, 54 Dalry Road, UK", "EH5 1HX", "Edinburgh");
INSERT INTO address_table VALUES ("50 Otley Road, Leeds, LS20 8AH, UK", "LS20 8AH", "");
Hive does not have a regular string replace function, but you can use regexp_replace() instead:
select
a.*,
regexp_replace(address, zip_code, '') new_address
from address_table
If you want an update statement:
update address_table
set address = regexp_replace(address, zip_code, '')

SQL Server Full-Text search - get total field value

Let's say I have the following table "Addresses":
+----+-------------+---------------+------------------+
| ID | CompanyName | Street | City |
+----+-------------+---------------+------------------+
| 1 | Salvador | Hollywood 123 | Paradise City |
| 2 | Zer0 | Avenue 34 | Opportunity City |
+----+-------------+---------------+------------------+
If I make a full-text search like:
SELECT * FROM Addresses WHERE CONTAINS(*, 'Salv')
Is it possible to get back
the name of the column, which contains the founded value (in this example it would be "CompanyName")
the full value of the column, which contains the founded value (in this example it would be "Salvador"
I can suggest this:
SELECT
*,
CASE WHEN CONTAINS(CompanyName, 'Salv') THEN 'CompanyName'
WHEN CONTAINS(Street, 'Salv') THEN 'Street'
WHEN CONTAINS(City, 'Salv') THEN 'City'
END As ColumnName,
CASE WHEN CONTAINS(CompanyName, 'Salv') THEN CompanyName
WHEN CONTAINS(Street, 'Salv') THEN Street
WHEN CONTAINS(City, 'Salv') THEN City
END As FullText
FROM Addresses
WHERE CONTAINS(*, 'Salv')