Convert a json which is a list of dictionaries into column/row format in Postgresql

Convert a json which is a list of dictionaries into column/row format in Postgresql - sql

I´ve a json which is a list of dictionaries with the next syntax:
[
{
"Date_and_Time": "Dec 29, 2017 15:35:37",
"Componente": "Bar",
"IP_Origen": "175.11.13.6",
"IP_Destino": "81.18.119.864",
"Country": "Brazil",
"Age": "3"
},
{
"Date_and_Time": "Dec 31, 2017 17:35:37",
"Componente": "Foo",
"IP_Origen": "176.11.13.6",
"IP_Destino": "80.18.119.864",
"Country": "France",
'Id': '123456',
'Car': 'Ferrari'
},
{
"Date_and_Time": "Dec 31, 2017 17:35:37",
"Age": "1",
"Country": "France",
'Id': '123456',
'Car': 'Ferrari'
},
{
"Date_and_Time": "Mar 31, 2018 14:35:37",
"Componente": "Foo",
"Country": "Germany",
'Id': '2468',
'Genre': 'Male'
}
]
The json is really big and each dictionary have different amount of key/values fields. And what I want to do is to create a table in postgresSQL where the key represents a column and the value a row. In the example explained above I would like table like this:
Date_and_Time | Componente | IP_Origen | IP_Destino | Country| Id | Car | Age| Genre
Dec 29, 2017 15:35:37 | Bar | 175.11.13.6 | 81.18.119.864 | Brazil | - | - | 3 | -
Dec 31, 2017 17:35:37 | Foo | 176.11.13.6 | 80.18.119.864 | France |123456 |Ferrari | - | -
Dec 31, 2017 17:35:37 | - | - | - | France |123456 |Ferrari | 1 | -
Mar 31, 2018 14:35:37 | Foo | - | - | Germany| 2468 | - | - | Male
The only solution I can think is putting the values one by one but this is no efficient at all

You can use jsonb_to_recordset to create record set out of your json and then use insert into to insert the records.
insert into table
select * from jsonb_to_recordset('<your json>'::jsonb)
as rec(Date_and_Time datetime, Componente text, IP_Origen text) --Specify all columns inside the table
Sample DBFiddle

Related

Scala Unpivot Table

SCALA
I have a table with this struct:
FName
SName
Email
Jan 2021
Feb 2021
Mar 2021
Total 2021
Micheal
Scott
scarrel#gmail.com
4000
5000
3400
50660
Dwight
Schrute
dschrute#gmail.com
1200
6900
1000
35000
Kevin
Malone
kmalone#gmail.com
9000
6000
18000
32000
And i want to transform it to:
I tried with 'stack' method but i couldn't get it to work.
Thanks

You can flatten the monthly/total columns via explode as shown below:
val df = Seq(
("Micheal", "Scott", "scarrel#gmail.com", 4000, 5000, 3400, 50660),
("Dwight", "Schrute", "dschrute#gmail.com", 1200, 6900, 1000, 35000),
("Kevin", "Malone", "kmalone#gmail.com", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")
val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021") // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
val moYr = split(lit(c), "\\s+")
struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
}
df.
withColumn("flattened", explode(array(structCols: _*))).
select(otherCols.map(col) :+ $"flattened.*": _*).
show
/*
+-------+-------+------------------+----+-----+-----+
| FName| SName| Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal| Scott| scarrel#gmail.com|2021| Jan| 4000|
|Micheal| Scott| scarrel#gmail.com|2021| Feb| 5000|
|Micheal| Scott| scarrel#gmail.com|2021| Mar| 3400|
|Micheal| Scott| scarrel#gmail.com|2021|Total|50660|
| Dwight|Schrute|dschrute#gmail.com|2021| Jan| 1200|
| Dwight|Schrute|dschrute#gmail.com|2021| Feb| 6900|
| Dwight|Schrute|dschrute#gmail.com|2021| Mar| 1000|
| Dwight|Schrute|dschrute#gmail.com|2021|Total|35000|
| Kevin| Malone| kmalone#gmail.com|2021| Jan| 9000|
| Kevin| Malone| kmalone#gmail.com|2021| Feb| 6000|
| Kevin| Malone| kmalone#gmail.com|2021| Mar|18000|
| Kevin| Malone| kmalone#gmail.com|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/
(**) Use pattern matching in case there are many columns; for example:
val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))

val data = Seq(
("Micheal","Scott","scarrel#gmail.com",4000,5000,3400,50660),
("Dwight","Schrute","dschrute#gmail.com",1200,6900,1000,35000),
("Kevin","Malone","kmalone#gmail.com",9000,6000,18000,32000)) )
val columns = Seq("FName","SName","Email","Jan 2021","Feb 2021","Mar 2021","Total 2021")
val newColumns = Array( "FName", "SName", "Email","Total 2021" )
val df = spark.createDataFrame( data ).toDF(columns:_*)
df
.select(
struct(
(for {column <- df.columns } yield col(column)).toSeq :_*
).as("mystruct")) // create your data set with a column as a struct.
.select(
$"mystruct.Fname", // refer to sub element of struct with '.' operator
$"mystruct.sname",
$"mystruct.Email",
explode( /make rows for every entry in the array.
array(
(for {column <- df.columns if !(newColumns contains column) } //filter out the columns we already selected
yield // for each element yield the following expression (similar to map)
struct(
col(s"mystruct.$column").as("value"), // create the value column
lit(column).as("date_year")) // create a date column
).toSeq :_* ) // shorthand to pass scala array into varargs for array function
)
)
.select(
col("*"), // just being lazy instead of typing.
col("col.*") // create columns from new column. Seperating the year/date should be easy from here.
).drop($"col")
.show(false)
+--------------+--------------+------------------+-----+---------+
|mystruct.Fname|mystruct.sname|mystruct.Email |value|date_year|
+--------------+--------------+------------------+-----+---------+
|Micheal |Scott |scarrel#gmail.com |4000 |Jan 2021 |
|Micheal |Scott |scarrel#gmail.com |5000 |Feb 2021 |
|Micheal |Scott |scarrel#gmail.com |3400 |Mar 2021 |
|Dwight |Schrute |dschrute#gmail.com|1200 |Jan 2021 |
|Dwight |Schrute |dschrute#gmail.com|6900 |Feb 2021 |
|Dwight |Schrute |dschrute#gmail.com|1000 |Mar 2021 |
|Kevin |Malone |kmalone#gmail.com |9000 |Jan 2021 |
|Kevin |Malone |kmalone#gmail.com |6000 |Feb 2021 |
|Kevin |Malone |kmalone#gmail.com |18000|Mar 2021 |
+--------------+--------------+------------------+-----+---------

Is there a way to "flatten" KQL results into summary columns?

Given the following dataset, is there a simple/efficient way to produce a summary table in like the following using KQL, ideally without knowing the actual colours to be used in advance (i.e. column names are generated from the data values encountered)?
datatable ( name: string, colour: string )[
"alice", "blue",
"bob", "green",
"bob", "blue",
"alice", "red",
"charlie", "red",
"alice", "blue",
"charlie", "red",
"bob", "green"
]
+---------+------+-------+-----+
| name | blue | green | red |
+---------+------+-------+-----+
| alice | 2 | 0 | 1 |
| bob | 1 | 2 | 0 |
| charlie | 0 | 0 | 2 |
+---------+------+-------+-----+

Pivot plugin
datatable ( name: string, colour: string )[
"alice", "blue",
"bob", "green",
"bob", "blue",
"alice", "red",
"charlie", "red",
"alice", "blue",
"charlie", "red",
"bob", "green"
]
| evaluate pivot(colour, count(), name)
name
blue
green
red
alice
2
0
1
bob
1
2
0
charlie
0
0
2
Fiddle

Ability to get the "index" (or ordinal value) for each array entry in BigQuery?

In a data column in BigQuery, I have a JSON object with the structure:
{
"sections": [
{
"secName": "Flintstones",
"fields": [
{ "fldName": "Fred", "age": 55 },
{ "fldName": "Barney", "age": 44 }
]
},
{
"secName": "Jetsons",
"fields": [
{ "fldName": "George", "age": 33 },
{ "fldName": "Elroy", "age": 22 }
]
}
]}
I'm hoping to unnest() and json_extract() to get results that resemble:
id | section_num | section_name | field_num | field_name | field_age
----+--------------+--------------+-----------+------------+-----------
1 | 1 | Flintstones | 1 | Fred | 55
1 | 1 | Flintstones | 2 | Barney | 44
1 | 2 | Jetsons | 1 | George | 33
1 | 2 | Jetsons | 2 | Elroy | 22
So far, I have the query:
SELECT id,
json_extract_scalar(curSection, '$.secName') as section_name,
json_extract_scalar(curField, '$.fldName') as field_name,
json_extract_scalar(curField, '$.age') as field_age
FROM `tick8s.test2` AS tbl
LEFT JOIN unnest(json_extract_array(tbl.data, '$.sections')) as curSection
LEFT JOIN unnest(json_extract_array(curSection, '$.fields')) as curField
that yields:
id | section_name | field_name | field_age
----+--------------+------------+-----------
1 | Flintstones | Fred | 55
1 | Flintstones | Barney | 44
1 | Jetsons | George | 33
1 | Jetsons | Elroy | 22
QUESTION: I'm not sure how, if possible, to get the section_num and field_num ordinal positions from their array index values?
(If you are looking to duplicate my results, I have a table named test2 with 2 columns:
id - INTEGER, REQUIRED
data - STRING, NULLABLE
and I insert the data with:
insert into tick8s.test2 values (1,
'{"sections": [' ||
'{' ||
'"secName": "Flintstones",' ||
'"fields": [' ||
'{ "fldName": "Fred", "age": 55 },' ||
'{ "fldName": "Barney", "age": 44 }' ||
']' ||
'},' ||
'{' ||
'"secName": "Jetsons",' ||
'"fields": [' ||
'{ "fldName": "George", "age": 33 },' ||
'{ "fldName": "Elroy", "age": 22 }' ||
']' ||
'}]}'
);
)

Do you just want with offset?
SELECT id,
json_extract_scalar(curSection, '$.secName') as section_name,
n_s,
json_extract_scalar(curField, '$.fldName') as field_name,
json_extract_scalar(curField, '$.age') as field_age,
n_c
FROM `tick8s.test2` tbl LEFT JOIN
unnest(json_extract_array(tbl.data, '$.sections')
) curSection WITH OFFSET n_s LEFT JOIN
unnest(json_extract_array(curSection, '$.fields')
) curField WITH OFFSET n_c;

Create nested json blobs in PostgreSQL

I'm trying to create a nested json from a table like this:
+----------+---------+------------------------------+
| unixtime | assetid | data |
+----------+---------+------------------------------+
| 10 | 80 | {"inflow": 10, "outflow": 2} |
| 20 | 90 | {"inflow": 10, "outflow": 2} |
| 10 | 80 | {"inflow": 10, "outflow": 2} |
| 20 | 90 | {"inflow": 10, "outflow": 2} |
+----------+---------+------------------------------+
and get something like this:
{
"10": {
"80": {"inflow": 10, "outflow": 2},
"90": {"inflow": 10, "outflow": 2}
},
"20": {
"80": {"inflow": 10, "outflow": 2},
"90": {"inflow": 10, "outflow": 2}
}
}
I've tried recursively converting the json data to text, array_agg then convert the result to json blob using json_object, but that eventually screwed up the json structure with escape slashes ( \ ).
Any help would be appreciated
Here's the link to the data:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=26734e87d4b9aea4ceded4e414acec4c
Thank you.

You can use json_object_agg() function:
....
, m as (
select
unixdatetime,
assetid,
json_object(array_agg(description), array_agg(value::text))
as value
from input_data
group by unixdatetime, assetid
), j as
(
select json_object_agg("assetid","value") as js,m."unixdatetime"
from m
group by "unixdatetime"
)
select json_object_agg("unixdatetime",js)
from j

How to create nested JSON return with aggregate function and dynamic key values using `jsonb_build_object`

This is what and example of the table looks like.
+---------------------+------------------+------------------+
| country_code | region | num_launches |
+---------------------+------------------+------------------+
| 'CA' | 'Ontario' | 5 |
+---------------------+------------------+------------------+
| 'CA' | 'Quebec' | 9 |
+---------------------+------------------+------------------+
| 'DE' | 'Bavaria' | 15 |
+---------------------+------------------+------------------+
| 'DE' | 'Saarland' | 12 |
+---------------------+------------------+------------------+
| 'DE' | 'Berlin' | 23 |
+---------------------+------------------+------------------+
| 'JP' | 'Tokyo' | 19 |
+---------------------+------------------+------------------+
I am able to write a query that returns each country_code with all regions nested within, but I am unable to get exactly what I am looking for.
My intended return looks like.
[
{ 'CA': [
{ 'Ontario': 5 },
{ 'Quebec': 9 }
]
},
{ 'DE': [
{ 'Bavaria': 15 },
{ 'Saarland': 12 },
{ 'Berlin': 23 }
]
},
{ 'JP': [
{ 'Tokyo': 19 }
]
}
]
How could this be calculated if the num_launches was not available?
+---------------------+------------------+
| country_code | region |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Quebec' |
+---------------------+------------------+
| 'CA' | 'Quebec' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Saarland' |
+---------------------+------------------+
| 'DE' | 'Berlin' |
+---------------------+------------------+
| 'DE' | 'Berlin' |
+---------------------+------------------+
| 'JP' | 'Tokyo' |
+---------------------+------------------+
Expected Return
[
{ 'CA': [
{ 'Ontario': 3 },
{ 'Quebec': 2 }
]
},
{ 'DE': [
{ 'Bavaria': 4 },
{ 'Saarland': 1 },
{ 'Berlin': 2 }
]
},
{ 'JP': [
{ 'Tokyo': 1 }
]
}
]
Thanks

You can try to use json_agg with json_build_object function in a subquery to get the array then do it again in the main query.
Schema (PostgreSQL v9.6)
CREATE TABLE T(
country_code varchar(50),
region varchar(50),
num_launches int
);
insert into t values ('CA','Ontario',5);
insert into t values ('CA','Quebec',9);
insert into t values ('DE','Bavaria',15);
insert into t values ('DE','Saarland',12);
insert into t values ('DE','Berlin',23);
insert into t values ('JP','Tokyo',19);
Query #1
select json_agg(json_build_object(country_code,arr)) results
from (
SELECT country_code,
json_agg(json_build_object(region,num_launches)) arr
FROM T
group by country_code
) t1;
results
[{"CA":[{"Ontario":5},{"Quebec":9}]},{"DE":[{"Bavaria":15},{"Saarland":12},{"Berlin":23}]},{"JP":[{"Tokyo":19}]}]
View on DB Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert a json which is a list of dictionaries into column/row format in Postgresql - sql

Related

Scala Unpivot Table

Is there a way to "flatten" KQL results into summary columns?

Ability to get the "index" (or ordinal value) for each array entry in BigQuery?

Create nested json blobs in PostgreSQL

How to create nested JSON return with aggregate function and dynamic key values using `jsonb_build_object`

Categories

Resources