Convert a json which is a list of dictionaries into column/row format in Postgresql - sql

I´ve a json which is a list of dictionaries with the next syntax:
[
{
"Date_and_Time": "Dec 29, 2017 15:35:37",
"Componente": "Bar",
"IP_Origen": "175.11.13.6",
"IP_Destino": "81.18.119.864",
"Country": "Brazil",
"Age": "3"
},
{
"Date_and_Time": "Dec 31, 2017 17:35:37",
"Componente": "Foo",
"IP_Origen": "176.11.13.6",
"IP_Destino": "80.18.119.864",
"Country": "France",
'Id': '123456',
'Car': 'Ferrari'
},
{
"Date_and_Time": "Dec 31, 2017 17:35:37",
"Age": "1",
"Country": "France",
'Id': '123456',
'Car': 'Ferrari'
},
{
"Date_and_Time": "Mar 31, 2018 14:35:37",
"Componente": "Foo",
"Country": "Germany",
'Id': '2468',
'Genre': 'Male'
}
]
The json is really big and each dictionary have different amount of key/values fields. And what I want to do is to create a table in postgresSQL where the key represents a column and the value a row. In the example explained above I would like table like this:
Date_and_Time | Componente | IP_Origen | IP_Destino | Country| Id | Car | Age| Genre
Dec 29, 2017 15:35:37 | Bar | 175.11.13.6 | 81.18.119.864 | Brazil | - | - | 3 | -
Dec 31, 2017 17:35:37 | Foo | 176.11.13.6 | 80.18.119.864 | France |123456 |Ferrari | - | -
Dec 31, 2017 17:35:37 | - | - | - | France |123456 |Ferrari | 1 | -
Mar 31, 2018 14:35:37 | Foo | - | - | Germany| 2468 | - | - | Male
The only solution I can think is putting the values one by one but this is no efficient at all

You can use jsonb_to_recordset to create record set out of your json and then use insert into to insert the records.
insert into table
select * from jsonb_to_recordset('<your json>'::jsonb)
as rec(Date_and_Time datetime, Componente text, IP_Origen text) --Specify all columns inside the table
Sample DBFiddle

Related

Scala Unpivot Table

SCALA
I have a table with this struct:
FName
SName
Email
Jan 2021
Feb 2021
Mar 2021
Total 2021
Micheal
Scott
scarrel#gmail.com
4000
5000
3400
50660
Dwight
Schrute
dschrute#gmail.com
1200
6900
1000
35000
Kevin
Malone
kmalone#gmail.com
9000
6000
18000
32000
And i want to transform it to:
I tried with 'stack' method but i couldn't get it to work.
Thanks
You can flatten the monthly/total columns via explode as shown below:
val df = Seq(
("Micheal", "Scott", "scarrel#gmail.com", 4000, 5000, 3400, 50660),
("Dwight", "Schrute", "dschrute#gmail.com", 1200, 6900, 1000, 35000),
("Kevin", "Malone", "kmalone#gmail.com", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")
val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021") // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
val moYr = split(lit(c), "\\s+")
struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
}
df.
withColumn("flattened", explode(array(structCols: _*))).
select(otherCols.map(col) :+ $"flattened.*": _*).
show
/*
+-------+-------+------------------+----+-----+-----+
| FName| SName| Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal| Scott| scarrel#gmail.com|2021| Jan| 4000|
|Micheal| Scott| scarrel#gmail.com|2021| Feb| 5000|
|Micheal| Scott| scarrel#gmail.com|2021| Mar| 3400|
|Micheal| Scott| scarrel#gmail.com|2021|Total|50660|
| Dwight|Schrute|dschrute#gmail.com|2021| Jan| 1200|
| Dwight|Schrute|dschrute#gmail.com|2021| Feb| 6900|
| Dwight|Schrute|dschrute#gmail.com|2021| Mar| 1000|
| Dwight|Schrute|dschrute#gmail.com|2021|Total|35000|
| Kevin| Malone| kmalone#gmail.com|2021| Jan| 9000|
| Kevin| Malone| kmalone#gmail.com|2021| Feb| 6000|
| Kevin| Malone| kmalone#gmail.com|2021| Mar|18000|
| Kevin| Malone| kmalone#gmail.com|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/
(**) Use pattern matching in case there are many columns; for example:
val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))
val data = Seq(
("Micheal","Scott","scarrel#gmail.com",4000,5000,3400,50660),
("Dwight","Schrute","dschrute#gmail.com",1200,6900,1000,35000),
("Kevin","Malone","kmalone#gmail.com",9000,6000,18000,32000)) )
val columns = Seq("FName","SName","Email","Jan 2021","Feb 2021","Mar 2021","Total 2021")
val newColumns = Array( "FName", "SName", "Email","Total 2021" )
val df = spark.createDataFrame( data ).toDF(columns:_*)
df
.select(
struct(
(for {column <- df.columns } yield col(column)).toSeq :_*
).as("mystruct")) // create your data set with a column as a struct.
.select(
$"mystruct.Fname", // refer to sub element of struct with '.' operator
$"mystruct.sname",
$"mystruct.Email",
explode( /make rows for every entry in the array.
array(
(for {column <- df.columns if !(newColumns contains column) } //filter out the columns we already selected
yield // for each element yield the following expression (similar to map)
struct(
col(s"mystruct.$column").as("value"), // create the value column
lit(column).as("date_year")) // create a date column
).toSeq :_* ) // shorthand to pass scala array into varargs for array function
)
)
.select(
col("*"), // just being lazy instead of typing.
col("col.*") // create columns from new column. Seperating the year/date should be easy from here.
).drop($"col")
.show(false)
+--------------+--------------+------------------+-----+---------+
|mystruct.Fname|mystruct.sname|mystruct.Email |value|date_year|
+--------------+--------------+------------------+-----+---------+
|Micheal |Scott |scarrel#gmail.com |4000 |Jan 2021 |
|Micheal |Scott |scarrel#gmail.com |5000 |Feb 2021 |
|Micheal |Scott |scarrel#gmail.com |3400 |Mar 2021 |
|Dwight |Schrute |dschrute#gmail.com|1200 |Jan 2021 |
|Dwight |Schrute |dschrute#gmail.com|6900 |Feb 2021 |
|Dwight |Schrute |dschrute#gmail.com|1000 |Mar 2021 |
|Kevin |Malone |kmalone#gmail.com |9000 |Jan 2021 |
|Kevin |Malone |kmalone#gmail.com |6000 |Feb 2021 |
|Kevin |Malone |kmalone#gmail.com |18000|Mar 2021 |
+--------------+--------------+------------------+-----+---------

Is there a way to "flatten" KQL results into summary columns?

Given the following dataset, is there a simple/efficient way to produce a summary table in like the following using KQL, ideally without knowing the actual colours to be used in advance (i.e. column names are generated from the data values encountered)?
datatable ( name: string, colour: string )[
"alice", "blue",
"bob", "green",
"bob", "blue",
"alice", "red",
"charlie", "red",
"alice", "blue",
"charlie", "red",
"bob", "green"
]
+---------+------+-------+-----+
| name | blue | green | red |
+---------+------+-------+-----+
| alice | 2 | 0 | 1 |
| bob | 1 | 2 | 0 |
| charlie | 0 | 0 | 2 |
+---------+------+-------+-----+
Pivot plugin
datatable ( name: string, colour: string )[
"alice", "blue",
"bob", "green",
"bob", "blue",
"alice", "red",
"charlie", "red",
"alice", "blue",
"charlie", "red",
"bob", "green"
]
| evaluate pivot(colour, count(), name)
name
blue
green
red
alice
2
0
1
bob
1
2
0
charlie
0
0
2
Fiddle

Ability to get the "index" (or ordinal value) for each array entry in BigQuery?

In a data column in BigQuery, I have a JSON object with the structure:
{
"sections": [
{
"secName": "Flintstones",
"fields": [
{ "fldName": "Fred", "age": 55 },
{ "fldName": "Barney", "age": 44 }
]
},
{
"secName": "Jetsons",
"fields": [
{ "fldName": "George", "age": 33 },
{ "fldName": "Elroy", "age": 22 }
]
}
]}
I'm hoping to unnest() and json_extract() to get results that resemble:
id | section_num | section_name | field_num | field_name | field_age
----+--------------+--------------+-----------+------------+-----------
1 | 1 | Flintstones | 1 | Fred | 55
1 | 1 | Flintstones | 2 | Barney | 44
1 | 2 | Jetsons | 1 | George | 33
1 | 2 | Jetsons | 2 | Elroy | 22
So far, I have the query:
SELECT id,
json_extract_scalar(curSection, '$.secName') as section_name,
json_extract_scalar(curField, '$.fldName') as field_name,
json_extract_scalar(curField, '$.age') as field_age
FROM `tick8s.test2` AS tbl
LEFT JOIN unnest(json_extract_array(tbl.data, '$.sections')) as curSection
LEFT JOIN unnest(json_extract_array(curSection, '$.fields')) as curField
that yields:
id | section_name | field_name | field_age
----+--------------+------------+-----------
1 | Flintstones | Fred | 55
1 | Flintstones | Barney | 44
1 | Jetsons | George | 33
1 | Jetsons | Elroy | 22
QUESTION: I'm not sure how, if possible, to get the section_num and field_num ordinal positions from their array index values?
(If you are looking to duplicate my results, I have a table named test2 with 2 columns:
id - INTEGER, REQUIRED
data - STRING, NULLABLE
and I insert the data with:
insert into tick8s.test2 values (1,
'{"sections": [' ||
'{' ||
'"secName": "Flintstones",' ||
'"fields": [' ||
'{ "fldName": "Fred", "age": 55 },' ||
'{ "fldName": "Barney", "age": 44 }' ||
']' ||
'},' ||
'{' ||
'"secName": "Jetsons",' ||
'"fields": [' ||
'{ "fldName": "George", "age": 33 },' ||
'{ "fldName": "Elroy", "age": 22 }' ||
']' ||
'}]}'
);
)
Do you just want with offset?
SELECT id,
json_extract_scalar(curSection, '$.secName') as section_name,
n_s,
json_extract_scalar(curField, '$.fldName') as field_name,
json_extract_scalar(curField, '$.age') as field_age,
n_c
FROM `tick8s.test2` tbl LEFT JOIN
unnest(json_extract_array(tbl.data, '$.sections')
) curSection WITH OFFSET n_s LEFT JOIN
unnest(json_extract_array(curSection, '$.fields')
) curField WITH OFFSET n_c;

Create nested json blobs in PostgreSQL

I'm trying to create a nested json from a table like this:
+----------+---------+------------------------------+
| unixtime | assetid | data |
+----------+---------+------------------------------+
| 10 | 80 | {"inflow": 10, "outflow": 2} |
| 20 | 90 | {"inflow": 10, "outflow": 2} |
| 10 | 80 | {"inflow": 10, "outflow": 2} |
| 20 | 90 | {"inflow": 10, "outflow": 2} |
+----------+---------+------------------------------+
and get something like this:
{
"10": {
"80": {"inflow": 10, "outflow": 2},
"90": {"inflow": 10, "outflow": 2}
},
"20": {
"80": {"inflow": 10, "outflow": 2},
"90": {"inflow": 10, "outflow": 2}
}
}
I've tried recursively converting the json data to text, array_agg then convert the result to json blob using json_object, but that eventually screwed up the json structure with escape slashes ( \ ).
Any help would be appreciated
Here's the link to the data:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=26734e87d4b9aea4ceded4e414acec4c
Thank you.
You can use json_object_agg() function:
....
, m as (
select
unixdatetime,
assetid,
json_object(array_agg(description), array_agg(value::text))
as value
from input_data
group by unixdatetime, assetid
), j as
(
select json_object_agg("assetid","value") as js,m."unixdatetime"
from m
group by "unixdatetime"
)
select json_object_agg("unixdatetime",js)
from j

How to create nested JSON return with aggregate function and dynamic key values using `jsonb_build_object`

This is what and example of the table looks like.
+---------------------+------------------+------------------+
| country_code | region | num_launches |
+---------------------+------------------+------------------+
| 'CA' | 'Ontario' | 5 |
+---------------------+------------------+------------------+
| 'CA' | 'Quebec' | 9 |
+---------------------+------------------+------------------+
| 'DE' | 'Bavaria' | 15 |
+---------------------+------------------+------------------+
| 'DE' | 'Saarland' | 12 |
+---------------------+------------------+------------------+
| 'DE' | 'Berlin' | 23 |
+---------------------+------------------+------------------+
| 'JP' | 'Tokyo' | 19 |
+---------------------+------------------+------------------+
I am able to write a query that returns each country_code with all regions nested within, but I am unable to get exactly what I am looking for.
My intended return looks like.
[
{ 'CA': [
{ 'Ontario': 5 },
{ 'Quebec': 9 }
]
},
{ 'DE': [
{ 'Bavaria': 15 },
{ 'Saarland': 12 },
{ 'Berlin': 23 }
]
},
{ 'JP': [
{ 'Tokyo': 19 }
]
}
]
How could this be calculated if the num_launches was not available?
+---------------------+------------------+
| country_code | region |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Ontario' |
+---------------------+------------------+
| 'CA' | 'Quebec' |
+---------------------+------------------+
| 'CA' | 'Quebec' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Bavaria' |
+---------------------+------------------+
| 'DE' | 'Saarland' |
+---------------------+------------------+
| 'DE' | 'Berlin' |
+---------------------+------------------+
| 'DE' | 'Berlin' |
+---------------------+------------------+
| 'JP' | 'Tokyo' |
+---------------------+------------------+
Expected Return
[
{ 'CA': [
{ 'Ontario': 3 },
{ 'Quebec': 2 }
]
},
{ 'DE': [
{ 'Bavaria': 4 },
{ 'Saarland': 1 },
{ 'Berlin': 2 }
]
},
{ 'JP': [
{ 'Tokyo': 1 }
]
}
]
Thanks
You can try to use json_agg with json_build_object function in a subquery to get the array then do it again in the main query.
Schema (PostgreSQL v9.6)
CREATE TABLE T(
country_code varchar(50),
region varchar(50),
num_launches int
);
insert into t values ('CA','Ontario',5);
insert into t values ('CA','Quebec',9);
insert into t values ('DE','Bavaria',15);
insert into t values ('DE','Saarland',12);
insert into t values ('DE','Berlin',23);
insert into t values ('JP','Tokyo',19);
Query #1
select json_agg(json_build_object(country_code,arr)) results
from (
SELECT country_code,
json_agg(json_build_object(region,num_launches)) arr
FROM T
group by country_code
) t1;
results
[{"CA":[{"Ontario":5},{"Quebec":9}]},{"DE":[{"Bavaria":15},{"Saarland":12},{"Berlin":23}]},{"JP":[{"Tokyo":19}]}]
View on DB Fiddle