How to get data from Json to multiple column - sql

I want to choose which to extract and show it to each column, I want to have result like this query, but don't want to type this for every single thing like this
Select *,
metrics::json ->> 'spend',
metrics::json ->> 'impressions',
metrics::json ->> 'clicks'
from t1
this show null, How to do if I choose to extract 'reach' and 'clicks',... to column but not all in the json?
select *
from json_to_record('{"reach": 240, "spend": 3.34, "clicks": 10, "frequency": 1.0375}')
as x(a int, b text, d text, e text)
I refer this Stack over flow question
My DEMO
EDIT: I have the main question is: how to choose which to extract
without extract all like the 2nd query? The data have many rows, each row have json, can I do that with Json_to_record ?

If you only want to select partial data from the JSON object, lets say 2 out of 4 keys, you can do so easily by omitting the rest of the keys from the anonymous table declaration. You need to use the JSON keys as column names.
select *
from json_to_record('{"reach": 240, "spend": 3.34, "clicks": 10, "frequency": 1.0375}')
as x(reach int, clicks int)
This allows you to get the columns you need with little writing effort.
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=dd39d912f6e696a8ace3670acf606959

As a_horse_with_no_name said keep the same record names as these in the JSON and use aliases in the select list if needed.
select
reach as a, spend as b, clicks as c, frequency as d
from json_to_record('{"reach": 240, "spend": 3.34, "clicks": 10, "frequency": 1.0375}')
as x(reach integer, spend numeric, clicks integer, frequency numeric);

Related

BigQuery Standard SQL - store query or UDF in table

Is it possible to store data in a table that can then be converted into either a SQL query or a UDF - like a javascript eval()?
The use case is that I have a list of clients where earnings are calculated in quite significantly different ways for each, and this can change over time. So I would like to have a lookup table which can be updated with a formula for calculating this figure rather than having to write not only hundreds of queries (one for each client) but also maintain these.
I have tried to think if there is a way of having a standard formula that would be flexible enough, but I really don't think it's possible unfortunately.
Sure! BigQuery can define and use JS UDFs. The good news is that eval() works as expected:
CREATE TEMP FUNCTION calculate(x FLOAT64, y FLOAT64, formula STRING)
RETURNS FLOAT64
LANGUAGE js AS """
return eval(formula);
""";
WITH table AS (
SELECT 1 AS x, 5 as y, 'x+y' formula
UNION ALL SELECT 2, 10, 'x-y'
UNION ALL SELECT 3, 15, 'x*y'
)
SELECT x, y, formula, calculate(x, y, formula) result
FROM table;

How can I aggregate Jsonb columns in postgres using another column type

I have the following data in a postgres table,
where data is a jsonb column. I would like to get result as
[
{field_type: "Design", briefings_count: 1, meetings_count: 13},
{field_type: "Engineering", briefings_count: 1, meetings_count: 13},
{field_type: "Data Science", briefings_count: 0, meetings_count: 3}
]
Explanation
Use jsonb_each_text function to extract data from jsonb column named data. Then aggregate rows by using GROUP BY to get one row for each distinct field_type. For each aggregation we also need to include meetings and briefings count which is done by selecting maximum value with case statement so that you can create two separate columns for different counts. On top of that apply coalesce function to return 0 instead of NULL if some information is missing - in your example it would be briefings for Data Science.
At a higher level of statement now that we have the results as a table with fields we need to build a jsonb object and aggregate them all to one row. For that we're using jsonb_build_object to which we are passing pairs that consist of: name of the field + value. That brings us with 3 rows of data with each row having a separate jsonb column with the data. Since we want only one row (an aggregated json) in the output we need to apply jsonb_agg on top of that. This brings us the result that you're looking for.
Code
Check LIVE DEMO to see how it works.
select
jsonb_agg(
jsonb_build_object('field_type', field_type,
'briefings_count', briefings_count,
'meetings_count', meetings_count
)
) as agg_data
from (
select
j.k as field_type
, coalesce(max(case when t.count_type = 'briefings_count' then j.v::int end),0) as briefings_count
, coalesce(max(case when t.count_type = 'meetings_count' then j.v::int end),0) as meetings_count
from tbl t,
jsonb_each_text(data) j(k,v)
group by j.k
) t
You can aggregate columns like this and then insert data to another table
select array_agg(data)
from the_table
Or use one of built-in json function to create new json array. For example jsonb_agg(expression)

SQL command(s) to transform data

For the SQL language gurus...a challenge. Hopefully not too hard. If I have data that contains an asset identifier, followed by 200 data elements for that asset...what SQL snippet would transform that to a vertical format?
Current:
Column names:
Asset ID, Column Header 1, Column Header 2, ... Column Header "n"
Data Row:
abc123, 1234, 2345, 3456, ...
Desired:
Asset ID, Column Header 1, 1234
Asset ID, Column Header 2, 2345
Asset ID, Column Header 3, 3456
...
Asset ID, Column Header n, 9876
The SQL implementation that I am using (DashDB based on DB2 in Bluemix) does not support a "pivot" command. And I would like the code snippet to work unchanged if column headers are changed, or additional columns are added to the "current" data format. I.e. I would prefer not to hard code to a fixed list of columns.
What do you think? Can it be done with an SQL code snippet?
Thanks!
You can do this by composing a pivoted table for each row and performing a cartesian product between the source table and the composed table:
SELECT assetId, colname, colvalue
FROM yourtable T,
TABLE(VALUES ('ColumnHeader1', T.ColumnHeader1),
('ColumnHeader2', T.ColumnHeader2),
('ColumnHeader3', T.ColumnHeader3),
...
('ColumnHeaderN', T.ColumnHeaderN)
) as pivot(colname, colvalue);
This will only require a single scan of yourtable, so it quite efficient.
The canonical way is union all:
select assetId, 'ColumnHeader1' as colname, ColumnHeader1 as value from t union all
select assetId, 'ColumnHeader2' as colname, ColumnHeader2 as value from t union all
. . .
There are other methods but this is usually the simplest to code. It will require reading the table once for each column, which could be an issue.
Note: You can construct such a query using a spreadsheet and formulas. Or, even construct it using another SQL query.

PostgreSQL: How to access column on anonymous record

I have a problem that I'm working on. Below is a simplified query to show the problem:
WITH the_table AS (
SELECT a, b
FROM (VALUES('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data AS (
SELECT 'data7' AS c, array_agg(ROW(a, b)) AS d
FROM the_table
)
SELECT c, d[array_upper(d, 1)]
FROM my_data
In the my data section, you'll notice that I'm creating an array from multiple rows, and the array is returned in one row with other data. This array needs to contain the information for both a and b, and keep two values linked together. What would seem to make sense would be to use an anonymous row or record (I want to avoid actually creating a composite type).
This all works well until I need to start pulling data back out. In the above instance, I need to access the last entry in the array, which is done easily by using array_upper, but then I need to access the value in what used to be the b column, which I cannot figure out how to do.
Essentially, right now the above query is returning:
"data7";"(data5,6)"
And I need to return
"data7";6
How can I do this?
NOTE: While in the above example I'm using text and integers as the types for my data, they are not the actual final types, but are rather used to simplify the example.
NOTE: This is using PostgreSQL 9.2
EDIT: For clarification, Something like SELECT 'data7', 6 is not what I'm after. Imagine that the_table is actually pulling from database tables and not the WITH statement the I put in for convenience, and I don't readily know what data is in the table.
In other words, I want to be able to do something like this:
SELECT c, (d[array_upper(d, 1)]).b
FROM my_data
And get this back:
"data7";6
Essentially, once I've put something into an anonymous record by using the row() function, how do I get it back out? How do I split up the 'data5' part and the 6 part so that they don't both return in one column?
For another example:
SELECT ROW('data5', 6)
makes 'data5' and 6 return in one column. How do I take that one column and break it back into the original two?
I hope that clarifies
If you can install the hstore extension:
with the_table as (
select a, b
from (values('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data as (
select 'data7' as c, array_agg(row(a, b)) as d
from the_table
)
select c, (avals(hstore(d[array_upper(d, 1)])))[2]
from my_data
;
c | avals
-------+-------
data7 | 6
This is just a very quick throw together around a similarish problem - not an answer to your question. This appears to be one direction towards identifying columns.
with x as (select 1 a, 2 b union all values (1,2),(1,2),(1,2))
select a from x;

Hive aggregation function that produces a map

I have the following hive table
ID, class,value
1, A, 0.3
1, B, 0.4
1, C, 0.5
2, B, 0.1
2, C, 0.2
I want to get
ID, class:value
1, [A:0.3, B:0.4, C:0.5]
2, [B:0.1, C:0.2]
I know that there is a collect_set() UDAF that produces a list of class or list of value, is there anyway to get a list of key:value pairs?
NOTE:
I guess I can use two collect_set() one for class column and one for value column but I am not sure if the lists will be in the same order.
I've used the UnionUDAF from the Brickhouse library to do something similar. You create a map from each pair, and then union them all together during the aggregation.
Add JAR brickhouse.jar;
create temporary function BH_union as 'brickhouse.udf.collect.UnionUDAF';
SELECT S.ID, BH_union(S.v_map)
FROM (SELECT ID, map(class, value) as v_map from mytable) S
GROUP by S.ID
You can use custom Map/Reduce scripts and collect_list() (from Hive 0.13.0) to achieve the same.
Let me know if you need more help in this.