Transform flat table into nested/repeated table - sql

in the BigQuery we have a dictionary tables which specify some mappings - example:
But this structure (flat, typically relational one) is not so handy to operate with mappings - when we write a query multiple joins are required if we would like to get mappings for a few fields. Our idea is to transform dictionary table into one-row nested table with repeated fields, to enable mappings application with a single join. So desired structure looks like:
Any idea how to transform the flat table to nested one via standard sql? Essentially, values from field attribute should become a new attributes, and key value value pairs should became a repeated entries for each attribute. So the whole operation is similar to the pivot.
P.S.
I know, that BigQuery recently introduced a json structures. We considered this solution, but JSON_QUERY doesn't support passing concatenated values to function parameters. As a result we are unable to get values dynamically, and we resign from this solution as more complicated.
Error when trying to have a variable pathsname: JSONPath must be a string literal or query parameter

Consider below simple option
select * from your_table pivot (
array_agg(struct(key, value) ignore nulls)
for field in ('aaa','bbb')
)
if applied to sample data in your question - output is

You can use pivot operator along with array_agg() function to achieve this.
Here is a query that does this -
with temp as
(
select 'aaa' as field, 1 key, 2 value union all
select 'aaa' as field, 2 key, 5 value union all
select 'aaa' as field, 4 key, 15 value union all
select 'bbb' as field, 1 key, 23 value union all
select 'bbb' as field, 2 key, 36 value
)
select *
from
(
select field, key, value
from temp
)
pivot
(
array_agg(struct(key, value) ignore nulls)
for field in ('aaa','bbb')
)
In the pivot operator, as you can see, I have mentioned the field values as aaa,bbb. If you want to load them dynamically, then you will need to set a variable and then do it.
For more details on storing the values in variable and then using pivot, use this link - https://towardsdatascience.com/pivot-in-bigquery-4eefde28b3be

Related

How to subscript a Postgres column

I have a Postgres query:
SELECT main
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
Which returns a single column of values in the format (value_a, value_b) in each row. I want the outer query to format those values so that all the value_a's and value_b's are in their own separate columns.
Is there an easy way to do this?
Output screenshot:
http://example.com/path-to-ghosts.jpg
You can abuse row_to_json to do this, but it is probably best to avoid anonymous record types in the first place.
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
To give a concrete example (after running pgbench -i):
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (aid, bid)
END as main
FROM pgbench_accounts
LIMIT 100) inner_t;
But it only works in v10 and up.
This is more of an explanation than an actual answer. But it won't fit into a comment.
The thing is, SQL is a strictly typed language. Postgres demands to know the number and data types in the SELECT list at call time. The *-expansion in SELECT * FROM .. is based on registered types. Postgres knows the columns of a table because the structure is saved in the catalog tables.
The expression nested in your construct (col_a, col_b) is short for ROW(col_a, col_b) and a ROW constructor creates an anonymous record. The manual:
By default, the value created by a ROW expression is of an anonymous
record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Postgres does not know how to expand an anonymous record. *-expansion does not work.
You could cast like the manual says. But that's only an option if the type is stable, i.e. you always put in the same number of columns with the same data types. And that still would not preserve column names.
So, for the best solution, first define:
Obviously you want to preserve column vales.
Do you also want to preserve column names?
Do you also want to preserve column types?
And:
Is the number of columns in the expression always the same?
Are data types always the same?
The CASE condition is stable or based on other columns?
If the true aim of the game is to fit multiple values in a single CASE expression, you only care about values, create a text array instead:
SELECT main[1] AS col_a, main[2] AS col_b
FROM (
SELECT CASE WHEN true THEN ARRAY[col_a::text, col_b::text] END AS main
FROM table1
LIMIT 100
) inner_t;
You lose name and type. You can cast and add aliases if you know name & type.
Else you have to describe your use case more closely - in the question.
Try Below query
SELECT split_part(main,',',1) as Val1,split_part(main,',',2) as Val2
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t

Select all existing json fields from a postgres table

In my table mytable I have a json field called data and I inserted json with a lot of keys & values.
I know that it's possible to select individual fields like so:
SELECT data->'mykey' as mykey from mytable
But how can I get an overview of all of the json keys on a certain depth? I would have expected something like
SELECT data->* from mytable
but that doesn't work. Is there something similar?
You can use the json_object_keys() function to get all the top-level keys of a json value:
SELECT keys.*
FROM mytable, json_object_keys(mytable.data) AS keys (mykey);
If you want to search at a deeper level, then first extract that deeper level from the json value using the #> operator:
SELECT keys.*
FROM mytable, json_object_keys(mytable.data #> '{level1, level2}') AS keys (mykey);
Note that the function returns a set of text, so you should invoke the function as a row source.
If you are using the jsonb data type, then use the jsonb_object_keys() function.

SQL: What does NULL as ColumnName imply

I understand that AS is used to create an alias. Therefore, it makes sense to have one long name aliased as a shorter one. However, I am seeing a SQL query NULL as ColumnName
What does this imply?
SELECT *, NULL as aColumn
Aliasing can be used in a number of ways, not just to shorten a long column name.
In this case, your example means you're returning a column that always contains NULL, and it's alias/column name is aColumn.
Aliasing can also be used when you're using computed values, such as Column1 + Column2 AS Column3.
When unioning or joining datasets using a 'Null AS [ColumnA] is a quick way to make sure create a complete dataset that can then be updated later and a new column does not need to be created in any of the source tables.
In the statement result we have a column that has all NULL values. We can refer to that column using alias.
In your case the query selects all records from table, and each result record has additional column containing only NULL values. If we want to refer to this result set and to additional column in other place in the future, we should use alias.
It means that "aColumn" has only Null values. This column could be updated with actual values later but it's an empty one when selected.
---I'm not sure if you know about SSIS, but this mechanism is useful with SSIS to add variable value to the "empty" column.
When using SELECT you can pass a value to the column directly.
So something like :
SELECT ID, Name, 'None' AS Hobbies, 0 AS NumberOfPets, NULL AS Picture, '' AS Adress
Is valid.
It can be used to format nicely a query output when using UNION/UNION ALL.
Query result can have a new column that has all NULL values. In SQL Server we can do it like this
SELECT *, CAST(NULL AS <data-type>) AS as aColumn
e.g.
SELECT *, CAST(NULL AS BIGINT) AS as aColumn
How about without using the the as
SELECT ID
, Name
, 'None' AS Hobbies
, 0 AS NumberOfPets
, NULL Picture
Usually adding NULL as [Column] name at the end of a select all is used when inserting into another table a calculated column based on the table you have just selected.
UPDATE #TempTable SET aColumn = Column1 + Column2 WHERE ...
Then exporting or saving the results to another table.

Can SQL determine which values from a set of possible column values do not exist?

I have a unique column. I also have a known set of elements that are possible values for the column. I need to know which of the possible values are not already in the table, and as such, are suitable for insertion.
Is this possible with SQL or is post processing required?
Currently, I am using the "in" operator to select all rows where the column value equals an element in my set. Then I remove all matched elements from my set via post processing.
Stick the allowed values in a temporary table allowed, then use a subquery using NOT IN:
SELECT *
FROM allowed
WHERE allowed.val NOT IN (
SELECT maintable.val
)
Some DBs will allow you to build up a table "in-place", instead of having to create a separate table. E.g. in PostgreSQL (any version):
SELECT *
FROM (
SELECT 'foo'
UNION ALL SELECT 'bar'
UNION ALL SELECT 'baz' -- etc.
) inplace_allowed
WHERE inplace_allowed.val NOT IN (
SELECT maintable.val
)
More modern versions of PostgreSQL (and perhaps other DBs) will let you use the slightly nicer VALUES syntax to do the same thing.
To do this entirely in SQL you will need to create a separate table with one column. Each row holds one value from the known set of elements. Assuming the table is called ElementList and the other table is called Existing:
SELECT * FROM ElementList WHERE Element NOT IN
(SELECT DISTINCT Element FROM Existing)
Depending on what database engine you're using you may be able to use a temporary table to create and hold the list without saving it permanently in the database. However, storing the list of allowed elements is valuable for constraining the Element column in the Existing table (and for presenting the user with allowed Elements in the user interface).

How to make result set from ('1','2','3')?

I have a question, how can i make a result set making only list of values. For example i have such values : ('1','2','3')
And i want to make a sql that returns such table:
1
2
3
Thanks.
[Edit]
Sorry for wrong question.
Actually list not containing integers, but it contains strings.
I am currently need like ('aa','bb,'cc').
[/Edit]
If you want to write a SQL statement which will take a comma separate list and generate an arbitrary number of actually rows the only real way would be to use a table function, which calls a PL/SQL function which splits the input string and returns the elements as separate rows.
Check out this link for an intro to table-functions.
Alternatively, if you can construct the SQL statement programmatically in your client you can do:
SELECT 'aa' FROM DUAL
UNION
SELECT 'bb' FROM DUAL
UNION
SELECT 'cc' FROM DUAL
The best way I've found is using XML.
SELECT items.extract('/l/text()').getStringVal() item
FROM TABLE(xmlSequence(
EXTRACT(XMLType(''||
REPLACE('aa,bb,cc',',','')||'')
,'/all/l'))) items;
Wish I could take credit but alas : http://pbarut.blogspot.com/2006/10/binding-list-variable.html.
Basically what it does is convert the list to an xmldocument then parse it back out.
The easiest way is to abuse a table that is guaranteed to have enough rows.
-- for Oracle
select rownum from tab where rownum < 4;
If that is not possible, check out Oracle Row Generator Techniques.
I like this one (requires 10g):
select integer_value
from dual
where 1=2
model
dimension by ( 0 as key )
measures ( 0 as integer_value )
rules upsert ( integer_value[ for key from 1 to 10 increment 1 ] = cv(key) )
;
One trick I've used in various database systems (not just SQL databases) is actually to have a table which just contains the first 100 or 1000 integers. Such a table is very easy to create programatically, and your query then becomes:
SELECT value FROM numbers WHERE value < 4 ORDER BY value
You can use the table for lots of similar purposes.