Is there a melt command in Snowflake? - sql

Is there a Snowflake command that will transform a table like this:
a,b,c
1,10,0.1
2,11,0.12
3,12,0.13
to a table like this:
key,value
a,1
a,2
a,3
b,10
b,11
b,13
c,0.1
c,0.12
c,0.13
?
This operation is often called melt in other tabular systems, but the basic idea is to convert the table into a list of key value pairs.
There is an UNPIVOT in SnowSQL, but as I understand it UNPIVOT requires to manually specify every single column. This doesn't seem practical for a large number of columns.

Snowflake's SQL is powerful enough to perform such operation without help of third-party tools or other extensions.
Data prep:
CREATE OR REPLACE TABLE t(a INT, b INT, c DECIMAL(10,2))
AS
SELECT 1,10,0.1
UNION SELECT 2,11,0.12
UNION SELECT 3,12,0.13;
Query(aka "dynamic" UNPIVOT):
SELECT f.KEY, f.VALUE
FROM (SELECT OBJECT_CONSTRUCT_KEEP_NULL(*) AS j FROM t) AS s
,TABLE(FLATTEN(input => s.j)) f
ORDER BY f.KEY;
Output:
How does it work?
Transform row into JSON(row 1 becomes { "A": 1,"B": 10,"C": 0.1 })
Parse the JSON into key-value pairs using FLATTEN

Related

How to select a column whose name is a value in another column in POSTGRESQL?

I know this isn't valid SQL, but I'd like to do something like:
SELECT items.{SELECT items.preferred_column}
To elaborate, to achieve what I'm trying to achieve, I could write a long case when statement:
SELECT
CASE WHEN items.preferred_column = "column_a" THEN items.column_a
CASE WHEN items.preferred_column = "column_b" THEN items.column_b
CASE WHEN items.preferred_column = "column_c" THEN items.column_c
... and so on...
But that seems wrong. I would prefer to write a query that looks at the value of items.preferred_column and loads that column.
Is this possible?
My use case involves an Active Record (the ORM for Rails) query, which limits me. I'm not able to use "INTO" for example.
Doing this without creating a SQL function would preferred, though if it's not possible without creating a SQL function that would be good to know.
Thanks in advance for lending your expertise!
You can try transforming the table rows with row_to_json() and then using json_each(), you can join the resultant "key" field on the preferred_column:
WITH CTE AS (
SELECT
row_to_json(Z.*)::jsonb as rcr,
row_number() over(partition by null order by <whatever comparator clause>) as rn,
Z.*
FROM items Z)
SELECT b.value, a.*
FROM CTE a, jsonb_each(rcr) b, CTE c
WHERE c.rn=a.rn AND b.key = ( c.preferred_column )
Note that this essentially operates as a quasi-pivot, so you'll need to maintain an index (the row_number invocation) to self-join the table when extracting the appropriate key-value pairs from jsonb_each's set-return. Casting to jsonb will be helpful in that the binary form will alphabetize the key-value pairs by key order within the object itself.
If you need to get the resultant value as a text string instead of a json primitive, you can do
b.value #>>'{}'
instead of using jsonb_each_text(), which will preserve any json columns.

Selecting Columns which are not null in Athena(Metabase)

I have a table of 1000+ columns in Athena(Metabase), and I want to know how can I extract only those columns which are not null for a certain group of ID.
Typically, this would need an UNPIVOTING of your columns to rows and then check where not null and then back to PIVOT.
From the documentation, Athena may do it simpler.
As documented here
SELECT filter(ARRAY [-1, NULL, 10, NULL], q -> q IS NOT NULL)
Which returns:
[-1,10]
Unfortunately, since there is no ability to be dynamic until we get to an array, this looks like:
WITH dataset AS (
SELECT
ID,
ARRAY[field1, field2, field3, .....] AS fields
FROM
Your1000ColumnTable
)
SELECT ID, SELECT filter(fields, q -> q IS NOT NULL)
FROM dataset
If you need to access the column names from the array, use a mapping to field names when creating the array as seen here

Hive - getting the column names count of a table

How can I get the hive column count names using HQL? I know we can use the describe.tablename to get the names of columns. How do we get the count?
create table mytable(i int,str string,dt date, ai array<int>,strct struct<k:int,j:int>);
select count(*)
from (select transform ('')
using 'hive -e "desc mytable"'
as col_name,data_type,comment
) t
;
5
Some additional playing around:
create table mytable (id int,first_name string,last_name string);
insert into mytable values (1,'Dudu',null);
select size(array(*)) from mytable limit 1;
This is not bulletproof since not all combinations of columns types can be combined into an array.
It also requires that the table will contain at least 1 row.
Here is a more complex but also stronger solution (types versa), but also requires that the table will contain at least 1 row
select size(str_to_map(val)) from (select transform (struct(*)) using 'sed -r "s/.(.*)./\1/' as val from mytable) t;

How to do calculations on json data in Postgres

I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:
[
{
match_type: "exact",
search_query: "gm hubcaps",
conversions: 2,
cost: 1.24
},
{
match_type: "broad",
search_query: "gm auto parts",
conversions: 34,
cost: 21.33
},
{
match_type: "phrase",
search_query: "silverdo headlights",
conversions: 63,
cost: 244.05
}
]
What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?
I'm using Postgresql 9.4
EDIT
The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.
What I want to do is query off these data hashes and sum up the conversions
The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.
CREATE TEMP TABLE report_data (
-- match_type text -- commented out, because we only need ..
-- , search_query text -- .. conversions for this query
conversions int
-- , cost numeric
);
A temp table is one way to register a row type ad-hoc. More explanation in this related answer:
jsonb query with nested objects in an array
Assuming a table report with report_id as PK for lack of inforamtion.
SELECT r.report_id, sum(d.conversions) AS sum_conversions
FROM report r
LEFT JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
-- WHERE r.report_id = 12345 -- only for given report?
GROUP BY 1;
The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.
For a sum from a single row in the underlying table, this is faster:
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum(conversions) AS sum_conversions
FROM jsonb_populate_recordset(null::report_data, r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Alternative with jsonb_array_elements() (no need for a registered row type):
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum((value->>'conversions')::int) AS sum_conversions
FROM jsonb_array_elements(r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).
You could use unnest:
select sum(conv) from
(select d->'conversion' as conv from
(select unnest(data) as d from <your table>) all_data
) all_conv
Disclaimer: I don't have Pg 9.2 so I couldn't test it myself.
EDIT: this is assuming that the array you mentioned is a Postgresql array, i.e. that the data type of your data column is character varying[]. If you mean the data is a json array, you should be able to use json_array_elements instead of unnest.

Iterate through a list to get strings in SQL

I have a SQL table as shown below. I want to generate strings using the 2 fields in my table.
A B
M1 tiger
M1 cat
M1 dog
M3 lion
I want to read in this table, count the number of rows, and store it in string variables like String1 = M1_tiger, String2 = M1_cat, etc. What's the best way to do this?
You could do a concat type query.
SELECT (Table.A + '_' + Table.B) AS A_B, COUNT(*) AS RowsCount FROM Table
I'm asuming the your table name is "Table", the result where you will find the strings you want would be the column named A_B, each record will have two things in each record, one would be the string you asked for, the other column would always be the same thing, the total number of records on you table.
The count part is kinda easy but check this link so you can use the specific count you need: http://www.w3schools.com/sql/sql_func_count.asp
You can try this:
SELECT CONCAT(A, '_', B) FROM yourtable
When you say "read in this table", do you mean read it into a programming language like C#? Or do you want to dynamically create sql variables?
You may want to use a table variable to store your strings rather than individual variables. Regarding getting the row number, you could use something like:
WITH CTE AS
(
SELECT A, B,
ROW_NUMBER() OVER (order by OrderDate) AS 'RowNumber'
FROM MyTable
)
SELECT A,B,RowNumber FROM CTE
See this answer for more on how you may choose to use the table variable.
SQL: Dynamic Variable Names
If your are using Oracle, you can also do it like:
select A ||'_'||B
from yourTable
Solution for PostgreSQL
CREATE SEQUENCE one;
SELECT array_to_string(array_agg(concat('String',nextval('one'),' = ',A,'_',B)), ', ')
AS result
FROM test_table;
DROP SEQUENCE one;
Explanation:
Create a temporary sequence 'one' in order to use nextval function.
nextval('sequence') - advance sequence and return new value.
concat('input1', ...) - concatenate all arguments.
array_agg('input1', ...); - input values, including nulls,
concatenated into an array.
array_to_string('array', 'delimiter') - concatenates array elements
using supplied delimiter and optional null string.
Drop the sequence 'one'.
The output of the query (for two test rows in test_table):
result
-------------------------------------------
String1 = M1_tiger, String2 = M1_cat
(1 row)