Selecting Columns which are not null in Athena(Metabase) - sql

I have a table of 1000+ columns in Athena(Metabase), and I want to know how can I extract only those columns which are not null for a certain group of ID.

Typically, this would need an UNPIVOTING of your columns to rows and then check where not null and then back to PIVOT.
From the documentation, Athena may do it simpler.
As documented here
SELECT filter(ARRAY [-1, NULL, 10, NULL], q -> q IS NOT NULL)
Which returns:
[-1,10]
Unfortunately, since there is no ability to be dynamic until we get to an array, this looks like:
WITH dataset AS (
SELECT
ID,
ARRAY[field1, field2, field3, .....] AS fields
FROM
Your1000ColumnTable
)
SELECT ID, SELECT filter(fields, q -> q IS NOT NULL)
FROM dataset
If you need to access the column names from the array, use a mapping to field names when creating the array as seen here

Related

Extract nested values as columns Google BigQuery?

I have a table with nested values, like the following:
I'd like to grab the values, with keys as columns without multiple cross joins.
i.e.
SELECT
owner_id,
owner_type,
domain,
metafields.value AS name,
metafields.value AS image,
metafields.value AS location,
metafields.value AS draw
FROM
example_table
Obviously, the above won't work for this, but the following output would be desired:
In the actual table there are hundreds of metafields per owner_id, and hundreds of owner_ids, and owner_types. Multiple joins to other tables for owner_types is fine, but for the same owner type, I don't want to have to join multiple times.
Basically, I need to be able to select the key to which the column corresponds, and display the relevant value for that column. Without, having to display every metafield available.
Any way of doing this?
Consider below approach
select * except(id) from (
select t.* except(metafields),
to_json_string(t) id, key, value
from your_table t, unnest(metafields) kv
)
pivot (min(value) for key in ('name', 'image', 'location', 'draw'))
if applied to sample data in your question - output is
You can use the subqueries and SAFE_offset statement and get a value from an array at a specific location.
Also, you need to use STRING_AGG, which returns a value (either STRING or BYTES) obtained by concatenating non-null values.
With the information you shared, you can use the query below.
With this code, you will get all the columns separated by a comma:
WITH sequences AS
(
SELECT 1 as ID,"product" AS owner_type,"beta.com" AS domain,["name","image","lcation","draw"] AS metalfields_key, ["big","pic.png","utha","1"] AS metalfields_value
),
Val as(
SELECT distinct id, owner_type,domain, value FROM sequences, sequences.metalfields_value as value, sequences.metalfields_key
), text as(
SELECT
id, owner_type, domain,
STRING_AGG(value ORDER BY value) AS Text
FROM Val
GROUP BY owner_type, domain, id
)
In this code, you will get each element that is separated by a comma and return them by columns.
SELECT DISTINCT t1.id, t1.owner_type,domain,
split(t1.text, ',')[SAFE_offset(1)] as name,
split(t1.text, ',')[SAFE_offset(2)] as image,
split(t1.text, ',')[SAFE_offset(3)] as location,
split(t1.text, ',')[SAFE_offset(0)] as draw
from text as t1
You can see the result.

SELECT on JSON operations of Postgres array column?

I have a column of type jsonb[] (a Postgres array of jsonb objects) and I'd like to perform a SELECT on rows where a criteria is met on at least one of the objects. Something like:
-- Schema would be something like
mytable (
id UUID PRIMARY KEY,
col2 jsonb[] NOT NULL
);
-- Query I'd like to run
SELECT
id,
x->>'field1' AS field1
FROM
mytable
WHERE
x->>'field2' = 'user' -- for any x in the array stored in col2
I've looked around at ANY and UNNEST but it's not totally clear how to achieve this, since you can't run unnest in a WHERE clause. I also don't know how I'd specify that I want the field1 from the matching object.
Do I need a WITH table with the values expanded to join against? And how would I achieve that and keep the id from the other column?
Thanks!
You need to unnest the array and then you can access each json value
SELECT t.id,
c.x ->> 'field1' AS field1
FROM mytable t
cross join unnest(col2) as c(x)
WHERE c.x ->> 'field2' = 'user'
This will return one row for each json value in the array.

I need to SELECT a group of columns not null

Basically, i have a table that have a series of columns named:
ATTRIBUTE10, ATTRIBUTE11, ATTRIBUTE12 ... ATTRIBUTE50
I want a query that gives me all the columns from ATTRIBUTE10 to ATTRIBUTE50 not null
As others have commented we aren't exactly sure of your requirements, but if you want a list the UNPIVOT can do that...
SELECT attribute , value
FROM
(SELECT * from YourFile) p
UNPIVOT
(value FOR attribute IN
(attribute1, attribute2, attribute3, etc.)
)AS unpvt
May be you can use where condition for all columns Or use between operator as below.
For All Columns
where ATTRIBUTE10 is not null and ATTRIBUTE11 is not null ...... and ATTRIBUTE50 is not null
By using between operator
where ATTRIBUTE10 between ATTRIBUTE11 and ATTRIBUTE50
One way to approach the problem is to unfold your table-with-a-zillion-like-named-attributes into one in which you've got one attribute per row, with appropriate foreign keys back to the original table. So something like:
CREATE TABLE ATTR_TABLE AS
SELECT ID_ATTR, ID_TABLE_WITH_ATTRS, ATTR
FROM (SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+1 AS ID_ATTR, ID_TABLE_WITH_ATTRS, ATTRIBUTE10 AS ATTR FROM TABLE_WITH_ATTRS UNION ALL
SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+2, ID_TABLE_WITH_ATTRS, ATTRIBUTE11 FROM TABLE_WITH_ATTRS UNION ALL
SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+3, ID_TABLE_WITH_ATTRS, ATTRIBUTE12 FROM TABLE_WITH_ATTRS);
This only unfolds ATTRIBUTE10, ATTRIBUTE11, and ATTRIBUTE12, but you should be able to get the idea - the rest of the attributes just requires a little cut-n-paste on your part.
You can then query this table to find your non-NULL attributes as
SELECT *
FROM ATTR_TABLE
WHERE ATTR IS NOT NULL
ORDER BY ID_ATTR
Hopefully the difficulty you're encountering in dealing with this table-with-a-zillion-repeated-fields teaches you a hard lesson about exactly why tables with repeated fields or groups of fields are a Bad Idea.
dbfiddle here

Hive - getting the column names count of a table

How can I get the hive column count names using HQL? I know we can use the describe.tablename to get the names of columns. How do we get the count?
create table mytable(i int,str string,dt date, ai array<int>,strct struct<k:int,j:int>);
select count(*)
from (select transform ('')
using 'hive -e "desc mytable"'
as col_name,data_type,comment
) t
;
5
Some additional playing around:
create table mytable (id int,first_name string,last_name string);
insert into mytable values (1,'Dudu',null);
select size(array(*)) from mytable limit 1;
This is not bulletproof since not all combinations of columns types can be combined into an array.
It also requires that the table will contain at least 1 row.
Here is a more complex but also stronger solution (types versa), but also requires that the table will contain at least 1 row
select size(str_to_map(val)) from (select transform (struct(*)) using 'sed -r "s/.(.*)./\1/' as val from mytable) t;

Iterate through a list to get strings in SQL

I have a SQL table as shown below. I want to generate strings using the 2 fields in my table.
A B
M1 tiger
M1 cat
M1 dog
M3 lion
I want to read in this table, count the number of rows, and store it in string variables like String1 = M1_tiger, String2 = M1_cat, etc. What's the best way to do this?
You could do a concat type query.
SELECT (Table.A + '_' + Table.B) AS A_B, COUNT(*) AS RowsCount FROM Table
I'm asuming the your table name is "Table", the result where you will find the strings you want would be the column named A_B, each record will have two things in each record, one would be the string you asked for, the other column would always be the same thing, the total number of records on you table.
The count part is kinda easy but check this link so you can use the specific count you need: http://www.w3schools.com/sql/sql_func_count.asp
You can try this:
SELECT CONCAT(A, '_', B) FROM yourtable
When you say "read in this table", do you mean read it into a programming language like C#? Or do you want to dynamically create sql variables?
You may want to use a table variable to store your strings rather than individual variables. Regarding getting the row number, you could use something like:
WITH CTE AS
(
SELECT A, B,
ROW_NUMBER() OVER (order by OrderDate) AS 'RowNumber'
FROM MyTable
)
SELECT A,B,RowNumber FROM CTE
See this answer for more on how you may choose to use the table variable.
SQL: Dynamic Variable Names
If your are using Oracle, you can also do it like:
select A ||'_'||B
from yourTable
Solution for PostgreSQL
CREATE SEQUENCE one;
SELECT array_to_string(array_agg(concat('String',nextval('one'),' = ',A,'_',B)), ', ')
AS result
FROM test_table;
DROP SEQUENCE one;
Explanation:
Create a temporary sequence 'one' in order to use nextval function.
nextval('sequence') - advance sequence and return new value.
concat('input1', ...) - concatenate all arguments.
array_agg('input1', ...); - input values, including nulls,
concatenated into an array.
array_to_string('array', 'delimiter') - concatenates array elements
using supplied delimiter and optional null string.
Drop the sequence 'one'.
The output of the query (for two test rows in test_table):
result
-------------------------------------------
String1 = M1_tiger, String2 = M1_cat
(1 row)