Does BigQuery have the concept of a ROW, for example, similar to MySQL or Postgres or Oracle or Snowflake? I know it sort of implicitly uses it when doing an INSERT ... VALUES (...) , for example:
INSERT dataset.Inventory (product, quantity)
VALUES('top load washer', 10),
('front load washer', 20)
Each of the values would be implicitly be a ROW type of the Inventory table, but is this construction allowed elsewhere in BigQuery? Or is this a feature that doesn't exist in BQ?
I think below is a simplest / naïve example of such constructor in BigQuery
with t1 as (
select 'top load washer' product, 10 quantity, 'a' type, 'x' category union all
select 'front load washer', 20, 'b', 'y'
), t2 as (
select 1 id, 'a' code, 'x' value union all
select 2, 'd', 'z'
)
select *
from t1
where (type, category) = (select as struct code, value from t2 where id = 1)
Besides using in simple queries, it can also be use in BQ scripts - for example (another simplistic example)
declare type, category string;
create temp table t2 as (
select 1 id, 'a' code, 'x' value union all
select 2, 'd', 'z'
);
set (type, category) = (select as struct code, value from t2 where id = 1);
Related
I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);
Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.
So i have a union query like:
select count(id)
from table 1
where membernumber = 'x'
and castnumber = 'y'
union
select count(id)
from table 1
where membernumber = 'x'
and castnumber = 'y'
union
etc...
There will be over 200 unions coming from a list 2x 200 table with values for x and y in each row. So each union query has to get the value of x and y from the corresponding row (not in any particular order).
How can i achieve that ?
Thanks
Try this:
DECLARE GLOBAL TEMPORARY TABLE
SESSION.PARAMETERS
(
MEMBERNUMBER INT
, CASTNUMBER INT
) DEFINITION ONLY WITH REPLACE
ON COMMIT PRESERVE ROWS NOT LOGGED;
-- Insert all the the constants in your application with
INSERT INTO SESSION.PARAMETERS
(MEMBERNUMBER, CASTNUMBER)
VALUES (?, ?);
-- I don't know the meaning of the result you want to get
-- but it's equivalent
select distinct count(t.id)
from table1 t
join session.parameters p
on p.membernumber = t.membernumber
and p.castnumber = t.castnumber
group by t.membernumber, t.castnumber;
Here's a sample data
record1: field1 = test2
record2: field1 = test3
The actual output I want is
record1: field1 = test2 | field2 = test3
I've looked around the net but can't find what I'm looking for. I can use a custom function to get it in this format but I'm trying to see if there's a way to make it work without resorting to that.
thanks a lot
You need to use pivot:
with t(id, d) as (
select 1, 'field1 = test2' from dual union all
select 2, 'field1 = test3' from dual
)
select *
from t
pivot (max (d) for id in (1, 2))
If you don't have the id field you can generate it, but you will have XML type:
with t(d) as (
select 'field1 = test2' from dual union all
select 'field1 = test3' from dual
), t1(id, d) as (
select ROW_NUMBER() OVER(ORDER BY d), d from t
)
select *
from t1
pivot xml (max (d) for id in (select id from t1))
There are several ways to approach this - google pivot rows to columns. Here is one set of answers: http://www.dba-oracle.com/t_converting_rows_columns.htm
I have a sql statement like below. How can I add a single row(code = 0, desc = 1) to result of this sql statement without using union keyword? thanks.
select code, desc
from material
where material.ExpireDate ='2010/07/23'
You can always create a view for your table which itself uses UNION keyword
CREATE VIEW material_view AS SELECT code, desc, ExpireDate FROM material UNION SELECT '0', '1', NULL;
SELECT code, desc FROM material_view WHERE ExpireDate = '2010/07/23' OR code = '0';
WITH material AS
(
SELECT *
FROM
(VALUES (2, 'x', '2010/07/23'),
(3, 'y', '2009/01/01'),
(4, 'z', '2010/07/23')) vals (code, [desc], ExpireDate)
)
SELECT
COALESCE(m.code,x.code) AS code,
COALESCE(m.[desc],x.[desc]) AS [desc]
FROM material m
FULL OUTER JOIN (SELECT 0 AS code, '1' AS [desc] ) x ON 1=0
WHERE m.code IS NULL OR m.ExpireDate ='2010/07/23'
Gives
code desc
----------- ----
2 x
4 z
0 1
Since you don't want to use either a union or a view, I'd suggest adding a dummy row to the material table (with code = 0, desc = 1, and ExpireDate something that would never normally be selected - eg. 01 January 1900) - then use a query like the following:
select code, desc
from material
where material.ExpireDate ='2010/07/23' or
material.ExpireDate ='1900/01/01'
Normally, a Union would be my preferred option.