BigQuery ARRAY_AGG(STRUCT) splitting values basing on column value - google-bigquery

I have a BigQuery table like this:
+------+------------+----------+-------+--------+
| Name | Date | Category | Value | Number |
+------+------------+----------+-------+--------+
| John | 2019-01-03 | Cat1 | AA | 10 |
| John | 2019-01-03 | Cat1 | AB | 11 |
| John | 2019-01-03 | Cat2 | NN | 12 |
| John | 2019-01-03 | Cat2 | MM | 13 |
+------+------------+----------+-------+--------+
The first 2 columns are the key identifier and I need to ARRAY/GROUP the rows basing on those 2 columns.
Here is the sample statement:
WITH data AS (
SELECT "John" name, DATE("2019-01-03") date, "cat1" category, "AA" value, 10 number
UNION ALL
SELECT "John", DATE("2019-01-03"), "cat1", "AB", 11
UNION ALL
SELECT "John", DATE("2019-01-03"), "cat2", "NN", 12
UNION ALL
SELECT "John", DATE("2019-01-03"), "cat2", "MM", 13
)
SELECT * FROM data
The basic version of the query is very simple:
SELECT
name,
date,
ARRAY_AGG(
STRUCT<category STRING, value STRING, number INT64>(category,value,number)
) AS items
FROM data
GROUP BY 1,2
but in my case I need to distinct the values (on 2 different columns) the value-number grouped values based on category column
I don't know if a dynamic column definition can be made, basing on the DISTINCT values of the category values, but in a simplier case I can use fixed values cat1 and cat2
Here an example of the output I described:
+------+------------+--------------------+---------------------+--------------------+---------------------+
| Name | Date | cat1_grouped.value | cat1_grouped.number | cat2_grouped.value | cat2_grouped.number |
+------+------------+--------------------+---------------------+--------------------+---------------------+
| John | 2019-01-03 | AA | 10 | NN | 12 |
| | | AB | 11 | MM | 13 |
| | | | | | |
+------+------------+--------------------+---------------------+--------------------+---------------------+

Below is working example - for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'John' name, DATE '2019-01-03' dt, 'Cat1' category, 'AA' value, 10 number UNION ALL
SELECT 'John', '2019-01-03', 'Cat1', 'AB', 11 UNION ALL
SELECT 'John', '2019-01-03', 'Cat2', 'NN', 12 UNION ALL
SELECT 'John', '2019-01-03', 'Cat2', 'MM', 13
)
SELECT name, dt,
ARRAY_CONCAT_AGG(IF(category = 'Cat1', arr, [])) cat1_grouped,
ARRAY_CONCAT_AGG(IF(category = 'Cat2', arr, [])) cat2_grouped
FROM (
SELECT name, dt, category,
ARRAY_AGG(STRUCT<value STRING, number INT64>(value, number)) arr
FROM `project.dataset.table`
GROUP BY name, dt, category
)
GROUP BY name, dt
with result
Row name dt cat1_grouped.value cat1_grouped.number cat2_grouped.value cat2_grouped.number
1 John 2019-01-03 AA 10 NN 12
AB 11 MM 13

Related

Oracle SQL - Creating a view that groups data by timestamp range

I have a table that contains song's names, their genres and the date when they were added.
I'd like to create a view which groups these table's records based on their timestamps. Each group should include records that were inserted at most one second later than the first record of the group.
Here's an example of my table:
+------+----------+-------+-------------------------+
| ID | name | genre | date_added |
+------+----------+-------+-------------------------+
| 1 | aaaa | aaaa | 21/05/21 14:21:54,010 |
| 2 | bbb | bbbb | 21/05/21 14:21:54,020 |
| 3 | qqq | cccc | 21/05/21 14:21:54,500 |
| 4 | ccc | dddd | 21/05/21 14:22:00,000 |
| 5 | www | eeee | 21/05/21 14:22:01,000 |
| 6 | s | ffff | 21/05/21 14:23:00,000 |
+------+----------+-------+-------------------------+
Here's an example of the expected view:
+------+-------------+----------+------------------------+
| ID | first_genre | ids | first_date |
+------+-------------+----------+------------------------+
| 1 | aaaa | [1,2,3] | 21/05/21 14:21:54,010 |
| 2 | dddd | [4,5] | 21/05/21 14:21:54,020 |
| 3 | qqq | [6] | 21/05/21 14:21:54,500 |
+------+-------------+----------+------------------------+
What I'm trying to do is select a record, get its date_added column, then proceed to check the consecutive records to see if they are within a 1 second range from the record that was just selected. If they are, it gets added to this group. Otherwise, I start a new group.
I haven't found a similar solution that creates a list like the one I wanted to, so that's why I'm asking this question. I'm a beginner at both SQL and Oracle SQL, so I don't know how to properly work with timestamp ranges, especially if grouping is involved.
Any help would be really appreciated.
You can't truncate timestamp up to the second, but you can cast it to date. The precision of date is seconds so any fractional seconds will be truncated.
CREATE TABLE test_data(id, sname, genre, date_added) AS
(
SELECT 1, 'aaaa', 'aaaa', TO_TIMESTAMP('21/05/21 14:21:54,010','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL UNION ALL
SELECT 2, 'bbb', 'bbbb', TO_TIMESTAMP('21/05/21 14:21:54,010','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL UNION ALL
SELECT 3, 'qqq', 'cccc', TO_TIMESTAMP('21/05/21 14:21:54,500','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL UNION ALL
SELECT 4, 'ccc', 'dddd', TO_TIMESTAMP('21/05/21 14:22:00,000','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL UNION ALL
SELECT 5, 'www', 'eeee', TO_TIMESTAMP('21/05/21 14:22:01,000','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL UNION ALL
SELECT 6, 's', 'ffff', TO_TIMESTAMP('21/05/21 14:22:01,000','DD/MM/YY HH24:MI:SS,FF3') FROM DUAL
);
CREATE OR REPLACE VIEW genre_list_by_date (
date_added,
id_list,
first_genre
) AS
SELECT
cast(date_added as date),
LISTAGG(id, ', ') WITHIN GROUP(ORDER BY id) AS id_list,
MIN(genre) AS first_genre
FROM
test_data
GROUP BY
cast(date_added as date);
select
TO_CHAR(date_added,'DD/MM/YY HH24:MI:SS'),
id_list,
first_genre
from genre_list_by_date;
date_added id_list first_genre
21/05/21 14:21:54 1, 2, 3 aaaa
21/05/21 14:22:00 4 dddd
21/05/21 14:22:01 5, 6 eeee

Filtering a table via another table's values

I have 2 tables:
Value
+----+-------+
| id | name |
+----+-------+
| 1 | Peter |
| 2 | Jane |
| 3 | Joe |
+----+-------+
Filter
+----+---------+------+
| id | valueid | type |
+----+---------+------+
| 1 | 1 | A |
| 2 | 1 | B |
| 3 | 1 | C |
| 4 | 1 | D |
| 5 | 2 | A |
| 6 | 2 | C |
| 7 | 2 | E |
| 8 | 3 | A |
| 9 | 3 | D |
+----+---------+------+
I need to retrieve the values from the Value table where the related Filter table does not contain the type 'B' or 'C'
So in this quick example this would be only Joe.
Please note this is a DB2 DB and i have limited permissions to run selects only.
Or also a NOT IN (<*fullselect*) predicate:
Only that my result is 'Joe', not 'Jane' - and the data constellation would point to that ...
WITH
-- your input, sans reserved words
val(id,nam) AS (
SELECT 1,'Peter' FROM sysibm.sysdummy1
UNION ALL SELECT 2,'Jane' FROM sysibm.sysdummy1
UNION ALL SELECT 3,'Joe' FROM sysibm.sysdummy1
)
,
filtr(id,valueid,typ) AS (
SELECT 1,1,'A' FROM sysibm.sysdummy1
UNION ALL SELECT 2,1,'B' FROM sysibm.sysdummy1
UNION ALL SELECT 3,1,'C' FROM sysibm.sysdummy1
UNION ALL SELECT 4,1,'D' FROM sysibm.sysdummy1
UNION ALL SELECT 5,2,'A' FROM sysibm.sysdummy1
UNION ALL SELECT 6,2,'C' FROM sysibm.sysdummy1
UNION ALL SELECT 7,2,'E' FROM sysibm.sysdummy1
UNION ALL SELECT 8,3,'A' FROM sysibm.sysdummy1
UNION ALL SELECT 9,3,'D' FROM sysibm.sysdummy1
)
-- real query starts here
SELECT
*
FROM val
WHERE id NOT IN (
SELECT valueid FROM filtr WHERE typ IN ('B','C')
)
;
-- out id | nam
-- out ----+-------
-- out 3 | Joe
Or also, a failing left join:
SELECT
val.*
FROM val
LEFT JOIN (
SELECT valueid FROM filtr WHERE typ IN ('B','C')
) filtr
ON filtr.valueid = val.id
WHERE valueid IS NULL
You can use EXISTS, as in:
select *
from value v
where not exists (
select null from filter f
where f.valueid = v.id and f.type in ('B', 'C')
);
Result:
ID NAME
--- -----
3 Joe
See running example at db<>fiddle.

Count appearances of value in second column based on unique value in first column

The easiest way to explain this is given this table in Oracle SQL...
+-----------------+------------+
| COUNTRY | VALUE |
+-----------------+------------+
| England | A |
| England | A |
| England | A |
| England | B |
| England | B |
| France | A |
| France | A |
| France | B |
+-----------------+------------+
how would I produce this result, which is the count of A's and B's for the unique values in column COUNTRY
+-----------+------------+------------+
| COUNTRY | COUNT(A) | COUNT(B) |
+-----------+------------+------------+
| England | 3 | 2 |
| France | 2 | 1 |
+-----------+------------+------------+
I'm sure this has already been answered, I just don't know how to ask the question.
Thanks
select country,
sum( case when value = 'A' then 1 else 0 end ) numA,
sum( case when value = 'B' then 1 else 0 end ) numB
from table
group by country
is one example of conditional aggregation
Use PIVOT:
Oracle Setup:
CREATE TABLE table_name ( COUNTRY, VALUE ) AS
SELECT 'England', 'A' FROM DUAL UNION ALL
SELECT 'England', 'A' FROM DUAL UNION ALL
SELECT 'England', 'A' FROM DUAL UNION ALL
SELECT 'England', 'B' FROM DUAL UNION ALL
SELECT 'England', 'B' FROM DUAL UNION ALL
SELECT 'France', 'A' FROM DUAL UNION ALL
SELECT 'France', 'A' FROM DUAL UNION ALL
SELECT 'France', 'B' FROM DUAL;
Query:
SELECT *
FROM table_name
PIVOT ( COUNT(*) FOR value IN ( 'A' AS "COUNT(A)", 'B' AS "COUNT(B)" ) )
Output:
COUNTRY | COUNT(A) | COUNT(B)
:------ | -------: | -------:
England | 3 | 2
France | 2 | 1
db<>fiddle here

Only return top n results for each group in GROUPING SETS query

I have a rather complicated query performing some aggregations using GROUPING SETS, it looks roughly like the following:
SELECT
column1,
[... more columns here]
count(*)
FROM table_a
GROUP BY GROUPING SETS (
column1,
[... more columns here]
)
ORDER BY count DESC
This works very well in general, as long as the number of results for each group is reasonably small. But I have some columns in this query that can have a large number of distinct values, which results in a large amount of rows returned by this query.
I'm actually only interested in the top results for each group in the grouping set. But there doesn't seem to be an obvious way to limit the number of results per group in a query using grouping sets, LIMIT doesn't work in this case.
I'm using PostgreSQL 9.6, so I'm not restricted in which newer features I can use here.
So what my query does is something like this:
| column1 | column2 | count |
|---------|---------|-------|
| DE | | 32455 |
| US | | 3445 |
| FR | | 556 |
| GB | | 456 |
| RU | | 76 |
| | 12 | 10234 |
| | 64 | 9805 |
| | 2 | 6043 |
| | 98 | 2356 |
| | 65 | 1023 |
| | 34 | 501 |
What I actually want is something that only returns the top 3 results:
| column1 | column2 | count |
|---------|---------|-------|
| DE | | 32455 |
| US | | 3445 |
| FR | | 556 |
| | 12 | 10234 |
| | 64 | 9805 |
| | 2 | 6043 |
Use row_number and grouping
select a, b, total
from (
select
a, b, total,
row_number() over(
partition by g
order by total desc
) as rn
from (
select a, b, count(*) as total, grouping ((a),(b)) as g
from t
group by grouping sets ((a),(b))
) s
) s
where rn <= 3
Something like this:
WITH T(column1 , column2, cnt) AS
(
SELECT 'kla', 'k', 10
UNION ALL
SELECT 'kle', 'm', 30
UNION ALL
SELECT 'foo', 'k', 10
UNION ALL
SELECT 'bar', 'm', 30
UNION ALL
SELECT 'bar', 'k', 20
UNION ALL
SELECT 'foo', 'm', 15
UNION ALL
SELECT 'foo', 'p', 10
),
tt AS (select column1, column2, COUNT(*) AS cnt from t GROUP BY GROUPING SETS( (column1), (column2)) )
(SELECT column1, NULL as column2, cnt FROM tt WHERE column1 IS NOT NULL ORDER BY cnt desc LIMIT 3)
UNION ALL
(SELECT NULL as column1, column2, cnt FROM tt WHERE column2 IS NOT NULL ORDER BY cnt desc LIMIT 3)

Create view from multiple tables, combine values from multiple rows into one row

I have 3 tables as below:
Area table:
UserID | Area
---------------
1 | 10001
2 | 10002
3 | 10003
Info table:
UserID | Info
-----------------
1 | U1_Info1
1 | U1_Info2
1 | U1_Info3
2 | U2_Info1
3 | U3_Info1
Company table:
UserID | Company
-----------------
1 | ComA
2 | ComB
3 | ComC
After that, I want group by UserID. My expected result as below:
UserID | Area | Info1 | Info2 | Info3 | Company
----------------------------------------------------------
1 | 10001 | U1_Info1 | U1_Info2 | U1_Info3 | ComA
2 | 10002 | U2_Info1 | | | ComB
3 | 10003 | U3_Info1 | | | ComC
User 3 doesn't have Info2 and Info3 so I set them = ' '.
Can I make a View like that?
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Area ( UserID, Area ) AS
SELECT 1, 10001 FROM DUAL
UNION ALL SELECT 2, 10002 FROM DUAL
UNION ALL SELECT 3, 10003 FROM DUAL;
CREATE TABLE Info ( UserID, Info ) AS
SELECT 1, 'U1_Info1' FROM DUAL
UNION ALL SELECT 1, 'U1_Info2' FROM DUAL
UNION ALL SELECT 1, 'U1_Info3' FROM DUAL
UNION ALL SELECT 2, 'U2_Info1' FROM DUAL
UNION ALL SELECT 3, 'U3_Info1' FROM DUAL;
CREATE TABLE Company (UserID, Company ) AS
SELECT 1, 'ComA' FROM DUAL
UNION ALL SELECT 2, 'ComB' FROM DUAL
UNION ALL SELECT 3, 'ComC' FROM DUAL;
CREATE VIEW TEST AS
SELECT A.UserID,
MAX( A.Area ) AS Area,
MAX( CASE WHEN I.Info LIKE '%_Info1' THEN I.Info END ) AS Info1,
MAX( CASE WHEN I.Info LIKE '%_Info2' THEN I.Info END ) AS Info2,
MAX( CASE WHEN I.Info LIKE '%_Info3' THEN I.Info END ) AS Info3,
MAX( C.Company ) AS Company
FROM Area A
INNER JOIN
Company C
ON ( A.UserID = C.UserID )
LEFT OUTER JOIN
Info I
ON ( A.UserID = I.UserID )
GROUP BY
A.UserID
Query 1:
SELECT * FROM test
Results:
| USERID | AREA | INFO1 | INFO2 | INFO3 | COMPANY |
|--------|-------|----------|----------|----------|---------|
| 1 | 10001 | U1_Info1 | U1_Info2 | U1_Info3 | ComA |
| 2 | 10002 | U2_Info1 | (null) | (null) | ComB |
| 3 | 10003 | U3_Info1 | (null) | (null) | ComC |