Efficient Way to do Very Complicated SQL Grouping: - sql

Say you have a table like this:
ID | Type | Reference #1 | Reference #2
0 | 1 | [A] | {a}
1 | 2 | [B] | {b}
2 | 2 | [B] | {c}
3 | 1 | [C] | {d}
4 | 1 | [D] | {d}
5 | 1 | [E] | {d}
6 | 1 | [C] | {e}
Is there any good way to group by "Reference #1" and "Reference #2" as a "fallback", for lack of a better way of putting it...
For example, I would like to group the following IDs together:
{0} [Unique Reference #1],
{1,2} [Same Reference #1],
{3,4,5,6} [{3,4,5} have same Reference #2 and {3,6} have same Reference #1]
I am at a total loss as to how to do this... Any thoughts?

In mellamokb's query, the groupings are dependent on the order of the input.
ie.
VALUES
(0, 1, '[A]', '{a}'),
(1, 2, '[B]', '{b}'),
(2, 2, '[B]', '{c}'),
(3, 1, '[C]', '{d}'), // group 3
(4, 1, '[D]', '{d}'), // group 3
(5, 1, '[E]', '{d}'), // group 3
(6, 1, '[C]', '{e}'); // group 3
produces a different result tahn
VALUES
(0, 1, '[A]', '{a}'),
(1, 2, '[B]', '{b}'),
(2, 2, '[B]', '{c}'),
(3, 1, '[C]', '{e}'), //group 3
(4, 1, '[D]', '{d}'), // group 4
(5, 1, '[E]', '{d}'), // group 4
(6, 1, '[C]', '{d}'); // group 3
This might be intended, if there is some natural order to the References that you could specify, but its a problem if they are not. The way to 'solve' this or specify another problem is to say that all equal Reference1s create a set of elements whose members are themselves and those elements whose Reference2 is equal to at least one member of that set.
In SQL:
with groupings as (
select
ID,Reference1,Reference2,
(select min(ID) from Table1 t2
where t2.Reference1=t1.Reference1 or t2.Reference2=t1.Reference2 ) as minID
from
Table1 t1
)
select
t1.ID,t1.Reference1,t1.Reference2,t1.minid as round1,
(select min(t2.minid) from
groupings t2
INNER JOIN groupings t3 ON t1.Reference2=t2.Reference2
) as minID
from
groupings t1
This should produce the full grouping each time.

Related

How to create an array from flattened data in BigQuery

There is a lot of information online to go from flattened data to arrays or structs, but I need to do the opposite and I am having a hard time archiving it. I am using Google BigQuery.
I have something like:
| Id | Value1 | Value2 |
| 1 | 1 | 2 |
| 1 | 3 | 4 |
| 2 | 5 | 6 |
| 2 | 7 | 8 |
I would like to get for the example above:
1, [(1, 2), (3, 4)]
2, [(5, 6), (7, 8)]
If I try to put an array in the select with a group by it is not a valid statement
For example:
SELECT Id, [ STRUCT(Value1, Value2) ] as Value
FROM `table.dataset`
GROUP BY Id
Which returns:
1, (1, 2)
1, (3, 4)
2, (5, 6)
2, (7, 8)
Which is not what I am looking for. The structure I got is: Id, Value.Value1, Value.Value2 and I want Id, [ Value(V1, V2), Value(V1, V2), ... ]
You can do that with SELECT Id, ARRAY_AGG(STRUCT(Value1, Value2)) ... GROUP BY Id
Below is for BigQuery Standard SQL
#standardSQL
select id, array_agg((select as struct t.* except(id))) as `value`
from `project.dataset.table` t
group by id
If to apply to sample data in your question - output is

Roll up multiple rows into one when joining in SQL Server

I have a table, Foo
ID | Name
-----------
1 | ONE
2 | TWO
3 | THREE
And another, Bar:
ID | FooID | Value
------------------
1 | 1 | Alpha
2 | 1 | Alpha
3 | 1 | Alpha
4 | 2 | Beta
5 | 2 | Gamma
6 | 2 | Beta
7 | 3 | Delta
8 | 3 | Delta
9 | 3 | Delta
I would like a query that joins these tables, returning one row for each row in Foo, rolling up the 'value' column from Bar. I can get back the first Bar.Value for each FooID:
SELECT * FROM Foo f OUTER APPLY
(
SELECT TOP 1 Value FROM Bar WHERE FooId = f.ID
) AS b
Giving:
ID | Name | Value
---------------------
1 | ONE | Alpha
2 | TWO | Beta
3 | THREE | Delta
But that's not what I want, and I haven't been able to find a variant that will bring back a rolled up value, that is the single Bar.Value if it is the same for each corresponding Foo, or a static string something like '(multiple)' if not:
ID | Name | Value
---------------------
1 | ONE | Alpha
2 | TWO | (multiple)
3 | THREE | Delta
I have found some solutions that would bring back concatenated values (albeit not very elegant) 'Alpha' Alpha, Alpha', 'Beta, Gamma, Beta' &c, but that's not what I want either.
One method, using a a CASE expression and assuming that [Value] cannot have a value of NULL:
WITH Foo AS
(SELECT *
FROM (VALUES (1, 'ONE'),
(2, 'TWO'),
(3, 'THREE')) V (ID, [Name])),
Bar AS
(SELECT *
FROM (VALUES (1, 1, 'Alpha'),
(2, 1, 'Alpha'),
(3, 1, 'Alpha'),
(4, 2, 'Beta'),
(5, 2, 'Gamma'),
(6, 2, 'Beta'),
(7, 3, 'Delta'),
(8, 3, 'Delta'),
(9, 3, 'Delta')) V (ID, FooID, [Value]))
SELECT F.ID,
F.[Name],
CASE COUNT(DISTINCT B.[Value]) WHEN 1 THEN MAX(B.Value) ELSE '(Multiple)' END AS [Value]
FROM Foo F
JOIN Bar B ON F.ID = B.FooID
GROUP BY F.ID,
F.[Name];
You can also try below:
SELECT F.ID, F.Name, (case when B.Value like '%,%' then '(Multiple)' else B.Value end) as Value
FROM Foo F
outer apply
(
select SUBSTRING((
SELECT distinct ', '+ isnull(Value,',') FROM Bar WHERE FooId = F.ID
FOR XML PATH('')
), 2 , 9999) as Value
) as B

Creating column for every group in group by

Suppose I have a table T which has entries as follows:
id | type | value |
-------------------------
1 | A | 7
1 | B | 8
2 | A | 9
2 | B | 10
3 | A | 11
3 | B | 12
1 | C | 13
2 | C | 14
For each type, I want a different column. Since the number of types is exhaustive, I would like all different types to be enumerated and a corresponding column for each. I wanted to make id a primary key for the table.
So, the desired output is something like:
id | A's value | B's value | C's value
------------------------------------------
1 | 7 | 8 | 13
2 | 9 | 10 | 14
3 | 11 | 12 | NULL
Please note that this is a simplified version. The actual table T is derived from a much bigger table using group by. And for each group, I would like a separate column. Is that even possible?
Use conditional aggregation:
select id,
max(case when type = 'A' then value end) as a_value,
max(case when type = 'B' then value end) as b_value,
max(case when type = 'C' then value end) as c_value
from t
group by id;
I'd recommend looking into the PIVOT function:
https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
The main blocker with this function though is the list of values for the pivot_column needs to be
pre-determined. To do this, I normally use the LISTAGG function:
https://docs.snowflake.com/en/sql-reference/functions/listagg.html
I've included a query below to show you how to build that string,
and doing this together in a script like
Python or even a Stored Procedure should be fairly straightforward (build the pivot_column, build the aggregate/pivot command, execute the aggregate/pivot command).
I hope this helps...Rich
CREATE OR REPLACE TABLE monthly_sales(
empid INT,
amount INT,
month TEXT)
AS SELECT * FROM VALUES
(1, 10000, 'JAN'),
(1, 400, 'JAN'),
(2, 4500, 'JAN'),
(2, 35000, 'JAN'),
(1, 5000, 'FEB'),
(1, 3000, 'FEB'),
(2, 200, 'FEB'),
(2, 90500, 'FEB'),
(1, 6000, 'MAR'),
(1, 5000, 'MAR'),
(2, 2500, 'MAR'),
(2, 9500, 'MAR'),
(1, 8000, 'APR'),
(1, 10000, 'APR'),
(2, 800, 'APR'),
(2, 4500, 'APR');
SELECT *
FROM monthly_sales
PIVOT(SUM(amount)
FOR month IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p
ORDER BY empid;
SELECT LISTAGG( DISTINCT ''''||month||'''', ', ' )
FROM monthly_sales;

Creating natural hierarchical order using recursive SQL

I have a table holding categories with an inner parent child relationship.
The table looks like this:
ID | ParentID | OrderID
---+----------+---------
1 | Null | 1
2 | Null | 2
3 | 2 | 1
4 | 1 | 1
OrderID is the order inside the current level.
I want to create a recursive SQL query to create the natural order of the table.
Meaning the output will be something like:
ID | Order
-----+-------
1 | 100
4 | 101
2 | 200
3 | 201
Appreciate any help.
Thanks
I am not really sure what you mean by "natural order", but the following query generates the results you want for this data:
with t as (
select v.*
from (values (1, NULL, 1), (2, NULL, 2), (3, 2, 1), (4, 1, 1)) v(ID, ParentID, OrderID)
)
select t.*,
(100 * coalesce(tp.orderid, t.orderid) + (case when t.parentid is null then 0 else 1 end)) as natural_order
from t left join
t tp
on t.parentid = tp.id
order by natural_order;

SQL aggregates over 3 tables

Well, this is annoying the hell out of me. Any help would be much appreciated.
I'm trying to get a count of how many project Ids and Steps there are. The relationships are:
Projects (n-1) Pages
Pages (n-1) Status Steps
Sample Project Data
id name
1 est et
2 quia nihil
Sample Pages Data
id project_id workflow_step_id
1 1 1
2 1 1
3 1 2
4 1 1
5 2 3
6 2 3
7 2 4
Sample Steps Data
id name
1 a
2 b
3 c
4 d
Expected Output
project_id name count_steps
1 a 3
1 b 1
2 c 2
2 d 1
Thanks!
An approach to meet the expected result. See it also at SQL Fiddle
CREATE TABLE Pages
("id" int, "project_id" int, "workflow_step_id" int)
;
INSERT INTO Pages
("id", "project_id", "workflow_step_id")
VALUES
(1, 1, 1),
(2, 1, 1),
(3, 1, 2),
(4, 1, 1),
(5, 2, 3),
(6, 2, 3),
(7, 2, 4)
;
CREATE TABLE workflow_steps
("id" int, "name" varchar(1))
;
INSERT INTO workflow_steps
("id", "name")
VALUES
(1, 'a'),
(2, 'b'),
(3, 'c'),
(4, 'd')
;
CREATE TABLE Projects
("id" int, "name" varchar(10))
;
INSERT INTO Projects
("id", "name")
VALUES
(1, 'est et'),
(2, 'quia nihil')
;
Query 1:
select pg.project_id, s.name, pg.workflow_step_id, ws.count_steps
from (
select distinct project_id, workflow_step_id
from pages ) pg
inner join (
select workflow_step_id, count(*) count_steps
from pages
group by workflow_step_id
) ws on pg.workflow_step_id = ws.workflow_step_id
inner join workflow_steps s on pg.workflow_step_id = s.id
order by project_id, name, workflow_step_id
Results:
| project_id | name | workflow_step_id | count_steps |
|------------|------|------------------|-------------|
| 1 | a | 1 | 3 |
| 1 | b | 2 | 1 |
| 2 | c | 3 | 2 |
| 2 | d | 4 | 1 |