Aggregate by aggregate (ARRAY_AGG)? - sql

Let's say I have a simple table agg_test with 3 columns - id, column_1 and column_2. Dataset, for example:
id|column_1|column_2
--------------------
1| 1| 1
2| 1| 2
3| 1| 3
4| 1| 4
5| 2| 1
6| 3| 2
7| 4| 3
8| 4| 4
9| 5| 3
10| 5| 4
A query like this (with self join):
SELECT
a1.column_1,
a2.column_1,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
Will produce a result like this:
column_1|column_1|array_agg
---------------------------
1| 2| {1}
1| 3| {2}
1| 4| {3,4}
1| 5| {3,4}
We can see that for values 4 and 5 from the joined table we have the same result in the last column. So, is it possible to somehow group the results by it, e.g:
column_1|column_1|array_agg
---------------------------
1| {2}| {1}
1| {3}| {2}
1| {4,5}| {3,4}
Thanks for any answers. If anything isn't clear or can be presented in a better way - tell me in the comments and I'll try to make this question as readable as I can.

I'm not sure if you can aggregate by an array. If you can here is one approach:
select col1, array_agg(col2), ar
from (SELECT a1.column_1 as col1, a2.column_1 as col2,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2) as ar
FROM agg_test a1 JOIN
agg_test a2
ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
) t
group by col1, ar
The alternative is to use array_dims to convert the array values into a string.

You could also try something like this:
SELECT DISTINCT
a1.column_1,
ARRAY_AGG(a2.column_1) OVER (
PARTITION BY
a1.column_1,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
) AS "a2.column_1 agg",
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
;
(Highlighted are the parts that are different from the query you've posted in your question.)
The above uses a window ARRAY_AGG to combine the values of a2.column_1 alongside the other other ARRAY_AGG, using the latter's result as one of the partitioning criteria. Without the DISTINCT, it would produce two {4,5} rows for your example. So, DISTINCT is needed to eliminate the duplicates.
Here's a SQL Fiddle demo: http://sqlfiddle.com/#!1/df5c3/4
Note, though, that the window ARRAY_AGG cannot have an ORDER BY like it's "normal" counterpart. That means the order of a2.column_1 values in the list would be indeterminate, although in the linked demo it does happen to match the one in your expected output.

Related

Updating different unique value to each group

I have a table where everything that has the same classification_id and application_id have the same group_id.
id |classification_id |application_id |authorisation_id |group_id |
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
I've managed to sort out subgroups of this group by authorisation_id and I've created a group_helper which basically shows my end goal - from this data set I want to get three different groups:
id |classification_id |application_id |authorisation_id |group_id |group_helper|
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
Now, I want each of those groups to have a different group_id. I don't have to update the one which has group_id = NULL since it is already unique. Now I want to give every row that has group_helper = 2 same (but different from those where group_id = NULL) UUID, every row that has group_helper = 3 same UUID (but different from those which have group_id = NULL or 2) and so on. This has to work on n amount of group_helper values because there can be much more than maximum 2.
So my end goal would look like this:
id |classification_id |application_id |authorisation_id |group_id |group_helper|
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
You can create a CTE which generates new group_id, selecting a single value for each group_helper column, then use update ... from .... (see demo)
with grouper(helper, gid) as
(select distinct on (group_helper)
group_helper
, gen_random_uuid()
from sometable
where group_helper is not null
order by group_helper
) --select * from grouper
update sometable
set group_id = gid
from grouper
where helper = group_helper;

Inserting records in one table getting records from 2 different tables

Hey Geeks I am new in DB2 I want to generate surrogates by getting maximum from one table and for that I am doing that
SELECT *
FROM ( SELECT EMP_NAME ,
EMP_ID ,
( ROW_NUMBER() OVER ( ) ) g
FROM STG.EMPLOYEE AS A
LEFT JOIN PRD.INDIVIDUAL AS B ON A.EMP_ID = B.SRC_KEY
WHERE B.SRC_KEY IS NULL
) V
CROSS JOIN ( SELECT ( COALESCE(MAX(INDVL_ID), 0) + 1 ) mm
FROM PRD.INDIVIDUAL
) B;
the above statement is used in insert statement.
In the above code I want to maximum which I get from last line.
(select EMP_NAME,EMP_ID,max(INDVL_ID)+(ROW_NUMBER() over())g) from
STG.EMPLOYEE)
May You guys got it and thanks in advance
Sample Data is here
First table data
STG.EMPLOYEE
EMP_ID|EMP_NAME|
3| def|
4| ghi|
Second table data from where i have to get maximum
PRD.INDIVIDUAL
INDVL_ID|INDVL_NAME|SRC_KEY|
1| abc| 1|
Output Table
INDVL_ID|INDVL_NAME|SRC_KEY|
2| def| 3|
3| ghi| 4|
Below is an example of allocating surrogate key values based on the current MAX value
INSERT INTO PRD.INDIVIDUAL
SELECT ROW_NUMBER() OVER()
+ (SELECT COALESCE(MAX(INDVL_ID),0) FROM PRD.INDIVIDUAL) AS INDVL_ID
, INDVL_NAME
, SRC_KEY
FROM
STG.EMPLOYEE

Access Select - Combine table1 with summed values of table2

I've two tables in Access 2010:
Tab1:
Key|ValTab1
1| 100
2| 200
3| 300
Tab2:
Key|ValTab2
1| 1000
1| 7000
3| 3000
4| 4000
Desired Result:
Key| Val
1| 8100
2| 200
3| 3300
Is it possible to do this without selecting everything into one table and then group everything (in Microsoft Access)? Something like
SELECT Tab1.Key,Sum(Tab1.ValTab1+IIF(Tab2.ValTab2 Is Null,0,Tab2.ValTab2)) AS Val
FROM Tab1 LEFT JOIN Tab2
ON Tab1.Key = Tab2.Key
GROUP BY Tab1.Key;
But this results in Key 1/Val 8200
Problem #2:
Extend Tab1 to
Cat|Key|ValTab1
1| 1| 100
1| 2| 200
1| 3| 300
2| 4| 20
3| 5| 1
Is it possible to make a connection from Cat using Tab1.Key=Tab2.Key to get Sum(ValTab1+ValTab2)?
Applying FuzzyTree's Solution max(tab1.val) + sum(tab2.val) for Problem #1:
This would mean something like
SELECT Tab1.Cat, Max(Tab1.ValTab1) + Sum(IIF(Tab2.ValTab2 Is Null,0,Tab2.ValTab2)) AS Val
FROM Tab1 LEFT JOIN Tab2
ON Tab1.Key = Tab2.Key
GROUP BY Tab1.Cat;
With the desired result:
Cat| Val
1|11600
2| 4020
3| 1
Thanks in advance!
I think a correlated subquery might be easiest in this case:
select tabl1.*,
(tab1.valtab1 +
(select sum(valtab2)
from tab2
where tab2.key = tab1.key
)
) as val
from tab1;
You may need to use nz():
select tabl1.*,
(tab1.valtab1 +
nz((select sum(valtab2)
from tab2
where tab2.key = tab1.key
), 0)
) as val
from tab1;
If the keys in tab1 are unique, then you should add their values only once, which you can do with sum(tab1.val) + sum(tab2.val)
i.e.
SELECT Tab1.Key, Sum(Tab1.ValTab1) + Sum(Tab2.ValTab2) AS Val
FROM Tab1 LEFT JOIN Tab2
ON Tab1.Key = Tab2.Key
GROUP BY Tab1.Key;

For each operation

I have a audit table(AUDIT) as follows:
empid|division id|dept id|lastupdated
1| A| 20|xxxxx
3| C| 10|xxxxxx
6| D| 10|xxxxxx
1| D| 10|xxxxxx
1| B| 10|xxxxxx
3| E| 10|xxxxxx
For each row in this table ,I want to compare data with the immediately previous record (based on lastupdated date).
The result should filter records wherein deptid are not equal on comparison between the 2 records.
Pseudocode:
For each record in AUDIT t1
1.Select dept id,max(lastupdated) from AUDIT t2 where t1.lastupdated > t2.lastupdated
2.Select t1.empid if t1.deptid<> t2.deptid(from step 1)
Is this possible as a single sql query - rather than temp table operations?
select * from AUDIT a inner join AUDIT b
on a.empid = b.empid and a.[division id] = b.[division id]
and b.lastupdated =(select max(lastupdated) from AUDIT c where lastupdated < a.lastupdated) where a.deptid <> b.deptid

SQL; Only count the values specified in each column

In SQL I have a column called "answer", and the value can either be 1 or 2. I need to generate an SQL query which counts the number of 1's and 2's for each month. I have the following query, but it does not work:
SELECT MONTH(`date`), YEAR(`date`),COUNT(`answer`=1) as yes,
COUNT(`answer`=2) as nope,` COUNT(*) as total
FROM results
GROUP BY YEAR(`date`), MONTH(`date`)
I would group by the year, month, and in addition the answer itself. This will result in two lines per month: one counting the appearances for answer 1, and another for answer 2 (it's also generic for additional answer values)
SELECT MONTH(`date`), YEAR(`date`), answer, COUNT(*)
FROM results
GROUP BY YEAR(`date`), MONTH(`date`), answer
Try the SUM-CASE trick:
SELECT
MONTH(`date`),
YEAR(`date`),
SUM(case when `answer` = 1 then 1 else 0 end) as yes,
SUM(case when `answer` = 2 then 1 else 0 end) as nope,
COUNT(*) as total
FROM results
GROUP BY YEAR(`date`), MONTH(`date`)
SELECT year,
month,
answer
COUNT(answer) AS quantity
FROM results
GROUP BY year, month, quantity
year|month|answer|quantity
2001| 1| 1| 2
2001| 1| 2| 1
2004| 1| 1| 2
2004| 1| 2| 2
SELECT * FROM results;
year|month|answer
2001| 1| 1
2001| 1| 1
2001| 1| 2
2004| 1| 1
2004| 1| 1
2004| 1| 2
2004| 1| 2