In SQL I have a column called "answer", and the value can either be 1 or 2. I need to generate an SQL query which counts the number of 1's and 2's for each month. I have the following query, but it does not work:
SELECT MONTH(`date`), YEAR(`date`),COUNT(`answer`=1) as yes,
COUNT(`answer`=2) as nope,` COUNT(*) as total
FROM results
GROUP BY YEAR(`date`), MONTH(`date`)
I would group by the year, month, and in addition the answer itself. This will result in two lines per month: one counting the appearances for answer 1, and another for answer 2 (it's also generic for additional answer values)
SELECT MONTH(`date`), YEAR(`date`), answer, COUNT(*)
FROM results
GROUP BY YEAR(`date`), MONTH(`date`), answer
Try the SUM-CASE trick:
SELECT
MONTH(`date`),
YEAR(`date`),
SUM(case when `answer` = 1 then 1 else 0 end) as yes,
SUM(case when `answer` = 2 then 1 else 0 end) as nope,
COUNT(*) as total
FROM results
GROUP BY YEAR(`date`), MONTH(`date`)
SELECT year,
month,
answer
COUNT(answer) AS quantity
FROM results
GROUP BY year, month, quantity
year|month|answer|quantity
2001| 1| 1| 2
2001| 1| 2| 1
2004| 1| 1| 2
2004| 1| 2| 2
SELECT * FROM results;
year|month|answer
2001| 1| 1
2001| 1| 1
2001| 1| 2
2004| 1| 1
2004| 1| 1
2004| 1| 2
2004| 1| 2
Related
I have a problem with building an efficient query in order to get a running total of sales between two dates.
Now I have the query :
select SalesId,
sum(Sales) as number_of_sales,
Sales_DATE as SalesDate,
ADD_MONTHS(Sales_DATE , -12) as SalesDatePrevYear
from DWH.L_SALES
group by SalesId, Sales_DATE
With the result:
| SalesId| number_of_sales| SalesDate|SalesDatePrevYear|
|:---- |:------:| :-----:|-----:|
| 1000| 1| 20200101|20190101|
| 1001| 1| 20220101|20210101|
| 1002| 1| 20220201|20210201|
| 1003| 1| 20220301|20210301|
The preferred result is the following:
| SalesId| number_of_sales| running total of sales | SalesDate|SalesDatePrevYear|
|:---- |:------:| :-----:| :-----:|-----:|
| 1000| 1| 1 | 20200101|20190101|
| 1001| 1| 1 | 20220101|20210101|
| 1002| 1| 2| 20220201|20210201|
| 1003| 1| 3|20220301|20210301|
As you can see, I want the total of Sales between the two dates, but because I also need the lower level (SalesId), it always stays at 1.
How can i get this efficiently?
You have successfully gotten the result which gives you the start and end dates that you care about, so you just need to take this result and then join it to the original data with an inequality join, and then sum the results. I suggest looking into the style of using CTE's (Common Table Expressions) which is helpful for learning and debugging.
For example,
WITH CTE_BASE_RESULT AS
(
your query goes here
)
SELECT CTE_BASE_RESULT.SalesId, CTE_BASE_RESULT.SalesDate, SUM(Sales) AS Total_Sales_Prior_Year
FROM CTE_BASE_RESULT
INNER JOIN DWH.L_Sales
ON CTE_BASE_RESULT.SalesId = L_Sales.SalesId
AND CTE_BASE_RESULT.SalesDate >= L_Sales.SalesDATE
AND CTE_BASE_RESULT.SalesDatePrevYear > L_Sales.SalesDATE
GROUP BY CTE_BASE_RESULT.SalesId, CTE_BASE_RESULT.SalesDate
I also recommend a website like SQL Generator that can help write complex operations, for example this is called Timeseries Aggregate.
This syntax works for snowflake, I didnt see what system you're on.
Alternatively,
WITH BASIC_OFFSET_1YEAR AS (
SELECT
A.Sales_Id,
A.SalesDate,
SUM(B.Sales) as SUM_SALES_PAST1YEAR
FROM
L_Sales A
INNER JOIN L_Sales B ON A.Sales_Id = B.Sales_Id
WHERE
B.SalesDate >= DATEADD(YEAR, -1, A.SalesDate)
AND B.SalesDate <= A.SalesDate
GROUP BY
A.Sales_Id,
A.SalesDate
)
SELECT
src.*, BASIC_OFFSET_1YEAR.SUM_SALES_PAST1YEAR
FROM
L_Sales src
LEFT OUTER JOIN BASIC_OFFSET_1YEAR
ON BASIC_OFFSET_1YEAR.SalesDate = src.SalesDate
AND BASIC_OFFSET_1YEAR.Sales_Id = src.Sales_Id
I have a table where everything that has the same classification_id and application_id have the same group_id.
id |classification_id |application_id |authorisation_id |group_id |
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056|
I've managed to sort out subgroups of this group by authorisation_id and I've created a group_helper which basically shows my end goal - from this data set I want to get three different groups:
id |classification_id |application_id |authorisation_id |group_id |group_helper|
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 2|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| 3|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
Now, I want each of those groups to have a different group_id. I don't have to update the one which has group_id = NULL since it is already unique. Now I want to give every row that has group_helper = 2 same (but different from those where group_id = NULL) UUID, every row that has group_helper = 3 same UUID (but different from those which have group_id = NULL or 2) and so on. This has to work on n amount of group_helper values because there can be much more than maximum 2.
So my end goal would look like this:
id |classification_id |application_id |authorisation_id |group_id |group_helper|
------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------+
54f614f3-7582-4ae9-a07e-5ff6d29e7a3b|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
a01571a1-4f04-4ff9-9a7b-3a720736b9ec|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
3e18f2d0-4d5f-41b3-baf5-ba0feac8f43e|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
b2ebe2ee-ffed-4e32-8abe-cd8b7d400646|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|fd3e63d1-d59c-477f-b58b-3ae3726c7992| 2|
ef01e6f7-f6ad-4d4d-b129-9c756734bef5|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
7d340811-b679-49fd-bdd6-32a1bb9bbfed|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
c45d7bb6-2146-48d0-a804-929cc42484cd|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
ddec5929-a08f-4f48-97f8-ccc2b85531ac|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|ed3ff96c-2f93-4182-8e4f-4594cb20cbb6| 3|
ae9edbb2-def3-4c4e-9a27-72454a09e146|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|a4edd12d-c19e-4e0d-badd-d3cf5e6d6d82|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
3a3fd904-1988-4f8c-bf27-8cdf349b8431|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|25a7e1f7-4d8c-4e12-a10f-3654d7ef5ee9|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
27c669b9-763c-49cf-887a-b9b1f85dc1ab|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|302b23f1-ce57-4219-bcae-7bdbc3b86cb4|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
03820732-32c4-4cd4-910b-4e27fdd44bdf|63a7b151-2b8d-4b6a-b9a1-108a80ae4cdf|cd3d597b-25d1-4b4b-92f0-2ad8fcb4698c|5e3bce60-b0d8-436c-9d33-b3a1d4c9a308|8e563f95-ff0c-41e7-b211-d5ac6f78d056| |
You can create a CTE which generates new group_id, selecting a single value for each group_helper column, then use update ... from .... (see demo)
with grouper(helper, gid) as
(select distinct on (group_helper)
group_helper
, gen_random_uuid()
from sometable
where group_helper is not null
order by group_helper
) --select * from grouper
update sometable
set group_id = gid
from grouper
where helper = group_helper;
How can I return only the records with the latest upload_date(s) from the data below?
My data is as follows:
upload_date |day_name |rows_added|row_count_delta|days_since_last_update|
-----------------------+---------+----------+---------------+----------------------+
2022-05-01 00:00:00.000|Sunday | 526043| | |
2022-05-02 00:00:00.000|Monday | 467082| -58961| 1|
2022-05-02 15:58:54.094|Monday | 421427| -45655| 0|
2022-05-02 18:19:22.894|Monday | 421427| 0| 0|
2022-05-03 16:54:04.136|Tuesday | 496021| 74594| 1|
2022-05-03 18:17:27.502|Tuesday | 496021| 0| 0|
2022-05-04 18:19:26.392|Wednesday| 487154| -8867| 1|
2022-05-05 18:18:15.277|Thursday | 489713| 2559| 1|
2022-05-06 16:15:39.518|Friday | 489713| 0| 1|
2022-05-07 16:18:00.916|Saturday | 482955| -6758| 1|
My desired results should be:
upload_date |day_name |rows_added|row_count_delta|days_since_last_update|
-----------------------+---------+----------+---------------+----------------------+
2022-05-01 00:00:00.000|Sunday | 526043| | |
2022-05-02 18:19:22.894|Monday | 421427| 0| 0|
2022-05-03 18:17:27.502|Tuesday | 496021| 0| 0|
2022-05-04 18:19:26.392|Wednesday| 487154| -8867| 1|
2022-05-05 18:18:15.277|Thursday | 489713| 2559| 1|
2022-05-06 16:15:39.518|Friday | 489713| 0| 1|
2022-05-07 16:18:00.916|Saturday | 482955| -6758| 1|
NOTE only the latest upload_date for 2022-05-02 and 2022-05-03 should be in the result set.
You can use a window function to PARTITION by day (casting the timestamp to a date) and sort the results by most recent first by ordering by upload_date descending. Using ROW_NUMBER() it will assign a 1 to the most recent record per date. Then just filter on that row number. Note that I am assuming the datatype for upload_date is TIMESTAMP in this case.
SELECT
*
FROM (
SELECT
your_table.*,
ROW_NUMBER() OVER (PARTITION BY CAST(upload_date AS DATE)
ORDER BY upload_date DESC) rownum
FROM your_table
)
WHERE rownum = 1
demo
WITH cte AS (
SELECT
max(upload_date) OVER (PARTITION BY upload_date::date),
upload_date,
day_name,
rows_added,
row_count_delta,
days_since_last_update
FROM test101 ORDER BY 1
)
SELECT
upload_date,
day_name,
rows_added,
row_count_delta,
days_since_last_update
FROM
cte
WHERE
max = upload_date;
This is more verbose but I find it easier to read and build:
SELECT *
FROM mytable t1
JOIN (
SELECT CAST(upload_date AS DATE) day_date, MAX(upload_date) max_date
FROM mytable
GROUP BY day_date) t2
ON t1.upload_date = t2.max_date AND
CAST(upload_date AS DATE) = t2.day_date;
I don't know about perfomance right away, but I suspect the window function is worse because you will need to order by, which is usually a slow operation unless your table already have an index for doing so.
Use DISTINCT ON:
SELECT DISTINCT ON (date_trunc('day', upload_date))
to_char(upload_date, 'Day') AS weekday, * -- added weekday optional
FROM tbl
ORDER BY date_trunc('day', upload_date), upload_date DESC;
db<>fiddle here
For few rows per day (like your sample data suggests) it's the simplest and fastest solution possible. See:
Select first row in each GROUP BY group?
I dropped the redundant column day_name from the table. That's just a redundant representation of the timestamp. Storing it only adds cost and noise and opportunities for inconsistent data. If you need the weekday displayed, use to_char(upload_date, 'Day') AS weekday like demonstrated above.
The query works for any number of days, not restricted to 7 weekdays.
Let's say I have a simple table agg_test with 3 columns - id, column_1 and column_2. Dataset, for example:
id|column_1|column_2
--------------------
1| 1| 1
2| 1| 2
3| 1| 3
4| 1| 4
5| 2| 1
6| 3| 2
7| 4| 3
8| 4| 4
9| 5| 3
10| 5| 4
A query like this (with self join):
SELECT
a1.column_1,
a2.column_1,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
Will produce a result like this:
column_1|column_1|array_agg
---------------------------
1| 2| {1}
1| 3| {2}
1| 4| {3,4}
1| 5| {3,4}
We can see that for values 4 and 5 from the joined table we have the same result in the last column. So, is it possible to somehow group the results by it, e.g:
column_1|column_1|array_agg
---------------------------
1| {2}| {1}
1| {3}| {2}
1| {4,5}| {3,4}
Thanks for any answers. If anything isn't clear or can be presented in a better way - tell me in the comments and I'll try to make this question as readable as I can.
I'm not sure if you can aggregate by an array. If you can here is one approach:
select col1, array_agg(col2), ar
from (SELECT a1.column_1 as col1, a2.column_1 as col2,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2) as ar
FROM agg_test a1 JOIN
agg_test a2
ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
) t
group by col1, ar
The alternative is to use array_dims to convert the array values into a string.
You could also try something like this:
SELECT DISTINCT
a1.column_1,
ARRAY_AGG(a2.column_1) OVER (
PARTITION BY
a1.column_1,
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
) AS "a2.column_1 agg",
ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
;
(Highlighted are the parts that are different from the query you've posted in your question.)
The above uses a window ARRAY_AGG to combine the values of a2.column_1 alongside the other other ARRAY_AGG, using the latter's result as one of the partitioning criteria. Without the DISTINCT, it would produce two {4,5} rows for your example. So, DISTINCT is needed to eliminate the duplicates.
Here's a SQL Fiddle demo: http://sqlfiddle.com/#!1/df5c3/4
Note, though, that the window ARRAY_AGG cannot have an ORDER BY like it's "normal" counterpart. That means the order of a2.column_1 values in the list would be indeterminate, although in the linked demo it does happen to match the one in your expected output.
I need to generate a report based on the following tables:
CALLS_FOR_PROPOSALS
ID|Name
--+------
1|Call 1
2|Call 2
3|Call 3
PROPOSALS
ID|Call ID|Title
--+-------+----------
1| 1|Proposal 1
2| 2|Proposal 2
3| 2|Proposal 3
PROPOSAL_STATUSES
ID|Proposal ID|Status ID
--+-----------+---------
1| 1| 1
2| 2| 1
3| 3| 1
4| 3| 2
STATUSES
ID|NAME
--+------------
1|Not Reviewed
2|Processing
3|Accepted
4|Rejected
With this sample data, there are 3 Calls for Proposals. There are three Proposals; one for Call 1, and two for Call 2. (Call 3 does not have any proposals.) Each proposal has at least one status assigned to it. When a row is inserted into the PROPOSALS table, a corresponding row is inserted into PROPOSAL_STATUSES, giving the Proposal an initial default status of 1 (Not Reviewed). Each time the status is changed, a new row is inserted into the PROPOSAL_STATUSES table, so that the history of status changes is preserved. I need to generate a report that shows for each Call, the number of Proposals submitted, and the number of Proposals that have had more than one status (i.e. the status has been changed from the default at least once.) For the sample data above, the results would look like this:
Call Name|Proposals Submitted|Proposals Reviewed|
---------+-------------------+------------------+
Call 1 | 1| 0|
Call 2 | 2| 1|
Call 3 | 0| 0|
How would I write the SQL query to generate this report based on the above table structure? Thanks for your help.
something like that should do the trick : Demo
SELECT Name as 'Call name',
submitted as 'Proposals Submitted',
SUM(CASE WHEN maxStatus > 1 THEN 1 ELSE 0 END) as 'Proposals Reviewed'
FROM
(SELECT cfp.Name,
sum(case when ps.Status_ID = 1 then 1 else 0 end) as submitted,
MAX(ps.Status_ID) as maxStatus
FROM CALLS_FOR_PROPOSALS cfp
LEFT JOIN PROPOSALS p on cfp.ID = p.CALL_ID
LEFT JOIN PROPOSAL_STATUSES ps on ps.PROPOSAL_ID = p.ID
GROUP BY cfp.Name) AS s
GROUP BY Name, submitted