SQL- Invalid query because of aggregate function with simple calculations for each date and ID - sql

I very new to SQL (less than 100Hrs). Problem case is as mentioned below. Every time I try a query either get incorrect output or error that "not contained in either an aggregate function or the GROUP BY clause"
Have tried searching similar questions or example but no results. I am lost now.
Please help
I have three tables
Table Calc,
Source_id | date(yyyymmdd) | metric1 | metric 2
-------------------------------------------------
1 | 20201010 | 2 | 3
2 | 20201010 | 4 | 5
3 | 20201010 | 6 | 7
1 | 20201011 | 8 | 9
2 | 20201011 | 10 | 11
3 | 20201011 | 12 | 13
1 | 20201012 | 14 | 15
2 | 20201012 | 16 | 17
3 | 20201012 | 18 | 19
Table Source
Source_id | Description
------------------------
1 | ABC
2 | DEF
3 | XYZ
Table Factor
Date | Factor
-----------------
20201010 | .3
20201011 | .5
20201012 | .7
If selected dates by user is 20201010 to 20201012 then result will be
Required result
Source_id | Calculated Value
-------------------------------------------------------------------------------
ABC | (((2x3)x.3 + (8x9)x.5 + (14x15)x.7))/(No of dates selected in this case =3)
DEF | (((4x5)x.3+ (10x11)x.5 + (16x17)x.7))/(No of dates selected in this case =3)
XYZ | (((6x7)x.3+ (12x13)x.5 + (18x19)x.7))/(No of dates selected in this case =3)
Dates will be user defined input so the calculated value should be average of that many dates. Selected dates will always be defined in range rather than random multiple selection.
In table calc, source_id and date together will be unique.
Each date has factor which is to be multiplied with all source_id for that date.
If selected dates by user is from 20201010 to 20201011 then result will be
Source_id | Calculated Value
-------------------------------------------------------------------------------
ABC | ((2x3)x.3+(8x9)x.5)/2
DEF | ((4x5)x.3+(10x11)x.5)/2
XYZ | ((6x7)x.3+(12x13)x.5)/2
If selected dates by user is 20201012 then result will be
Source_id | Calculated Value
-------------------------------------------------------------------------------
ABC | (14x15)x.7
DEF | (16x17)x.7
XYZ | (18x19)x.7

Create a CTE to store the starting and ending dates and cross join it to the join of the tables, group by source and aggregate:
WITH cte AS (SELECT '20201010' min_date, '20201012' max_date)
SELECT s.Description,
ROUND(SUM(c.metric1 * c.metric2 * f.factor / (DATEDIFF(day, t.min_date, t.max_date) + 1)), 2) calculated_value
FROM cte t CROSS JOIN Source s
LEFT JOIN Calc c ON c.Source_id = s.Source_id AND c.date BETWEEN t.min_date AND t.max_date
LEFT JOIN Factor f ON f.date = c.date
GROUP BY s.Source_id, s.Description
See the demo.
Results:
> Description | calculated_value
> :---------- | ---------------:
> ABC | 61.60
> DEF | 83.80
> XYZ | 110.00

Related

Summing all values with same ID in a column give me duplicated values in SQL?

I am trying to sum all the columns that have the same ID number in a specified date range, but it always gives me duplicated values
select pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand,
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
join dm_product.product_data_livefeed a
on pr.product_sku = a.product_sku
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by pr.product_sku,
pr.product_name,
pr.brand,
pr.category_name,
pr.subcategory_name,
a.stock_on_hand;
This supposes to give me:
Sum of all KPI values for a distinct product SKU
Example table:
| Date | product_sku |page_views|number_of_subs
|------------|-------------|----------|--------------|
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
| 2022-01-01 | 1 | 110 | 50 |
| 2022-01-25 | 2 | 1000 | 40 |
| 2022-01-20 | 3 | 2000 | 10 |
Expected Output:
| product_sku |page_views|number_of_subs
|-------------|----------|--------------|
| 1 | 220 | 100 |
| 2 | 2000 | 80 |
| 3 | 4000 | 20 |
Sorry I had to edit to add the table examples
Since you're not listing the dupes (assuming they are truly appearing as duplicate rows, and not just multiple rows with different values), I'll offer that there may be something else that's at play here - I would suggest for every string value in your result set that's part of the GROUP BY clause to apply a TRIM(UPPER()) as you might be dealing with either a case insensitivity or trailing blanks that are treated as unique values in the query.
Assuming all the columns are character based:
select trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name)),
sum(pr.pageviews) as page_views,
sum(acquired_subscriptions) as acquired_subs,
sum(acquired_subscription_value) as asv_value
from dwh.product_reporting pr
where pr.fact_day between '2022-05-01' and '2022-05-30' and pr.pageviews > '0' and pr.acquired_subscription_value > '0' and store_id = 1
group by trim(upper(pr.product_sku)),
trim(upper(pr.product_name)),
trim(upper(pr.brand)),
trim(upper(pr.category_name)),
trim(upper(pr.subcategory_name));
Thank you guys for all your help, I found out where the problem was. It was mainly in the group by when I removed all the other column names and left only the product_sku column, it worked as required

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Comparing two sets of data

Very sorry if this has been answered in some way. I have checked all over and can't figure it out.
I need to find a way in postgresql to compare data from week to week. All data exists in the same table, and has a Week number column. Data will not always completely overlap but I need to compare data within groups when they do.
Say these are the data sets:
Week 2
+--------+--------+------+---------+-------+
| group | num | color| ID | week #|
+--------+--------+------+---------+-------+
| a | 1 | red | a1red | 2 |
| a | 2 | blue | a2blue | 2 |
| b | 3 | blue | b3blue | 2 |
| c | 7 | black| c7black | 2 |
| d | 8 | black| d8black | 2 |
| d | 9 | red | d9red | 2 |
| d | 10 | gray | d10gray | 2 |
+--------+--------+------+---------+-------+
Week 3
+--------+--------+------+---------+-------+
| group | num | color| ID | week #|
+--------+--------+------+---------+-------+
| a | 1 | red | a1red | 3 |
| a | 2 | green| a2green | 3 |
| b | 3 | blue | b3blue | 3 |
| b | 5 | green| b5green | 3 |
| c | 7 | black| c7black | 3 |
| e | 11 | blue | d11blue | 3 |
| e | 12 | other| d12other| 3 |
| e | 14 | brown| d14brown| 3 |
+--------+--------+------+---------+-------+
Each row has an ID made out of the group, number, and color values.
I need the query to grab all groups from Week 3, then for any groups in Week 3 that exist in Week 2:
flag ID's within the group that have changed, like in group A.
flag if any ID's were added or removed to the group, like in group B.
One function that would be nice to have, but is not essential, would be to have Week 3 compare against Week 1 for groups that do not exist in Week 2.
I have thought about trying to divide the two weeks up and use intercept/except to get results but I can't quite wrap my head around how I might get this to work correctly. Any tips would be much appreciated.
For just two (known) weeks you can do something like this:
select coalesce(w1.group_nr, w2.group_nr) as group_nr,
coalesce(w1.num, w2.num) as num,
case
when w1.group_nr is null then 'missing in first week'
when w2.group_nr is null then 'missing in second week'
when (w1.color, w1.id) is distinct from (w2.color, w2.id) then 'data has changed'
else 'no change'
end as status,
case
when
w1.group_nr is not null
and w2.group_nr is not null
and w1.color is distinct from w2.color then 'color is different'
end as color_change,
case
when
w1.group_nr is not null
and w2.group_nr is not null
and w1.id is distinct from w2.id then 'id is different'
end as id_change
from (
select group_nr, num, color, id, hstore
from data
where week = 2
) as w1
full outer join (
select group_nr, num, color, id
from data
where week = 3
) w2 on (w1.group_nr, w1.num) = (w2.group_nr, w2.num)
Getting the attributes that have changed is a bit clumsy. If you can live with a textual representation, you could use the hstore extension to display the differences:
select coalesce(w1.group_nr, w2.group_nr) as group_nr,
coalesce(w1.num, w2.num) as num,
case
when w1.group_nr is null then 'missing in first week'
when w2.group_nr is null then 'missing in second week'
when (w1.color, w1.id) is distinct from (w2.color, w2.id) then 'data has changed'
else 'no change'
end as status,
w2.attributes - w1.attributes as changed_attributes
from (
select group_nr, num, color, id, hstore(data) - 'week'::text as attributes
from data
where week = 2
) as w1
full outer join (
select group_nr, num, color, id, hstore(data) - 'week'::text as attributes
from data
where week = 3
) w2 on (w1.group_nr, w1.num) = (w2.group_nr, w2.num);

Subtract the value of a row from grouped result

I have a table supplier_account which has five coloumns supplier_account_id(pk),supplier_id(fk),voucher_no,debit and credit. I want to get the sum of debit grouped by supplier_id and then subtract the value of credit of the rows in which voucher_no is not null. So for each subsequent rows the value of sum of debit gets reduced. I have tried using 'with' clause.
with debitdetails as(
select supplier_id,sum(debit) as amt
from supplier_account group by supplier_id
)
select acs.supplier_id,s.supplier_name,acs.purchase_voucher_no,acs.purchase_voucher_date,dd.amt-acs.credit as amount
from supplier_account acs
left join supplier s on acs.supplier_id=s.supplier_id
left join debitdetails dd on acs.supplier_id=dd.supplier_id
where voucher_no is not null
But here the debit value will be same for all rows. After subtraction in the first row I want to get the result in second row and subtract the next credit value from that.
I know it is possible by using temporary tables. The problem is I cannot use temporary tables because the procedure is used to generate reports using Jasper Reports.
What you need is an implementation of the running total. The easiest way to do it with a help of a window function:
with debitdetails as(
select id,sum(debit) as amt
from suppliers group by id
)
select s.id, purchase_voucher_no, dd.amt, s.credit,
dd.amt - sum(s.credit) over (partition by s.id order by purchase_voucher_no asc)
from suppliers s
left join debitdetails dd on s.id=dd.id
order by s.id, purchase_voucher_no
SQL Fiddle
Results:
| id | purchase_voucher_no | amt | credit | ?column? |
|----|---------------------|-----|--------|----------|
| 1 | 1 | 43 | 5 | 38 |
| 1 | 2 | 43 | 18 | 20 |
| 1 | 3 | 43 | 8 | 12 |
| 2 | 4 | 60 | 5 | 55 |
| 2 | 5 | 60 | 15 | 40 |
| 2 | 6 | 60 | 30 | 10 |

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.