how to add monthly count average - sql

I am looking for all counts when dimsyermid=-1 and also make a new column calculate avg per month. Below are my current queries and result, I don't know how to add a new column calculate avg per month.
query:
select DimSystemID, EligibleYM, count(*)
from dbo.table1
where DimSystemID=-1
group by DimSystemID, EligibleYM
order by 2 desc, 1
Result table
DimSystemID EligibleYM (No column name)
-1 202001 75
-1 201912 70
-1 201911 67
-1 201910 67
-1 201909 59

Welcome to Stack. Making the assumption that you have some values that you want to average in your data set but not shown in your question, in MS SQL, you would just create another computed column that does the math:
select DimSystemID, EligibleYM, count(*), [new computed column here as AVG]
from dbo.table1
where DimSystemID=-1
group by DimSystemID, EligibleYM
order by 2 desc, 1
with an example:
select DimSystemID, EligibleYM, count(*), AVG(MONTH DATA HERE)
An example (anonymized) of your data would help.
MSSQL AVG Document

Related

How to select rows which has cumulative sum of column value min to given value

I need to fetch data from PostgreSQL, where I need to select rows on below condition.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021
2 1 12 27-Jun-2021
3 1 32 26-Jun-2021
4 1 52 25-Jun-2021
Need to get all rows [sum of total_quantity column value] matching the given value in a query and type. If I give value as 24 and type as 1, then I need to get all rows [cumulative value of total_quantity value] <= 24 and also get next immediate row which is greater than the given value, rest of rows need to be ignored. row[s] are fetched through Order by created_dttm desc
so I need to get only three rows.. for given value 24 and for type = 1.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021 [10 less than 24 ] fetch row
2 1 12 27-Jun-2021 [22 (sum of current row &previous) less than 24]fetch row
3 1 32 26-Jun-2021 [54 [10+12+32]greater than 24] when greater than reached;
then fetch this row only
4 1 52 25-Jun-2021 [query should not fetch this row, since max reached # id 3]
I tried sum of two columns, but this will not work, since I am looking for rows between a value range, and with condition to select all rows less than given value + select next max value of given value.. for the given type...
We can use SUM here as an analytic function:
WITH cte AS (
SELECT *, SUM(total_quantity) OVER (ORDER BY created_dttm DESC)
- total_quantity AS tq_sum
FROM yourTable
)
SELECT id, type, total_quantity, created_dttm
FROM cte
WHERE tq_sum < 24;
Demo
The above trick (in the CTE) works by sparing the current row's total quantity from the running total. So the first row to exceed the threshhold of 24 would also be included, because its total quantity would be excluded from the running total.

Need sum of a column from a filter condition for each row

Need to get total sum of defect between main_date column and past 365 day (a year) from it, if any, for a single ID.
And The value need to be populated for each row.
Have tried below queries and tried to use CSUM also but it's not working:
1) select sum(Defect) as "sum",Id,MAIN_DT
from check_diff
where MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by 2,3;
2)select Defect,
Type1,
Type2,
Id,
MAIN_DT,
ADD_MONTHS(TIM_MAIN_DT,-12) year_old,
CSUM(Defect,MAIN_DT)
from check_diff
where
MAIN_DT between ADD_MONTHS(MAIN_DT,-12) and MAIN_DT group by id;
The expected output is as below:
Defect Type1 Type2 Id main_dt sum
1 a a 1 3/10/2017 1
99 a a 1 4/10/2018 99
0 a b 1 7/26/2018 99
1 a b 1 11/21/2018 100
1 a c 2 12/20/2018 1
Teradata doesn't support RANGE for Cumulative Sums, but you can rewrite it using a Correlated Scalar SUbquery:
select Defect, Id, MAIN_DT,
( select sum(Defect) as "sum"
from check_diff as t2
where t2.Id = t1.Id
and t2.MAIN_DT > ADD_MONTHS(t1.MAIN_DT,-12)
and t2.MAIN_DT <= t1.MAIN_DT group by 2,3;
) as dt
from check_diff as t1
Performance might be bad depending on the overall number of rows and the number of rows per ID.

HIVE: Replace empty results by 0 in group by statements

I'm a new Hive user, and need to aggregate the sum of amounts for a given table. Consider the simplified example below:
SELECT day, sum(amount) FROM tableX WHERE columnA = 'RareValue' GROUP BY day;
Suppose that it's possible that there is no row entry which matches the condition in the WHERE clause for some dates. And so the query result will skip those days.
For example, this is the result I get:
date amount
2018-01-15 230
2018-01-13 210
2018-01-12 140
2018-01-11 222
But this is the desired result:
date amount
2018-01-15 230
2018-01-14 0
2018-01-13 210
2018-01-12 140
2018-01-11 222
I tried this to generate a sequence of dates and then use LEFT JOIN and COALESCE to fill empty dates by zeros. However, the performance was terrible slow. What is the best approach for this?
Supposing that you are trying to exclude the whole day in case when your where condition is true, you can do something like
select
day,
if(max(mycondition) = 0, sum(amount), 0) as mysum from
(
select day, amount,
if(columnA = 'RareValue', 1, 0) as mycondition
FROM tableX
) t GROUP BY day;
I did not have the chance to test it :)
If I correctly understood you all needed days are presented in tableX table. So, I advise first select all rows where columnA is not equal 'RareValue' and that UNION it with your query.
SELECT day, 0 FROM tableX WHERE columnA != 'RareValue'
UNION
SELECT day,sum(amount) from tableX WHERE columnA = 'RareValue' GROUP BY day;
if the days from the first select repeats you can add 'distinct'

SQL SUM and value conversion

I'm looking to transform data in my SUM query to acknowledge that some numeric values are negative in nature, although not represented as such.
I look for customer balance where the example dataset includes also credit transactions that are not written as negative in the database (although all records that have value C for credit in inv_type column should be treated as negative in the SQL SUM function). As an example:
INVOICES
inv_no inv_type cust_no value
1 D 25 10
2 D 35 30
3 C 25 5
4 D 25 50
5 C 35 2
My simple SUM function would not give me the correct answer:
select cust_no, sum(value) from INVOICES
group by cust_no
This query would obviously sum the balance of customer no 25 for 65 and no 35 for 32, although the anticipated answer would be 10-5+50 = 55 and 30 - 2 = 28
Should I perhaps utilize CAST function somehow? Unfortunately I'm not up to date on the underlying db engine, however good chance of it being of IBM origin. Most of the basic SQL code has worked out so far though.
You can use the case expression inside of a sum(). The simplest syntax would be:
select cust_no,
sum(case when inv_type = 'C' then - value else value end) as total
from invoices
group by cust_no;
Note that value could be a reserved word in your database, so you might need to escape the column name.
You should be able to write a projection (select) first to obtain a signed value column based on inv_type or whatever, and then do a sum over that.
Like this:
select cust_no, sum(value) from (
select cust_no
, case when inv_type='D' then [value] else -[value] end [value]
from INVOICES
) SUMS
group by cust_no
You can put an expression in the sum that calculates a negative value if the invoice is a credit:
select
cust_no,
sum
(
case inv_type
when 'C' then -[value]
else [value]
end
) as [Total]
from INVOICES

Finding average of data within a certain range

How do I find the average of data set within a certain range? Specifically I am looking to find the average for a data set for all data points that are within one standard deviations of the original average. Here is an example:
Student_ID Test_Scores
1 3
1 20
1 30
1 40
1 50
1 60
1 95
Average = 42.571
Standard Deviation = 29.854
I want to find all data points that are within one standard deviation of this original average, so within the range (42.571-29.854)<=Data<=(42.571+29.854). And from here I want to recalculate a new average.
So my desired data set is:
Student_ID Test_Scores
1 20
1 30
1 40
1 50
1 60
My desired new average is: 40
Here is my following SQL code and it didn't yield my desired result:
SELECT
Student_ID,
AVG(Test_Scores)
FROM
Student_Data
WHERE
Test_Scores BETWEEN (AVG(Test_Scores)-STDEV(Test_Scores)) AND (AVG(Test_Scores)+STDEV(Test_Scores))
ORDER BY
Student_ID
Anyone know how I could fix this?
Use either window functions or do the calculation in a subquery:
SELECT sd.Student_ID, sd.Test_Scores
FROM Student_Data sd CROSS JOIN
(SELECT AVG(Test_Scores) as avgts, STDEV(Test_Scores) as stdts
FROM Student_Data
) x
WHERE sd.Test_Scores BETWEEN avgts - stdts AND avgts + stdts
ORDER BY sd.Student_ID;
select avg(
select test_scores from table where
test_scores between
(
(select avg(test_scores) from table)-(select stddev(test_scores) from table))
and
(
(select avg(test_scores) from table)+(select stddev(test_scores) from table))
);