Returning single records per month - sql

I have a use case function that needs to returns a single row only for every end of month.
I tried using select distinct and it is showing multiple records for the same end of month
SELECT DISTINCT CASE
WHEN eff_interest_balance < 0.01 THEN trial_balance_date
WHEN date_paid < trial_balance_date THEN date_paid
END as A
, period
FROM dbo.Intpayments[enter image description here][1]
WHERE loan_number = 60023
ORDER BY period ASC
Each row should return single date for each month

Distinct is returning unique rows, not grouping them. You are looking to aggregate rows. This means using some combination of aggregate functions and group by.
What your current query is missing is some sort of logic for aggregating the rows that are in the same period. Do you want to compare the sum of these values? The min, the max?
In any case, the basic idea of aggregating and grouping would look like this - I don't think this summing is what you want, but the query shows the basic idea of aggregating and grouping:
SELECT
period
, SUM(eff_interest_balance) AS SumOfBalance
FROM dbo.Intpayments
WHERE loan_number = 60023
GROUP BY period

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

SQL-How to Sum Data of Clients Over Time?

Goal: SUM/AVG Client Data over multiple dates/transactions.
Detailed Question: How do I properly Group clients ('PlayerID') then SUM the int(MinsPlayed), then AVG (AvgBet)?
Current Issue: my Results are giving individual transactions day by day over the 90 day time period instead of the SUM/AVG over the 90 days.
Current Script/Results: FirstName-Riley is showing each individual daily transaction instead of 1 total SUM/AVG over set time period
Firstly, you don't need to use DISTINCT as you are going to be aggregating the results using GROUP BY, so you can take that out.
The reason you are returning a row for each transaction is that your GROUP BY clause includes the column you are trying to aggregate (e.g. TimePlayed). Typically, you only want to GROUP BY the columns that are not being aggregated, so remove all the columns from the GROUP BY clause that you are aggregating using SUM or AVG (TimePlayed, PlayerSkill etc.).
Here's your current SQL:
SELECT DISTINCT CDS_StatDetail.PlayerID,
StatType,
FirstName,
LastName,
Email,
SUM(TimePlayed)/60 AS MinsPlayed,
SUM(CashIn) AS AvgBet,
SUM(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
CustomFlag1
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate >= '1/02/17' and CDS_StatDetail.GamingDate <= '4/02/2017' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID, StatType, FirstName, LastName, Email, TimePlayed, CashIn, PlayerSkill, PlayerSpeed, CustomFlag1
ORDER BY CDS_StatDetail.PlayerID
You want something like:
SELECT CDS_StatDetail.PlayerID,
SUM(TimePlayed)/60 AS MinsPlayed,
AVG(CashIn) AS AvgBet,
AVG(PlayerSkill) AS AvgSkillRating,
SUM(PlayerSpeed) AS Speed,
FROM CDS_Player INNER JOIN CDS_StatDetail
ON CDS_Player.Player_ID = CDS_StatDetail.PlayerID
WHERE StatType='PIT' AND CDS_StatDetail.GamingDate BETWEEN '2017-01-02' AND '2017-04-02' AND CustomFlag1='N'
GROUP BY CDS_StatDetail.PlayerID
Next time, please copy and paste your text, not just linking to a screenshot.

Redshift: Find MAX in list disregarding non-incremental numbers

I work for a sports film analysis company. We have teams with unique team IDs and I would like to find the number of consecutive weeks they have uploaded film to our site moving backwards from today. Each upload also has its own row in a separate table that I can join on teamid and has a unique date of when it was uploaded. So far I put together a simple query that pulls each unique DATEDIFF(week) value and groups on teamid.
Select teamid, MAX(weekdiff)
(Select teamid, DATEDIFF(week, dateuploaded, GETDATE()) as weekdiff
from leroy_events
group by teamid, weekdiff)
What I am given is a list of teamIDs and unique weekly date differences. I would like to then find the max for each teamID without breaking an increment of 1. For example, if my data set is:
Team datediff
11453 0
11453 1
11453 2
11453 5
11453 7
11453 13
I would like the max value for team: 11453 to be 2.
Any ideas would be awesome.
I have simplified your example assuming that I already have a table with weekdiff column. That would be what you're doing with DATEDIFF to calculate it.
First, I'm using LAG() window function to assign previous value (in ordered set) of a weekdiff to the current row.
Then, using a WHERE condition I'm retrieving max(weekdiff) value that has a previous value which is current_value - 1 for consecutive weekdiffs.
Data:
create table leroy_events ( teamid int, weekdiff int);
insert into leroy_events values (11453,0),(11453,1),(11453,2),(11453,5),(11453,7),(11453,13);
Code:
WITH initial_data AS (
Select
teamid,
weekdiff,
lag(weekdiff,1) over (partition by teamid order by weekdiff) as lag_weekdiff
from
leroy_events
)
SELECT
teamid,
max(weekdiff) AS max_weekdiff_consecutive
FROM
initial_data
WHERE weekdiff = lag_weekdiff + 1 -- this insures retrieving max() without breaking your consecutive increment
GROUP BY 1
SQLFiddle with your sample data to see how this code works.
Result:
teamid max_weekdiff_consecutive
11453 2
You can use SQL window functions to probe relationships between rows of the table. In this case the lag() function can be used to look at the previous row relative to a given order and grouping. That way you can determine whether a given row is part of a group of consecutive rows.
You still need overall to aggregate or filter to reduce the number of rows for each group of interest (i.e. each team) to 1. It's convenient in this case to aggregate. Overall, it might look like this:
select
team,
case min(datediff)
when 0 then max(datediff)
else -1
end as max_weeks
from (
select
team,
datediff,
case
when (lag(datediff) over (partition by team order by datediff) != datediff - 1)
then 0
else 1
end as is_consec
from diffs
) cd
where is_consec = 1
group by team
The inline view just adds an is_consec column to the data, marking whether each row is part of a group of consecutive rows. The outer query filters on that column (you cannot filter directly on a window function), and chooses the maximum datediff from the remaining rows for each team.
There are a few subtleties there:
The case expression in the inline view is written as it is to exploit the fact that the lag() computed for the first row of each partition will be NULL, which does not evaluate unequal (nor equal) to any value. Thus the first row in each partition is always marked consecutive.
The case testing min(datediff) in the outer select clause picks up teams that have no record with datediff = 0, and assigns -1 to column max_weeks for them.
It would also have been possible to mark rows non-consecutive if the first in their group did not have datediff = 0, but then you would lose such teams from the results altogether.

Aggregate function -Avg is not working in my sql query

In my query I need to display date and average age:
SELECT (SYSDATE-rownum) AS DATE,
avg((SYSDATE - rownum)- create_time) as average_Age
FROM items
group by (SYSDATE-rownum)
But my output for average age is not correct. It's simply calculating/displaying the output of (SYSDATE - rownum)- create_time but not calculating the average of them though I use: avg((SYSDATE - rownum)- create_time).
Can someone tell me why the aggregate function AVG is not working in my query and what might be the possible solution
In the select clause you are using both an non-aggregate expression as wel as an aggregate expression. By dropping the (SYSDATE-rownum) AS DATE statemant you would generate an outcome over the whole data set. In that way the avg is calculated over the whole data set ... and not just per single record retrieve.
Then you might drop the group by too. In the end you just keep the avg statement
SELECT avg((SYSDATE - rownum)- create_time) as average_Age
FROM items
First you need to think on rows or group on which you need avg. this column will come in group by clause. as a simple thing if there is 5 rows with age of 20, 10, 20, 30 then avg will be (80/4=20) i.e. 20. so I think you need to fist calculate age by (sysdate - create_time).
eg.select months_between(sysdate,create_date)/12 cal3 from your_table
and then there will be outer query for avg on group.

how would i get the average of a previous date and update it?

I want to write a query that will have the average(that wont be hard) but when I get that average I want to save it somewhere. Let's I have a average save from last month table_a.last_month_average. And now I run the query again and this would be the current_month_average. I want to compare this two columns and see if the current_month_average increase from last_month_average.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
Is this possible in sql? or maybe there is a better way to do this?any suggestions will help.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
By my understanding, this operation is to select maximum month_average from all history records. So you don't need to keep a record of current_month_average and last_month_average. Instead, a table of all history month average is helpful. Assume there is table named monthaverage with columns (Id, Month, Average), you can query
SELECT TOP 1 T1.*
, CASE WHEN
T1.Average > (SELECT TOP 1 T2.Average
FROM monthaverage T2
WHERE T2.Month < T1.Month
ORDER BY Month DESC)
THEN 'Increased'
ELSE 'Not Increased'
END
FROM monthaverage T1
ORDER BY T1.Average DESC
If you have chance to run it from SQL SERVER 2012, you can leverage LAST_VALUE function. Query is like
SELECT TOP 1 *, CASE WHEN Average > LAST_VALUE(Average) OVER (ORDER BY Month) THEN 'Increased' ELSE 'Not Increased' END
FROM monthaverage
ORDER BY Average DESC