SQL Aggregation / Window Function for Summarizing Data

SQL Aggregation / Window Function for Summarizing Data - sql

I would like to create a query to do the following but I am having trouble:
I have a DB table with the columns:
TestYear (int, e.g. 2014)
Date (date, i.e. set of dates in a given year)
DailyWorstValue
RunningValue
Primary key is TestYear + Date
I would like to get the:
LAST RunningValue ordered by Date (i.e. the final value)
MINIMUM WorstValue (i.e. the worst value)
Per TestYear
This will basically be a one-row summary per TestYear. Is it possible to do this using window functions? Thank you very much in advance for any help that you can give.

Am not sure why you need window function to do this just aggregate function will do the job for you
SELECT testyear,
MIN_DailyWorstValue = Min(dailyworstvalue),
RV.last_runningvalue
FROM db_table A
CROSS apply (SELECT TOP 1 Last_RunningValue= runningvalue
FROM db_table B
WHERE A.testyear = B.testyear
ORDER BY date DESC) RV
GROUP BY testyear,
RV.last_runningvalue

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.

Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

Adding grouping in framing clause window while creating partitions

Using the dataset hosted on Google (MBL Data) as an example, here is what I am accomplishing to do - obtain last 3 weeks score run for a given Venue.
My aggregated dataset looks like this without the strikes_3wk column -
Logic for strikes_3wk column is to partition the aggregated dataset by venueName, order by YearWeek column and then obtain the last 3 weeks aggregated strikes data.
Here is the query I have written so far. I see that the windowing function is where I need to modify the logic. So, is there a way to add grouping within the windowing function? Is there any alternative way of doing this?
In the image I added a new column 'expected', showing values for two weeks.
select inr.*
,sum(inr.strikes) over (Venue_Week rows between current row and 2 following) as strikes_3wk
from
(
select seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,CAST(
CONCAT(
CAST(EXTRACT(YEAR FROM createdAt) as string)
,CAST(EXTRACT(WEEK(Monday) FROM createdAt) as string)
) as INT64)
as YearWeek
,sum(homeFinalRuns) as homeFinalRuns
,sum(strikes) as strikes
from `bigquery-public-data.baseball.games_wide`
where createdAt is not null
group by seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,YearWeek
)inr
window Venue_Week as (
partition by inr.venueName
order by inr.YearWeek desc
)

So you are looking for strikes per venue regardless of who did them, right?
May be something like:
SELECT INR.*, STATS.strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR
LEFT JOIN (
SELECT venueName, SUM(strikes) as strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR2
WHERE YearWeek IN (
SELECT TOP 3 YearWeek
FROM `bigquery-public-data.baseball.games_wide`
WHERE venueName = INR2.venueName
ORDER BY YearWeek DESC
)
GROUP BY venueName
) STATS
ON INR.venueName = STATS.venueName

Group by and calculation from value on the next row

I'm quite new to sql server. I can't seem to figure this out. I have a table that looks like this.
I need to be able to calculate the percentage change in the number for each name, for each year, in the column p. So the end result should look like this.

You can easily calculate the % difference using lag()
select name, date, number,
Cast(((number * 1.0) - Lag(number,1) over(partition by name order by date))/ Lag(number,1) over(partition by name order by date) * 100 as int)
from table

Max partition by DAX measure equivalent?

In DAX/Power BI, I wondering if it is possible to create an aggregate calculation on a subset of data within a dataset.
I have a listing of customer scores for a period of time, e.g.
date, customer, score
-----------------------
1.1.17, A, 12
2.1.17, A, 16
4.1.17, B, 10
5.1.17, B, 14
I would like to identify to Max date per customer eg.
date, customer, score, max date per client
-------------------------------------------
1.1.17, A, 12, 2.1.17
2.1.17, A, 11, 2.1.17
4.1.17, B, 10, 5.1.17
5.1.17, B, 14, 5.1.17
The SQL equivalent would something like-
MAX(date) OVER (PARTITION BY customer).
In DAX/Power BI I realise that a calculated column can be used in combination with EARLIER but this will not be suitable because the calculated column is not responsive to filtering from a slicer. I.e I would like to find the MAX date per client as illustrated above for a filtered date range controlled from a slicer and not for the full data set which is what a calculated column does. Is such a measure possible?

You will want a measure like this:
Max Date by Customer =
CALCULATE(
MAX(Table1[Date]),
FILTER(
ALLSELECTED(Table1),
Table1[customer] = MAX(Table1[customer])
)
)
The ALLSELECTED removes the local filter context while preserving any slicer filtering.
The filter Table1[customer] = MAX(Table1[customer]) is basically the measure equivalent of Table1[customer] = EARLIER(Table1[customer]) in a calculated column.

You can use subquery :
select *, (select max(t1.date)
from table t1
where t1.customer = t.customer
) as max_date_per_client
from table t;

SQL find nearest date without going over, or return the oldest record

I have a view in SQL Server with prices of items over time. My users will be passing a date variable and I want to return the closest record without going over, or if no such record exists return the oldest record present. For example, with the data below, if the user passes April for item A it will return the March record and for item B it will return the June record.
I've tried a lot of variations with Union All and Order by but keep getting a variety of errors. Is there a way to write this using a Case Statement?
example:
case when min(Month)>Input Date then min(Month)
else max(Month) where Month <= Input Date?
Sincere apologies for attaching sample dataset as an image, I couldn't get it to format right otherwise.
Sample Dataset

You can use SELECT TOP (1) with order by DATE DESC + Item type + date comparison to get the latest. ORDER BY will order records by date, then you get the latest either this month (if exists) or earlier months.

Here's a rough outline of a query (without more of your table it's hard to be exact):
WITH CTE AS
(
SELECT
ITEM,
PRICE,
MIN(ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM) AS MIN_DATE,
MAX(INPUT_DATE<=ACTUAL_DATE) OVER (PARTITION BY ITEM ORDER BY ITEM,ACTUAL_DATE) AS MATCHED_DATE
FROM TABLE
)
SELECT
CTE.ITEM,
CTE.PRICE,
CASE
WHEN
CTE.MATCHED_DATE IS NOT NULL
THEN
CTE.MATCHED_DATE
ELSE
CTE.MIN_DATE
END AS MOSTLY_MATCHED_DATE
FROM CTE
GROUP BY
CTE.ITEM,
CTE.PRICE
The idea is that in a Common Table Expression, you use the PARTITION BY function to identify the key date for each item, record by record, and then you do a test in aggregate to pull either your matched record or your default record.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas