A different kind of running total in Teradata - sql

I have seen tickets about running totals, but this is a little different.
Let's say I have claims from January 2020 to max(date). I want to write a query to give me the claims totals for January 2020, then January to February 2020, then January to March 2020.... all the way to January to max(date), and all in the same query.
An additional month of data gets added each month. I would like the query to account for that and not hardcode anything.

This is a cumulative sum. Something like this:
select date, sum(claim) over (order by date)
from t;
If you need to aggregate by month, then:
select extract(year from date), extract(month from date),
sum(claims) as claims_in_month,
sum(sum(claims)) over (order by min(date)) as running_claims
from t
group by extract(year from date), extract(month from date);

Related

Using () OVER or HAVING clause to get monthly aggregates of counts

I have a big dataset on ticket sales throughout a single year. The schema I am working with is:
ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)
ticket_price (float)
total_price (float)
I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:
year
month
weekday
total_tickets
2015
01
SAT
5400
2015
02
SUN
4300
2015
03
SUN
6400
I tried using the following, but admittedly SQL is not my strongest skill:
SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
from ticket_sales
order by YEAR, MONTH
But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?
The analysis exception says:
`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358
but then it is quite a long error.
Update:
So I tried this code:
SELECT DISTINCT year,
month,
week_day,
COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count
from ticket_sales
order by year, month, weekday_count DESC
And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.
Try this query and let me know if return the desire result:
I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.
First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.
WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)
SELECT DISTINCT
YEAR,
MONTH,
FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;
In Postgresql database I got the desire result.

Presenting cumulative average in time series

I am trying to present a time series of a score to view the trend.
Score is an Average of all of the scores from the first Date in the table until the of the end of Year-Month.
ie. Jan 2018 = where date < Jan 2018
Feb 2018 = where date < Feb 2018
I would like to present this as a Monthly score for each Year-Month (Dec 2017, Jan 2018)
If score was not an average, i could utilize the Cumulative option in the Timeseries, however this does not work when introducing Avg(Metric).
I am really scratching my head on this one. Any advice on how to structure the data and present this in Google Datastudio would be greatly appreciated.
I have access to the database, and we are utilizing Big query to create the views.
avg() should work. Something like this:
select t.*,
avg(val) over (partition by format_date('%Y%m', date))
from t;
Oops, this is the average for the current month. If you want the running average:
select format_date('%Y%m', date) as yyyymm,
(sum(sum(val)) over (order by min(date)) /
sum(count(*)) over (order by min(date))
) as running_avg
from t
group by yyyymm
order by yyyymm;

How do I correctly use the SQL Sum function with multiple variables and grouping?

I am trying to write an SQL statement based on the following code.
CREATE TABLE mytable (
year INTEGER,
month INTEGER,
day INTEGER,
hoursWorked INTEGER )
Assuming that each employee works multiple days over each month in a 3 year period.
I need to write an sql statement that returns the total hours worked in each month, grouped by earliest year/month first.
I tried doing this, but I don't think it is correct:
SELECT Sum(hoursWorked) FROM mytable
ORDER BY(year,month)
GROUP BY(month);
I am a little confused about how to operate the sum function in conjunction with thee GROUP BY or ORDER BY function. How does one go about doing this?
Try this:
SELECT year, month, SUM(hoursWorked)
FROM mytable
GROUP BY year, month
ORDER BY year, month
This way you will have for example:
2014 December 30
2015 January 12
2015 February 40
Fields you want to group by always have be present in SELECT part of query. And vice-versa - what you put in SELECT part, need be also in GROUP BY.
SELECT year, month, Sum(hoursWorked)as workedhours
FROM mytable
GROUP BY year,month
ORDER BY year,month;
You have to group by year and month.
Is this what you are trying to do. This will sum by Year/Month and Order by Year/Month.
Select [Year], [Month], Sum(HoursWorked) as WorkedHours
From mytable
Group By [Year], [Month]
Order by [Year], [Month]
You have to group by year and month, otherwise you will have the hours you worked on March 2014 and 2015 in one record :)
SELECT Sum(hoursWorked) as hoursWorked, year, month
FROM mytable
GROUP BY(year, month)
ORDER BY(year,month)
;

How to group quantities in a period of time

Im trying to group quantities regarding a time or period, i have the next table
SALES
SALES_DATE
SALES_ITEM
SALES_QUANTITY
The query that im doing it's
SELECT DATE,ITEM,SUM(QUANTITY)
FROM SALES
WHERE DATE BETWEEN "DATE1" AND "DATE2";
The problem is that i dont need the DATE to appear, if i look for the sales of october it should appear the sum of october without showing the date... Thank you very much for your help
Example:
What i get...
DATE ITEM SALES
2012-06-12 14152 7
2012-06-14 14152 15
2012-06-16 14157 25
What i need: query between 06-12 and 06-16
ITEM SALES
14152 22
14157 25
Thanks you very much
If you want the sum by month, you can include that in the group by expression. Here is one way:
SELECT extract(year from DATE) as yr, extract(month from date) as mon, ITEM, SUM(QUANTITY)
FROM SALES
WHERE DATE BETWEEN "DATE1" AND "DATE2"
group by extract(year from DATE), extract(month from date)
order by 1, 2
Although extract is standard SQL, not all databases support it. For instance, you might use to_char(date, 'YYYY-MM') in Oracle or datepart(month, date) in SQL Server.

SQLCE: How to Count Datepart

For a long time I am struggling with the following subject: I want to count datepart values. I use SQL Compact Edition 4.0 and have no idea on how to get the following:
select datepart(week, CreateDate) as Week, count(*) from tblOrders
where CreateDate>'12 April 2010' and CreateDate<'25 June 2011'
This does not work obviously, but to give you an idea what I want to get as the result is:
- 2 columns,
one called "week" - that would be a week number
in the second column - how many orders I had per week
Thanks in advance,
Pete
You'll need to add a Group By to make the query syntax correct.
select datepart(week, CreateDate) as Week, count(*)
from tblOrders where CreateDate>'12 April 2010' and CreateDate<'25 June 2011'
group by datepart(week, CreateDate)
Does that help?