Using sum function with a condition based on a returned value - sql

I have a set of given month with a number of hours related to each of it
DATE HOURS
8/1/2013 3
9/1/2013 8
10/1/2013 2
11/1/2013 4
12/1/2013 1
I need to return the sum of hours for everything that is in the past including current month, in the example below, starting in august, sum would be august only. For september, I'd need august + september
DATE HOURS SUM
8/1/2013 3 3
9/1/2013 8 11
10/1/2013 2 13
11/1/2013 4 17
12/1/2013 1 18
I am not sure how to proceed, since the date condition is different for each line.
If anyone can help on this, it'd be greatly appreciated

You can do this in most SQL dialects using a correlated subquery (or a non-equijoin, but I find the subquery cleaner):
select date, hours,
(select sum(t2.hours)
from t t2
where t2.date <= t.date
) as cum
from t;
Many SQL engines also support the cumulative sum function, which would typically look like this:
select date, hours sum(hours) over (order by date) as cum
from t

Related

How to LEFT JOIN on ROW_NUM using WITH

Right now I'm in the testing phase of this query so I'm only testing it on two Queries. I've gotten stuck on the final part where I want to left join everything (this will have to be extended to 12 separate queries). The problem is basically as the title suggests--I want to join 12 queries on the created Row_Num column using the WITH() statement, instead of creating 12 separate tables and saving them as table in a database.
WITH Jan_Table AS
(SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jan_Rev
FROM ba.SALE_TABLE a
WHERE a.SALE_DATE BETWEEN '2015-01-01' and '2015-01-31'
GROUP BY a.SALE_DATE)
SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jun_Rev, j.Jan_Rev
FROM ba.SALE_TABLE a
LEFT JOIN Jan_Table j
on "j.Row_ID" = a.Row_ID
WHERE a.SALE_DATE BETWEEN '2015-06-01' and '2015-06-30'
GROUP BY a.SALE_DATE
And then I get this error message:
ERROR: column "j.Row_ID" does not exist
I put in the "j.Row_ID" because the previous message was:
ERROR: column a.row_id does not exist Hint: Perhaps you meant to
reference the column "j.row_id".
Each query works individually without the JOIN and WITH functions. I have one for every month of the year and want to join 12 of these together eventually.
The output should be a single column with ROW_NUM and 12 Monthly Revenues columns. Each row should be a day of the month. I know not every month has 31 days. So, for example, Feb only has 28 days, meaning I'd want days 29, 30, and 31 as NULLs. The query above still has the dates--but I will remove the "SALE_DATE" column after I can just get these two queries to join.
My initially thought was just to create 12 tables but I think that'd be a really bad use of space and not the most logical solution to this problem if I were to extend this solution.
edit
Below are the separate outputs of the two qaruies above and the third table is what I'm trying to make. I can't give you the raw data. Everything above has been altered from the actual column names and purposes of the data that I'm using. And I don't know how to create a dataset--that's too above my head in SQL.
Jan_Table (first five lines)
Row_Num Date Jan_Rev
1 2015-01-01 20
2 2015-01-02 20
3 2015-01-03 20
4 2015-01-04 20
5 2015-01-05 20
Jun_Table (first five lines)
Row_Num Date Jun_Rev
1 2015-06-01 30
2 2015-06-02 30
3 2015-06-03 30
4 2015-06-04 30
5 2015-06-05 30
JOINED_TABLE (first five lines)
Row_Num Date Jun_Rev Date Jan_Rev
1 2015-06-01 30 2015-01-01 20
2 2015-06-02 30 2015-01-02 20
3 2015-06-03 30 2015-01-03 20
4 2015-06-04 30 2015-01-04 20
5 2015-06-05 30 2015-01-05 20
It seems like you can just use group by and conditional aggregation for your full query:
select day(sale_date),
max(case when month(sale_date) = 1 then sale_date end) as jan_date,
max(case when month(sale_date) = 1 then revenue end) as jan_revenue,
max(case when month(sale_date) = 2 then sale_date end) as feb_date,
max(case when month(sale_date) = 2 then revenue end) as feb_revenue,
. . .
from sale_table s
group by day(sale_date)
order by day(sale_date);
You haven't specified the database you are using. DAY() is a common function to get the day of the month; MONTH() is a common function to get the months of the year. However, those particular functions might be different in your database.

Can I calculate an aggregate duration over multiple rows with a single row per day?

I'm creating an Absence Report for HR. The Absence Data is stored in the database as a single row per day (the columns are EmployeeId, Absence Date, Duration). So if I'm off work from Tuesday 11 February 2020 to Friday 21 February 2020 inclusive, there will be 9 rows in the table:
11 February 2020 - 1 day
12 February 2020 - 1 day
13 February 2020 - 1 day
14 February 2020 - 1 day
17 February 2020 - 1 day
18 February 2020 - 1 day
19 February 2020 - 1 day
20 February 2020 - 1 day
21 February 2020 - 1 day
(see screenshot below)
HR would like to see a single entry in the report for a contiguous period of absence:
My question is - without using a cursor, how can I calculate the is in SQL (even more complicated because I have to do this using Linq to SQL, but I might be able to swap this out for a stored procedure. Note that the criterion for contiguous data is adjacent working days EXCLUDING weekends and bank holidays. I hope I've made myself clear ... apologies if not.
This is a form of gaps-and-islands. In this case, use lag() to see if two vacations overlap and then a cumulative sum:
select employee, min(absent_from), max(absent_to)
from (select t.*,
sum(case when prev_absent_to = dateadd(day, -1, absent_from) then 0 else 1
end) over (partition by employee order by absent_to) as grp
from (select t.*,
lag(absent_to) over (partition by employee order by absent_from) as prev_absent_to
from t
) t
) t
group by employee, grp;
If you need to deal with holidays and weekends, then you need a calendar table.

How to convert separate year and month column into a single date and get the difference between two dates in terms of months/days

After joining two tables in google bigquery, I ended up with a table which have two sets of year and month in four separate columns. First two year and month columns should form one date and the second pair for another date. I need to convert each of those two sets of year and month in to two single dates, and then get the difference between those two dates in terms of months or days.
Example of the table is provided below:
year month year month
0 2013 12 2014 2
1 2014 5 2014 9
2 2015 6 2015 8
If anyone can help code this in bigquery, would be really helpful.
Thanks in advance.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 2013 year1, 12 month1, 2014 year2, 2 month2 UNION ALL
SELECT 2014, 5, 2014, 9 UNION ALL
SELECT 2015, 6, 2015, 8
)
SELECT
DATE(year1, month1, 1) date1,
DATE(year2, month2, 1) date2,
DATE_DIFF(DATE(year2, month2, 1), DATE(year1, month1, 1), DAY) diff_in_days
FROM `project.dataset.table`
with result
Row date1 date2 diff_in_days
1 2013-12-01 2014-02-01 62
2 2014-05-01 2014-09-01 123
3 2015-06-01 2015-08-01 61
To get the difference in months, you don't need to convert to dates. Just use arithmetic:
select (year1 * 12 + month1) - (year2 * 12 + month2)
So you can use the DATE(YEAR,MONTH,DAY) function two times passing the data that you've got on both columns and passing 1 as the day since it doesn't matter, then use DATE_DIFF(date_expression, date_expression, date_part) passing the dates that you got from those functions and the DATE PART that you want to get as a return, it accepts :
DAY,WEEK, ISOWEEK,MONTH,QUARTER,YEAR and ISOYEAR.

SQL Carryover from previous month

I have some data that I am trying to get some counts on. There are dates for when the record was entered and when it was closed, if it has been closed yet. I want to be able to get a count of how many records were still open from the previous month as of the first of the month. Here is an example. First table is the data, second table is the results I am looking for. In the second table, ignore the parenthesis, they are just the IDs of the records that make up that count.
Position DateEntered DateClosed
1 12/15/2017 12/20/2017
11 12/20/2017 1/7/2018
2 1/23/2018 2/3/2018
3 1/24/2018
4 2/15/2018
5 2/20/2018 5/16/2018
6 3/3/2018 3/15/2018
7 3/23/2018 4/12/2018
8 4/11/2018 5/10/2018
9 4/12/2018 4/25/2018
10 5/4/2018
Year Month Carried Over
2018 January 1 (11)
2018 February 2 (2,3)
2018 March 3 (3,4,5)
2018 April 4 (3,4,5,7)
2018 May 4 (3,4,5,8)
2018 June 3 (3,4,10)
2018 July 3 (3,4,10)
2018 August 3 (3,4,10)
Is this possible, and if so, how? Been racking my brain on this one for a few hours.
For each month, you want the number of rows that start before that month and end after. I'm thinking:
with dates as (
select cast('2018-01-01' as date) as dte
union all
select dateadd(month, 1, dte)
from dates
where dte < '2018-08-01'
)
select d.dte,
(select count(*)
from t
where t.dateentered < d.dte and
(t.dateclosed > d.dte or t.dateClosed is null)
) as carriedover
from dates d;
Note that this puts the date in a single column, rather than splitting the year and month into separate columns. That is easily arranged, but I prefer to keep date components together.

SQL - Grouping results by custom 24 hour period

I need to create an Oracle 11g SQL report showing daily productivity: how many units were shipped during a 24 hour period. Each period starts at 6am and finishes at 5:59am the next day.
How could I group the results in such a way as to display this 24 hour period? I've tried grouping by day, but, a day is 00:00 - 23:59 and so the results are inaccurate.
The results will cover the past 2 months.
Many thanks.
group by trunc(your_date - 1/4)
Days are whole numbers in oracle so 6 am will be 0.25 of a day
so :
select
trunc(date + 0.25) as period, count(*) as number
from table
group by trunc(date + 0.25 )
I havent got an oracle to try it on at the moment.
Well, you could group by a calculated date.
So, add 6 hours to the dates and group by that which would then technically group your dates correctly and produce the correct results.
Assuming that you have a units column or similar on your table, perhaps something like this:
SQL Fiddle
SELECT
TRUNC(us.shipping_datetime - 0.25) + 0.25 period_start
, TRUNC(us.shipping_datetime - 0.25) + 1 + (1/24 * 5) + (1/24/60 * 59) period_end
, SUM(us.units) units
FROM units_shipped us
GROUP BY TRUNC(us.shipping_datetime - 0.25)
ORDER BY 1
This simply subtracts 6 hours (0.25 of a day) from each date. If the time is earlier than 6am, the subtraction will make it fall prior to midnight, and when the resultant value is truncated (time element is removed, the date at midnight is returned), it falls within the grouping for the previous day.
Results:
| PERIOD_START | PERIOD_END | UNITS |
-----------------------------------------------------------------------
| April, 22 2013 06:00:00+0000 | April, 23 2013 05:59:00+0000 | 1 |
| April, 23 2013 06:00:00+0000 | April, 24 2013 05:59:00+0000 | 3 |
| April, 24 2013 06:00:00+0000 | April, 25 2013 05:59:00+0000 | 1 |
The bit of dynamic maths in the SELECT is just to help readability of the results. If you don't have a units column to SUM() up, i.e. each row represents a single unit, then substitute COUNT(*) instead.