Get Recent Quarters Without Dates - sql

I'm tasked with pulling the data for the four recent quarters. If I was dealing with dates this would be easy, but I'm not sure how to do so when I have a quarters table that looks like this:
| quarter | year |
+---------+------+
| 1 | 2016 |
| 2 | 2016 |
| 3 | 2016 |
...
I know that I can get the current quarter by doing something like this:
SELECT *
FROM quarters
WHERE quarter = (EXTRACT(QUARTER FROM CURRENT_DATE))
AND year = (EXTRACT(YEAR FROM CURRENT_DATE));
However, I'm not sure the best way to get the four most recent quarters. I thought about getting this quarter from last year, and selecting everything since then, but I don't know how to do that with tuples like this. My expected results would be:
| quarter | year |
+---------+------+
| 1 | 2017 |
| 2 | 2017 |
| 3 | 2017 |
| 4 | 2017 |
Keep in mind they won't always be the same year - in Q12018 this will change.
I've built a SQLFiddle that can be used to tinker with this - http://sqlfiddle.com/#!17/0561a/1

Here is one method:
select quarter, year
from quarters
order by year desc, quarter desc
fetch first 4 rows only;
This assumes that the quarters table only has quarters with data in it (as your sample data suggests). If the table has future quarters as well, then you need to compare the values to the current date:
select quarter, year
from quarters
where year < extract(year from current_date) or
(year = extract(year from current_date) and
quarter <= extract(quarter from current_date)
)
order by year desc, quarter desc
fetch first 4 rows only;

For the case that there can be gaps, like 2/2017 missing, and one would then want to return only three quarters instead of four, one can turn years and quarters into consecutive numbers by multiplying the year by four and adding the quarters.
select *
from quarters
where year * 4 + quarter
between extract(year from current_date) * 4 + extract(quarter from current_date) - 3
and extract(year from current_date) * 4 + extract(quarter from current_date)
order by year desc, quarter desc;

Related

BigQuery: Repeat the same calculated value in multiple rows

I'm trying to get several simple queries into one new table using Googe Big Query. In the final table is existing revenue data per day (that I can simply draw from another table). I then want to calculate the average revenue per day of the current month and continue this value until the end of the month. So the final table is updated every day and includes actual data and forecasted data.
So far, I came up with the following, which generates an error message in combination: Scalar subquery produced more than one element
#This gives me the date, the revenue per day and the info that it's actual data
SELECT
date, sum(revenue), 'ACTUAL' as type from `project.dataset.table` where date >"2020-01-01" and date < current_date() group by date
union distinct
# This shall provide the remaining dates of the current month
SELECT
(select calendar_date FROM `project.dataset.calendar_table` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)),
#This shall provide the average revenue per day so far and write this value for each day of the remaining month
(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.table` WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date) as average_daily_revenue where calendar >= current_date()),
'FORECAST'
How I wish the final data looks like:
+------------+------------+----------+
| date | revenue | type |
+------------+------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 230,43 € | FORECAST |
| 25.04.2020 | 230,43 € | FORECAST |
| 26.04.2020 | 230,43 € | FORECAST |
| 27.04.2020 | 230,43 € | FORECAST |
| 28.04.2020 | 230,43 € | FORECAST |
| 29.04.2020 | 230,43 € | FORECAST |
| 30.04.2020 | 230,43 € | FORECAST |
+------------+------------+----------+
The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far.
Thanks for any hint on how to approach this.
I just figured something out, which creates the data I need. I'll still work on updating this every day automatically. But this is what I got so far:
select
date, 'actual' as type, sum(revenue) as revenue from `project.dataset.revenue` where date >="2020-01-01" and date < current_date() group by date
union distinct
select calendar_date, 'forecast',(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.revenue` WHERE extract(year from date) = extract (year from current_date()) and extract(month from date) = extract (month from current_date()) group by date order by date) as average_daily_revenue), FROM `project.dataset.calendar` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY) order by date

SQL Query Extract Totals by Month for Multiple Date Fields

I have an Oracle Database I am trying to query multiple date fields by dates and get the totals by month and year as output.
This was my original query. This just gets what I want for the dates I want to input.
SELECT COUNT(*) as Total
FROM Some_Table s
WHERE (s.Start_DATE <= TO_Date ('2019/09/01', 'YYYY/MM/DD'))
AND (s.End_DATE IS NULL OR (s.End_DATE > TO_Date ('2019/08/31', 'YYYY/MM/DD')))
I would like to get an output where it gives me a count by Month and Year. The count would be the number between the Start_DATE (beginning of the month) and the End_DATE (end of the month).
I can't do
Edit: this was an example from another query and has no relation to the query above. I was just trying to provide an example of what I cannot do because I have two separate date fields. The example below was stating my knowledge of extracting month and year from a single date field. Sorry for the confusion.
SELECT extract(year from e.DATE_OCCURRED) as Year
,to_char(e.DATE_OCCURRED, 'MONTH') as Month
,count (*) as totals
because the Start_DATE and End_DATE are two separate fields.
Any help would be appreciated
Edit: Example would be
----------------------------------
| Name | Start_DATE | End_DATE |
----------------------------------
| John | 01/16/2018 | 07/09/2019 |
| Sue | 06/01/2015 | 09/01/2018 |
| Joe | 04/06/2016 | Null |
----------------------------------
I want to know my total number of workers that would have been working by month and year. Would want the output to look like.
------------------------
| Year | Month | Total |
------------------------
| 2016 | Aug | 2 |
| 2018 | May | 3 |
| 2019 | Aug | 2 |
------------------------
So I know I had two workers working in August 2016 and three in May 2018.
Do you want this?
SELECT count(*)
from some_table
where year(e.DATE_OCCURRED) > year(start_date)
and year(e.DATE_OCCURRED) < year(end_date)
and month(e.DATE_OCCURRED) > month(start_date)
and month(e.DATE_OCCURRED) < month(end_date)
note: using month and year functions is generally better when working with dates. If you convert to characters you might find that January comes after February (as an example) since J comes after F in the alphabet.
Are you looking for this?(Hoping that end_date > start_date)
select extract (year from end_dt2)- extract(YEAR from st_dt1) as YearDiff ,
extract (month from end_dt2)- extract (month from st_dt1) as monthDiff from tab;

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

hive date dim - customize week number

My requirement is to populate week number against calendar date.The catch is week number will start from October 1 and end at December 7.
So week commencing October 1 will be treated as week 1 , 7th October as week 2 and so on last week number will populate against December 7. Rest will have week number column as NULL. How to do it in hive ?
with t as (select date '2014-10-23' as dt)
select case
when dt between cast(concat(date_format(dt,'yyyy'),'-10-01') as date)
and cast(concat(date_format(dt,'yyyy'),'-12-07') as date)
then datediff (dt,cast(concat(date_format(dt,'yyyy'),'-10-01') as date)) div 7 + 1
end as week_number
from t
+-------------+
| week_number |
+-------------+
| 4 |
+-------------+

How do I get a trailing week count for every day over a given period (on Postgres)?

Say I’ve got an events table with just the columns id and occurred (which is just a datetime).
I want to get, for every day in a given period, the number of events in the previous week. So, let’s say the period was Jan 1 through April 1. I’d want the results of this query to look like:
_______________
|count | date |
|------|------|
| 3 | 1/1 |
| 2 | 1/2 |
| 0 | 1/3 |
| 4 | 1/4 |
---------------
Where count is, for that date, the number of events that happened in the week prior. So, the 3 count for 1/1 is how many events happened between Dec 25th and Jan 1.
I could do this easily enough in code:
for (date in 1/1 to 4/1) {
start_date = date - 7 days
db.query(’SELECT COUNT(1) FROm events WHERE occurred > start_date AND occurred < date`)
}
Unfortunately, this would result in over a hundred separate queries. I’d like to figure out how to do this in one query.
Hmm, you can generate all the dates in the period using generate_series(). Then then join in the data and do a cumulative sum:
select dd.dte,
sum(cnt) over (order by dd.dte rows between 6 preceding and current date) as avg_7daymoving
from generate_series('2015-01-01'::timestamp, '2015-04-01'::timestamp, '1 day'::interval) dd(dte) left join
(select date_trunc('day', occurred) as dte, count(*) as cnt
from events e
group by date_trunc('day', occurred)
) e
on e.dte = dd.dte