I am trying to get a date range from current date to previous 3 years
And previous 3 yr data should start with Jan 01.
Below is the code snippet that I have tried.
dateDF = spark.sql("select current_date() as current_date, add_months(current_date(),-36) as end_date")
dateDF = dateDF.withColumn("end_date_first_date", F.trunc("end_date", "month")).withColumn("end_date_first_date_first_month",lit(''))
dateDF.show()
+------------+----------+-------------------+-------------------------------+
|current_date| end_date|end_date_first_date|end_date_first_date_first_month|
+------------+----------+-------------------+-------------------------------+
| 2021-04-09|2018-04-09| 2018-04-01| |
+------------+----------+-------------------+-------------------------------+
Here I was able to get first date, but how can I get first month. Is there any pre defined functions?
Expected output
+------------+----------+-------------------+-------------------------------+
|current_date| end_date|end_date_first_date|end_date_first_date_first_month|
+------------+----------+-------------------+-------------------------------+
| 2021-04-09|2018-04-09| 2018-04-01| 2018-01-01 |
+------------+----------+-------------------+-------------------------------+
Just use year instead of month in F.trunc:
dateDF = dateDF.withColumn(
"end_date_first_date",
F.trunc("end_date", "month")
).withColumn(
"end_date_first_date_first_month",
F.trunc("end_date", "year")
)
Related
I have a table that contains week number in string and number. I want to sum number with week and get the next week.
for example
tableA
week num
2022-1 1
2022-3 3
output
week num new_week
2022-1 1 2022-2
2022-3 3 2022-6
2022-52 2 2023-2
As a result, I converted the week into the date, added the week to the date, and finally converted the date back to the week. However, when I try to work date to week, I have issues. The SQL below is what I'm using
CONCAT(YEAR(DATEADD('week', num, date)), WEEK(DATEADD('week', num, date)))
I am not using the calendar year. Due to the fact that my week begins on the first Friday of every year, the calculation is incorrect. Would it be possible to avoid the need to convert week into date and date into week?
I wrote a small JS UDF to do your "week" math. It seems if December 31 is Thursday, then that year has 53 weeks. Good thing is, you don't need to convert your "year-week" to dates.
create or replace function addweeks( spcweek VARCHAR, num VARCHAR ) returns VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
year = parseInt(SPCWEEK.substring( 0, 4 ));
week = parseInt(SPCWEEK.substring( 5 ));
week = week + parseInt(NUM);
weekinyear = (new Date(year, 11, 31).getDay() == 4 ? 53 : 52);
while (week > weekinyear ) {
week = week - weekinyear;
weekinyear = (new Date(year, 11, 31).getDay() == 4 ? 53 : 52);
year ++;
}
return year + "-" + week;
$$
;
select myweek, num, addweeks( myweek, num) new_week
from mydata;
+---------+-----+----------+
| MYWEEK | NUM | NEW_WEEK |
+---------+-----+----------+
| 2022-1 | 1 | 2022-2 |
| 2022-3 | 3 | 2022-6 |
| 2022-52 | 2 | 2023-2 |
| 2020-52 | 2 | 2021-1 |
+---------+-----+----------+
I think you can correct my logic if there is an error in calculating the total weeks of the year.
With a bit of string fiddling you could do the calulation like this.
SELECT week, num, CONCAT( SUBSTRING(week FROM 1 for 5), num + SUBSTRING(week FROM INSTR(week, '-')+1))
FROM table;
I'm trying to get several simple queries into one new table using Googe Big Query. In the final table is existing revenue data per day (that I can simply draw from another table). I then want to calculate the average revenue per day of the current month and continue this value until the end of the month. So the final table is updated every day and includes actual data and forecasted data.
So far, I came up with the following, which generates an error message in combination: Scalar subquery produced more than one element
#This gives me the date, the revenue per day and the info that it's actual data
SELECT
date, sum(revenue), 'ACTUAL' as type from `project.dataset.table` where date >"2020-01-01" and date < current_date() group by date
union distinct
# This shall provide the remaining dates of the current month
SELECT
(select calendar_date FROM `project.dataset.calendar_table` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)),
#This shall provide the average revenue per day so far and write this value for each day of the remaining month
(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.table` WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date) as average_daily_revenue where calendar >= current_date()),
'FORECAST'
How I wish the final data looks like:
+------------+------------+----------+
| date | revenue | type |
+------------+------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 230,43 € | FORECAST |
| 25.04.2020 | 230,43 € | FORECAST |
| 26.04.2020 | 230,43 € | FORECAST |
| 27.04.2020 | 230,43 € | FORECAST |
| 28.04.2020 | 230,43 € | FORECAST |
| 29.04.2020 | 230,43 € | FORECAST |
| 30.04.2020 | 230,43 € | FORECAST |
+------------+------------+----------+
The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far.
Thanks for any hint on how to approach this.
I just figured something out, which creates the data I need. I'll still work on updating this every day automatically. But this is what I got so far:
select
date, 'actual' as type, sum(revenue) as revenue from `project.dataset.revenue` where date >="2020-01-01" and date < current_date() group by date
union distinct
select calendar_date, 'forecast',(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.revenue` WHERE extract(year from date) = extract (year from current_date()) and extract(month from date) = extract (month from current_date()) group by date order by date) as average_daily_revenue), FROM `project.dataset.calendar` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY) order by date
I have an Oracle Database I am trying to query multiple date fields by dates and get the totals by month and year as output.
This was my original query. This just gets what I want for the dates I want to input.
SELECT COUNT(*) as Total
FROM Some_Table s
WHERE (s.Start_DATE <= TO_Date ('2019/09/01', 'YYYY/MM/DD'))
AND (s.End_DATE IS NULL OR (s.End_DATE > TO_Date ('2019/08/31', 'YYYY/MM/DD')))
I would like to get an output where it gives me a count by Month and Year. The count would be the number between the Start_DATE (beginning of the month) and the End_DATE (end of the month).
I can't do
Edit: this was an example from another query and has no relation to the query above. I was just trying to provide an example of what I cannot do because I have two separate date fields. The example below was stating my knowledge of extracting month and year from a single date field. Sorry for the confusion.
SELECT extract(year from e.DATE_OCCURRED) as Year
,to_char(e.DATE_OCCURRED, 'MONTH') as Month
,count (*) as totals
because the Start_DATE and End_DATE are two separate fields.
Any help would be appreciated
Edit: Example would be
----------------------------------
| Name | Start_DATE | End_DATE |
----------------------------------
| John | 01/16/2018 | 07/09/2019 |
| Sue | 06/01/2015 | 09/01/2018 |
| Joe | 04/06/2016 | Null |
----------------------------------
I want to know my total number of workers that would have been working by month and year. Would want the output to look like.
------------------------
| Year | Month | Total |
------------------------
| 2016 | Aug | 2 |
| 2018 | May | 3 |
| 2019 | Aug | 2 |
------------------------
So I know I had two workers working in August 2016 and three in May 2018.
Do you want this?
SELECT count(*)
from some_table
where year(e.DATE_OCCURRED) > year(start_date)
and year(e.DATE_OCCURRED) < year(end_date)
and month(e.DATE_OCCURRED) > month(start_date)
and month(e.DATE_OCCURRED) < month(end_date)
note: using month and year functions is generally better when working with dates. If you convert to characters you might find that January comes after February (as an example) since J comes after F in the alphabet.
Are you looking for this?(Hoping that end_date > start_date)
select extract (year from end_dt2)- extract(YEAR from st_dt1) as YearDiff ,
extract (month from end_dt2)- extract (month from st_dt1) as monthDiff from tab;
I'm tasked with pulling the data for the four recent quarters. If I was dealing with dates this would be easy, but I'm not sure how to do so when I have a quarters table that looks like this:
| quarter | year |
+---------+------+
| 1 | 2016 |
| 2 | 2016 |
| 3 | 2016 |
...
I know that I can get the current quarter by doing something like this:
SELECT *
FROM quarters
WHERE quarter = (EXTRACT(QUARTER FROM CURRENT_DATE))
AND year = (EXTRACT(YEAR FROM CURRENT_DATE));
However, I'm not sure the best way to get the four most recent quarters. I thought about getting this quarter from last year, and selecting everything since then, but I don't know how to do that with tuples like this. My expected results would be:
| quarter | year |
+---------+------+
| 1 | 2017 |
| 2 | 2017 |
| 3 | 2017 |
| 4 | 2017 |
Keep in mind they won't always be the same year - in Q12018 this will change.
I've built a SQLFiddle that can be used to tinker with this - http://sqlfiddle.com/#!17/0561a/1
Here is one method:
select quarter, year
from quarters
order by year desc, quarter desc
fetch first 4 rows only;
This assumes that the quarters table only has quarters with data in it (as your sample data suggests). If the table has future quarters as well, then you need to compare the values to the current date:
select quarter, year
from quarters
where year < extract(year from current_date) or
(year = extract(year from current_date) and
quarter <= extract(quarter from current_date)
)
order by year desc, quarter desc
fetch first 4 rows only;
For the case that there can be gaps, like 2/2017 missing, and one would then want to return only three quarters instead of four, one can turn years and quarters into consecutive numbers by multiplying the year by four and adding the quarters.
select *
from quarters
where year * 4 + quarter
between extract(year from current_date) * 4 + extract(quarter from current_date) - 3
and extract(year from current_date) * 4 + extract(quarter from current_date)
order by year desc, quarter desc;
My requirement is to populate week number against calendar date.The catch is week number will start from October 1 and end at December 7.
So week commencing October 1 will be treated as week 1 , 7th October as week 2 and so on last week number will populate against December 7. Rest will have week number column as NULL. How to do it in hive ?
with t as (select date '2014-10-23' as dt)
select case
when dt between cast(concat(date_format(dt,'yyyy'),'-10-01') as date)
and cast(concat(date_format(dt,'yyyy'),'-12-07') as date)
then datediff (dt,cast(concat(date_format(dt,'yyyy'),'-10-01') as date)) div 7 + 1
end as week_number
from t
+-------------+
| week_number |
+-------------+
| 4 |
+-------------+