Netezza Grouping by Week Start (Sunday) AND Month Start - sql

I have a little bit of an unusual question. I'm using Python to write some data to a text file that I then use Tableau to read from and build visualizations. I'm grouping the query results by week in order to reduce the size of the output file. I think the SQL is pretty standard for that type of operation.
SELECT [Date] - EXTRACT(DOW FROM [Date]) + 1
[this gives me the Sunday of the week for any date]
However, I occasionally want to group by months rather than weeks, which is impossible with the current output. What I want is a modification to the query which will group by week EXCEPT when a week overlaps two months. If the week overlaps two months, it will split the results into the first part of the week which is in the first month, and then the second part of the week which is in the second month. That way, we could use the output to show weekly result OR monthly/quarterly/yearly results simply by grouping the dates within Tableau.
Has anyone tackled a problem like this before?
As an illustration, consider the following values.
2016-08-21 1
2016-08-22 1
2016-08-23 1
2016-08-24 1
2016-08-25 1
2016-08-26 1
2016-08-27 1
2016-08-28 1
2016-08-29 1
2016-08-30 1
2016-08-31 1
2016-09-01 1
2016-09-02 1
2016-09-03 1
2016-09-04 1
... ...
I would like the code to output the following values:
2016-08-21 7
2016-08-28 4
2016-09-01 3
2016-09-04 1...
Would really appreciate any help!

Based on googled Netzetta syntax, this could work:
select
min([Date]) as MinDate, count(*) as TotalDays
from YourTable
group by
extract(year from [Date]),
extract(month from [Date]),
(case
when extract(dow from [Date]) = 1 -- dow 1 is sunday
then extract(week from [Date]) + 1 -- week starts on monday
else extract(week from [Date])
end);
Or as suggested in the comments, group on the sunday:
select
min([Date]) as MinDate, count(*) as TotalDays
from YourTable
group by
([Date] - (extract(dow from [Date]) - 1));

Here's the final code that I used.
CASE
WHEN EXTRACT(MONTH FROM [Date]) <> EXTRACT(MONTH FROM [Date] - EXTRACT(DOW FROM [Date]) + 1)
THEN DATE_TRUNC('month', [Date])
ELSE [Date] - EXTRACT(DOW FROM [Date]) + 1 END
Then I grouped on that field.
The way it works is that it checks if the month of the date is equal to the month of the week start. If it isn't, it returns the first day of the month. If it is, it returns the week start. This code returns the values in the example from the original post.

Related

Get count of day types between two dates

I am trying the get the count of week days between two dates for which I have not found the solution in BigQuery standard sql. I have tried the BQ sql date function DATE_DIFF(date_expression_a, date_expression_b, date_part) following published examples, but it did not reveal the result.
For example, I have two dates 2021-02-13 and 2021-03-31 and my desired outcome would be:
MON
TUE
WED
THUR
FRI
SAT
SUN
6
6
6
6
7
7
7
Consider below approach
with your_table as (
select date
from unnest(generate_date_array("2021-02-13", "2021-03-30")) AS date
)
select * from your_table
pivot (count(*) for format_date('%a', date) in ('Mon','Tue','Wed','Thu','Fri','Sat','Sun'))
with output
Or you can just simply do
select
format_date('%a', date) day_of_week,
count(*) counts
from your_table
group by day_of_week
with output
You can do the following:
SELECT
CASE EXTRACT(DAYOFWEEK
FROM
dates)
WHEN 1 THEN 'MON'
WHEN 2 THEN 'TUE'
WHEN 3 THEN 'WED'
WHEN 4 THEN 'THU'
WHEN 5 THEN 'FRI'
WHEN 6 THEN 'SAT'
WHEN 7 THEN 'SUN'
END
AS day_of_week,
COUNT(*) AS day_count
FROM
UNNEST(GENERATE_DATE_ARRAY("2021-02-13", "2021-03-30")) AS dates
GROUP BY 1
The important part is the GENERATE_DATE_ARRAY function, that will return all the dates between the dates you're interested in. UNNEST will return one row for each date (instead of one row for the array of all dates).
From there, you can extract the day of the week thanks to the BQ date functions, and count the number of occurences with a GROUP BY day_of_week.
The above query gives the following result:

how to count a column by month if the date column has time stamp?

I have two columns in a table:
id date
1 1/1/18 12:55:00 AM
2 1/2/18 01:34:00 AM
3 1/3/18 02:45:00 AM
How do I count the number of IDs per month if the time is appended into the date column?
The output would be:
Count month
3 1
In ANSI SQL, you would use:
select extract(month from date) as month, count(*)
from t
group by extract(month from date);
I think more databases support a month() function rather than extract(), though.
you have to extract month and count by using group by
select DATE_PART('month', date) as month,count(id) from yourtable
group by DATE_PART('Month', date)

Select based on the next business day in a calendar table

I have a calendar table (Calendar_Date, Is_Business_Day) already filled in.
I have already managed to do a SELECT on this basis :
If today is before the 3rd day of the current month, select all the days before this day until the last day of the penultimate month
For example : Today is 2018-05-02, this is my output :
Calendar_Date | Is_Business_Day
2000-01-01 | 0
... |
2018-03-29 | 1
2018-03-30 | 1
2018-03-31 | 0
If today is after the 3rd day of the current month, select all the days before this day until the last day of the last month.
For example : Tomorrow, 2018-05-03 this will be my output :
Calendar_Date | Is_Business_Day
2000-01-01 | 0
... |
2018-04-28 | 0
2018-04-29 | 0
2018-04-30 | 1
This is my query :
SELECT Calendar_Date, Is_Business_Day
FROM Calendar_Table
WHERE (Calendar_Date <= (CASE WHEN DATEPART(day, GETDATE()) >= 3 THEN EOMONTH(DATEADD(MONTH, - 1, GETDATE())) ELSE EOMONTH(DATEADD(MONTH, - 2, GETDATE())) END))
This is working perfectly, but what i would like it to do now is to switch after the first business day after the 3rd day of the month, instead of switching after the 3rd day of the month.
How can I use the information about business days in my calendar table to do this?
I think following query should work.
;WITH CTE AS
(
SELECT Calendar_Date, Is_Business_Day
FROM Calendar_Table
WHERE (Calendar_Date <= (CASE WHEN DATEPART(day, GETDATE()) >= 3
THEN EOMONTH(DATEADD(MONTH, - 1, GETDATE()))
ELSE EOMONTH(DATEADD(MONTH, - 2, GETDATE())) END))
)
SELECT * FROM CTE
WHERE Calendar_Date >= (SELECT MIN(Calendar_Date) FROM CTE WHERE Is_Business_Day=1)

How do I exclude SQL data that is in current Week of Year

Currently I have some SQL which works well, except that I am continuously having to remove the first line of the results, as it only contains (at this point, being Monday 17th) data for part of the week, and being grouped by this field it is showing 'fake data'
Here is the current code:
SELECT
YEAR(submitted) YEAR,
COUNT(request) Total_Requests,
DATEPART( wk, submitted) WEEK
FROM
wv_external_statistics
WHERE
userid <> 'anonymous'
GROUP BY
YEAR(submitted),
DATEPART( wk, submitted)
Here is some sample data:
YEAR | Total_Requests | WEEK
2017 | 361 | 28
2017 | 2486 | 27
2017 | 2860 | 26
2017 | 4521 | 25
2016 | 2600 | 52
2016 | 3028 | 51
....
As you can see the top row is the current week, and as we are only at the first day of the week the data is not complete, so I want to exclude this row from my results... I just tried the below, and immediately zero rows were found, so I am clearly doing something silly, which I am hoping someone can point out?
DATEPART( wk, submitted) <= DATEPART( wk, submitted)-1
NOTE: I need to keep all the data from the year 2016, even though it's week number will be greater than this week, the year will be from previous years.
Cheers
If you want to remove the current week. Why not just throw this in the where clause...
Where
Datepart( wk, submitted) != datepart(wk,getdate())
SELECT YEAR(SubmittedDate) As [YEAR]
,COUNT(Request) As Total_Requests
,DATEPART(WEEK, SubmittedDate) As [WEEK]
FROM wv_external_statistics
WHERE UserID <> 'Anonymous' Or ((DATEPART(WEEK, SubmittedDate) -
DATEPART(WEEK, GETDATE()) = 0) And (YEAR(SubmittedDate) - YEAR(GETDATE()) = 0))
GROUP BY YEAR(SubmittedDate),DATEPART(WEEK, SubmittedDate);
You need to specify having condition (well, should admit its long, fuzzy and maybe not optimal)
SELECT
YEAR(submitted) YEAR,
COUNT(request) Total_Requests,
DATEPART( wk, submitted) WEEK
FROM
wv_external_statistics
WHERE
userid <> 'anonymous'
GROUP BY
YEAR(submitted),
DATEPART( wk, submitted)
having year(submitted) * 100 + datepart(wk, submitted) < (select max(year(submitted) * 100 + datepart(wk, submitted)) from wv_external_statistics)
because your condition (DATEPART( wk, submitted) <= DATEPART( wk, submitted)-1) is always false for each row in the query

How to get calculations from two rows

I started learning SQL recently and would like to know if it is possible to do the calculations as below.
Basically my table looks like this:
id Date Fill_nbr
1 01/01/2015 30
1 02/05/2015 30
1 03/02/2015 30
1 07/01/2015 30
1 07/26/2015 30
2 03/01/2015 30
....
And I'd like to create a table like this:
id Date Fill_nbr Date_last Gap First
1 01/01/2015 30 01/30/2015 0 1
1 02/05/2015 30 03/04/2015 5 0
1 03/02/2015 30 03/31/2015 0 0
1 07/01/2015 30 07/30/2015 91 1
1 07/26/2015 30 08/24/2015 0 0
2 03/01/2015 30 03/30/2015 0 1
....
The rule for column 'Date_last' is Date_last = Date + fill_nbr which is easy to get.
The difficult part for me is the 'Gap' part. The rules are:
Gap='Date' - last record of "Date_last'.
For example, gap for the second row is calculated as Gap=02/05/2015-
01/30/2015;
Gap=0 for everyone's first record or when the calculated gap<0;
The rule for column 'First':
First=1 for everyone's first record OR when gap>60.
Otherwise, First=0;
Looks like this question is about abandoned already since important details are still missing...thought it'd be interesting to at least find a solution. The solution below works for SQL Server 2012 or higher since it uses LAG.
SELECT
id,
[Date],
Fill_nbr,
(CASE WHEN LAG (DATEADD(DD, Fill_nbr - 1, [Date]), 1, NULL) OVER (
PARTITION BY id ORDER BY [Date]) > [Date] THEN 0 ELSE
COALESCE(DATEDIFF(DD, LAG (DATEADD(DD, Fill_nbr - 1, [Date]), 1, NULL) OVER (
PARTITION BY id ORDER BY [Date]), [Date]) - 1, 0) END) AS Gap,
DATEADD(DD, Fill_nbr - 1, [Date]) AS Date_last,
CASE WHEN DATEPART(DD, [Date]) = 1 THEN 1 ELSE 0 END AS [First]
FROM Records
SQL Fiddle: http://sqlfiddle.com/#!6/a9b68/8
Thanks, Jason! I figured it out that the LAG or LEAD functions would work for this problem. So here is my solution which is similar with yours. Thanks again for your input!
select
id,
date,
fill_nbr,
date + fill_nbr - 1 AS date_last,
LAG(date_last) OVER (PARTITION BY id OREDER by id, date) LagV,
date - LagV - 1 as gap,
ROW_NUMBER() OVER(PARTITION BY id IRDER BY id, date) AS rk,
CASE
WHEN (gap>30 or rk=1) then '1'
ELSE '0'
END AS first
FROM table;