I would like to create a monthly summary table to sum values where the date column is within two other different dates from another existing table.
I've tried with the following SQL code but Snowflake returns this error: SQL compilation error: error line 9 at position 10 invalid identifier 'DATE'.
select
dateadd(month, seq, dt::date) as DATE,
year(DATE) as Y,
month(DATE) as M,
(select
sum(items.total_price_disc_conv)
from BI.MODELS.CONTR_LINES_ACC as items
where DATE between items.subscription_start_date and items.subscription_end_date
) as ARR
from (
select
seq4() as seq,
dateadd(month, 1, '2019-12-01'::date) as dt
from table(generator(rowcount => (12 * 5))
)
);
I think you are hitting this error because you try to use an ALIAS (DATE) in the WHERE filter. Can you try this one?
select
dateadd(month, seq, dt::date) as DATE,
year(DATE) as Y,
month(DATE) as M,
(select
sum(items.total_price_disc_conv)
from BI.MODELS.CONTR_LINES_ACC as items
where dateadd(month, seq, dt::date) between items.subscription_start_date and items.subscription_end_date
) as ARR
from (
select
seq4() as seq,
dateadd(month, 1, '2019-12-01'::date) as dt
from table(generator(rowcount => (12 * 5))
)
);
Of course, it's better to use MY_DATE instead of DATE.
Related
How to get the whole row or other column value for the same row for which window function in over clause gave output.
For ex.
with o as (
select date from unnest(GENERATE_TIMESTAMP_ARRAY('2021-01-01 00:00:00',current_timestamp(),interval 1 hour)) as date
enter code here
), p as (
select *,RAND()*100 as Number from o
), q as (
select *,max(number) over(order by date) as best from p
order by date
)
select * from q
Using the above query I get output as the best value which defined the maximum number above me when order by timestamp.
The output of the above column :
I calculated the best value using the over function, but I also want the date column on which day it was best.
Maybe this one?
with o as (
select date from unnest(GENERATE_TIMESTAMP_ARRAY('2021-01-01 00:00:00',current_timestamp(),interval 1 hour)) as date
), p as (
select *,RAND()*100 as Number from o
), q as (
select *,max(number) over(order by date) as best from p
)
select * except(date_new_best), max(date_new_best) over (order by date) as date_best
from (
select *, if(number=best, date, NULL) as date_new_best
from q
)
order by date
Consider below approach
with o as (
select date from unnest(GENERATE_TIMESTAMP_ARRAY('2021-01-01 00:00:00',current_timestamp(),interval 1 hour)) as date
), p as (
select *,RAND()*100 as Number from o
), q as (
select *,max(number) over(order by date) as best from p
)
select * except(best_date),
last_value(best_date ignore nulls) over(order by date) as best_date
from (
select *, if(best = lag(best) over(order by date), null, date) best_date
from q
)
with output like below
I am trying to write a select statement which detects if a month is not existent and automatically inserts that month with a value 0. It should insert all missing months from the first entry to the last entry.
Example:
My table looks like this:
After the statement it should look like this:
You need a recursive CTE to get all the years in the table (and the missing ones if any) and another one to get all the month numbers 1-12.
A CROSS join of these CTEs will be joined with a LEFT join to the table and finally filtered so that rows prior to the first year/month and later of the last year/month are left out:
WITH
limits AS (
SELECT MIN(year) min_year, -- min year in the table
MAX(year) max_year, -- max year in the table
MIN(DATEFROMPARTS(year, monthnum, 1)) min_date, -- min date in the table
MAX(DATEFROMPARTS(year, monthnum, 1)) max_date -- max date in the table
FROM tablename
),
years(year) AS ( -- recursive CTE to get all the years of the table (and the missing ones if any)
SELECT min_year FROM limits
UNION ALL
SELECT year + 1
FROM years
WHERE year < (SELECT max_year FROM limits)
),
months(monthnum) AS ( -- recursive CTE to get all the month numbers 1-12
SELECT 1
UNION ALL
SELECT monthnum + 1
FROM months
WHERE monthnum < 12
)
SELECT y.year, m.monthnum,
DATENAME(MONTH, DATEFROMPARTS(y.year, m.monthnum, 1)) month,
COALESCE(value, 0) value
FROM months m CROSS JOIN years y
LEFT JOIN tablename t
ON t.year = y.year AND t.monthnum = m.monthnum
WHERE DATEFROMPARTS(y.year, m.monthnum, 1)
BETWEEN (SELECT min_date FROM limits) AND (SELECT max_date FROM limits)
ORDER BY y.year, m.monthnum
See the demo.
You should not be storing date components in two separate columns; instead, you should have just one column, with a proper date-like datatype.
One approach is to use a recursive query to generate all starts of month between the earliest and latest date in the table, then brin the table with a left join.
In SQL Server:
with cte as (
select min(datefromparts(year, monthnum, 1)) as dt,
max(datefromparts(year, monthnum, 1)) as dt_max
from mytable
union all
select dateadd(month, 1, dt)
from cte
where dt < dt_max
)
select c.dt, coalesce(t.value, 0) as value
from cte c
left join mytable t on datefromparts(t.year, t.month, 1) = c.dt
If your data spreads over more that 100 months, you need to add option(maxrecursion 0) at the end of the query.
You can extract the date components in the final select if you like:
select
year(c.dt) as yr,
month(c.dt) as monthnum,
datename(month, c.dt) as monthname,
coalesce(t.value, 0) as value
from ...
I have a table named Employees with Columns: PersonID, Name, StartDate. I want to calculate 1) difference in days between the newest and oldest employee and 2) the longest period of time (in days) without any new hires. I have tried to use DATEDIFF, however the dates are in a single column and I'm not sure what other method I should use. Any help would be greatly appreciated
Below is for BigQuery Standard SQL
#standardSQL
SELECT
SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
SELECT
DATE_DIFF(
StartDate,
LAG(StartDate) OVER(ORDER BY StartDate),
DAY
) days_before_next_hire
FROM `project.dataset.your_table`
)
You can test, play with above using dummy data as in the example below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT DATE '2019-01-01' StartDate UNION ALL
SELECT '2019-01-03' StartDate UNION ALL
SELECT '2019-01-13' StartDate
)
SELECT
SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
SELECT
DATE_DIFF(
StartDate,
LAG(StartDate) OVER(ORDER BY StartDate),
DAY
) days_before_next_hire
FROM `project.dataset.your_table`
)
with result
Row days_between_newest_and_oldest_employee longest_period_without_new_hire
1 12 9
Note use of -1 in calculating longest_period_without_new_hire - it is really up to you to use this adjustment or not depends on your preferences of counting gaps
1) difference in days between the newest and oldest record
WITH table AS (
SELECT DATE(created_at) date, *
FROM `githubarchive.day.201901*`
WHERE _table_suffix<'2'
AND repo.name = 'google/bazel-common'
AND type='ForkEvent'
)
SELECT DATE_DIFF(MAX(date), MIN(date), DAY) max_minus_min
FROM table
2) the longest period of time (in days) without any new records
WITH table AS (
SELECT DATE(created_at) date, *
FROM `githubarchive.day.201901*`
WHERE _table_suffix<'2'
AND repo.name = 'google/bazel-common'
AND type='ForkEvent'
)
SELECT MAX(diff) max_diff
FROM (
SELECT DATE_DIFF(date, LAG(date) OVER(ORDER BY date), DAY) diff
FROM table
)
I have following table.
DECLARE #TBL_RESULT Table
(
ID varchar(10),
CreateDate DateTime,
PEOPLE_CODE_ID varchar(10),
CONVERSION_DATE DateTime,
CAMPUS varchar(20),
DAYS_TOOK int
);
This table has records from January 01,2013 to date of all the leads that were received and converted.
I initially needed to find the Median time it took to convert leads that arrived in last 10 weeks and group them by Campus I was able to do that Using the SQL Query below
WITH CTE_RESULT
AS ( SELECT *
FROM #TBL_RESULT
WHERE CreateDate > DATEADD(WEEK, -10, GETDATE())
)
SELECT Campus ,
AVG(DAYS_TOOK) AS MedianTime
FROM ( SELECT CAMPUS ,
Days_Took ,
ROW_NUMBER() OVER ( PARTITION BY Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER ( PARTITION BY CAMPUS ) AS CampusCount
FROM CTE_RESULT
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.Campus
I now need to plot this trend on a graph i.e. find records the previous 10 weeks buckets and plot the median on a line chart - where each line is one campus. (Grouped by campus)
Is cursor my only option? where I will find the leads of first 10 week starting from Jan 01, do the above SQL query to get median, push it to a temp table and then find the next 10 weeks and so on.
Or is there anything better i can do?
Without trying to optimise your query, if you need to produce the same result across multiple 10-WEEK periods, you can expand your current (10 week ago to today) ranges to as many ranges as required, threading a PeriodEndDate throughout the query as shown below.
SQL Fiddle
MS SQL Server 2012 Schema Setup:
Query 1:
DECLARE #TBL_RESULT Table
(
ID varchar(10),
CreateDate DateTime,
PEOPLE_CODE_ID varchar(10),
CONVERSION_DATE DateTime,
CAMPUS varchar(20),
DAYS_TOOK int
);
-- fill the table with some dummy data from 2013-01-01
INSERT #TBL_RESULT (CreateDate, Campus, Days_Took)
SELECT DATEADD(D, A.Number, '20130101'), 'Campus' + Right(B.Number, 10),
ABS(CAST(NEWID() AS binary(6)) % 130) + 1
FROM master..spt_values A
JOIN master..spt_values B on B.type='P' and B.number < 50 -- 50 campuses
WHERE A.type='P'
AND DATEADD(D, A.Number, '20130101') <= GetDate();
-- This first CTE is used to create the required number of 10-week periods
WITH N(NUMBER) AS (
SELECT 0
union all
select number+1 from N
where Number <= DATEDIFF(WEEK, '20130101', GETDATE())
),
-- and from below here it's your query with the PeriodEndDate threaded through
CTE_RESULT AS (
SELECT DATEADD(WEEK, -Number, GETDATE()) PeriodEndDate,
T.*
FROM #TBL_RESULT T
CROSS JOIN N
-- you see the range built up dynamically here
WHERE CreateDate > DATEADD(WEEK, -Number-10, GETDATE())
AND CreateDate < DATEADD(WEEK, -Number, GETDATE()) +1
)
SELECT PeriodEndDate, Campus ,
AVG(DAYS_TOOK) AS MedianTime
FROM (
SELECT PeriodEndDate, CAMPUS ,
Days_Took ,
ROW_NUMBER() OVER ( PARTITION BY PeriodEndDate, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER ( PARTITION BY PeriodEndDate, CAMPUS ) AS CampusCount
FROM CTE_RESULT
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.PeriodEndDate, x.Campus
ORDER BY x.PeriodEndDate, x.Campus;
It seems that you solved the hard part of the problem.
To get what you want, you need to introduce a grouping variable. In this case, I measure the number of weeks in the past and divide by 10 (SQL Server does integer division so this produces an integer).
You just then judiciously use this in the partition by and group by statements:
WITH CTE_RESULT AS (
SELECT t.*,
DATEDIFF(week, CreateDate, GETDATE()) / 10 as groupnum
FROM #TBL_RESULT t
)
SELECT Campus, groupnum, MIN(CreateDate), MAX(CreateDate),
AVG(DAYS_TOOK) AS MedianTime
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY groupnum, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
COUNT(*) OVER (PARTITION BY groupnum, CAMPUS) AS CampusCount
FROM CTE_RESULT t
) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.Campus, groupnum
I haven't tested this, so it might have a syntax error or two.
I'm currently looking for a SQL solution for the following problem:
SQLFiddle as guidance:
I have a list of not-nullable startdates and nullable enddates. Based on this list I need the total gap time between a given start and enddate.
Based on the SQLFiddle
If I would only have situation 1 in my database the result should be 2 days.
If I would have situation 2 and 3 in my database the result should be 1 day.
I have been pondering this for a couple of days now... any help would be much appreciated!
Regards,
Kyor
Notes: I'm running SQL 2012 ( should any special new features be required )
The best solution will be to create 'Dates' table and start from there, otherwise solution will be unmaintainable. For each date in specified range you can check whether it is covered by ranges in 'dateranges' table and get a count of dates that are not.
Something like this:
SELECT COUNT(*)
FROM
Dates d
WHERE
d.Date BETWEEN #start AND #end
AND NOT EXISTS
(SELECT *
FROM dateranges r
WHERE d.date BETWEEN r.startdate and ISNULL(r.enddate, d.date)
)
CREATE TABLE Dates (
dt DATETIME NOT NULL PRIMARY KEY);
INSERT INTO Dates VALUES('20081204');
INSERT INTO Dates VALUES('20081205');
INSERT INTO Dates VALUES('20090608');
INSERT INTO Dates VALUES('20090609');
-- missing ranges
SELECT DATEADD(DAY, 1, prev) AS start_gap,
DATEADD(DAY, -1, next) AS end_gap,
DATEDIFF(MONTH, DATEADD(DAY, 1, prev),
DATEADD(DAY, -1, next)) AS month_diff
FROM (
SELECT dt AS prev,
(SELECT MIN(dt)
FROM Dates AS B
WHERE B.dt > A.dt) AS next
FROM Dates AS A) AS T
WHERE DATEDIFF(DAY, prev, next) > 1;
-- existing ranges
SELECT MIN(dt) AS start_range,
MAX(dt) AS end_range
FROM (
SELECT dt,
DATEDIFF(DAY, ROW_NUMBER() OVER(ORDER BY dt), dt) AS grp
FROM Dates) AS D
GROUP BY grp;
DROP TABLE Dates;