How to use select, max date and declare in a sql query? - sql

I am trying to use declare and max in variable. This is the query below:
Declare #MAX_BUF Datetime
Set #OpeningStock = (
SELECT #MAX_BUF = MAX(end_date) FROM [IBIS].[buf_stk]
WHERE SUBSTRING(CONVERT(VARCHAR,end_date ,112 ),1,6)<SUBSTRING(CONVERT(VARCHAR,getdate() ,112 ),1,6);
SELECT COUNT(1) AS Opening_Stock
FROM [IBIS].[buf_stk] AS bs(NOLOCK)
WHERE CAST(end_date AS DATE)=#MAX_BUF
)
I am getting syntax error in '=' and '(end_date)'. Please let me know if this can be resolved.

Without seeing your table and data I can't be confident in rewriting your SQL correctly, however all you need to do to "make it work", I suspect, is amend slightly to:
SELECT #MAX_BUF = MAX(end_date) FROM [IBIS].[buf_stk]
WHERE SUBSTRING(CONVERT(VARCHAR,end_date ,112 ),1,6)<SUBSTRING(CONVERT(VARCHAR,getdate() ,112 ),1,6);
SELECT #OpeningStock=COUNT(*)
FROM [IBIS].[buf_stk] AS bs(NOLOCK) --<< Remove this!
WHERE CAST(end_date AS DATE)=#MAX_BUF
Having said that, you could combine these into a single query. Your manipulation of dates as strings is not sargable and will force the optimizer to scan your table/index.
If you want to find rows where end_date is prior to 1st day of the month then just compare as dates (assuming your end_date is an actual date data type)
where end_date < DATEADD(month, DATEDIFF(month, 0, GetDate()), 0)
And lose the nolock hint, unless you prefer your data to be randomly incorrect.

Use two statements. I'm not a fan of converting dates to strings for date arithmetic. You seem to want the maximum date from before this month. So, DATEDIFF() provides one method (and there are more efficient methods if you have an index on end_date):
Set ##MAX_BUF = (SELECT CAST(MAX(end_date) as date)
FROM [IBIS].[buf_stk]
WHERE DATEDIFF(month, end_date, getdate()) >= 1
);
Set #OpeningStock = (SELECT COUNT(1) AS Opening_Stock
FROM [IBIS].[buf_stk] bs
WHERE CAST(end_date AS DATE) = #MAX_BUF
);
You can also do this as a single statement:
select top (1)
#max_buf = CAST(end_date as date),
#OpeningStock = COUNT(*)
from FROM [IBIS].[buf_stk] bs
where end_date < dateadd(day, 1 - day(getdate()), convert(date, getdate()))
group by CAST(end_date as date)
order by CAST(end_date as date) desc;
Note that this also changes the date comparison to be friendlier to the optimizer.

You can actually do this all in one statement and one scan of the table:
SELECT #MAX_BUF = MAX(end_date), #OpeningStock = COUNT(1)
FROM
(SELECT TOP (1) WITH TIES
CAST(end_date AS date) end_date
FROM [IBIS].[buf_stk]
WHERE end_date >= DATEADD(month, 1, getdate())
ORDER BY CAST(end_date AS date) DESC
) t;
Notes:
Don't use NOLOCK, it has many unintended effects and can cause incorrect results
Switch round the WHERE predicate in order to hit an index if you have one. Don't make the server do algebra for you, it's not very good at it.

Related

generate_series() equivalent in snowflake

I'm trying to find the snowflake equivalent of generate_series() (the PostgreSQL syntax).
SELECT generate_series(timestamp '2017-11-01', CURRENT_DATE, '1 day')
Just wanted to expand on Marcin Zukowski's comment to say that these gaps started to show up almost immediately after using a date range generated this way in a JOIN.
We ultimately ended up doing this instead!
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date
from table (generator(rowcount => 90))
I had a similar problem and found an approach, which avoids the issue of a generator requiring a constant value by using a session variable in addition to the already great answers here. This is closest to the requirement of the OP to my mind.
-- set parameter to be used as generator "constant" including the start day
set num_days = (Select datediff(day, TO_DATE('2017-11-01','YYYY-MM-DD'), current_date()+1));
-- use parameter in bcrowell's answer now
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date
from table (generator(rowcount => ($num_days)));
-- clean up previously set variable
unset num_days;
WITH RECURSIVE rec_cte AS (
-- start date
SELECT '2017-11-01'::DATE as dt
UNION ALL
SELECT DATEADD('day',1,dt) as dt
FROM rec_cte
-- end date (inclusive)
WHERE dt < current_date()
)
SELECT * FROM rec_cte
Adding this answer for completitude, in case you have an initial and last date:
select -1 + row_number() over(order by 0) i, start_date + i generated_date
from (select '2020-01-01'::date start_date, '2020-01-15'::date end_date)
join table(generator(rowcount => 10000 )) x
qualify i < 1 + end_date - start_date
I found the generator function in Snowflake quite limiting for all but the simplest use cases. For example, it was not clear how to take a single row specification, explode it into a table of dates and join it back to the original spec table.
Here is an alternative that uses recursive CTEs.
-- A 2 row table that contains "specs" for a date range
create local temp table date_spec as
select 1 as id, '2022-04-01'::date as start_date, current_date() as end_date
union all
select 2, '2022-03-01', '2032-03-30'
;
with explode_date(id, date, next_date, end_date) as (
select
id
, start_date as date -- start_date is the first date
, date + 1 as next_date -- next_date is the date of for the subsequent row in the recursive cte
, end_date
from date_spec
union all
select
ds.id
, ed.next_date -- the current_date is the value of next_date from above
, ed.next_date + 1
, ds.end_date
from date_spec ds
join explode_date ed
on ed.id = ds.id
where ed.date <= ed.end_date -- keep running until you hit the end_date
)
select * from explode_date
order by id, date desc
;
This is how I was able to generate a series of dates in Snowflake. I set row count to 1095 to get 3 years worth of dates, you can of course change that to whatever suits your use case
select
dateadd(day, '-' || seq4(), current_date()) as dte
from
table
(generator(rowcount => 1095))
Originally found here
EDIT: This solution is not correct. seq4 does not guarantee a sequence without gaps. Please follow other answers, not this one. Thanks #Marcin Zukowski for pointing that out.

Optimizing GROUP BY performance

Is there some tricky way to GROUP BY a variable which has been defined by alias or which is a result of calculation? I think that the following code makes a double dip by calculating MyMonth in Select statement and then again in Group statement. It may be unnecessary waste. It is not possible by simple GROUP BY MyMonth. Is it possible to force only one calculation of month([MyDate])?
Update of code. Aggregate function is added.
SELECT month([MyDate]) AS MyMonth, count([MyDate]) AS HowMany
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY month([MyDate])
ORDER BY MyMonth
Your real problem likely stems from calling MONTH(...) on every row. This prevents the optimizer from using an index to fulfill the count (it can use it for the WHERE clause, but this will still be many rows).
Instead, you should turn this into a range query, that the optimizer could use for comparisons against an index. First we build a simple range table:
WITH Months as (SELECT MONTH(d) AS month,
d AS monthStart, DATEADD(month, 1, d) AS monthEnd
FROM (VALUES(CAST('20140101' AS DATE))) t(d)
UNION ALL
SELECT MONTH(monthEnd),
monthEnd, DATEADD(month, 1, monthEnd)
FROM Months
WHERE monthEnd < CAST('20150101' AS DATE))
SQL Fiddle Example
(if you have an existing calendar table, you can base your query on that, but sometimes a simple ad-hoc one works best)
Once we have the range-table, you can then use it to constrain and bucket your data, like so:
SELECT Months.month, COUNT(*)
FROM TableA
JOIN Months
ON TableA.MyDate >= Months.monthStart
AND TableA.MyDate < Months.monthEnd
GROUP BY Months.month
Note: The start of the date range was changed to 2014-01-01, as it seems strange that you'd only include one day from January, when aggregating months...
No, you can't use column alias directly in the GROUP BY clause. Instead do a select in the from list, and use the result column in your group by.
select MyMonth, MAX(someothercolumn)
from
(
SELECT month([MyDate]) AS MyMonth,
someothercolumn
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-31' AND '2014-12-31'
)
GROUP BY MyMonth
ORDER BY MyMonth

SQL Get the gaps in dateranges when a list of ranges is provided

I'm currently looking for a SQL solution for the following problem:
SQLFiddle as guidance:
I have a list of not-nullable startdates and nullable enddates. Based on this list I need the total gap time between a given start and enddate.
Based on the SQLFiddle
If I would only have situation 1 in my database the result should be 2 days.
If I would have situation 2 and 3 in my database the result should be 1 day.
I have been pondering this for a couple of days now... any help would be much appreciated!
Regards,
Kyor
Notes: I'm running SQL 2012 ( should any special new features be required )
The best solution will be to create 'Dates' table and start from there, otherwise solution will be unmaintainable. For each date in specified range you can check whether it is covered by ranges in 'dateranges' table and get a count of dates that are not.
Something like this:
SELECT COUNT(*)
FROM
Dates d
WHERE
d.Date BETWEEN #start AND #end
AND NOT EXISTS
(SELECT *
FROM dateranges r
WHERE d.date BETWEEN r.startdate and ISNULL(r.enddate, d.date)
)
CREATE TABLE Dates (
dt DATETIME NOT NULL PRIMARY KEY);
INSERT INTO Dates VALUES('20081204');
INSERT INTO Dates VALUES('20081205');
INSERT INTO Dates VALUES('20090608');
INSERT INTO Dates VALUES('20090609');
-- missing ranges
SELECT DATEADD(DAY, 1, prev) AS start_gap,
DATEADD(DAY, -1, next) AS end_gap,
DATEDIFF(MONTH, DATEADD(DAY, 1, prev),
DATEADD(DAY, -1, next)) AS month_diff
FROM (
SELECT dt AS prev,
(SELECT MIN(dt)
FROM Dates AS B
WHERE B.dt > A.dt) AS next
FROM Dates AS A) AS T
WHERE DATEDIFF(DAY, prev, next) > 1;
-- existing ranges
SELECT MIN(dt) AS start_range,
MAX(dt) AS end_range
FROM (
SELECT dt,
DATEDIFF(DAY, ROW_NUMBER() OVER(ORDER BY dt), dt) AS grp
FROM Dates) AS D
GROUP BY grp;
DROP TABLE Dates;

SQL Logic Problem & Cross Apply Query

Given a start date and an end date, I need a count of instances between those two dates. So given the following:
Table:
Col 1 Start_Date End_Date
1 01/01/2010 02/01/2010
2 01/01/2010 04/01/2010
3 03/01/2010 04/01/2010
4 03/01/2010 04/01/2010
If I was looking between the 1st (01/01) and the 2nd (02/01) I would expect a count of 2. If I was looking for the 3rd to the 4th I would expect a count of 3. If I was looking across the whole date range then I would expect a count of 4. Make sense?
NOTE: The dates are already converted to midnight, no code needs to be added for this. Also, dates are in dd/MM/yyyy format throughout this question.
Currently I have something similar to the following:
SELECT COUNT(*), Group_Field
FROM MY_Table
WHERE Start_Date < DATEADD(DAY, 1, #StartDate) AND End_Date > #EndDate
GROUP BY Group_Field
I did at some point think that this was right, but i'm not convinced now...
I did previously have:
WITH Dates AS (
SELECT [Date] = #StartDate
UNION ALL SELECT [Date] = DATEADD(DAY, 1, [Date])
FROM Dates WHERE [Date] < #EndDate
)
SELECT COUNT(*), Group_Field -- In this case it is [Date]
FROM MY_Table
CROSS APPLY Dates
WHERE Start_Date < DATEADD(DAY, 1, #StartDate) AND End_Date > [Date]
GROUP BY Group_Field
But I am not sure that I am using CROSS APPLY properly in this case...
The questions:
1) Am I using Cross Apply right in the 2nd example (and the CTE for that matter)?
2) If so, which logic is right? (I think it's the 2nd)
/Discuss :)
If it is supposed to be inclusive, use <= and >=.
I believe either logic with work.
The solution ended up being:
WHERE [Date] BETWEEN Start_Date AND DATEADD(Day, -1, End_Date)

SQL Checking for NULL and incrementals

I'd like to check if there is anything to return given a number to check against, and if that query returns no entries, increase the number until an entry is reached and display that entry. Currently, the code looks like this :
SELECT *
FROM news
WHERE DATEDIFF(day, date, getdate() ) <= #url.d#
ORDER BY date desc
where #url.d# is an integer being passed through (say 31). If that returns no results, I'd like to increase the number stored in #url.d# by 1 until an entry is found.
This kind of incremental querying is just not efficient. You'll get better results by saying - "I'll never need more than 100 results so give me these" :
SELECT top 100 *
FROM news
ORDER BY date desc
Then filtering further on the client side if you want only a particular day's items (such as the items with a common date as the first item in the result).
Or, you could transform your multiple query request into a two query request:
DECLARE
#theDate datetime,
#theDate2 datetime
SET #theDate = (SELECT Max(date) FROM news)
--trim the time off of #theDate
SET #theDate = DateAdd(dd, DateDiff(dd, 0, #theDate), 0)
SET #theDate2 = DateAdd(dd, 1, #theDate)
SELECT *
FROM news
WHERE #theDate <= date AND date < #theDate2
ORDER BY date desc
In MySQL:
SELECT news.*,
(
SELECT COUNT(*)
FROM news
WHERE date < DATEADD(day, GETDATE(), -#url.d#)
)
FROM news
WHERE date >= DATEADD(day, GETDATE(), -#url.d#)
ORDER BY
date DESC
LIMIT 1
In SQL Server:
SELECT TOP 1
news.*,
(
SELECT COUNT(*)
FROM news
WHERE date < DATEADD(day, GETDATE(), -#url.d#)
)
FROM news
WHERE date >= DATEADD(day, GETDATE(), -#url.d#)
ORDER BY
date DESC
Note that using this syntax makes your query sargable, that is an index can be used to filter on date efficiently.
First, I think you will probably want to avpod using the DateDiff function in your where clause, instead, compute the desired cutoff date and do use any computations on the date column within the where clause, this will be more efficient, so rather than
WHERE DATEDIFF(day, date, getdate() ) <= #url.d#
you would have something like
WHERE date >= #cutoffDate
where #cutoffDate is a computed date based on #url.d#
Now, as for grabbing the correct cutoff date. My assumption is that under normal circumstances, there will be articles returned from the request otherwise you would just grab articles from the most recent date. So, the approach that I would take would be to grab the OLDEST of the computed cutoff date (based on #url.d# and the MOST RECENT article date. Something like
-- #urld == #url.d
-- compute the cutoff date as the OLDEST of the most recent article and
-- the date based on #url.d
declare #cutoff datetime
select #cutoff = DateAdd(dd,-1*#urld,GetDate())
select #cutoff
select #cutoff = min(cutoffDate)
from
(SELECT Max(date) as cutoffDate from News
UNION
select #cutoff) Cutoff
-- grab the articles with dates that are more recent than the cutoff date
select *
from News
WHERE date >= #cutoff
I'm also guessing that you would probably want to round to midnight for the dates (which I didn't do here). This is a multi-query approach and should probably be implemented in a single stored procedure ... if this is what you are looking for.
Good luck with the project!
If you wanted the one row:
SELECT t.*
FROM NEWS t
WHERE t.id = (SELECT MAX(n.id)
FROM NEWS n
WHERE n.date BETWEEN DATEADD(day, -:url.d, getDate()) AND getDate())
It might not be obvious that the DATEADD is using a negative in order to go back however many number of days desired.
If you wanted all the rows in that date:
SELECT t.*
FROM NEWS t
WHERE t.date BETWEEN DATEADD(day, -:url.d, getDate()) AND getDate())