Difference in years between start date and system time - sql

I have the dates of when employees have begun working at a zoo, however i need to figure out how long they have been working there for. I have done my research and know what i need to do, but i cant seem to figure out the syntax for incorporating the NOW() function within the DATEDIFF function. I have to display all active employees and the amount of years (with 2 decimal places) they have been working for.
I have two columns, Joined (Date) and Resigned (date, where may equal null if employee is active)
So lets just say that someone started working on 1996-09-18 (yyyy/mm/dd).
please help, thank you kindly.

(assuming you are using MySQL - from the NOW() function)
I think you need to use COALESCE() function so either Resigned or NOW() is used in the calculation by DATEDIFF():
DATEDIFF(COALESCE(Resigned, CURDATE()), Joined) AS days
So, you could have something like:
(DATEDIFF(COALESCE(Resigned, CURDATE()), Joined)) / 365 AS years
or:
(DATEDIFF(COALESCE(Resigned, CURDATE()), Joined)) / 365.25 AS years
If you want to be extremely accurate about extreme cases and leap years, a more complex calculation will be needed.

Years with 2 decimal places (Assuming TSQL)
SELECT ROUND(
CONVERT(FLOAT,
DATEDIFF(day,joined,ISNULL(resigned,GETDATE()) ))/365,2)

Related

SQL - Automatically adjust Where clause to previous month in YYYY-MM format

(This is all steps in containers within an Alteryx flow that is connecting to a Teradata source)
My SQL is incredibly rusty as it's been almost 8 years since I've needed to use it. I know this is a quite basic question. Basically I have several queries that need to be manually adjusted monthly to shift the month. in YYYY-MM format. They look like this:
Is the main one where I just adjust one backwards one month
select DB.TABLE.field1, DB.TABLE.Year_month
from DB.TABLE
where DB.TABLE.Year_month = '2023-01'
This is the secondary one where I adjust one backwards one month, and the others are same month or plus a month or more.
and A.B_MONTH in ('2022-12-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01','2023-05-01')
and B.Year_month = '2023-01'
How do I adjust the where clause to always be the needed relative references?
Any help is greatly appreciated
I tried using concat but it choked for some reason.
You can try this:
select DB.TABLE.field1, DB.TABLE.Year_month
from DB.TABLE
where DB.TABLE.Year_month = DATE_FORMAT( NOW() - INTERVAL 1 MONTH, '%Y/%m')
I don't understand your second need, but you can do it similar to my response.
Just play with the NOW() - INTERVAL X.
Pretty basic stuff. ADD_MONTHS to move your month around, TO_CHAR for your desired format.
To get the previous month:
select to_char(add_months(current_date,-1), 'YYYY-MM')

How to set a max range condition with timescale time_bucket_gapfill() in order to not fill real missing values?

I'd like some advices to know if what I need to do is achievable with timescale functions.
I've just found out I can use time_bucket_gapfill() to complete missing data, which is amazing! I need data each 5 minutes but I can receive 10 minutes, 30 minutes or 1 hour data. So the function helps me to complete the missing points in order to have only 5 minutes points. Also, I use locf() to set the gapfilled value with last value found.
My question is: can I set a max range when I set the last value found with locf() in order to never overpass 1 hour ?
Example: If the last value found is older than 1 hour ago I don't want to fill gaps, I need to leave it empty to say we have real missing values here.
I think I'm close to something with this but apparently I'm not allowed to use locf() in the same case.
ERROR: multiple interpolate/locf function calls per resultset column not supported
Somebody have an idea how I can resolve that?
How to reproduce:
Create table powers
CREATE table powers (
delivery_point_id BIGINT NOT NULL,
at timestamp NOT NULL,
value BIGINT NOT NULL
);
Create hypertable
SELECT create_hypertable('powers', 'at');
Create indexes
CREATE UNIQUE INDEX idx_dpid_at ON powers(delivery_point_id, at);
CREATE INDEX index_at ON powers(at);
Insert data for one day, one delivery point, point 10 minutes
INSERT INTO powers SELECT 1, at, round(random()*10000) FROM generate_series(TIMESTAMP '2021-01-01 00:00:00', TIMESTAMP '2022-01-02 00:00:00', INTERVAL '10 minutes') AS at;
Remove three hours of data from 4am to 7am
DELETE FROM powers WHERE delivery_point_id = 1 AND at < '2021-01-1 07:00:00' AND at > '2021-01-01 04:00:00';
The query that need to be fixed
SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
CASE
WHEN (locf(at) - at) > interval '1 hour' THEN null
ELSE locf(avg(value))
END AS gapfilled
FROM powers
GROUP BY point_five, at
ORDER BY point_five;
Actual: ERROR: multiple interpolate/locf function calls per resultset column not supported
Expected: Gapfilled values each 5 minutes except between 4am and 7 am (real missing values).
This is a great question! I'm going to provide a workaround for how to do this with the current stuff, but I think it'd be great if you'd open a Github issue as well, because there might be a way to add an option for this that doesn't require a workaround like this.
I also think your attempt was a good approach and just requires a few tweaks to get it right!
The error that you're seeing is that we can't have multiple locf calls in a single column, this is a limitation that's pretty easy to work around as we can just shift both of them into a subquery, but that's not enough. The other thing that we need to change is that locf only works on aggregates, right now, you’re trying to use it on a column (at) that isn’t aggregated, which isn’t going to work, because it wouldn’t know which of the values of at in a time_bucket to “pull forward” for the gapfill.
Now you said you want to fill data as long as the previous point wasn’t more than one hour ago, so, we can take the last value of at in the bucket by using last(at, at) this is also the max(at) so either of those aggregates would work. So we put that into a CTE (common table expression or WITH query) and then we do the case statement outside like so:
WITH filled as (SELECT
time_bucket_gapfill('5 minutes', at) AS point_five,
avg(value) AS avg,
locf(last(at, at)) as filled_from,
locf(avg(value)) as filled_avg
FROM powers
WHERE at BETWEEN '2021-01-01 01:30:00' AND '2021-01-01 08:30:00'
AND delivery_point_id = 1
GROUP BY point_five
ORDER BY point_five)
SELECT point_five,
avg,
filled_from,
CASE WHEN point_five - filled_from > '1 hour'::interval THEN NULL
ELSE filled_avg
END as gapfilled
FROM filled;
Note that I’ve tried to name my CTE expressively so that it’s a little easier to read!
Also, I wanted to point out a couple other hyperfunctions that you might think about using:
heartbeat_agg is a new/experimental one that will help you determine periods when your system is up or down, so if you're expecting points at least every hour, you can use it to find the periods where the delivery point was down or the like.
When you have more irregular sampling or want to deal with different data frequencies from different delivery points, I’d take a look a the time_weight family of functions. They can be more efficient than using something like gapfill to upsample, by instead letting you treat all the different sample rates similarly, without having to create more points and more work to do so. Even if you want to, for instance, compare sums of values, you’d use something like integral to get the time weighted sum over a period based on the locf interpolation.
Anyway, hope all that is helpful!

Teradata - YEARFRAC equivalence

I am having a hard time trying to find something that would be equivalent to YEARFRAC (Excel) for Teradata. I messed around with the below, but want I want it to display the fraction of the year. So instead of 37 I would want to see 37.033. If possible would like it to account for leap years so wouldn't want to just divide it by 365. Any help would be greatly appreciated!
SELECT (CURRENT_DATE - CAST('1985-05-01' AS DATE)) YEAR
There is no direct function to get the desired output.
Excel YEARFRAC method uses different logic to calculate the output based on the optional parameter basis.
Syntax YEARFRAC(start_date, end_date, [basis])
Considering the basis parameter as 0 or omitted, you can achieve it in Teradata using below query.
SELECT
DATE'2022-05-13' AS Till_Date
,DATE'1985-05-01' AS From_Date
,(Till_Date - From_Date) YEAR TO MONTH AS Year_To_Month
,EXTRACT(YEAR FROM Year_To_Month)
+EXTRACT(MONTH FROM Year_To_Month)*30.0000/360
+( EXTRACT(DAY FROM Till_Date)-EXTRACT(DAY FROM From_Date))*1.0000/360 AS YEARFRAC
The basis parameter with 0 or omitted uses a 30/360 format to calculate the difference.
You can find more details about the YEARFRAC logic in below link.
https://support.microsoft.com/en-us/office/yearfrac-function-3844141e-c76d-4143-82b6-208454ddc6a8

Datediff function in TeraData SQL?

i'm new here and it's all a bit confusing, so i'm gonna excuse myself in the beginning, if i do something wrong here.
I usually used MySQL or sometimes Oracle but now I have to switch to Teradata.
Simply i need to convert this:
SELECT FLOOR(DATEDIFF(NOW(),`startdate`)/365.25) AS `years`,
COUNT(FLOOR(DATEDIFF(NOW(),`startdate`)/365.25)) AS `numberofemployees`
FROM `employees`
WHERE 1
GROUP BY `years`
ORDER BY `years`;
into teradata.
Would be great if someone could help :)
Equivalent of your query in Teradata would be :
SELECT FLOOR((CURRENT_DATE - startdate)/365.2500) AS years,
COUNT(FLOOR((CURRENT_DATE - startdate )/365.2500)) AS numberofemployees
FROM employees
--WHERE 1
GROUP BY years
ORDER BY years;
CURRENT_DATE is equivalent to NOW() (without the time part, part DATEDIFF would have ignored it, anyway).
In Teradata you can simply subtract dates to get days in between.
Also, I added two zeroes at the end of 365.25 to force Teradata to evaluate the division to 4 decimal places, because MySQL seems to perform it that way (https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#:~:text=In%20division%20performed%20with%20/%2C%20the%20scale%20of%20the%20result%20when%20using%20two%20exact-value%20operands%20is%20the%20scale%20of%20the%20first%20operand%20plus%20the%20value%20of%20the%20div_precision_increment%20system%20variable%20(which%20is%204%20by%20default).
But, I am not sure if Understood your original query thoroughly:
What does WHERE 1 do?
Why do you count the years column and call it numberofemployees (why not simply do count(*))

Why is the result of datediff year in Firebird too high?

I have question about function datediff in firebird. When I try to diff two dates like 15.12.1999 and 30.06.2000 in sql like this
SELECT
SUM(datediff (YEAR, W.FROM, W.TO)),
SUM(datediff (MONTH, W.FROM, W.TO)),
SUM(datediff (DAY, W.FROM, W.TO))
FROM WORKERS W
WHEN W.ID=1
I get in result 1 year, 6 month and 198 days but it is not true with value years (of course result should be 0) How I have to write my query to get correct result in parameter year? In that link https://firebirdsql.org/refdocs/langrefupd21-intfunc-datediff.html in documentation there is information about this case but there is not how to solve this problem.
The documentation is not very clear, but I'm pretty sure that datediff() is counting the number of boundaries between two dates. (This is how the very similar function in SQL Server works.) Hence, for year, it is counting the number of "Dec 31st/Jan 1st" boundaries. This is explicitly explained in the documentation.
If you want a more accurate count, you can use a smaller increment. The following is pretty close:
(datediff(day, w.from, t.to) / 365.25) as years_diff