Convert integer years or months into days in SQL impala - sql

I have two columns; both have integer values. One Representing years, and the other representing months.
My goal is to perform calculations in days (integer), so I have to convert both to calendar days, to achieve that, taking in consideration that we have years with both 365 and 366 days.
Example in pseudo code:
Select Convert(years_int) to days, Convert(months int) to days
from table.
Real Example:
if --> Years = 1 and Months = 12
1) Convert both to days to compare them: Years = 365 days; Months = 365 days
After conversion : (Years = Months) Returns TRUE.
The problem is when we have years = 10 (for example), we must take in account the fact that at least two of them have 366 days. The same with Months - we have 30 and 31 days. So I need to compensate that fact to get the most accurate possible value in days.
Thanks in advance

From integers to timestamp can be done in PostgreSQL. I do not have impala, but hopefully below script will help you getting this done using impala:
with
year as (select 2022 as y union select 2023),
month as (select generate_series(1,12) as m),
day as(select generate_series(1,31) as d )
select y,m,d,dt from (
select
y,m,d,
to_date(ds,'YYYYMMDD')+(((d-1)::char(2))||' day')::interval dt
from ( select
*,
y::char(4)|| right('0'||m::char(2),2) || right('0'||0::char(2),2) as ds
from year,month,day
) x
) y
where extract(year from dt)=y and extract(month from dt)=m
order by dt
;
see: DBFIDDLE
Used functions in this query and, a way, to convert them to imapala (remember I do not use that tool/language/dialect)
function
impala alternative
to_date(a,b)
This will convert the string a to a date using the format b. Using impala you can use CAST(expression AS type FORMAT pattern)
y::char(4)
Cast y to a char(4), Using imala you can use: CAST(expression AS type)
right(a,b)
Use: right()
\\
Use: concat()
generate_series(a,b)
This generates a serie of numbers from a to (an inclusing) b. A SQL altervative is to write SELECT 1 as x union SELECT 2 union SELECT 3, which generates the same series as generate_series(1,3) in PostgreSQL
extract(year from a)
Get the year from the datetime field a, see YEAR()
One special case is this one to_date(ds,'YYYYMMDD')+(((d-1)::char(2))||' day')::interval
This will convert ds (with datatype CHAR(8)) to a date, and then add (using +) a number of days (like: '4 day')
Because I included all days until 31, this will fail in Februari, April, June, September, November because those months do not have 31 days. This is corrected by the WHERE clause in the end (where extract(year from dt)=y and extract(month from dt)=m)

Related

How to calculate the number of each weekday between 2 dates in PostgreSQL?

There is a column with dates. I would like to calculate the number of each weekday (Monday to Sunday) from those dates to present date. On Stack Overflow and otherwise, I found answers that included creating functions, I was hoping there's some inbuilt function that would do it. I found another solution here, which mentions DATEPART('day', start - stop) AS days, but that didn't work. If this is an recent update in PostgreSQL then it won't work because the tool we use at work for PostgreSQL doesn't accept some of the recent updates (like PostgreSQL now accepts negative indexing but the tool doesn't).
What I want:
start_date
day_of_week
no_of_days
2022-04-01
1
10
2022-04-01
2
9
2022-05-15
2
3
2022-06-01
5
1
The start_date is the column of dates, which when subtracted from current_date (the other way around) returns the number of each weekday between those two days. There were 10 Mondays between 1st April 2022 and 6th June 2022 (today), and that's the number I want for each day of the week.
How can I achieve this in PostgreSQL? I am on version 12.8.
This "simple" but optimized solution counts the number of occurrences for every weekday in the interval between start_date and the current date:
WITH cte(start_date) AS (
VALUES
('2022-04-01'::date)
, ('2022-05-15')
, ('2022-06-01')
)
SELECT c.start_date, sub.dow, sub.no_of_days
FROM cte c
CROSS JOIN LATERAL (
SELECT dow, COALESCE(ct, 0) AS no_of_days
FROM (
SELECT EXTRACT('isodow' FROM g)::int AS dow, count(*) AS ct
FROM generate_series(start_date, current_date, interval '1 day') g
GROUP BY 1
) g
RIGHT JOIN generate_series(1, 7) dow USING (dow)
) sub
ORDER BY 1, 2;
db<>fiddle here
The upper bound (current_date) is included.
Every weekday is included, even when no_of_days is 0.
For very old dates (resulting in long intervals), an arithmetic solution will be cheaper than simply counting generated days. A bit more challenging, but not that hard.

I'm trying to calculate variance using sql between two same days of two years

I will try to be simple as possible to make my question crystal-clear. I have a table that's called 'fb_ads' (it's about different facebook compaigns for different stores in USA) on BigQuery, it contains the following columns:
STORE : name of store
CLICKS: number of clicks.
IMPRESSIONS: number of impressions of the ad
COST: the ad cost
DATE: AAAA-MM-DD
Frequency: number of visitors of a store
So, I'm trying to calculate the variance between two years 2017 and 2018.
Here is the variance I'm trying to calculate:
Variance_Of_Frequency = ((Frequency in 2018 at date X) - ((Frequency in 2017 at date X))/((Frequency in 2017 at date X)
The problem is, that I'll have to compare the same day of the week close to Date X;
For example, if I have a compaign run on a Monday 2017-08-13, I'll need to compare to another monday in 2018 close to 2018-08-13 (it might be a monday on 2018-08-15 for example).
This is a daily variance!
I tried to make a weekly variance calculating and I don't know if it's correct, here is how I did it:
I first started with aggregating my daily table to a weekly tables using the following query:
creating my weekly_table
SELECT
year_week,
STORE,
min(DATE ) as DATE ,
SUM(IMPRESSIONS ) AS FB_IMPRESSIONS ,
SUM(CLICKS ) AS FB_CLICKS ,
SUM(COST) AS FB_COST ,
SUM(Frequency) AS FREQUENCY,
FROM (
SELECT
*,
CONCAT(cast(ANNEE as string), LPAD(cast((extract(WEEK from date)) as string), 2, '0') ) AS year_week
FROM `fb_ads`)
GROUP BY
year_week,
STORE,
ORDER BY year_week
Then I tried to calculate the variance using this:
SELECT
base.*, (base.frequency-lw.frequency) / lw.frequency as VAR_FF
FROM
`weekly_table` base
JOIN (
SELECT
* EXCEPT (date),
DATE_ADD(DATE(TIMESTAMP(date)) , INTERVAL 1 Week)AS date
FROM
`weekly_table` ) lw
ON
base.date = lw.date
AND base.store= lw.store
Anyone has any idea how to do the daily thing or if my weekly queries are correct ?
Thanks!
For a given date, you want to know the date of the nearest Monday to the same date in the following year...
SET #dt = '2017-08-17';
SELECT CASE WHEN WEEKDAY(#dt + INTERVAL 1 YEAR) > 3
THEN ADDDATE(ADDDATE(#dt + INTERVAL 1 YEAR,INTERVAL 1 WEEK),INTERVAL - WEEKDAY(#dt + INTERVAL 1 YEAR) DAY)
ELSE ADDDATE(#dt + INTERVAL 1 YEAR,INTERVAL - WEEKDAY(#dt + INTERVAL 1 YEAR) DAY)
END x;
Obviously, I could remove all those + INTERVAL 1 YEAR bits by defining #dt that way to begin with.

Get month,days difference between two date columns

I'm trying to fix some problems in my database and i want to re-calculate column in my db based on other 2 date columns. This col is float and i want to get the difference between 2 dates in months with decimal point for days.
For example if i have 2 dates '2016-01-15', '2015-02-01' the difference should be 12.5 best of 12 months differences and 0.5 for the remaining 15 days
Here is what i tried so far based on my searches but i think there is something i'm missing as it tells me there is an error with my date col as it doesn't exist
Select EXTRACT(year FROM vehicle_delivery(date, vehicle_received_date))*12 + EXTRACT(month FROM vehicle_delivery(date, vehicle_received_date));
Where vehicle_delivery is my table name & date is my end date and vehicle_received_date is my start date
same thing happes with this sql :
select extract('years' from vehicle_delivery) * 12 + extract('months' from vehicle_delivery) + extract('days' from vehicle_delivery) / 30
from (select age(date::timestamp, vehicle_received_date::timestamp)) a;
The SQL should look like this:
select extract(year from diff) * 12 + extract(month from diff) + extract(day from diff) / 30
from (select age(date::timestamp, vehicle_received_date::timestamp) as diff
from vehicle_delivery
) vd;
I don't know what the purpose of the / 30 is, but you appear to want it.
Notes:
The FROM clause references the table.
The first argument in extract() is a keyword, not a string.
You want to reference the age() value in the extract().
extract() returns an interval, so it is rather redundant to take out the parts (only needed if you want them in separate columns).

Convert decimal year to date

I have dates in a table that are stored as decimal years. An example is 2003.024658 which translates to January 9, 2003.
I would like to convert the decimal years to Oracle's date format.
I've found someone who's done this in Excel: Decimal year to date formula?
=DATE(INT(B1),1,MOD(B1,1)*(DATE(INT(B1)+1,1,1)-DATE(INT(B1),1,1)))
However, I can't quite figure out how to convert the logic to Oracle PL/SQL.
If you start from the assumption that the decimal portion was calculated according to the number of days in the given year (i.e. 365 or 366 depending on whether it was a leap year), you could do something like this:
with
q1 as (select 2003.024658 d from dual)
,q2 as (select d
,mod(d,1) as decimal_portion
,to_date(to_char(d,'0000')||'0101','YYYYMMDD')
as jan01
from q1)
,q3 as (select q2.*
,add_months(jan01,12)-jan01 as days_in_year
from q2)
select d
,decimal_portion * days_in_year as days
,jan01 + (decimal_portion * days_in_year) as result
from q3;
d: 2003.024658
days: 9.00017
result: 10-JAN-2003 12:00am

How to calculate ages in BigQuery?

I have two TIMESTAMP columns in my table: customer_birthday and purchase_date. I want to create a query to show the number of purchases by customer age, to create a chart.
But how do I calculate ages, in years, using BigQuery? In other words, how do I get the difference in years between two TIMESTAMPs? The age calculation cannot be made using days or hours, because of leap years, so the function DATEDIFF(<timestamp1>,<timestamp2>) is not appropriate.
Thanks.
First of all, I'd really love BigQuery to have a function which calculates current age based on a date. That seems to be like a very common use case and it's not really easy due to the whole leap year thing.
I found a great article about this issue: https://towardsdatascience.com/how-to-accurately-calculate-age-in-bigquery-999a8417e973
Their final approach is similar to Lars Haugseth's and Saad's answer, but they do not use the DAYOFYEAR part in order to avoid issues with leap years. It also gives you the flexibility not only to calculate the current age, but also the age at a particular date that you pass to the function as argument:
CREATE OR REPLACE FUNCTION workspace.age_calculation(as_of_date DATE, date_of_birth DATE)
AS (
DATE_DIFF(as_of_date,date_of_birth, YEAR) -
IF(EXTRACT(MONTH FROM date_of_birth)*100 + EXTRACT(DAY FROM date_of_birth) >
EXTRACT(MONTH FROM as_of_date)*100 + EXTRACT(DAY FROM as_of_date)
,1,0)
)
Regarding the difference between dates - you could consider user-defined functions (https://cloud.google.com/bigquery/user-defined-functions) with a JavaScript date library, such as Datejs or Moment.js
You can use DATE_DIFF to get the difference in years, but need to subtract by one if the birthday has not yet occured this year:
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate),
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) - 1,
DATE_DIFF(CURRENT_DATE, birthdate, YEAR)) AS age
Here it is in a user defined function:
CREATE TEMP FUNCTION calculateAge(birthdate DATE) AS (
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) +
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate), -1, 0) -- subtract 1 if bithdate has not yet occured this year
);
You can compute the number of days it would be if all years were 365 days long, take the difference, and divide by 365. For example:
SELECT (day2-day1)/365
FROM (
SELECT YEAR(t1) * 365 + DAYOFYEAR(t1) as day1,
YEAR(t2) * 365 + DAYOFYEAR(t2) as day2
FROM (
SELECT TIMESTAMP('20000201') as t1,
TIMESTAMP('20140201') as t2))
This returns 14.0, even though there are intervening leap years. If you want the final result as an integer instead of floating point, you can use the INTEGER() function to cast the result.
Note that if one of the dates is a leap day (feb 29) it will appear to be one year away from march 1, but I think this sounds like the intended behavior.
Another way to calculate age that takes leap years into account is to:
Calculate simple age based on difference in year
Either subtract 1 or not by:
Add difference in years to birthday (e.g. if today is 2022-12-14 and birthday is 2000-12-30, then the "new" birthday becomes 2022-12-30)
Do a DAY-based difference between today and "new" birthday, which either gives you a positive number (birthday passed for this year) or negative number (still has birthday this year)
Subtract 1 year from simple age calculation if number is negative
In BigQuery SQL code this looks like:
SELECT
bd AS birthday
,today
,DATE_DIFF(today, bd, YEAR) AS simpleAge
,DATE_DIFF(today, bd, YEAR) +
(CASE
WHEN DATE_DIFF(today, DATE_ADD(bd, INTERVAL DATE_DIFF(today, bd, YEAR) YEAR), DAY) >= 0
THEN 0
ELSE -1
END) AS age
FROM
(SELECT
PARSE_DATE("%Y-%m-%d", "2000-12-01") AS bd
,CURRENT_DATE("Asia/Tokyo") AS today
)
Outputs:
birthday
today
simpleAge
age
2000-12-30
2022-12-14
22
21