How to convert an int to DateTime in BigQuery - sql

I have an INT64 column called "Date" which contains many different numbers like: "20210209" or "20200305". I want to turn those numbers into a date with this format: MM-YYYY (so in these cases, 02-2021 and 03-2020). Ultimately I want to sum all the data in each month together. The problem is that BigQuery can't convert INT64 to date, only to strings. I'm not sure if I should convert to a string and then to a date or if there is a better way.

Although converting to a string then a date both works and is very concise, over large enough numbers of rows (which may be the case in Big Query) you may be better off using integer maths and using DATE(year, month, day)...
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date
SELECT
DATE(
DIV( 20210209 , 10000), -- Which gives 2021
DIV(MOD(20210209, 10000), 100), -- Which gives 02
MOD(20210209, 100) -- Which gives 09
)

You can convert the value to a string and use parse_date():
select parse_date('%Y%m%d', cast(20210209 as string))

Another option
select date,
regexp_replace('' || date, r'(\d{4})(\d{2})(\d{2})', r'\2-\1') as MM_YYYY
from your_table
if applied to sample data in your question - output is
Yet another option
select date,
format_date('%m-%Y', parse_date('%Y%m%d', '' || date)) as MM_YYYY
from your_table
with same output

Related

storing date in 'CCYYMMDD' format in Teradata

I would like to store dates in the format CCYYMMDD in Teradata, but I fail to do so. Find below what I tried so far:
query 1:
SEL CAST(CAST(CURRENT_DATE AS DATE FORMAT 'YYYYMMDD') AS VARCHAR(8))
-- Output: 20191230 ==> this works!
query 2:
SEL CAST(CAST(CURRENT_DATE AS DATE FORMAT 'CCYYMMDD') AS VARCHAR(8))
-- output: SELECT Failed. [3530] Invalid FORMAT string 'CCYYMMDD'.
It seems that the CCYYMMDD is not available in Teradata right away. Is there a workaround?
Tool used: Teradata SQL assistant
Internally, dates are stored as integers in Teradata. So when you say you want to store them in a different format, I don't think you can do that. But you can choose how to display / return the values.
I'm sure there's a cleaner way to get the format you want, but here's one way:
WITH cte (mydate) AS (
SELECT CAST(CAST(CURRENT_DATE AS DATE FORMAT 'YYYYMMDD') AS CHAR(8)) AS mydate
)
SELECT
CAST(
(CAST(SUBSTRING(mydate FROM 1 FOR 2) AS INTEGER) + 1) -- generate "century" value
AS CHAR(2) -- cast value as string
) || SUBSTRING(mydate FROM 3) AS new_date -- add remaining portion of date string
FROM cte
SQL Fiddle - Postgres
You'd have to add some extra logic to handle years before 1000 and after 9999. I don't have a TD system to test, but give it a try and let me know.

Convert/get varchar variable to YYYYMM

I have 4 CTE's in this table and the third one contains a DATETIME converted to VARCHAR (with format based on the requirement) as startDate in DD/MM/YYYY format. The last cte does calculations based on the data generated and one of the columns needs to store YYYYMM date based on startDate.
The problem it's getting the year and the month from this converted DATETIME, using convert() it shows this:
IDPER
-------
01/01/ --DD/MM/
These 2 show YYYYMM correctly when startDate isn't converted:
Select *, left(convert(nvarchar(6),new_ini,112),6) as IDPER from table
Select *, convert(nvarchar(6),new_ini,112) as IDPER from table
How could I get YYYYMM format having startDate converted? Or what could be a more smart approach to the requirement
If you have a string in the format DD/MM/YYYY and you want YYYYMM, then use string operations:
select right(new_ini, 4) + substring(new_ini, 4, 2)
You should be storing date values as dates or a related type, not as string. But given that you have already stored this as a string, string operations can do what you need.
My way would be slightly different
SELECT CONVERT(NVARCHAR(6), CONVERT(DATE, new_ini, 103), 112);
Here, I first converted it to date and then formatted to YYYYMMDD and taken 6 chars only
declare #date DATE = GETDATE();
select REPLACE(LEFT(CONVERT(DATE,#date,112),8),'-','') -- 1st approach
select FORMAT(#date,'yyyyMM') --2nd approach

Find data with specific date and month only

I am trying to find a data with specific where clause of date and month but I am receiving an error can anyone help me with this?
select *
from my_data
where date BETWEEN '11-20' AND '12-15'
MS SQL Server Management Studio
I am receving an error
Conversion failed when converting date and/or time from character string
Most databases support functions to extract components of dates. So, one way of doing what you want is to convert the values to numbers and make a comparison like this:
where month(date) * 100 + day(date) between 1120 and 1215
The functions for extracting date parts differ by database, so your database might have somewhat different methods for doing this.
The conversion is failing because you are not specifying a year. If you were to specify '11-20-2015' your query would work just insert whatever year you need.
SELECT *
FROM my_data
WHERE date BETWEEN '11-20-2015' AND '12-15-2015'
Alternatively if you wanted data from that range of dates for multiple years I would use a while loop to insert information in a # table then read from that table, depending on the amount of data this could be quick or sloooowww here is an example.
DECLARE #mindatestart date, #mindateend date, #maxdatestart date
SET #mindatestart = '11-20-2010'
SET #mindateend = '12-15-2010'
SET #maxdatestart = '11-20-2015'
SELECT top 0 *, year = ' '
INTO #mydata
FROM my_data
WHILE #mindatestart < #maxdatestart
BEGIN
INSERT INTO #mydata
SELECT *, YEAR(#mindatestart)
FROM my_data
where date between #mindatestart and #mindateend
SET #mindatestart = DATEADD(Year, 1, #mindatestart)
SET #mindateend = DATEADD(Year, 1, #mindateend)
END
This will loop and insert the data from 2010-2015 for those date ranges and add a extra column on the end so you can call the data and order by year if you want like this
SELECT * FROM #mydata order by YEAR
Hopefully some part of this helps!
FROM THE COMMENT BELOW
SELECT *
FROM my_data
WHERE DAY(RIGHT(date, 5)) between DAY(11-20) and DAY(12-15)
The reason '11-20' doesn't work is because its a character string which is why you have to input it between ' ' What the Month() function does is take whatever you put between the () and convert it to an integer. Which is why you're not getting anything back using the method in the first answer, the '-Year' from the table date field is being added into the numeric value where your value is just being converted from 11-20 you can see by using these queries
SELECT MONTH(11-20) --Returns 12
SELECT MONTH(11-20-2015) -- Returns 6
SELECT MONTH(11-20-2014) -- Returns 6
Using RIGHT(Date, 5) you only get Month-day, then you date the day value of that so DAY(RIGHT(DATE, 5) and you should get something that in theory should fall within those date ranges despite the year. However I'm not sure how accurate the data will be, and its a lot of work just to not add an additional 8 characters in your original query.
Since you only care about month and day, but not year, you need to use DATEPART to split up the date. Try this:
select *
from my_data
WHERE 1=1
AND (DATEPART(m, date) >= 11 AND DATEPART(d,date) >= 20)
AND (DATEPART(m, date) <= 12 AND DATEPART(d,date) <= 15)

Selecting YYYYMM of the previous month in HIVE

I am using Hive, so the SQL syntax might be slightly different. How do I get the data from the previous month? For example, if today is 2015-04-30, I need the data from March in this format 201503? Thanks!
select
employee_id, hours,
previous_month_date--YYYYMM,
from
employees
where
previous_month_date = cast(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd') as int)
From experience, it's safer to use DATE_ADD(Today, -1-Day(Today)) to compute last-day-of-previous-month without having to worry about edge cases. From there you can do what you want e.g.
select
from_unixtime(unix_timestamp(), 'yyyy-MM-dd') as TODAY,
date_add(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1-cast(from_unixtime(unix_timestamp(), 'd') as int)) as LAST_DAY_PREV_MONTH,
substr(date_add(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1-cast(from_unixtime(unix_timestamp(), 'd') as int)), 1,7) as PREV_MONTH,
cast(substr(regexp_replace(date_add(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), -1-cast(from_unixtime(unix_timestamp(), 'd') as int)), '-',''), 1,6) as int) as PREV_MONTH_NUM
from WHATEVER limit 1
-- today last_day_prev_month prev_month prev_month_num
-- 2015-08-13 2015-07-30 2015-07 201507
See Hive documentation about date functions, string functions etc.
below works across year boundaries w/o complex calcs:
date_format(add_months(current_date, -1), 'yyyyMM') --previous month's yyyyMM
in general,
date_format(add_months(current_date, -n), 'yyyyMM') --previous n-th month's yyyyMM
use proper sign for needed direction (back/ahead)
You could do (year('2015-04-30')*100+month('2015-04-30'))-1 for the above mentioned date, it will return 201503 or something like (year(from_unixtime(unix_timestamp()))*100+month(from_unixtime(unix_timestamp())))-1 for today's previous month. Assuming your date column is in 'yyyy-mm-dd' format you can use the first example and substitute the date string with your table column name; for any other format the second example will do, add the column name in the unix_timestamp() operator.
Angelo's reply is a good start but it returns 201500 if the original date was 2015-01-XX. Building on his answer, I suggest using the following:
IF(month(${DATE}) = 1,
(year(${DATE})-1)*100 + 12,
year(${DATE})*100 + month(${DATE})-1
) as month_key
provided you get rid of those hyphens in your input string , previous date's month id in YYYYMM format you can get by:-
select if( ((${hiveconf:MonthId}-1)%100)=0 ,${hiveconf:MonthId}-89,${hiveconf:MonthId}-1 ) as PreviousMonthId;

convert Excel Date Serial Number to Regular Date

I got a column called DateOfBirth in my csv file with Excel Date Serial Number Date
Example:
36464
37104
35412
When i formatted cells in excel these are converted as
36464 => 1/11/1999
37104 => 1/08/2001
35412 => 13/12/1996
I need to do this transformation in SSIS or in SQL. How can this be achieved?
In SQL:
select dateadd(d,36464,'1899-12-30')
-- or thanks to rcdmk
select CAST(36464 - 2 as SmallDateTime)
In SSIS, see here
http://msdn.microsoft.com/en-us/library/ms141719.aspx
The marked answer is not working fine, please change the date to "1899-12-30" instead of "1899-12-31".
select dateadd(d,36464,'1899-12-30')
You can cast it to a SQL SMALLDATETIME:
CAST(36464 - 2 as SMALLDATETIME)
MS SQL Server counts its dates from 01/01/1900 and Excel from 12/30/1899 = 2 days less.
tldr:
select cast(#Input - 2e as datetime)
Explanation:
Excel stores datetimes as a floating point number that represents elapsed time since the beginning of the 20th century, and SQL Server can readily cast between floats and datetimes in the same manner. The difference between Excel and SQL server's conversion of this number to datetimes is 2 days (as of 1900-03-01, that is). Using a literal of 2e for this difference informs SQL Server to implicitly convert other datatypes to floats for very input-friendly and simple queries:
select
cast('43861.875433912' - 2e as datetime) as ExcelToSql, -- even varchar works!
cast(cast('2020-01-31 21:00:37.490' as datetime) + 2e as float) as SqlToExcel
-- Results:
-- ExcelToSql SqlToExcel
-- 2020-01-31 21:00:37.490 43861.875433912
this actually worked for me
dateadd(mi,CONVERT(numeric(17,5),41869.166666666664)*1440,'1899-12-30')
(minus 1 more day in the date)
referring to the negative commented post
SSIS Solution
The DT_DATE data type is implemented using an 8-byte floating-point number. Days are represented by whole number increments, starting with 30 December 1899, and midnight as time zero. Hour values are expressed as the absolute value of the fractional part of the number. However, a floating point value cannot represent all real values; therefore, there are limits on the range of dates that can be presented in DT_DATE. Read more
From the description above you can see that you can convert these values implicitly when mapping them to a DT_DATE Column after converting it to a 8-byte floating-point number DT_R8.
Use a derived column transformation to convert this column to 8-byte floating-point number:
(DT_R8)[dateColumn]
Then map it to a DT_DATE column
Or cast it twice:
(DT_DATE)(DT_R8)[dateColumn]
You can check my full answer here:
Is there a better way to parse [Integer].[Integer] style dates in SSIS?
Found this topic helpful so much so created a quick SQL UDF for it.
CREATE FUNCTION dbo.ConvertExcelSerialDateToSQL
(
#serial INT
)
RETURNS DATETIME
AS
BEGIN
DECLARE #dt AS DATETIME
SELECT #dt =
CASE
WHEN #serial is not null THEN CAST(#serial - 2 AS DATETIME)
ELSE NULL
END
RETURN #dt
END
GO
I had to take this to the next level because my Excel dates also had times, so I had values like this:
42039.46406 --> 02/04/2015 11:08 AM
42002.37709 --> 12/29/2014 09:03 AM
42032.61869 --> 01/28/2015 02:50 PM
(also, to complicate it a little more, my numeric value with decimal was saved as an NVARCHAR)
The SQL I used to make this conversion is:
SELECT DATEADD(SECOND, (
CONVERT(FLOAT, t.ColumnName) -
FLOOR(CONVERT(FLOAT, t.ColumnName))
) * 86400,
DATEADD(DAY, CONVERT(FLOAT, t.ColumnName), '1899-12-30')
)
In postgresql, you can use the following syntax:
SELECT ((DATE('1899-12-30') + INTERVAL '1 day' * FLOOR(38242.7711805556)) + (INTERVAL '1 sec' * (38242.7711805556 - FLOOR(38242.7711805556)) * 3600 * 24)) as date
In this case, 38242.7711805556 represents 2004-09-12 18:30:30 in excel format
In addition of #Nick.McDermaid answer I would like to post this solution, which convert not only the day but also the hours, minutes and seconds:
SELECT DATEADD(s, (42948.123 - FLOOR(42948.123))*3600*24, dateadd(d, FLOOR(42948.123),'1899-12-30'))
For example
42948.123 to 2017-08-01 02:57:07.000
42818.7166666667 to 2017-03-24 17:12:00.000
You can do this if you just need to display the date in a view:
CAST will be faster than CONVERT if you have a large amount of data, also remember to subtract (2) from the excel date:
CAST(CAST(CAST([Column_With_Date]-2 AS INT)AS smalldatetime) AS DATE)
If you need to update the column to show a date you can either update through a join (self join if necessary) or simply try the following:
You may not need to cast the excel date as INT but since the table I was working with was a varchar I had to do that manipulation first. I also did not want the "time" element so I needed to remove that element with the final cast as "date."
UPDATE [Table_with_Date]
SET [Column_With_Excel_Date] = CAST(CAST(CAST([Column_With_Excel_Date]-2 AS INT)AS smalldatetime) AS DATE)
If you are unsure of what you would like to do with this test and re-test! Make a copy of your table if you need. You can always create a view!
Google BigQuery solution
Standard SQL
Select Date, DATETIME_ADD(DATETIME(xy, xm, xd, 0, 0, 0), INTERVAL xonlyseconds SECOND) xaxsa
from (
Select Date, EXTRACT(YEAR FROM xonlydate) xy, EXTRACT(MONTH FROM xonlydate) xm, EXTRACT(DAY FROM xonlydate) xd, xonlyseconds
From (
Select Date
, DATE_ADD(DATE '1899-12-30', INTERVAL cast(FLOOR(cast(Date as FLOAT64)) as INT64) DAY ) xonlydate
, cast(FLOOR( ( cast(Date as FLOAT64) - cast(FLOOR( cast(Date as FLOAT64)) as INT64) ) * 86400 ) as INT64) xonlyseconds
FROM (Select '43168.682974537034' Date) -- 09.03.2018 16:23:28
) xx1
)
For those looking how to do this in excel (outside of formatting to a date field) you can do this by using the Text function https://exceljet.net/excel-functions/excel-text-function
i.e.
A1 = 132134
=Text(A1,"MM-DD-YYYY") will result in a date
This worked for me because sometimes the field was a numeric to get the time portion.
Command:
dateadd(mi,CONVERT(numeric(17,5),41869.166666666664)*1440,'1899-12-31')