Failed to cast variant value to DATE (parquet/snowflake) - pandas

I have a Pandas dataframe which includes two datetime64[ns] columns, one with just a date and the other with a timestamp.
>>> df
date ts
0 2020-01-06 2020-01-06 03:12:45
1 2020-01-07 2020-01-07 12:56:52
2 2020-01-08 2020-01-08 15:09:59
>>> df.info()
# Column Dtype
--- --------- ------------
0 date datetime64[ns]
1 ts datetime64[ns]
The idea is to save this dataframe into a Parquet file hosted on S3:
df.to_parquet('s3:/my-bucket-name/df.parquet', engine='fastparquet', compression='gzip')
... and using this file to COPY INTO a Snowflake table with two columns:
CREATE TABLE MY_TABLE (
date DATE,
ts TIMESTAMP
)
The command used to COPY is as follows, based on Snowflake's documentation:
copy into {schema}.{table}
from s3://my-bucket-name
credentials=(aws_key_id='{aws_key_id}' aws_secret_key='{aws_secret_key}')
match_by_column_name=case_insensitive
file_format=(type=parquet);
When executing the above command with a dataframe/file/table with only timestamp fields, everything runs fine. The problem comes when using it with a dataframe/file/table with a date field. In this case, an error shows up:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100071 (22000):
Failed to cast variant value "2020-06-16 00:00:00.000" to DATE
Is there a way to solve this issue?

Related

Converting dtype('O') to a date DD/MM/YYYY

im a begginer at coding and started a project for my small business.
i imported a xlsx file using panda and got the table with dtype('O') at every columns
me sheet
but i can't find anywhere a way to get only the date in the format DD/MM/YYYY
any tips?
i have tried this code
tabela['dt_nasc'] = pd.to_datetime(tabela['dt_nasc'], format='%m/%d/%Y')
but the results were
ValueError: time data '1988-10-24 00:00:00' does not match format '%m/%d/%Y' (match)
i also tried another code
import datetime
def convert_to_date(x):
return datetime.datetime.strptime(x , '%Y-%m-%d %H:%M:%S')
tabela.loc[:, 'dt_nasc'] = tabela['dt_nasc'].apply(convert_to_date)
# better to use a lambda function
tabela.loc[:, 'dt_nasc'] = tabela['dt_nasc'].apply(lambda x:datetime.datetime.strptime(x , '%Y-%m-%d %H:%M:%S'))
but couldn't find a way to print at format DD/MM/YYYY
Example
s = pd.Series({0: '2022-11-29 13:00:00', 1: '2022-11-30 13:48:00'})
s
0 2022-11-29 13:00:00
1 2022-11-30 13:48:00
dtype: object <-- chk dtype
Code
object to datetime
pd.to_datetime(s)
result
0 2022-11-29 13:00:00
1 2022-11-30 13:48:00
dtype: datetime64[ns] <--chk dtype
convert result to DD-MM-YYYY HH-mm-ss
pd.to_datetime(s).dt.strftime('%d-%m-%Y %H:%M:%S')
0 16-11-2022 13:00:00
1 17-11-2022 13:48:00
dtype: object <--chk dtype
convert result to DD/MM/YYYY
pd.to_datetime(s).dt.strftime('%d/%m/%Y')
0 29/11/2022
1 30/11/2022
dtype: object <-- chk dtype

Splitting unrecognized timestamp column into separate date and time columns

I have a problem splitting column name timedate and I want to split it into time and column date.
TimeDate
00:00:00 (01/01/2018)
01:00:00 (01/01/2018)
02:00:00 (01/01/2018)
I tried using pandas datetime method but it won't work
pd.to_datetime(df["Time / Date."]).dt.date
Got this error
('Unknown string format:', '00:00:00 (01/01/2018)')
Any idea how should I approach this problem?
Looks like you can just pass the format:
pd.to_datetime(df['TimeDate'], format='%H:%M:%S (%m/%d/%Y)').dt.date
Output:
0 2018-01-01
1 2018-01-01
2 2018-01-01
Name: TimeDate, dtype: object

convert 6 digit int into to yyyymm in pandas

I made a file that had three date columns:
pd.DataFrame({'yyyymm':[199501],'yyyy':[1995],'mm':[1],'Address':['AL1'],'Number':[12]})
yyyymm yyyy mm Address Number
0 199501 1995 1 AL1 12
and saved it as a file:
df.to_csv('complete.csv')
I read in the file with:
df=pd.read_csv('complete.csv')
and my 3 date columns are converted to int's, and not dates.
I tried to convert them back to dates with:
df['yyyymm']=df['yyyymm'].astype(str).dt.strftime('%Y%m')
df['yyyy']=df['yyyy'].dt.strftime('%Y')
df['mm']=df['mm'].dt.dtrftime('%m')
with the error:
AttributeError: Can only use .dt accessor with datetimelike values
Very odd, as the command I used to make the datetime column was:
df['yyyymm']=df['col2'].dt.strftime('%Y%m')
Am I missing something? HOw can I convert the 6 digit column back to yyyymm datetime, the 4 digit column to yyyy datetime, and the mm digit column back to datetime?
The columns yyyymm and yyyy and mm are integers. By using .astype(str), you convert these to strings. But a string has no .dt.
You can use pd.to_datetime(..) [pandas-doc] to convert these to a datetime object:
df['yyyymm'] = pd.to_datetime(df['yyyymm'].astype(str), format='%Y%m')
Indeed, this gives us:
>>> pd.to_datetime(df['yyyymm'].astype(str), format='%Y%m')
0 1995-01-01
Name: yyyymm, dtype: datetime64[ns]
The same can be done for the yyyy and mm columns:
>>> pd.to_datetime(df['yyyy'].astype(str), format='%Y')
0 1995-01-01
Name: yyyy, dtype: datetime64[ns]
>>> pd.to_datetime(df['mm'].astype(str), format='%m')
0 1900-01-01
Name: mm, dtype: datetime64[ns]

How to cast pandas datetime object to string without index

I'm trying to cast a datetime64 panda object to string without printing the index.
I have a csv file with the following
Dates
2019-06-01
2019-06-02
2019-06-03
When I import the csv file via pandas, I have a normal pandas object in the column.
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
This provides a datetime64[ns] object. I tried printing this object with the following output.
>>> What is the date 0 2019-06-01
Name: Dates, dtype: datetime64[ns]
So I have to cast this object to a string. The documentation suggests I use dt.strftime().
s=df["Dates"].dt.strftime("%Y-%m-%d")
print(f"What is the date {s['Dates'}")
The output for the above is:
>>> What is the date 0 2019-06-01
How do I remove the index from the output?
file = r'test.csv'
df = pd.read_csv(file)
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
s = df[df["Dates"] < "2019-06-02"]
print(f"What is the date {s['Dates']}")
print(s["Dates"])
The expected output is the following:
>>> What is the date 2019-06-01
However I am getting the following
>>> What is the date 0 2019-06-01
You can try:
[print(f"What is the date {x}") for x in s['Dates'].astype('str')]
gives:
What is the date 2019-06-01
What is the date 2019-06-02
What is the date 2019-06-03

Matplotlib Default date format?

I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')