Splitting unrecognized timestamp column into separate date and time columns - pandas

I have a problem splitting column name timedate and I want to split it into time and column date.
TimeDate
00:00:00 (01/01/2018)
01:00:00 (01/01/2018)
02:00:00 (01/01/2018)
I tried using pandas datetime method but it won't work
pd.to_datetime(df["Time / Date."]).dt.date
Got this error
('Unknown string format:', '00:00:00 (01/01/2018)')
Any idea how should I approach this problem?

Looks like you can just pass the format:
pd.to_datetime(df['TimeDate'], format='%H:%M:%S (%m/%d/%Y)').dt.date
Output:
0 2018-01-01
1 2018-01-01
2 2018-01-01
Name: TimeDate, dtype: object

Related

Failed to cast variant value to DATE (parquet/snowflake)

I have a Pandas dataframe which includes two datetime64[ns] columns, one with just a date and the other with a timestamp.
>>> df
date ts
0 2020-01-06 2020-01-06 03:12:45
1 2020-01-07 2020-01-07 12:56:52
2 2020-01-08 2020-01-08 15:09:59
>>> df.info()
# Column Dtype
--- --------- ------------
0 date datetime64[ns]
1 ts datetime64[ns]
The idea is to save this dataframe into a Parquet file hosted on S3:
df.to_parquet('s3:/my-bucket-name/df.parquet', engine='fastparquet', compression='gzip')
... and using this file to COPY INTO a Snowflake table with two columns:
CREATE TABLE MY_TABLE (
date DATE,
ts TIMESTAMP
)
The command used to COPY is as follows, based on Snowflake's documentation:
copy into {schema}.{table}
from s3://my-bucket-name
credentials=(aws_key_id='{aws_key_id}' aws_secret_key='{aws_secret_key}')
match_by_column_name=case_insensitive
file_format=(type=parquet);
When executing the above command with a dataframe/file/table with only timestamp fields, everything runs fine. The problem comes when using it with a dataframe/file/table with a date field. In this case, an error shows up:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100071 (22000):
Failed to cast variant value "2020-06-16 00:00:00.000" to DATE
Is there a way to solve this issue?

Change Date Format from an array in SQL SELECT Statement

I have a column updated_at that returns an array
["2019-01-05T17:28:32.506-05:00","2019-06-15T13:22:02.625-04:00"]
But I want the output date format like this 2019-01-03.
How can I accomplish this in sql databricks?
Thanks!
Try unnest and cast that as a date:
with ts_array as
(select array['2019-01-05T17:28:32.506-05:00','2019-06-15T13:22:02.625-04:00'] as tsa)
select unnest(tsa)::date from ts_array ;
You can use "date_trunc" SQL function to get the output in date format.
date_trunc(fmt, ts) - Returns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
Examples:
> SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359');
2015-01-01 00:00:00
> SELECT date_trunc('MM', '2015-03-05T09:32:05.359');
2015-03-01 00:00:00
> SELECT date_trunc('DD', '2015-03-05T09:32:05.359');
2015-03-05 00:00:00
> SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359');
2015-03-05 09:00:00
Reference: Databricks - SQL Functions.
Hope this helps.

Sanitize a string column with a mix of datetime and timestamps

I have a column on BigQuery of String datatype that has a mix of date and timestamps as strings.
https://www.evernote.com/l/AmOc0thoaMRLJ7y1IFnLMxwLAeREujUtGRc
Tried SAFE_CAST, DATE_PARSE but neither work.
I want to be able to query this column uniformly as timestamps.
I want to be able to query this column uniformly as timestamps.
Below example for BigQuery Standard SQL
#standardSQL
SELECT close_date,
COALESCE(
TIMESTAMP_MILLIS(SAFE_CAST(close_date AS INT64)),
SAFE.PARSE_TIMESTAMP('%m/%d/%y', close_date)
) AS close_date_as_timestamp
FROM `project.dataset.table`
If to apply to your sample data - result is
Row close_date close_date_as_timestamp
1 1556064000000 2019-04-24 00:00:00 UTC
2 01/24/19 2019-01-24 00:00:00 UTC
3 1548892800000 2019-01-31 00:00:00 UTC
4 11/27/18 2018-11-27 00:00:00 UTC
Note: You can add to COALESCE as many different patterns as you expect in your data
For example you can add below to support 2019-01-01
SAFE.PARSE_TIMESTAMP('%Y-%m-%d', close_date)
And so on ...

Sql swapp value of minutes and month

While inserting and updating a table using java i have accidentally mixed up minute and month values. Now i have entries in my table like:
end_date
12.01.2016 00:05:00
27.01.2017 00:09:00
16.01.2010 00:07:00
I can trunc the time part using :
UPDATE myTable
SET end_date = trunc(end_date)
WHERE someCondition;
which gives me
12.01.2016 00:00:00
27.01.2017 00:00:00
16.01.2010 00:00:00
but before i do that i want to replace the month value with the minute value, so that i finally have :
12.05.2016 00:00:00
27.09.2017 00:00:00
16.07.2010 00:00:00
How can i do this?
If the value is a date -- and the dates are valid in both directions -- then probably the simplest way is to go back and forth to strings:
update myTable
set end_date = to_date(to_char(end_date, 'DD.MI.YYYY'), 'DD.MM.YYYY')
where . . .;

Selecting only time from date

If I do:
SELECT PRESERV_STARTED
FROM HARVESTED_L;
I will get values like:
23-12-1999 00:00:00
21-03-2000 22:01:37
...
And so on. (PRESERV_STARTED has type DATE)
What I want is only to select the date with time part, where the time is not 00:00:00, so that I can omit those.
There is a lot of info about a solution to this, saying I can do something like:
select cast(AttDate as time) [time]
from yourtable
And for older versions of sql server:
select convert(char(5), AttDate, 108) [time]
from yourtable
And yet other proposals are:
SELECT CONVERT(VARCHAR(8),GETDATE(),108)
I tried all of these, among a few others, but no luck.
So my question is, having a date like: 23-12-1999 00:00:00, how do I select the time part?
What comes most intuitive to me (mixing with the proposals I found) is something like:
SELECT CONVERT(VARCHAR(8), GETDATE(PRESERV_STARTED), 108) AS timePortion
FROM HARVESTED_L;
I get an error from this code, saying "Missing expression". In fact, this is the error I get from most of the proposals I tried.
I am using Oracle SQL Developer version 4.1.1.19
In Oracle you can just format the date(time) however you like:
SELECT TO_CHAR(preserv_started, 'HH24:MI:SS')
FROM harvested_l
If I understand correctly you want to select only the rows for which the time part of the date column is not 00:00:00. You don't have to get the time part in order to do this. You can use TRUNC function which (by default) returns date with the time part truncated. Here's an example:
SQL> select * from t;
ID D
---------- -------------------
1 2016-01-01 00:00:00
2 2016-01-01 00:01:00
3 2016-01-01 00:01:23
3 rows selected.
SQL> select * from t where d <> trunc(d);
ID D
---------- -------------------
2 2016-01-01 00:01:00
3 2016-01-01 00:01:23
2 rows selected.