this is my code I did not put hour and seconds in my data but the result I got includes time, do you know if it is possible to remove the time?
input
fitness['Date']=pd.to_datetime(fitness['Date'])
result
1970-01-01 00:00:00.000000000 1979-09-09
1970-01-01 00:00:00.000000001 1979-09-09
1970-01-01 00:00:00.000000002 1979-09-09
1970-01-01 00:00:00.000000003 1979-09-09
1970-01-01 00:00:00.000000004 1979-09-09
As discussed here, the dt.strftime function can be utilised to format the output.
fitness['Date_formatted'] = pd.to_datetime(fitness['Date']).dt.strftime('%Y-%m-%d')
This will format the output of to_datetime() to follow %Y-%m-%d. However, the dtype of the column will be converted to string. You can find documentation for the function here.
Related
Could someone help me understand why these two queries are returning different results in bigquery?
select FORMAT_TIMESTAMP('%F %H:%M:%E*S', "2018-10-01 00:00:00" , 'Europe/London')
returns 2018-10-01 01:00:00
select PARSE_TIMESTAMP('%F %H:%M:%E*S', "2018-10-0100:00:00", "Europe/London")
returns 2018-09-30 23:00:00 UTC
As 2018-10-01 is during british summer time (UTC +1), I would've expected both queries to return 2018-09-30 23:00:00 UTC
The first is given a timestamp which is in UTC. It then converts it to the corresponding time in Europe/London. The return value is a string representing the time in the local timezone.
The second takes a string representation and returns a UTC timestamp. The representation is assumed to be in Europe/London.
So, the two functions are going in different directions, one from UTC to the local time and the other from the local time to UTC.
A column in dataframe keeps date like:
2019-06-19 23:04:36
2018-06-29 20:06:56
2019-03-04 11:12:35
2019-07-12 21:16:44
I tried the below code but it gives not correct results:
df['timestamps'] = pd.to_datetime(df['datetimes']).astype('int64') / 10**9
The results are like these:
1.465506e+09
1.465516e+09
1.465503e+09
If I convert them again to date, I get incorrect date time:
df['new'] = df['timestamps'].apply(lambda x: pd.Timestamp(x))
1970-01-01 00:00:01.465506396
1970-01-01 00:00:01.465506397
1970-01-01 00:00:01.465506397
Something is not correct...
What is the way to convert date time that is as string like "2019-06-19 23:04:36" to timestamp?
Thank you.
I have the following code that WORKS but I am unable to recreate because I do not understand WHY it works. If you plug it in w3schools it compiles successfully.
I do not understand how "1501532100" is parsed into a working date function. individually, I can see how dateadd() and format works, but why does it work the way it does and how can I reverse engineer the rest of the integers into proper dates?
SELECT FORMAT((dateadd(s, 1501532100, '1969-12-31 20:00')), 'MM.dd.yyy');
RETURNS: 07.31.2017
dateadd accepts 3 arguments: interval, number and date. When interval is s, it means that number will be treated as seconds, so it will add that many seconds to the date specified and return the result, which will then be displayed in the MM.dd.yyy format.
You can think of the first argument of dateadd as a measurement unit of the second one.
From: https://www.w3schools.com/sql/func_sqlserver_dateadd.asp
DATEADD(interval, number, date)
interval here is s - seconds,
number is 1501532100
date being 1969-12-31 20:00
all that does is just adds 1501532100 seconds to 1969-12-31 20:00
I have a column eventtime that only stores the time of day as string. Eg:
0445AM - means 04:45 AM. I am using the below query to convert to UNIX timestamp.
select unix_timestamp(eventtime,'hhmmaa'),eventtime from data_raw limit 10;
This seems to work fine for test data. I always thought unixtimestamp is a combination of date and time while here I only have the time. My question is what date does it consider while executing the above function? The timestamps seem to be quite small.
Unix timestamp is the bigint number of seconds from Unix epoch (1970-01-01 00:00:00 UTC). The unix time stamp is a way to track time as a running total of seconds.
select unix_timestamp('0445AM','hhmmaa') as unixtimestamp
Returns
17100
And this is exactly 4hrs, 45min converted to seconds.
select 4*60*60 + 45*60
returns 17100
And to convert it back use from_unixtime function
select from_unixtime (17100,'hhmmaa')
returns:
0445AM
If you convert using format including date, you will see it assumes the date is 1970-01-01
select from_unixtime (17100,'yyyy-MM-dd hhmmaa')
returns:
1970-01-01 0445AM
See Hive functions dosc here.
Also there is very useful site about Unix timestamp
I am handling JSON data containing a date as per this example 'MON 2014-01-03 13:00:00 +GMT0000'
I need to compare records on date.
Is it best to load as strings and manipulate as and when required?
A requirement will be to select the highest and lowest dates for a particular criteria, and calculate the difference in seconds.
Thanks for looking.
Best solution for your problem is to use unixtimestamp (seconds since standard epoch of 1/1/1970)
Following is an example query, as to how you parse the timestamp-strings to unixtimestamp.
select unix_timestamp(REGEXP_REPLACE('MON 2014-01-03 13:00:00 +GMT0000','GMT',''),
"EEE yyyy-MM-dd HH:mm:ss Z") as unixtime from reqtable;
You will have more details here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
Also you should take a look into Java SimpleDateFormat to match the exact timestamp string pattern.