How can I convert a Matillion Job variable of type DateTime to a python datetime.datetime? - matillion

I have a Job variable last_updated of type DateTime. I'm trying to use it from a Python Script component, I thought it will show up as datetime.datetime object in python but it isn't :
print(latest_updated) # Sat Jan 01 00:00:00 UTC 2000
print(latest_updated.getClass()) # <type 'com.matillion.bi.emerald.server.scripting.MatillionDate'>
From what I see it seems to be a python bridge to some a Java's object of type MatillionDate.
How can I convert this to a regular datetime.datetime object?

It depends on what python interpreter you use:
Jython:
If you use Jython, then the Job variable of type DateTime are, as mentioned, instances of MatillionDate and not datetime. You can convert it back to datetime.datetime, by converting it to a Java's Instant, getting the ISO 8601 representation and parsing that:
from dateutil.parser import parse
from dateutil.tz import tzutc
as_iso_string = str(latest_updated.toInstant()) # 2000-01-01T00:00:00Z
asdt = parse(as_iso_string) # datetime.datetime(2000, 1, 1, 0, 0, tzinfo=tzlocal())) tzinfo should be tzutc() not tzlocal()
asdt = parse(as_iso_string).replace(tzinfo=None) # naive datetime
asdt = parse(as_iso_string).replace(tzinfo=tzutc()) # datetime.datetime(2000, 1, 1, 0, 0, tzinfo=tzutc())
Make sure that you get the timezone right, if your Job variable was meant to be in UTC you will need to do .replace(tzinfo=tzutc()) as for some reason dateutil.parser.parse() in Jython is not parsing the Z as timezone UTC (it does in regular python 2.7.12)
Python 2 / Python 3:
The variable would have the right type already datetime.datetime. No timezone
print(repr(latest_updated)) # datetime.datetime(2000, 1, 1, 0, 0)

Related

How do I convert a UNIX timestamp to ISO in CMake

I have a UNIX-style timestamp that looks like 1587405820 -0600 which I would like to convert to an ISO style format, something like YYYY-MM-DDTHH:MM:SSZ
CMake has a string(TIMESTAMP ...) command at https://cmake.org/cmake/help/v3.12/command/string.html#timestamp but this only gets me the current time in a formatted string which does not work for my application. I need to be able to convert an existing time into ISO format.
Is there a way to do this?
UPDATE
Based on #squareskittles answer, here's what I ended up with this test which is doing the right thing:
# Check that we get the current timestamp
string(TIMESTAMP TIME_T UTC)
message(STATUS ">>> T1: ${TIME_T}")
# Get the ISO string from our specific timestamp
set(ENV{SOURCE_DATE_EPOCH} 1587405820)
string(TIMESTAMP TIME_T UTC)
unset(ENV{SOURCE_DATE_EPOCH})
message(STATUS ">>> T2: ${TIME_T}")
# Check that we get the current timestamp again correctly
string(TIMESTAMP TIME_T UTC)
message(STATUS ">>> T3: ${TIME_T}")
Which gives me this output:
-- >>> T1: 2020-04-22T15:08:13Z
-- >>> T2: 2020-04-20T18:03:40Z
-- >>> T3: 2020-04-22T15:08:13Z
If you want this function to use a specific time other than than the current time, you can set the environment variable SOURCE_DATE_EPOCH to the UNIX-style timestamp (integer):
# Set the environment variable to a specific timestamp.
set(ENV{SOURCE_DATE_EPOCH} 1587405820)
# Convert to ISO format, and print it.
string(TIMESTAMP MY_TIME)
message(STATUS ${MY_TIME})
prints (for UTC -0600):
2020-04-20T12:03:40
If you need to adjust this time to UTC time, you can add the UTC argument:
set(ENV{SOURCE_DATE_EPOCH} 1587405820)
string(TIMESTAMP MY_TIME UTC)
message(STATUS ${MY_TIME})
prints:
2020-04-20T18:03:40Z
Note: If this SOURCE_DATE_EPOCH variable is used elsewhere in your CMake code, it is best to save the SOURCE_DATE_EPOCH value before modifying it, so it can be set back to its previous value when complete.

Writing data to Excel give me 'ZIP does not support timestamps before 1980'

I hope to don't create any duplicate but I looked around (stack overflow and other forum) and I found some similar question but none of them solved my problem.
I have a python code that the only thing that does is query the DB, create a DataFrame in Pandas and write it to an Excel file.
The code worked without problem locally but when I introduced it in my server it start to give this error:
File "Test.py", line 34, in <module>
test()
File "Test.py", line 31, in test
ex.generate_file()
File "/home/carlo/Test/Utility/ExportExcell.py", line 96, in generate_file
writer.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/excel.py", line 1952, in save
return self.book.close()
File "/usr/local/lib/python2.7/dist-packages/xlsxwriter/workbook.py", line 306, in close
self._store_workbook()
File "/usr/local/lib/python2.7/dist-packages/xlsxwriter/workbook.py", line 677, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/usr/lib/python2.7/zipfile.py", line 1135, in write
zinfo = ZipInfo(arcname, date_time)
File "/usr/lib/python2.7/zipfile.py", line 305, in __init__
raise ValueError('ZIP does not support timestamps before 1980')
ValueError: ZIP does not support timestamps before 1980
To ensure that is everything ok I printed my DataFrame and for me it looks good even because when I run it locally it geenrate an excell file without problem:
Computer_System_Memory_Size Count_of_HostName Disk_Total_Size Number_of_CPU OS_Family
0 5736053088256 70 6072238035456 282660 Windows
1 96159653888 607 96630589440 2451066 vCenter
2 0 9 0 36342 Virtualization
3 2469361287143424 37 2389533519619072 149406 Unix
4 3691651514368 90 5817485303808 363420 Linux
I don't see any timestamp here and this is part of my code:
pivot = pd.DataFrame.from_dict(pivot) #pivot= information extracted from DB
pd.to_numeric(pivot['Count_of_HostName'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Disk_Total_Size'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Computer_System_Memory_Size'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Number_of_CPU'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
print pivot
name = 'TempReport/Report.xlsx'#set-up file name
writer = pd.ExcelWriter(name, engine='xlsxwriter')#create excel with file name
pivot.to_excel(writer, 'Pivot', index=False)#introduce my data to excel
writer.save()#write to file, it's where it fail
Does someone know why it doesn't work in an Ubuntu 16.04 server without give me 'ZIP does not support timestamps before 1980' error?
I checked many things, library version, ensure that there are no data
XlsxWriter set the individual XML files that make up an XLSX file with a creation date of 1/1/1980 which is (I think) the ZIP epoch and the date used by Excel. This allows binary reproducibility of files created by XlsxWriter once the same input data and metadata is used.
It sets the date as follows (for the non-in-memory zipfile.py) case:
timestamp = time.mktime((1980, 1, 1, 0, 0, 0, 0, 0, 0))
os.utime(os_filename, (timestamp, timestamp))
The error that you are seeing occurs when this fails in some way and the date is set before 1/1/1980.
I've only seen this happen once before in a situation where the user was using a container and the container had a different time to the host system.
Do you have a situation like this or where the timestamp may be set incorrectly for some reason?
Update: Try run this in the same environment as the example that fails:
import os
import time
filename = 'file.txt'
file = open(filename, 'w')
file.close()
timestamp = time.mktime((1980, 1, 1, 0, 0, 0, 0, 0, 0))
os.utime(filename, (timestamp, timestamp))
print(time.ctime(os.path.getmtime(filename)))
# Should give:
# Tue Jan 1 00:00:00 1980
Update: This issue is fixed in XlsxWriter >= 1.1.9.
Try using this engine:
pd.to_excel('file_name.xlsx', engine = 'openpyxl')
This issue is fixed in XlsxWriter 1.2.1!

Date parsing in Pandas from a pdf

I am totally new to Pandas and not managing. I have a pdf (in German) with my working schedule and I would like to read it into pandas, format the date, save it as a csv so I can import it into some calendar (google calendar or whatever). I am using pd.to_datetime and my problem is that I cannot parse the Start Date column to a standard date format.
This is the format that I have:
Start Date Start Time End Time Location Subject
Do., 10. Mai 2018 10:00 11:40 Spain Klettern
Any suggestions would be very much appreciated.
Check out the dateparser module as it should do a good job with these:
In [1]:
import dateparser
s = "Do., 10. Mai 2018"
dateparser.parse(s).date()
Out[1]: datetime.date(2018, 5, 10)

Using Time::Piece with Apache::Log::Parser

I am using Apache::Log::Parser to parse Apache log files.
I extracted the date from log file using the following code.
my $parser = Apache::Log::Parser->new(fast=>1);
my $log = $parser->parse($data);
$t = $log->{date};
Now,I tried to use Time::Piece to parse dates, but I'm unable to do it.
print "$t->day_of_month";
But, it's not working. How to use Time::Piece to parse date?
You cannot call methods on objects inside of string interpolation. It will probably output something like this:
Sat Feb 18 12:44:47 2017->day_of_month
Remove the double quotes "" to call the method.
print $t->day_of_month;
Now the output is:
18
Note that you need to create a Time::Piece object with localtime or gmtime if you have an epoch value in your log, or using strptime if the date is some kind of timestamp.

dse pig datetime functions

Can someone give a full example of date time functions including the 'register' jar ? I have been trying to get CurrentTime() and ToDate() running without much success. I have the piggybank jar in classpath and registered the same. But it always says the function has to be defined before usage.
I read this question comparing datetime in pig before this.
Datetime functions can be easily implemented using native pig, you no need to go for piggybank jar.
Example:
In this example i will read set of dates from the input file, get the current datetime and calculate the total no of days between previous and current date
input.txt
2014-10-12T10:20:47
2014-08-12T10:20:47
2014-07-12T10:20:47
PigScript:
A = LOAD 'input.txt' AS (mydate:chararray);
B = FOREACH A GENERATE ToDate(mydate) AS prevDate,CurrentTime() AS currentDate,DaysBetween(CurrentTime(),ToDate(mydate)) AS diffDays;
DUMP B;
Output:
(2014-10-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 61)
(2014-08-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 122)
(2014-07-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 153)
You can refer few examples from my old post
Human readable String date converted to date using Pig?
Storing Date and Time In PIG
how to convert UTC time to IST using pig