Date parsing in Pandas from a pdf - pandas

I am totally new to Pandas and not managing. I have a pdf (in German) with my working schedule and I would like to read it into pandas, format the date, save it as a csv so I can import it into some calendar (google calendar or whatever). I am using pd.to_datetime and my problem is that I cannot parse the Start Date column to a standard date format.
This is the format that I have:
Start Date Start Time End Time Location Subject
Do., 10. Mai 2018 10:00 11:40 Spain Klettern
Any suggestions would be very much appreciated.

Check out the dateparser module as it should do a good job with these:
In [1]:
import dateparser
s = "Do., 10. Mai 2018"
dateparser.parse(s).date()
Out[1]: datetime.date(2018, 5, 10)

Related

Read multiple csv's and concat to multiple dataframes based on filenames python

I have a list of csv's with the same columns. Here is how the list looks like,
C:/Users/foo/bar/January01.csv
C:/Users/foo/bar/February01.csv
C:/Users/foo/bar/March01.csv
C:/Users/foo/bar/January02.csv
C:/Users/foo/bar/March02.csv
I want something like this, all csv's that start with January should copy the data into January dataframe and likewise for all months.
Can anyone help me on this?
You can first iterate trough your directory to find all months you have, then you pass again appending the dataframes and finally saves them:
import os
dir_name = #your dir
months = set()
for file in os.listdir(dir_name):
months.add(file[:-2])
month_df = {month: pd.DataFrame() for month in months}
for file in os.listdir(dir_name):
month_df[file[:-2]] = month[file[:-2]].append(pd.read_csv(os.join.path(dir_name, file)))
for month in month_df.keys():
month_df[month].to_csv(month + '.csv', index=False)

How can I convert a Matillion Job variable of type DateTime to a python datetime.datetime?

I have a Job variable last_updated of type DateTime. I'm trying to use it from a Python Script component, I thought it will show up as datetime.datetime object in python but it isn't :
print(latest_updated) # Sat Jan 01 00:00:00 UTC 2000
print(latest_updated.getClass()) # <type 'com.matillion.bi.emerald.server.scripting.MatillionDate'>
From what I see it seems to be a python bridge to some a Java's object of type MatillionDate.
How can I convert this to a regular datetime.datetime object?
It depends on what python interpreter you use:
Jython:
If you use Jython, then the Job variable of type DateTime are, as mentioned, instances of MatillionDate and not datetime. You can convert it back to datetime.datetime, by converting it to a Java's Instant, getting the ISO 8601 representation and parsing that:
from dateutil.parser import parse
from dateutil.tz import tzutc
as_iso_string = str(latest_updated.toInstant()) # 2000-01-01T00:00:00Z
asdt = parse(as_iso_string) # datetime.datetime(2000, 1, 1, 0, 0, tzinfo=tzlocal())) tzinfo should be tzutc() not tzlocal()
asdt = parse(as_iso_string).replace(tzinfo=None) # naive datetime
asdt = parse(as_iso_string).replace(tzinfo=tzutc()) # datetime.datetime(2000, 1, 1, 0, 0, tzinfo=tzutc())
Make sure that you get the timezone right, if your Job variable was meant to be in UTC you will need to do .replace(tzinfo=tzutc()) as for some reason dateutil.parser.parse() in Jython is not parsing the Z as timezone UTC (it does in regular python 2.7.12)
Python 2 / Python 3:
The variable would have the right type already datetime.datetime. No timezone
print(repr(latest_updated)) # datetime.datetime(2000, 1, 1, 0, 0)

Writing data to Excel give me 'ZIP does not support timestamps before 1980'

I hope to don't create any duplicate but I looked around (stack overflow and other forum) and I found some similar question but none of them solved my problem.
I have a python code that the only thing that does is query the DB, create a DataFrame in Pandas and write it to an Excel file.
The code worked without problem locally but when I introduced it in my server it start to give this error:
File "Test.py", line 34, in <module>
test()
File "Test.py", line 31, in test
ex.generate_file()
File "/home/carlo/Test/Utility/ExportExcell.py", line 96, in generate_file
writer.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/excel.py", line 1952, in save
return self.book.close()
File "/usr/local/lib/python2.7/dist-packages/xlsxwriter/workbook.py", line 306, in close
self._store_workbook()
File "/usr/local/lib/python2.7/dist-packages/xlsxwriter/workbook.py", line 677, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/usr/lib/python2.7/zipfile.py", line 1135, in write
zinfo = ZipInfo(arcname, date_time)
File "/usr/lib/python2.7/zipfile.py", line 305, in __init__
raise ValueError('ZIP does not support timestamps before 1980')
ValueError: ZIP does not support timestamps before 1980
To ensure that is everything ok I printed my DataFrame and for me it looks good even because when I run it locally it geenrate an excell file without problem:
Computer_System_Memory_Size Count_of_HostName Disk_Total_Size Number_of_CPU OS_Family
0 5736053088256 70 6072238035456 282660 Windows
1 96159653888 607 96630589440 2451066 vCenter
2 0 9 0 36342 Virtualization
3 2469361287143424 37 2389533519619072 149406 Unix
4 3691651514368 90 5817485303808 363420 Linux
I don't see any timestamp here and this is part of my code:
pivot = pd.DataFrame.from_dict(pivot) #pivot= information extracted from DB
pd.to_numeric(pivot['Count_of_HostName'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Disk_Total_Size'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Computer_System_Memory_Size'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
pd.to_numeric(pivot['Number_of_CPU'], downcast='signed')#try to enforce to be a numeric value in case it get confused with a datetime
print pivot
name = 'TempReport/Report.xlsx'#set-up file name
writer = pd.ExcelWriter(name, engine='xlsxwriter')#create excel with file name
pivot.to_excel(writer, 'Pivot', index=False)#introduce my data to excel
writer.save()#write to file, it's where it fail
Does someone know why it doesn't work in an Ubuntu 16.04 server without give me 'ZIP does not support timestamps before 1980' error?
I checked many things, library version, ensure that there are no data
XlsxWriter set the individual XML files that make up an XLSX file with a creation date of 1/1/1980 which is (I think) the ZIP epoch and the date used by Excel. This allows binary reproducibility of files created by XlsxWriter once the same input data and metadata is used.
It sets the date as follows (for the non-in-memory zipfile.py) case:
timestamp = time.mktime((1980, 1, 1, 0, 0, 0, 0, 0, 0))
os.utime(os_filename, (timestamp, timestamp))
The error that you are seeing occurs when this fails in some way and the date is set before 1/1/1980.
I've only seen this happen once before in a situation where the user was using a container and the container had a different time to the host system.
Do you have a situation like this or where the timestamp may be set incorrectly for some reason?
Update: Try run this in the same environment as the example that fails:
import os
import time
filename = 'file.txt'
file = open(filename, 'w')
file.close()
timestamp = time.mktime((1980, 1, 1, 0, 0, 0, 0, 0, 0))
os.utime(filename, (timestamp, timestamp))
print(time.ctime(os.path.getmtime(filename)))
# Should give:
# Tue Jan 1 00:00:00 1980
Update: This issue is fixed in XlsxWriter >= 1.1.9.
Try using this engine:
pd.to_excel('file_name.xlsx', engine = 'openpyxl')
This issue is fixed in XlsxWriter 1.2.1!

How to Reset Date/Time M500 Sport DV Camera?

I recently bought a M500 Sport DV Cam. I am unable to reset/change Date and Time. According to Manual, Cam will create SportDV.txt file in SDCard and we can change Date Time from SportDV.txt file.
But My Cam is not creating any SportDV.txt file. It only creates Two folders Data (which contains an empty base.dat file) and DCIM (Which contains videos and Images).
I tried to create file Manually, but It doesn't change Date/Time. I also tried different methods like creating files with name times.txt, time.txt, timeset.txt, tag.txt, settime.txt but nothing works.
I am unable to change Date and Time. It always shows Year 2158 instead of 2015.
Sample Date: 2158/8/14 22:10:22
I tried everything and failed. But I found the solution.
Open Notepad and Copy & Paste
SPORTS DV
UPDATE:N
FORMAT
EV:6
CTST:100
SAT:100
AWB:0
SHARPNESS:100
AudioVol:1
QUALITY:0
LIGHTFREQ:0
AE:0
RTCDisplay:1
year:2014
month:7
date:7
hour:16
minute:11
second:0
-------------------------------
Exposure(EV)
0 ~ 12, def:6
Contrast(CTST)
1 ~ 200, def:100
Saturation(SAT)
1 ~ 200, def:100
White Balance(AWB)
0 ~ 3, def:0, 0(auto), 1(Daylight), 2(Cloudy), 3(Fluorescent)
Sharpness
1 ~ 200, def:100
AudioVol
0 ~ 2, def:1, 0:Max 1:Mid 2:Min
QUALITY
0 ~ 2, def:0, 0:High 1:Middle 2:Low
LIGHTFREQ
0 ~ 1, def:0, 0:60Hz 1:50Hz
AUTO EXPOSURE(AE)
0 ~ 2, def:0, 0:Average 1:Center 2:Spot
RTCDisplay
0 ~ 1, def:1, 0:Off 1:On
year
2012 - 2038, def:2013
month
01 - 12, def:1
date
01 - 31, def:1
hour
00 - 23, def:0
minute
01 - 59, def:0
second
01 - 59, def:0
Set Update:N to Update:Y,
Change year, month, date ,
and save the file with the name SportDV and Encoding to UTF-8
For versions that have a time.bat file putting a N at the end of the timestamp in the time.txt file removes the timestamp from the video, ie time.txt:
2015.11.13 20:13:31 N
i have the more recent version of the m500 mini camera that doesnt use the sportdv.txt file
It looks same physically as earlier one, same leds, same decals but it instead after being reset has a time.bat file in the root of the card. executing this on a windows machine produced a file called time.txt except the format of this batch file doesnt work,
i edited the time.txt file and restated the camera and it worked after following andys format from his posting on the dx.com site
choose edit and then make sure you replace the (probably nonsense format) contents with 2015.11.13 20:13:31 - in this case that's YYYY.MM.DD HH:MM:SS click save. turn off/eject the camera. Power up now not connected to PC and make a short capture. Now when you check the content the date/time will hopefully be right?
afaik there is no updated firmware for this version of the camera to change from 3 min files or hide the time/date text :-(

dse pig datetime functions

Can someone give a full example of date time functions including the 'register' jar ? I have been trying to get CurrentTime() and ToDate() running without much success. I have the piggybank jar in classpath and registered the same. But it always says the function has to be defined before usage.
I read this question comparing datetime in pig before this.
Datetime functions can be easily implemented using native pig, you no need to go for piggybank jar.
Example:
In this example i will read set of dates from the input file, get the current datetime and calculate the total no of days between previous and current date
input.txt
2014-10-12T10:20:47
2014-08-12T10:20:47
2014-07-12T10:20:47
PigScript:
A = LOAD 'input.txt' AS (mydate:chararray);
B = FOREACH A GENERATE ToDate(mydate) AS prevDate,CurrentTime() AS currentDate,DaysBetween(CurrentTime(),ToDate(mydate)) AS diffDays;
DUMP B;
Output:
(2014-10-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 61)
(2014-08-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 122)
(2014-07-12T10:20:47.000+05:30, 2014-12-12T10:39:15.455+05:30, 153)
You can refer few examples from my old post
Human readable String date converted to date using Pig?
Storing Date and Time In PIG
how to convert UTC time to IST using pig