xlwings - early dates (<=1900's) - pandas

Note:
xw.__version__
Out[84]: '0.10.2'
pd.__version__
Out[85]: '0.16.2'
I have the following df:
>>> df.head()
data
1900-01-31 0.0315
1900-02-28 0.0314583333333
1900-03-31 0.0314166666667
1900-04-30 0.031375
1900-05-31 0.0313333333333
and when I run:
xw.sheets(str(sht)).range(k).value = d_of_dfs[k]
I see the following in excel:
data
1900-02-01 0.0315
1900-02-29 0.031458333
1900-03-31 0.031416667
1900-04-30 0.031375
1900-05-31 0.031333333
1900-06-30 0.031291667
1900-07-31 0.03125
1900-08-31 0.031208333
1900-09-30 0.031166667
1900-10-31 0.031125
is xlwings hijacking the early date and messing it up?
Also - xlwings cannot handle dates prior to 1900 at all.

This issue stems from the way excel stores dates and a bug that harkens back to Lotus 123.
From http://www.cpearson.com/excel/datetime.htm
Dates
The integer portion of the number, ddddd, represents the number of
days since 1900-Jan-0. For example, the date 19-Jan-2000 is stored as
36,544, since 36,544 days have passed since 1900-Jan-0. The number 1
represents 1900-Jan-1. It should be noted that the number 0 does not
represent 1899-Dec-31. It does not. If you use the MONTH function
with the date 0, it will return January, not December. Moreover, the
YEAR function will return 1900, not 1899.
Actually, this number is one greater than the actual number of days.
This is because Excel behaves as if the date 1900-Feb-29 existed. It
did not. The year 1900 was not a leap year (the year 2000 is a leap
year). In Excel, the day after 1900-Feb-28 is 1900-Feb-29. In
reality, the day after 1900-Feb-28 was 1900-Mar-1 . This is not a
"bug". Indeed, it is by design. Excel works this way because it was
truly a bug in Lotus 123. When Excel was introduced, 123 has nearly
the entire market for spreadsheet software. Microsoft decided to
continue Lotus' bug, in order to fully compatible. Users who switched
from 123 to Excel would not have to make any changes to their data.
As long as all your dates later than 1900-Mar-1, this should be of no
concern.
This answer has more detail though I've just been converting the date to string which excel seems to parse correctly.
converted = str(datetime.date(1900, 1, 1))

Related

Pandas DateTimeSlicing for specific months per year

I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.
this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)

Querying data from quite old IBM iseries AS400 DB2 (date format issue)

I am quite a newbie in querying data from databases and now I am currently having an issue with date format in very old (I don't know exact version) IBM iSeries AS400 DB2 database. My problem is that the date is stored in this DB in three separate columns as whole number (column day + column month + column year) and I need to connect to this DB via ODBC in Excel and filter just a few rows according to desired date span (e.g. from 1st December 2019 until 31st December 2019). In this case I don't want to use PowerQuery to do all the modifications, because the complete table has millions of rows. I want to specify the filter criteria within SQL string so the PowerQuery doesn't have to load all the rows...
My approach was following:
I've created 6 parameter cells in Excel sheet where I simply defined Date From (e.g. cell 1 = '01', cell 2 = '12' and cell 3 = '2019') and Date To (same logic for parameter cells 4, 5 and 6). Then I mentioned these parameter cells in SQL string where I defined:
(Day >= Parameter cell 1, Month >= Parameter cell 2, Year >= Parameter cell 3)
and
(Day <= Parameter cell 4 etc.)
This worked quite good for me, but only when I liked to export just a few hundreds of lines within the same year. But now I am facing to an issue when I like to export data from 1st December 2019 to 31st January 2020. In this case my "logic" doesn't work, because Month From is '12' and Month To is '01'.
I've tried another approach with concat SQL function to create text column like '2019-12-01' and then convert this column to datetime format (with cast to varchar8 first), but it seems that this approach doesn't work for me, because everytime I get an error which says: "Global variable DATETIME not found".
Before I post you some of my code, could I ask you for an advise, if you can think of a better solution or approach for my issue?
Many thanks and have a great day :-)
A simple solution would be
select * from table where year * 100 + month between 201912 and 202001

SQL ORACLE Get week numbers from multiple datetime rows

I have 70.000 rows of data, including a date time column (YYYY-MM-DD HH24-MM-SS.).
I want to split this data into 3 separate columns; Hour, day and Week number.
The date time column name is 'REGISTRATIONDATE' from the table 'CONTRACTS'.
This is what I have so far for the day and hour columns:
SELECT substr(REGISTRATIONDATE, 0, 10) AS "Date",
substr(REGISTRATIONDATE, 11, 9) AS "Hour"
FROM CONTRACTS;
I have seen the options to get a week number for specific dates, this assignment concerns 70.000 dates so this is not an option.
You (the OP) still have to explain what week number to assign to the first few days in a year, until the first Monday of the year. Do you assign a week number for the prior calendar year? In a Comment I asked about January 1, 2017, as an example; that was a Sunday. The week from January 2 to January 8 of 2017 is "week 1" according to your definition; what week number do you assign to Sunday, January 1, 2017?
The straightforward calculation below assigns to it week number 0. Other than that, the computation is trivial.
Notes: To find the Monday of the week for any given date dt, we can use trunc(dt, 'iw'). iw stands for ISO Week, standard week which starts on Monday and ends on Sunday.
Then: To find the first Monday of the year, we can start with the date January 7 and ask for the Monday of the week in which January 7 falls. (I won't explain that one - it's easy logic and it has nothing to do with programming.)
To input a fixed date, the best way is with the date literal syntax: date '2017-01-07' for January 7. Please check the Oracle documentation for "date literals" if you are not familiar with it.
So: to find the week number for any date dt, compute
1 + ( trunc(dt, 'iw') - trunc(date '2017-01-07', 'iw') ) / 7
This formula finds the Monday of the ISO Week of dt and subtracts the first Monday of the year - using Oracle date arithmetic, where the difference between two dates is the number of days between them. So to find the number of weeks we divide by 7; and to have the first Monday be assigned the number 1, instead of 0, we need to add 1 to the result of dividing by 7.
The other issue you will have to address is to convert your strings into dates. The best solution would be to fix the data model itself (change the data type of the column so that it is DATE instead of VARCHAR2); then all the bits of data you need could be extracted more easily, you would make sure you don't have dates like '2017-02-29 12:30:00' in your data (currently, if you do, you will have a very hard time making any date calculations work), queries will be a lot faster, etc. Anyway, that's an entirely different issue so I'll leave it out of this discussion.
Assuming your REGISTRATIONDATE if formatted as 'MM/DD/YYYY'
the simples (and the faster ) query is based ond to to_char(to_date(REGISTRATIONDATE,'MM/DD/YYYY'),'WW')
(otherwise convert you column in a proper date and perform the conversio to week number)
SELECT substr(REGISTRATIONDATE, 0, 10) AS "Date",
substr(REGISTRATIONDATE, 11, 9) AS "Hour",
to_char(to_date(REGISTRATIONDATE,'MM/DD/YYYY'),'WW') as "Week"
FROM CONTRACTS;
This is messy, but it looks like it works:
to_char(
to_date(RegistrationDate,'YYYY-MM-DD HH24-MI-SS') +
to_number(to_char(trunc(to_date(RegistrationDate,'YYYY-MM-DD HH24-MI-SS'),'YEAR'),'D'))
- 2,
'WW')
On the outside you have the solution previous given by others but using the correct date format. In the middle there is an adjustment of a certain number of days to adjust for where the 1st Jan falls. The trunc part gets the first of Jan from the date, the 'D' gets the weekday of 1st Jan. Since 1 represents Sunday, we have to use -2 to get what we need.
EDIT: I may delete this answer later, but it looks to me that the one from #mathguy is the best. See also the comments on that answer for how to extend to a general solution.
But first you need to:
Decide what to do dates in Jan before the first Monday, and
Resolve the underlying problems in the date which prevent it being converted to dates.
On point 1, if assigning week 0 is not acceptable (you want week 52/53) it gets a bit more complicated, but we'll still be able to help.
As I see it, on point 2, either there is something systematically wrong (perhaps they are timestamps and include fractions of a second) or there are isolated cases of invalid data.
Either the length, or the format, or the specific values don't compute. The error message you got suggests that at least some of the data is "too long", and the code in my comment should help you locate that.

How to get Month from ISO Date and Week

I'm looking for a way to convert an ISO date to Month and I also need to covert an ISO week to Month.
I need to do this in Excel and Access.
I found this formula for excel but when converting it to Access it does not work. Is there a simple way to accomplish this?
I found this Excel formula to convert ISO date to month (C2=date) and it works perfect:
=MONTH(DATE(YEAR(C2),MONTH(C2)+(WEEKDAY(C2,2)+(DAY(DATE(YEAR(C2),MONTH(C2)+1,0)))-(DAY(C2))<4),(((7-(WEEKDAY(C2,2)))+(DAY(C2)))>3)))
But when I modify it for an Access query it does not return the correct values:
Date_to_Month:MONTH(DATESERIAL(YEAR([WW_Index].[ISO_date]),MONTH([WW_Index].[ISO_date])+(WEEKDAY([WW_Index].[ISO_date],2)+(DAY(DATESERIAL(YEAR([WW_Index].[ISO_date]),MONTH([WW_Index].[ISO_date])+1,0)))-(DAY([WW_Index].[ISO_date]))<4),(((7-(WEEKDAY([WW_Index].[ISO_date],2)))+(DAY([WW_Index].[ISO_date])))>3)))
I also need to convert ISO week_year to Month. I found this formula but it does not work:
=MONTH(DATE(YEAR(C2),1,-2)-WEEKDAY(DATE(YEAR(C2),1,3))+D2*7)
Example: week 18, 2012 is the ISO week of Apr 30 through May 6, 2012. There is less that 4 April days in this week thus week 18, 2012 is in May.
Any help would be greatly appreciated.
I don't know access but you can do the first part (date to month) much more easily in excel with this formula
=MONTH(C2-WEEKDAY(C2-1)+4)
That should be easier to convert for access......
For the second part you are finding the Monday of the relevant week, hence you get the wrong month in some cases, the Thursday of the week (midpoint) should always be within the correct month so you can just add 3 to get that (-2 becomes 1), i.e.
=MONTH(DATE(YEAR(C2),1,1)-WEEKDAY(DATE(YEAR(C2),1,3))+D2*7)
I assume C2 is a date within the relevant year and D2 is the ISO week number
but probably better to have C2 containing the year, e.g. just 2013 and then you can use
=MONTH(DATE(C2,1,1)-WEEKDAY(DATE(C2,1,3))+D2*7)

Dateadd error when subtracting from 0:00

I am trying to convert column with GMT hour to the specified time zones from the user.
I get an error when VBA attempts to subtract 18000 secs (GMT-5) from 01:00.
Selected_GMT = -18000
CellValue = "1/0/00 01:00"
New_Time = DateAdd("s", Selected_GMT,CellValue)
Is this error happening because VBA is unable to determine the hours before 00:00?
I have figured out the seconds for Selected_GMT, how can I use that to determine New_Time?
As ooo noted in a comment above, 1/0/00 is an invalid date code. However even if that was a typo in your question, the fact that the date uses a 2 digit year code begs the question "WHICH year 00?" Apologies if you already know this, but below I've extracted a recap of how Excel dates work from something that I've written elsewhere. The relevant part is "Day Zero And Before In Excel"; if the "00" actually represents *19*00 in the cell (as it will if you've just punched in "01:00 as the cell entry), you're going to run into problems subtracting from that. In which case, perhaps explicitly enter the date and time (perhaps using the current date) but hide the date component using formatting):
Excel uses a "date serial" system in which any date that you use in
calculations is represented as a positive integer value. That integer
value is calculated from an arbitrary starting date. Adding whole
numbers to a specific serial date moves you forward through the
calendar a day at a time, and subtracting whole numbers moves you
backwards... as long as you don't go past the starting date of the
serial number system and end up with a negative value. Times are
represented as fractions of a day; 0.25 for 6am, 0.5 for noon, 0.75
for 6pm and so on.
Excel Dates
In the case of Excel for Windows, the starting date is 1 January 1900. That is, if you enter the value 1 into a cell in Excel
and format it as a date, you'll see the value as 1 January 1900. 2
will be the 2nd of January 1900, 3 the 3rd of January, and so on. 367
represents 1 January 1901 because Excel treats 1900 as having been a
leap year with 366 days. In other words, every full day that passes
adds 1 to the serial date.
It's important to remember that the above relates to Excel only, and
not to Access, SQL Server or other database products (or Visual Basic,
for that matter). In Access, for example, the range of valid dates is
1 January 100 to 31 December 9999, the same range that can be stored
in a VB or VBA variable with a Date data type.
Excel And The Macintosh
Macintosh systems use a start date of 1 January 1904, neatly bypassing the 1900 leap year issue. However that
does mean that there's a 4 year discrepancy between the serial date
values in a workbook created in Excel for Windows, and one created in
Excel for the Mac. Fortunately under Tools -> Options-> Calculation
(on pre-2007 versions of Excel) you'll find a workbook option called
1904 Date System. If that's checked, Excel knows that the workbook
came from a Macintosh and will adjust its date calculations
accordingly.
Excel Times
As noted in the introduction, times are calculated as a
fraction of a day. For example 1.5 represents noon on 2 January 1900.
1.75 represents 6pm on 2 January 1900.
(Snipped a bit about the leap year bug in 1900)
From 1 March 1900 onward Excel's dates are correct, but if you format
the number 1 using the format dddd, mmmm dd, yyyy you'll get the
result Sunday, 1 January 1900. That is incorrect; 1 January 1900 was a
Monday, not a Sunday. This day of week error continues until you reach
1 March, which is the first truly correct date in the Excel calendar.
Day Zero And Before In Excel
If you use the value zero and display it
in date format you'll get the nonsense date Saturday 0 January 1900.
If you try to format a negative value as a date, you'll just get a
cell full of hash marks. Similarly if you try to obtain a date serial
number using Excel functions like DateValue, you can only do so for
dates on or after 1 January 1900. An attempt to specify an earlier
date will result in an error.
The 1904 (Macintosh) system starts from zero. (1 January 1904 has a
value of 0, not 1. Excel's on-line help describes the Mac system as
starting from January 2, but that's probably easier than explaining to
users why a serial date value of 0 works on the Mac but not Excel.)
Negative numbers won't generate an error, but the number will be
treated as absolute. That is, both 1 and -1 will be treated as 2
January 1904.