Errors while Copying Workbook using JExcel API - spreadsheet

I am trying to copy a large Excel file with multiple worksheets and a ton of formulas using JExcel API; however, when I run the following code--
Workbook w = Workbook.getWorkbook(inputWorkbook);
WritableWorkbook wcopy = Workbook.createWorkbook(new File("C:/filename.xls"), w);
I receive the errors below...
>jxl.common.AssertionFailed
>>> at jxl.common.Assert.verify(Assert.java:37)
>>> at jxl.write.biff.SheetCopier.copySheet(SheetCopier.java:329)
>>> at jxl.write.biff.WritableSheetImpl.copy(WritableSheetImpl.java:1584)
>>> at jxl.write.biff.WritableWorkbookImpl.copyWorkbook(WritableWorkbookImpl.java:971)
>>> at jxl.write.biff.WritableWorkbookImpl.<init>(WritableWorkbookImpl.java:343)
>>> at jxl.Workbook.createWorkbook(Workbook.java:339)
>>> at jxl.Workbook.createWorkbook(Workbook.java:320)
>>> at shortcut.ModifyCell.getValue(ModifyCell.java:66)
>>> at shortcut.ModifyCell.main(ModifyCell.java:88)
>Exception in thread "main" jxl.common.AssertionFailed
>>> at jxl.common.Assert.verify(Assert.java:37)
>>> at jxl.write.biff.SheetCopier.copySheet(SheetCopier.java:329)
>>> at jxl.write.biff.WritableSheetImpl.copy(WritableSheetImpl.java:1584)
>>> at jxl.write.biff.WritableWorkbookImpl.copyWorkbook(WritableWorkbookImpl.java:971)
>>> at jxl.write.biff.WritableWorkbookImpl.<init>(WritableWorkbookImpl.java:343)
>>> at jxl.Workbook.createWorkbook(Workbook.java:339)
>>> at jxl.Workbook.createWorkbook(Workbook.java:320)
>>> at shortcut.ModifyCell.getValue(ModifyCell.java:66)
>>> at shortcut.ModifyCell.main(ModifyCell.java:88)
and a blank Excel file is created with the given name. I was wondering if there are any known limitations to copying spreadsheets with JExcel API--for example, formats or formulas that cannot be read, size limitations, macro limitations, etc?
(When I ran the code on their sample spreadsheet, it copied perfectly fine, so I don't think it's a code issue, although I could be wrong.)
//Edit: Here are the warnings I received (multiple of each):
Reading...
Warning: Property storage name for 5 is empty - setting to Root Entry
Warning: Usage of a local non-builtin name
Copying...
Warning: Shared template formula is null - trying most recent formula template
Warning: Cell F155 already contains data
Warning: Unknown shape typeWarning: Unknown shape type
Warning: Cell at H45 not present - adding a blank
Warning: Cell at J45 not present - adding a blank
Warning: Cell H45 already contains data
Warning: Cell J45 already contains data

Whenever JXL reads formulas, it will give you a logger warning. If you change all of the formulas to numbers, the warnings will go away.

Related

Does Pandas DataFrame constructor (and helper construction functions) release the GIL while copying the source array?

Preamble
On another question I understood that python constructors routines does not make a copy of the provided numpy array only if the data-type of the array is the same for all the entries. In case the constructor is fed with a structured numpy array with different types on columns, it makes a copy.
Implementation reference
df_dict = {}
for i in range(5):
obj = Object(1000000)
arr = obj.getNpArr()
print(arr[:10])
df_dict[i] = pandas.DataFrame.from_records(arr)
print("The DataFrames are :")
for i in range(5):
print(df_dict[i].head(10))
In this case Object(N) constructs an instance of Object which internally allocates and initializes a 2D array of shape (N,3), with dtypes 'f8','i4','i4' on each row. Object manages the life of these data, deallocating it on destruction. The function Object.getNpArr() returns a np.recarray pointing to the internal data and it has the above mentioned dtype. Importantly, the returned array does not own the data, it is just a view.
Problem
The DataFrame printed at the end show corrupted data (with respect to the printed array inside the first loop). I am not expecting such behaviour, since the array fed to the pandas construction function is copied (I separately checked this behaviour).
I have not many ideas about the cause and solutions to avoid data corruption. The only guess I can make is:
the constructor starts allocating the memory for its own data, which takes long because of the big size, and then copies
before/during the allocation/copy, the GIL is released and it is taken back to the for loop
for loop proceed before the copy of the array is completed, going to the next iteration
at the next iteration the obj name is moved to the new Object and the memory is deallocated, which causes data corruption in the copy of the DataFrame at the previous iteration, which is probably still running.
If this is really the cause of the issue, how can I find a workaround? Is there a way to let the GIL go through only when the copy of the array is effectively done?
Or, if my guess is wrong, what is the cause of the data corruption?

How does DataFrame.interpolation() work in its source code?

Since I could not find the declarations of the single methods of DataFrame.interpolation()'s "method"-parameter, I am asking here:
How does pandas' DataFrame.interpolation() work in relation to the amount of rows it considers, is it just the row before the NaNs and the row right after?
Or is it the whole DataFrame (how does that work at 1 million rows?)
If you already know where to look, feel free to share the link to the source-code (since https://github.com/pandas-dev/pandas/blob/06d230151e6f18fdb8139d09abf539867a8cd481/pandas/core/frame.py#L10916 doesnt include the "method"'s declarations (for example "polynomial").
I found the attached in core/missing.py.
My interpretation is that interpolation is either done with np.iter or, if method is specified and only available in scipy, with _interpolate_scipy_wrapper. A function which I could not locate but a reasonable guess is that it is a wrapper for scipy.
if method in NP_METHODS:
# np.interp requires sorted X values, #21037
indexer = np.argsort(indices[valid])
yvalues[invalid] = np.interp(
indices[invalid], indices[valid][indexer], yvalues[valid][indexer]
)
else:
yvalues[invalid] = _interpolate_scipy_wrapper(
indices[valid],
yvalues[valid],
indices[invalid],
method=method,
fill_value=fill_value,
bounds_error=bounds_error,
order=order,
**kwargs,
)
yvalues[preserve_nans] = np.nan

Memory error with numpy. array

I get a memory error when using numpy.arange with large numbers. My code is as follows:
import numpy as np
list = np.arange(0, 10**15, 10**3)
profit_list = []
for diff in list:
x = do_some_calculation
profit_list.append(x)
What can be a replacement so I can avoid getting the memory error?
If you replace list¹ with a generator, that is, you do
for diff in range(10**15, 10**3):
x = do_some_calculation
profit_list.append(x)
then that will no longer cause MemoryErrors as you no longer initiate the full list. In this world, though, profit_list will probably by causing issues instead, as you are trying to add 10^12 items to that. Again, you can probably get around that by not storing the values explicitly, but rather yield them as you need them, using generators.
¹: Side note: Don't use list as a variable name as it shadows a built-in.

pivot_table error - InvalidOperation: [<class 'decimal.InvalidOperation'>]

The above error is being raised from a pivot_table operation for a variable set to be the column grouping (if it matters, it's failing in the format.py module)
/anaconda/lib/python3.4/site-packages/pandas/core/format.py in __call__(self, num)
2477 sign = 1
2478
-> 2479 if dnum < 0: # pragma: no cover
2480 sign = -1
2481 dnum = -dnum
(Pandas v17.1)
If I create random values for the 'problem' variable via numpy there is no error.
Whilst I doubt it's an edge case for the pivot_table function, I can't figure out what might be causing the problem on the data side:
i) The variable is the first integer from a modest sized sequence of integers (eg 2 from 246) (via df.var.str[0]).
ii) pd.unique(df.var) returns the expected 1-9 values
iii) There are no NaNs: notnull(df.var).all() returns True
iv) The dtype is int64 (or if the integer is cast as a string - or set to label these alternatives still fail with the same error)
v) a period index is used - and that forms the index for pivot table.
vi) the aggregation is 'count'
Creating a another variable with random values with those characteristics (1-9 values from from numpy's random.randint) - the pivot_table call works. If I cast it as a string, or use labels, it still works.
Likewise, I've been playing with the data set for a while - usually on some other position in the sequence without issue. But today - the first place is causing a problem.
Possibly, it's a data issue - but why doesn't pivot_table return empty cells or NaNs, rather than failing at that point.
But I'm at a loss after a day exploring.
any thoughts on why the above error is being raised would be much appreciated (as it'll help me track down the data issue if that is the case).
thanks
Chris
The simplest solution is to reset pandas formatting options by
pd.set_option('display.float_format', None)
further details
I had encoutered same problem. As a workaround you can also filter dataframe that is pivoted to avoid NaNs in result.
My problem is related to use of pd.set_eng_float_format(2, True). Without this all pivots works well.

In Gimp script-fu, how can you access QuickMask functionality?

In the Gimp GUI, the QuickMask is very useful for many things, but this functionality doesn't seem to be directly available through script-fu. No obvious equivalents were apparent to me in the procedure browser.
In particular, putting the (value/gray) pixels of a layer into the selection mask is the basic thing I need to do. I tried using gimp-image-get-selection to get the selection channel's id number, then gimp-edit-paste into it, but the following anchor operation caused Gimp to crash.
My other answer contains the "theoretical" way of doing it - however, the O.P. found a bug in GIMP, as of version 2.6.5, as can be seem on the comments to that answer.
I got a workaround for what the O.P. intends to do: paste the contents of a given image layer to the image selection. As denoted, edit-copy -> edit-paste on the selection drawable triggers a program crash.
The workaround is to create a new image channel with the desired contents, through the copy and paste method, and then use gimp-selection-load to make the selection equal the channel contents:
The functions that need to be called are thus (I won't paste scheme code, as I am not proficient in all the parenthesis - I did the tests using the Python console in GIMP):
>>> img = gimp.image_list()[0]
>>> ch = pdb.gimp_channel_new(img, img.width, img.height, "bla", 0, (0,0,0))
>>> ch
<gimp.Channel 'bla'>
>>> pdb.gimp_edit_copy(img.layers[0])
1
>>> pdb.gimp_image_add_channel(img, ch, 0)
>>> fl = pdb.gimp_edit_paste(ch, 0)
> >> fl
<gimp.Layer 'Pasted Layer'>
>>> pdb.gimp_floating_sel_anchor(fl)
>>> pdb.gimp_selection_load(ch)
Using QuickMask through the User interface is exactly equivalent to draw on the Selection, treating the selection as a drawable object.
So, to use the equivalent of "quickmask" on script-fu all one needs to is to retrieve the Selection as a drawable and pass that as a parameter to the calls that will modify it -
And to get the selection, one just have to call 'gimp-image-get-selection'