Xlsxwriter: Grouping using set_column after setting width with set_column - xlsxwriter

I'm using xlsxwriter to write data and afterwards autofit the columns to the maximum string length of every column.
For that I'm using something like for every single column (as every column has different max string lengths):
Sheet1.set_column(0, 0, 15)
At the end of my script I want to group a few columns together. Hence using something like this from the doc:
Sheet1.set_column(0, 10, None, None, {'level': 1})
The grouping shows but not for the desired columns. Am I doing something wrong? Interestingly, the formatting (i.e. the column width) of one of the grouped columns went away, somehow seems to get overwritten. Also I tried something like set_column('A:D', None, None, {'level': 1}) but doesn't work either.
When grouping an empty sheet, ie without writing any data, hence without applying any styles, it works. Isn't it possible to use consecutive set_columns on the same columns??
Thanks a lot in advance

Isn't it possible to use consecutive set_column() on the same columns??
No. Any call to set_column() will overwrite previous calls in the same range.
So you will need to group together all the options that you want to set, such as width, format or grouping, and apply them in one go.
Also, you will need to set overlapping ranges separately. Like this:
# Not like this!
worksheet.set_column(0, 9, 20, None, {'level': 1})
worksheet.set_column(4, 5, 30, None, {'level': 1})
# Use separate non-overlapping ranges.
worksheet.set_column(0, 3, 20, None, {'level': 1})
worksheet.set_column(4, 5, 30, None, {'level': 1})
worksheet.set_column(6, 9, 20, None, {'level': 1})

Related

How to change the order numpy stores the data?

I have a numpy array that I need to change the order of the axis.
To do that I am using moveaxis() method, which only returns a view of the input array, by changing only the strides of the array.
However, this does not change the order that the data are stored in the memory. This is problematic for me because I need to pass this reorderd array to a C code in which the order that the data are stored matters.
import numpy as np
a=np.arange(12).reshape((3,4))
a=np.moveaxis(a,1,0)
In this example, a is originally stored continuously in the memory as [0,1,2,...,11].
I would like to have it stored [0,4,8,1,5,9,2,6,10,3,7,11], and obviously moveaxis() did not do the trick
How could I force numpy to rewrite the array in the memory the way I want? I precise that contrary to my simple example, I am manipulating 3D or 4D data, so I cannot simply change the ordering from row to col major when I create it.
Thanks!
The order parameter of the numpy.reshape(...,order='F') function does exactly what you want
a=np.arange(12).reshape((4,3),order='F')
a.flatten()
array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])

LGBMClassifier + Unbalanced data + GridSearchCV()

The dependent variable is binary, the unbalanced data is 1:10, the dataset has 70k rows, the scoring is the roc curve, and I'm trying to use LGBM + GridSearchCV to get a model. However, I'm struggling with the parameters as sometimes it doesn't recognize them even when I use the parameters as the documentation shows:
params = {'num_leaves': [10, 12, 14, 16],
'max_depth': [4, 5, 6, 8, 10],
'n_estimators': [50, 60, 70, 80],
'is_unbalance': [True]}
best_classifier = GridSearchCV(LGBMClassifier(), params, cv=3, scoring="roc_auc")
best_classifier.fit(X_train, y_train)
So:
What is the difference between putting the parameters in the GridsearchCV() and params?
As it's unbalanced data, I'm trying to use the roc_curve as the scoring metric as it's a metric that considers the unbalanced data. Should I use the argument scoring="roc_auc" put it in the params argument?
The difference between putting the parameters in GridsearchCV()or params is mentioned in the docs of GridSearch:
When you put it in params:
Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries,
in which case the grids spanned by each dictionary in the list are
explored. This enables searching over any sequence of parameter
settings.
And yes you can put the scoring also in the params.

Identifying the index of an array inside a 2D array

I'm trying out an inventory system in python 3.8 using functions and numpy.
While I am new to numpy, I haven't found anything in the manuals for numpy for this problem.
My problem is this specifically:
I have a 2D array, in this case the unequipped inventory;
unequippedinv = [[""], [""], [""], [""], ["Iron greaves", 15, 10, 10]]
I have an if statement to ensure that the item selected is acceptable. I'm now trying to remove the entire index ["Iron greaves", 15, 10, 10] using unequippedinv.pop(unequippedinv.index(item)) but I keep getting the error ValueError: "'Iron greaves', 15, 10, 10" is not in list
I've tried using numpy's where and argwhere but instead just got [] as the outcome.
Is there a way to search for an entire array in a 2D array, such as how SQL has SELECT * IN y WHERE x IS b but in which it gives me the index for the entire row?
Note: I have now found out that it is something to do with easygui's choicebox, which, I assume, turns the chosen array into a string which is why it creates an error.

If I pass a ndarray view to a function I can find its base but how can I find the slice?

numpy slicing e.g. S=np.s_[1:-1]; V=A[1:-1], produces a view of the underlying array. I can find this underlying array by V.base. If I pass such a view to a function, e.g.
def f(x):
return x.base
then f(V) == A. But how can I find the slice information S? I am looking for an attribute something like base containing information on the slice that created this view. I would like to be able to write a function to which I can pass a view of an array and return another view of the same array calculated from the view. E.g. I would like to be able to shift the view to the right or left of a one dimensional array.
As far as I know the slicing information is not stored anywhere, but you might be able to deduce it from attributes of the view and base.
For example:
In [156]: x=np.arange(10)
In [157]: y=x[3:]
In [159]: y.base
Out[159]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [160]: y.data
Out[160]: <memory at 0xb1a16b8c>
In [161]: y.base.data
Out[161]: <memory at 0xb1a16bf4>
I like the __array_interface__ value better:
In [162]: y.__array_interface__['data']
Out[162]: (163056924, False)
In [163]: y.base.__array_interface__['data']
Out[163]: (163056912, False)
So y databuffer starts 12 bytes beyond x. And since y.itemsize is 4, this means that the slicing start is 3.
In [164]: y.shape
Out[164]: (7,)
In [165]: x.shape
Out[165]: (10,)
And comparing the shapes, I deduce that the slice stop is None (the end).
For 2d arrays, or stepped slicing you'd have to look at the strides as well.
But in practice it is probably easier, and safer, to pass the slicing object (tuple, slice, etc) to your function, rather than deduce it from the results.
In [173]: S=np.s_[1:-1]
In [174]: S
Out[174]: slice(1, -1, None)
In [175]: x[S]
Out[175]: array([1, 2, 3, 4, 5, 6, 7, 8])
That is, pass S itself, rather than deduce it. I've never seen it done before.

Numpy.trim_zeros for structured array without creating new array

Is it possible to trim zero 'records' of a structured numpy array without copying it; i.e. free allocated memory for the 'unused' zero entries at the beginning or the end; actually, I am only interested in trimming zeros at the end.
There is a builtin function numpy.trim_zeros() for 1d arrays. Its return value:
Returns:
trimmed : 1-D array or sequence
The result of trimming the input. The input data type is preserved.
However, I can't say from this whether this does not create a copy and only frees memory. I am not proficient enough to tell from its source code its behaviour.
More specifically, I have following code:
import numpy
edges = numpy.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
# fill the first two records with sensible data:
edges[0]['i'] = 0
edges[0]['j'] = 1
edges[0]['length'] = 2.0
edges[1]['i'] = 1
edges[1]['j'] = 2
edges[1]['length'] = 2.0
# list memory adress and size
edges.__array_interface__
edges = numpy.trim_zeros(edges) # does not work for structured array
edges.__array_interface__
UPDATE
My question is somewhat 'twofold':
1) Does the builtin function simply frees memory or does it copy the array?
Answer: it copies creates a slice (=view); [ipython console] import numpy; numpy?? (see also Resize NumPy array to smaller size without copy and View onto a numpy array?)
2) What be a solution to have similar functionality for structured arrays?
Answer:
begin=(edges!=numpy.zeros(1,edges.dtype)).argmax()
end=len(edges)-(edges!=numpy.zeros(1,edges.dtype))[::-1].argmax()
# 1) create slice without copy but no memory is free
goodedges=edges[begin:end]
# 2) or copy and free memory (temporary both arrays exist)
goodedges=edges[begin:end].copy()
del edges
IMHO, there is two problem.
First, the trim_zeros function doesn't recognize zeroes on composite dtype.
You can locate them by begin=(edges!=zeros(1,edges.dtype)).argmax()
and end=len(edges)-(edges!=zeros(1,edges.dtype))[::-1].argmax(). Then goodedges=edges[begin:end] is the interresting data.
Second, the trim_zeros function doesn't free memory:
Returns -------
trimmed : 1-D array or sequence.
The result of trimming the input. The input data type is preserved.
So I think you must do it manually : goodedges=edges[begin:end].copy();del edges.
To expand on my comment, let's try trim_zeros on a simple integer array:
In [252]: arr = np.zeros(10,int)
In [253]: arr[3:8]=np.ones(5)
In [254]: arr
Out[254]: array([0, 0, 0, 1, 1, 1, 1, 1, 0, 0])
In [255]: arr1=np.trim_zeros(arr)
In [256]: arr1
Out[256]: array([1, 1, 1, 1, 1])
Now compare the __array_interface__ dictionaries:
In [257]: arr.__array_interface__
Out[257]:
{'descr': [('', '<i4')],
'shape': (10,),
'version': 3,
'strides': None,
'data': (150760432, False),
'typestr': '<i4'}
In [258]: arr1.__array_interface__
Out[258]:
{'descr': [('', '<i4')],
'shape': (5,),
'version': 3,
'strides': None,
'data': (150760444, False),
'typestr': '<i4'}
shape reflects the change we want. But look at the data pointer, ...432, and ...444. arr1 just points to 12 bytes (3 ints) further along the same buffer.
If I delete arr or reassign it (even arr=arr1), arr1 continues to point to this data buffer. numpy keeps some sort of reference count, and recycles a data buffer only when all references are gone.
The code for trim_zeros is (fetched in ipython with '??')
File: /usr/lib/python3/dist-packages/numpy/lib/function_base.py
def trim_zeros(filt, trim='fb'):
first = 0
trim = trim.upper()
if 'F' in trim:
for i in filt:
if i != 0.: break
else: first = first + 1
last = len(filt)
if 'B' in trim:
for i in filt[::-1]:
if i != 0.: break
else: last = last - 1
return filt[first:last]
The work is in the last line, and clearly returns a slice, a view. Most of the code handles the 2 trim options (F and B). Notice that it uses iteration to find the first and last non-zeros. That should be fine for arrays with just a few extra 0s at beginning or end. But it isn't the 'vectorized' kind of operation that SO questions often seek.
Before this question I didn't even know that trim_zeros existed, but I'm not at all surprised by its code and action.
On a side issue, here's a more compact way of creating your edges array.
In [259]: edges =np.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
In [260]: edges[:2]=[(0,1,2.0),(1,2,2.0)]
To remove all the zero elements you could just use:
edges[edges!=numpy.zeros(1,edges.dtype)]
This is a copy. It does remove 'embedded' zeros as well, but that might not be an issue if the only zeros are those left at the end after filling in the earlier slots.
You may not need this trimming at all if you collect the edges data in a list, and build the array at the end:
edges1 = np.array([(0,1,2.0),(1,2,2.0)], dtype=edges.dtype)