Find a specific value in a multi-dimensional array list - arraylist

Let suppose there is an array of cells each containing a specific set of valid characters.
Valid characters are either "Null/No value", "*", "X", "0", "1".
For simplicity sake, lets assume the array is 3 rows and 3 columns
Col1
Col2
Col 3
X
NV
0
*
1
X
1
NV
1
Glist = [[X,,0],[*,1,X],[1,,1]] /*assuming this is the right method to represent the 3X3 from above
How do I use count() function of array to determine how many times a certain value exists in the array elements?
I tried Glist.count(1) and it returned a value of 0
Thanks in advance.

Related

How to change rows in pandas based on an attribute of the other rows

I have a dataframe with columns: A(continuous variable) and B(discrete 1 or 0). The df is initially sorted by A variable.
I need to order the dataframe so for each set of X rows, there are Y rows with value 1 in B column, and (X-Y) rows with 0 (B column) (when possible!). But these sets should have variable A in desceding order. X and Y are input by the user
Example:
X=4, Y=3
Rows 0-11 are ok, since the sets (0-3),(4-7) and (8-11) has 3 rows with 1 in column B and only one row with 0 AND variable A is descending. However, rows 12-15 are not ok, since there are 2 rows with 1(variable B) and two with 0. Row 17 would replace row 15 to make this set valid. There is no problem if the last rows has 0 in variable B, since there isn't any with value 1.
The code should be general enough to run on dataframes with different number of rows.
Any ideas?

Select rows where column value is a combination of numbers and letters

Having a dataset like this:
word
0 TBH46T
1 BBBB
2 5AAH
3 CAAH
4 AAB1
5 5556
Which would be the most efficient way to select the rows where column word is a combination of numbers and letters?
The output would be like this:
word
0 TBH46T
2 5AAH
4 AAB1
A possible solution would be to create a new column using apply and regex in which store if column word has the desired structure. But I'm curious about if this could be achieved in a more straightforward way.
Use Series.str.contains for chain mask for match numeric and for match non numeric with & for bitwise AND:
df = df[df['word'].str.contains('\d') & df['word'].str.contains('\D')]
print (df)
word
0 TBH46T
2 5AAH
4 AAB1

Single cell string to list to multiple rows

I have a pandas data frame,
Currently the list column is a string, I want to delimit this by spaces and replicate rows for each primary key would be associated with each item in the list. Can you please advise me on how I can achieve this?
Edit:
I need to copy down the value column after splitting and stacking the list column
If your data frame is df you can do:
df.List.str.split(' ').apply(pd.Series).stack()
and you will get
Primary Key
0 0 a
1 b
2 c
1 0 d
1 e
2 f
dtype: object
You are splitting the variable List on spaces, turning the resulting list into a series, and then stacking it to turn it into long format, indexed on the primary key, along with a sequence for each item obtained from the split.
My version:
df['List'].str.split().explode()
produces
0 a
0 b
0 c
1 d
1 e
1 f
With regards to the Edit of the question, the following tweak will give you want you need I think:
df['List'] = df['List'].str.split()
df.explode('List')
Here is a solution.
df = df.assign(**{'list':df['list'].str.split()}).explode('list')
df['cc'] = df.groupby(level=0)['list'].cumcount()
df.set_index(['cc'],append=True)

AttributeError: 'int' object has no attribute 'count' while using itertuples() method with dataframes

I am trying to iterate over rows in a Pandas Dataframe using the itertuples()-method, which works quite fine for my case. Now i want to check if a specific value ('x') is in a specific tuple. I used the count() method for that, as i need to use the number of occurences of x later.
The weird part is, for some Tuples that works just fine (i.e. in my case (namedtuple[7].count('x')) + (namedtuple[8].count('x')) ), but for some (i.e. namedtuple[9].count('x')) i get an AttributeError: 'int' object has no attribute 'count'
Would appreciate your help very much!
Apparently, some columns of your DataFrame are of object type (actually a string)
and some of them are of int type (more generally - numbers).
To count occurrences of x in each row, you should:
Apply a function to each row which:
checks whether the type of the current element is str,
if it is, return count('x'),
if not, return 0 (don't attempt to look for x in a number).
So far this function returns a Series, with a number of x in each column
(separately), so to compute the total for the whole row, this Series should
be summed.
Example of working code:
Test DataFrame:
C1 C2 C3
0 axxv bxy 10
1 vx cy 20
2 vv vx 30
Code:
for ind, row in df.iterrows():
print(ind, row.apply(lambda it:
it.count('x') if type(it).__name__ == 'str' else 0).sum())
(in my opinion, iterrows is more convenient here).
The result is:
0 3
1 1
2 1
So as you can see, it is possible to count occurrences of x,
even when some columns are not strings.

Get column index label based on values

I have the following:
C1 C2 C3
0 0 0 1
1 0 0 1
2 0 0 1
And i would like to get the corresponding column index value that has 1's, so the result
should be "C3".
I know how to do this by transposing the dataframe and then getting the index values, but this is not ideal for data in the dataframes i have, and i wonder there might be a more efficient solution?
I will save the result in a list because otherwise there could be more than one column with values ​​equal to 1. You can use DataFrame.loc
if all column values ​​must be 1 then you can use:
df.loc[:,df.eq(1).all()].columns.tolist()
Output:
['C3']
if this isn't necessary then use:
df.loc[:,df.eq(1).any()].columns.tolist()
or as suggested #piRSquared, you can select directly from df.columns:
[*df.columns[df.eq(1).all()]]