How to concatenate strings in an array of array with Ramda.js - ramda.js

Given a 2d matrix of strings, I would like to concatenate the elements in the sublists based on their position.
e.g. Given this data structure as input:
[['x','y' ,'z'],['A', 'B', 'C'], ['a', 'b', 'c']]
The algorithm would produce the output:
['xAa', 'yBb', 'zCc']
I tried R.transpose, but it gets me [['x','A','a'], ['y', 'B', 'b'], ['z', 'C', 'c']], however I couldn't find a way to concatenate the nested strings.
I also tried R.zipWith, but it doesn't look ideal since R.zipWith takes only two arguments so that I need to apply it twice to get the output.

You weren't that far off I think.
With transpose you managed to get this:
[['x','A','a'], ['y', 'B', 'b'], ['z', 'C', 'c']]
Then you just need to map over this array of arrays and join each of them.
e.g.
const {compose, map, join, transpose } = R
const concat_arr = compose(map(join("")), transpose);
const result = concat_arr([['x','y' ,'z'],['A', 'B', 'C'], ['a', 'b', 'c']]);
console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.27.1/ramda.min.js"></script>
Where compose, map, join and transpose are all Ramda functions.
The initial input goes to transpose
The result of transpose(arr) is given to map(join("")) which applies join("") to all sub arrays

Related

split content of a column pandas

I have the following Pandas Dataframe
Which can also be generated using this list of dictionaries:
list_of_dictionaries = [
{'Project': 'A', 'Hours': 2, 'people_ids': [16986725, 17612732]},
{'Project': 'B', 'Hours': 2, 'people_ids': [17254707, 17567393, 17571668, 17613773]},
{'Project': 'C', 'Hours': 3, 'people_ids': [17097009, 17530240, 17530242, 17543865, 17584457, 17595079]},
{'Project': 'D', 'Hours': 2, 'people_ids': [17097009, 17584457, 17702185]}]
I have implemented kind of what I need, but adding columns vertically:
df['people_id1']=[x[0] for x in df['people_ids'].tolist()]
df['people_id2']=[x[1] for x in df['people_ids'].tolist()]
And then I get a different column of every single people_id, just until the second element, because when I add the extraction 3rd element on a third column, it crashes because , there is no 3rd element to extract from the first row.
Even though, what I am trying to do is to extract every people_id from people_ids column, and then each one of those will have their associated value from the Project and Hours columns, so I get a dataset like this one:
Any idea on how could I get this output?
I think what you are looking for is explode on 'people_ids' column.
df = df.explode('people_ids', ignore_index=True)

Changing the value of a Numpy Array based on a probability and the value itself

I have a 2d Numpy Array:
a = np.reshape(np.random.choice(['b','c'],4), (2,2))
I want to go through each element and, with probability p=0.2 change the element. If the element is 'b' I want to change it to 'c' and vice versa.
I've tried all sorts (looping through with enumerate, where statements) but I can't seen to figure it out.
Any help would be appreciated.
You could generate a random mask with the wanted probability and use it to swap the values on a subset of the array:
# select 20% of cells
mask = np.random.choice([True, False], a.shape, p=[0.2, 0.8])
# swap the values for those
a[mask] = np.where(a=='b', 'c', 'b')[mask]
example output:
array([['b', 'c'],
['c', 'c']], dtype='<U1')

Removing selected features from dataset

I am following this program: https://scikit-learn.org/dev/auto_examples/inspection/plot_permutation_importance_multicollinear.html
since I have a problem with highly correlated features in my model (different from that one shown in the example). In this step
selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]
I can get information on the features that I will need to remove from my classifier. They are given as numbers ([0, 3, 5, 6, 8, 9, 10, 17]). How can I get names of these features?
Ok, there are two different elements to this problem I think.
First, you need to get a list of the column names. In the example code you linked, it looks like the list of feature names is stored like this:
data.feature_names
Once you have the feature names, you'd need a way to loop through them and grab only the ones you want. Something like this should work:
columns = ['a', 'b', 'c', 'd']
keep_index = [0, 3]
new_columns = [columns[i] for i in keep_index]
new_columns
['a', 'b']

Why I can't merge the columns together

My goal is to transform the array to DataFrame, and the error occurred only at the columns=...
housing_extra = pd.DataFrame(housing_extra_attribs,
index=housing_num.index,
columns=[housing.columns,'rooms_per_household', 'population_per_household', 'bedrooms_per_room'])
Consequently, it returns
AssertionError: Number of manager items must equal union of block items
# manager items: 4, # tot_items: 12
It said I only do input 4 columns, but the housing.columns itself has 9 columns
here, when I run housing.columns ;
Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
'total_bedrooms', 'population', 'households', 'median_income',
'ocean_proximity'],
dtype='object')
So, My question is how can I merge the existed column which is housing.columns with the 3 new columns; ['rooms_per_household', 'population_per_household', 'bedrooms_per_room'] together.
You can use Index.union to add a list of columns to existing dataframe columns:
columns= housing.columns.union(
['rooms_per_household', 'population_per_household', 'bedrooms_per_room'],
sort=False)
Or convert to list and then add the remaining columns as list:
columns = (housing.columns.tolist() +
['rooms_per_household', 'population_per_household', 'bedrooms_per_room'])
Then:
housing_extra = pd.DataFrame(housing_extra_attribs,
index=housing_num.index,
columns=columns)
Some example:
Assume this df:
df = pd.util.testing.makeDataFrame()
print(df.columns)
#Index(['A', 'B', 'C', 'D'], dtype='object')
When you pass this into a list:
[df.columns,'E','F','G']
you get:
[Index(['userId', 'column_1', 'column_2', 'column_3'], dtype='object'),'E','F','G']
v/s when you use union:
df.columns.union(['E','F','G'],sort=False)
You get:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

Splitting Dataframe with hierarchical index [duplicate]

This question already has answers here:
Splitting dataframe into multiple dataframes
(13 answers)
Closed 3 years ago.
I have a large dataframe with hierarchical indexing (a simplistic/ format example provided in the code below). I would like to setup a loop/automated way of splitting the dataframe into subsets per unique index value, i.e. dfa, dfb, dfc etc. in the coded example below and store in a list.
I have tried the following but unfortunately to no success. Any help appreciated!
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b',
'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 2, 1, 2, 2, 3]])
split = []
for value in data.index.unique():
split.append(data[data.index == value])
I am not exactly sure if this is what you are looking for but have you checked groupby pandas function? The crucial part is that you can apply it across MultiIndex specifying which level of indexing (or what subset of levels) to group by. e.g.
split = {}
for value, split_group in data.groupby(level=0):
split[value] = split_group
print(split)
as #jezrael points out a simpler way to do it is:
dict(tuple(df.groupby(level=0)))