My dataframe:
How to sort the values like the final sorts? I don't know how to finish it by pandas.
What you are really looking to do is concatenate columns A and B to get your number. An easy way to do this is to convert it to a string, add them together, and then convert it back to an integer.
#creating dataframe
data = dict(
A=[1, 2, 3, 4, 5, 6], B=[6, 5, 4, 2, 1, 3], values=["a", "b", "c", "d", "e", "f"])
so_data = pd.DataFrame(data)
so_data["final values"] = (so_data["A"].astype(str) + so_data["B"].astype(str)).astype(int)
I am now realizing that you also have the "values" col not sorted normally either. I am not sure how that is sorted, it seems like there is some missing information to me.
Related
I want to manipulate a data set in order to make it suitable for ANOVA testing. The current way the df is structured is like df1, with many data points, of several types and separated by contextual categories. As I understand it (which may be wrong), I need to change the structure of the df so that it more resembles df2. I'm sure it's something to do with melt and sort, but I'm not sure how to get all the way there. What's the way/is there a better way to do ANOVA testing on this kind of data?
The real df I'm using has hundreds of data points, and many more types and categories, so it has to be a solution that can be applied realistically to more than 6 values.
df1 = pd.DataFrame({'length': [1, 2, 3, 4, 5, 6],
'width': [1, 2, 3, 4, 5, 6],
'type': ['A', 'B', 'C', 'A', 'B', 'C'],
'type2': ['x', 'y', 'x', 'y', 'y', 'x']})
df2 = pd.DataFrame({'A(x) length': [*length values that are types A,X*],
'B(x) length': [*length values that are types B,X*],
'C(x) length': [*length values that are types C,X*]})
**edited df2 to more accurately reflect what I'm asking. Maybe df restructuring isn't the answer - How would I write the anova prompt to apply the test to df1?
fvalue, pvalue =f_oneway(df2[*Axlength*], df2[*Bxlength*], df2[*Cxlength*])
The exact expected output remains unclear, but you might want:
df2 = df.melt(['type', 'type2'])
group = df2['type']+'('+df2['type2']+') '+df2['variable']
df2 = df2.groupby(group)['value'].agg(list)
Output:
A(x) length [1]
A(x) width [1]
A(y) length [4]
A(y) width [4]
B(y) length [2, 5]
B(y) width [2, 5]
C(x) length [3, 6]
C(x) width [3, 6]
Name: value, dtype: object
My last question about filter df by value list had a nice solution:
How to filter df by value list with Polars?
But now I have inverse task.
I have a list with some int values: black_list = [45, 87, 555]
And I have df with some values in column cid1.
df = pl.DataFrame(
{
"cid1": [45, 99, 177],
"cid2": [4, 5, 6],
"cid3": [7, 8, 9],
}
)
How I can filter df by my black_list to result df contains only rows without blacklisted values in the "cid1" column?
I can't filter by some white_list according to the conditions of my task.
The code .filter((pl.col("cid1").is_not(black_list)) not suitable. I tried it but it get me an error TypeError: Expr.is_not() takes 1 positional argument but 2 were givenand I don't catch another way.
You can just add ~ to get reversed Series of bool values
df.filter(~col("cid1").is_in(black_list))
or you can use .is_not() to reverse bool values
df.filter(col("cid1").is_in(black_list).is_not())
Using Pandas, I'd like to "groupby" and calculate the mean values for each group of my Dataframe. I do it like this:
dict = {
"group": ["A", "B", "C", "A", "A", "B", "B", "C", "A"],
"value": [5, 6, 8, 7, 3, 9, 4, 6, 5]
}
import pandas as pd
df = pd.DataFrame(dict)
print(df)
g = df.groupby([df['group']]).mean()
print(g)
Which gives me:
value
group
A 5.000000
B 6.333333
C 7.000000
However, I'd like to exclude groups which have, let's say, less than 3 entries (so that the mean has somewhat of a value). In this case, it would exclude group "C" from the results. How can I implement this?
Filter the group based on the length and then take the mean.
df = df.groupby('group').filter(lambda x : len(x) > 5).mean()
#if you want the mean group-wise after filtering the required groups
result = df.groupby('group').filter(lambda x : len(x) >= 3).groupby('group').mean().reset_index()
Output:
group value
0 A 5.000000
1 B 6.333333
I'm creating a function that accepts 3 inputs: a dataframe, a column and a list of columns.
The function should apply a short calculation to the single column, and a different short calculation to the list of other columns. It should return a dataframe containing just the amended columns (and their amended rows) from the original dataframe.
import numpy as np
df = pd.DataFrame([[1, 2, 3, 4], [1, 3, 5, 6], [4, 6, 7, 8], [5, 4, 3, 6], columns=['A', 'B', 'C', 'D'])
def pre_process(dataframe, y_col_name, x_col_names):
return = new_dataframe
The calculation to be applied to y_col_name's rows is each value of y_col_name divided by the mean of y_col_name.
The calculation to be applied to each of the list of columns in x_col_name is each value of each column, divided by the column's standard deviation.
I would like some help to write the function. I think I need to use an "apply" or a "lambda" function but I'm unsure.
This is what calling the command would look like:
pre_process_data = preprocess(df,'A', ['B','D'])
Thanks
def pre_process(dataframe, y_col_name, x_col_names):
new_dataframe = dataframe.copy()
new_dataframe[y_col_name] = new_dataframe[y_col_name]/new_dataframe[y_col_name].mean()
new_dataframe[x_col_names] = new_dataframe[x_col_names]/new_dataframe[x_col_names].std()
return new_dataframe
Is this what you mean?
I want to create an array in Rails that contains every value of two columns but each just one time. So, for example, there is in column "A" {1,5,7,1,7} and in column "B" {3,2,3,1,4}.
When I just wanted an array with all elements of "A", I would write:
Model.uniq.pluck(:A)
And I would get {1,5,7}.
Is there an option in Rails to make the same thing with two columns, so just getting all values one time that are contained in two columns? (Here it would be {1,5,7,3,2,4})
Thanks for help!
Yup, pass multiple column names to pluck:
Model.pluck(:A, :B)
#=> [[1, 3], [5, 2], [7, 3], [1, 1], [7, 4]]
But of course you want the values together and uniqued so:
Model.pluck(:A, :B).flatten.uniq
#=> [1, 3, 5, 2, 7, 4]
Doing Model.uniq.pluck(:A, :B).flatten won’t work since it will just get distinct rows (i.e. combinations of A & B), so you’d still have to uniq again after flattening.
records = []
Model.all.map {|e| records << [e.A, e.B] }
uniq_records = records.flatten.uniq
Hope this would help you.
Thanks