df.corr() does not work. it just considers one feature column - pandas

df.corr() resultI wonder if someone could help me to solve my problem. I have a data frame called: df_normalized that is a normal data frame with 17 columns. I want a correlation matrix based on spearman method to find if the feature columns are correlated with each other?
However, df_normalized.corr(method='spearman') just considers sex column as you can see in the uploaded pictures of my codes.[the data frame][1]

It would be nice if you could post the full code and at least part of the dataframe so it's easier to see what's wrong. It looks like there's only the sex column in your dataframe, but it's hard to tell.
You can find a nice example of how to do what you want here: https://datatofish.com/correlation-matrix-pandas/

Related

PALANTIR-FOUNDRY: How can I add a description for a dataframe in a transform?

I am producing a dataframe through a transform. In that transform I am able to add Column description the usual way:
out_all.write_dataframe(df, column_descriptions=mycols_dictionary)
My question is, can I add a dataframe description in a similar way?
Thank u in advance.
There's no way to do it from transforms at the moment. Unlike column descriptions, dataset description is not branchable, meaning the description remains the same across branches (just like dataset name).

Pandas - Spitting data (lists) from multiple columns into new row

I have a Dataframe that has lots stored in each column. I am trying to have them unpacked such that each combination is made into a new row. Given below is how my data looks like.
cust_id,prod_name,type,value
101,['car','bike','computer'],['t1','t2','t3'],['434','533','55']
102,['car','bike'],['t1','t3'],['533','55']
Trying to convert the above dataframe to the below format:
cust_id,prod_name,type,value
101,car,t1,434
101,bike,t2,533
101,computer,t3,55
102,car,t1,533
102,bike,t3,55
There is a great answer for this question here: https://stackoverflow.com/a/53218939/6348485
I would mark this question as a duplicate but I don't think I have enough reputation to do it.
The above is a great answer offering almost 10 different methods.

SSIS Fuzzy Grouping Always return the same result with different similarity thrshold

Can anyone tell me why my similarity is always 1.
My goal is AAB and AAC can be set as the same group for example.
Thanks
After I tried different source data, I got the goal what I need.
I think for sample data, it should be better to use some real example in the world.
Instead of AAA and AAC, maybe use Name column like Sara vs Saraa then ssis would say they are in the same group. However, i found for Don vs Done, they won't. So....it may not good idea to filter the records that has typo with different letter?
*** try to create more than one column to be you comparison column

Pandas conditions on count()

Hi hoping this is not a silly question.
I have a dataframe from which I am plotting a chart based on how many times something appears with the following code.
df.groupby('name').name.count().plot.bar()
plt.xlabel('Name')
plt.ylabel('Number')
plt.title('Number of times name appears')
Is there a way to get it to only plot those names that appear a certain amount of times? I am guessing I need some kind of function but not really sure where to start.
By using value_counts
df.name.value_counts().plot(kind='bar')
Edit :
df.group1.value_counts().compress(lambda s: s>=8).plot(kind='bar')

How to get multi row data of one column to one row of one Column

I need to get data in multiple row of one column.
For example data from that format
ID Interest
Sports
Cooking
Movie
Reading
to that format
ID Interest
Sports,Cooking
Movie,Reading
I wonder that we can do that in MS Access sql. If anybody knows that, please help me on that.
Take a look at Allen Browne's approach: Concatenate values from related records
As for the normalization argument, I'm not suggesting you store concatenated values. But if you want to join them together for display purposes (like a report or form), I don't think you're violating the rules of normalization.
This is called de-normalizing data. It may be acceptable for final reporting. Apparently some experts believe it's good for something, as seen here.
(Mind you, kevchadder's question is right on.)
Have you looked into the SQL Pivot operation?
Take a look at this link:
http://technet.microsoft.com/en-us/library/ms177410.aspx
Just noticed you're using access. Take a look at this article:
http://www.blueclaw-db.com/accessquerysql/pivot_query.htm
This is nothing you should do in SQL and it's most likely not possible at all.
Merging the rows in your application code shouldn't be too hard.