Extra Row in Dataframe - dataframe

I am attempting to create a dataframe comprised of two vectors.
The two vectors are comprised of SIX elements which were the products of previous steps here they are:
CLsummary<-c(MaxCL, MinCL, MeanCL, MedianCL, RangeCL, SDCL)
PRsummary<-c(MaxPR, MinPR, MeanPR, MedianPR, RangePR, SDPR)
But when I create the data frame, like below, I get SEVEN rows of data:
FHsummary<-data.frame(CLsummary, PRsummary)
Specifically, the fifth row seems to be a duplicate of the MinCL and MinPR data.
What am I doing wrong?
Thank you!

When just replicating your code, making the objects in the vector into strings, the final dataframe is 6 rows as expected. You should take a look at your objects that comprise the vectors. What type are they? How were they made? My guess is that one of the objects in each vector is a vector itself, with two elements.

Related

Slice dataframe according to unique values into many smaller dataframes

I have a large dataframe (14,000 rows). The columns include 'title', 'x' and 'y' as well as other random data.
For a particular title, I've written a code which basically performs an analysis using the x and y values for a subset of this data (but the specifics are unimportant for this).
For this title (which is something like "Part number Y1-17") there are about 80 rows.
At the moment I have only worked out how to get my code to work on 1 subset of titles (i.e. one set of rows with the same title) at a time. For this I've been making a smaller dataframe out of my big one using:
df = pd.read_excel(r"mydata.xlsx")
a = df.loc[df['title'].str.contains('Y1-17')]
But given there are about 180 of these smaller datasets I need to do this analysis on, I don't want to have to do it manually.
My question is, is there a way to make all of the smaller dataframes automatically, by slicing the data by the unique 'title' value? All the help I've found, it seems like you need to specify the 'title' to make a subset. I want to subset all of it and I don't want to have to list all the title names to do it.
I've searched quite a lot and haven't found anything, however I am a beginner so it's very possible I've missed some really basic way of doing this.
I'm not sure if its important information but the modules I'm working with pandas, and numpy
Thanks for any help!
You can use Pandas groupby
For example:
df_dict = {key: title for key, title in df.copy().groupby('title', sort=False)}
Which creates a dictionary of DataFrames each containing all the columns and only the rows pertaining to each unique value of title.

Pandas, turn all the data frame to unique categorical values

I am relatively new to Pandas and to python and I am trying to find out how to turn all content(all fields are strings) of a Pandas Dataframe to categorical one.
All the values from rows and columns have to be treated as a big unique data set before turning them to categorical numbers.
So far I was able to write the following piece of code
for col_name in X.columns:
if(X[col_name].dtype == 'object'):
X[col_name]= X[col_name].astype('category')
X[col_name] = X[col_name].cat.codes
that works on a data frame X of multiple columns. It takes the strings and turns them to unique numbers.
What I am not sure for the code above is that my for loop only works per column and I am not sure if the codes assigned are unique per column or per whole data frame (the latter is the desired action).
Can you please provide advice on how I can turn my code to provide unique numbers considering all the values of the data frame?
I would like to thank you in advance for your help.
Regards
Alex
Use DataFrame.stack with Series.unstack for set MultiIndex Series to unique values:
cols = df.select_dtypes('object').columns
df[cols] = df[cols].stack().astype('category').cat.codes.unstack()

pandas dataframe for matrix of values

I have 3 things:
A time series 1D array of certain length.
A matrix of stellar flux values of equal column length as the time series (as each star in the field was observed according to the time array) but ~3000 rows deep as there are ~3000 observed stars in this field.
An array of ~3000 star ID's to go with the ~3000 time-series flux recordings mentioned above.
I'm trying to turn all of this into a pandas.DataFrame for extracting timeseries features using the module 'tsfresh'. Link here.
Does anyone have an idea on how to do this? It should read somewhat like a table with a row of ID's as headers, a column of time values and ~3000 columns of flux values for the stars.
I've seen examples of it being done on the page I've linked i.e. multiple 'value' columns (in this case they would be flux values). But no indication of how to construct them.
This data frame will then be used for machine learning if that makes any difference.
Many thanks for any help that can be offered!

Meaning of the colormap._lut list in matplotlib.color

If I understood well, a colormap is an object which fixes a map from a set of numbers to colors.This map can be summarized by using the "lut". To see the "lut" associated to a cmap
you have to call cmap._init(), then you can see the cmap._lut What does it mean? Isn't the Lut an attribute of any cmap object? What does precisely do the ._init()?.
The Lut is made up of N+3 rows and 4 columns where N is the number of colours of the cmap.
The first N rows are the RGBA representation of the corresponding colours. What are the last
three rows? What do they represent?
I hope my questions are not too stupid. Thanks!
I found an answer for the last question. The last three rows are the colours for data low and high out-of-range values and for masked values.

Why do we use multiple dimensional arrays?

I have an understanding about how multiple dimensional arrays work and how to use them except for one thing, In what situation would we need to use them and why?
Basically multi dimension arrays are used if you want to put arrays inside an array.
Say you got 10 students and each writes 3 tests. You can create an array like: arr_name[10][3]
So, calling arr_name[0][0] gives you the result of student 1 on lesson 1.
Calling arr_name[5][2] gives you the result of student 6 on test 3.
You can do this with a 30 position array, but the multi dimension is:
1) easier to understand
2) easier to debug.
Here are a couple examples of arrays in familiar situations.
You might imagine a 2 dimensional array is as a grid. So naturally it is useful when you're dealing with graphics. You might get a pixel from the screen by saying
pixel = screen[20][5] // get the pixel at the 20th row, 5th column
That could also be done with a 3 dimensional array to represent 3d space.
An array could act like a spreadsheet. Here the rows are customers, and the columns are name, email, and date of birth.
name = customers[0][0]
email = customers[0][1]
dateofbirth = customers[0][2]
Really there is a more fundamental pattern underlying this. Things have things have things... and so on. And in a sense you're right to wonder whether you need multidimensional arrays, because there are other ways to represent that same pattern. It's just there for convenience. You could alternatively
Have a single dimensional array and do some math to make it act multidimensional. If you indexed pixels one by one left to right top to bottom you would end up with a million or so elements. Divide by the width of the screen to get the row. The remainder is the column.
Use objects. Instead of using a multidimensional array in example 2 you could have a single dimensional array of Customer objects. Each Customer object would have the attributes name, email and dob.
So there's rarely one way to do something. Just choose the most clear way. With arrays you're accessing by number, with objects you're accessing by name.
Such solution comes as intuitive when you are faced with accessing a data element identified by a multidimensional vector. So if "which element" is defined by more than two "dimensions".
Good uses for 2D or Two D arrays might be:
Matrix Math i.e. rotation things in space on a plane and more.
Maps like game maps, top or side views for either actual graphics or descriptive data.
Spread Sheet like storage.
Multi Columns of display table data.
Kinds of Graphics work.
I know there could be much more, so maybe someone else can add to this list in their answers.