Skip first column

Skip first column - pandas

Quite simple question I hope.Basically I want the same output without the first column.
import pandas as pd
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils','Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year':
[2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
df.loc[df['Team']=='Riders'].values.tolist()
Out [1]:
[['Riders', 1, 2014, 876],
['Riders', 2, 2015, 789],
['Riders', 2, 2016, 694],
['Riders', 2, 2017, 690]]
I want my output to be:
Out [1]:
[[1, 2014, 876],
[2, 2015, 789],
[ 2, 2016, 694],
[2, 2017, 690]]

You can do this,
df.loc[df['Team']=='Riders', ['Rank', 'Year', 'Points'] ].values.tolist()
Or if you want to select the columns without explicitly specifying column names,
columns = df.columns.values.tolist()[1:]
df.loc[df['Team']=='Riders', columns].values.tolist()

Use:
df.loc[df.Team == "Riders", df.columns[1:]].to_numpy().tolist()
to_numpy() is recommended instead of values according to pandas documentation. Both gives you numpy arrays so you can still use tolist().

You can select by positions all columns without first by DataFrame.iloc:
#pandas 0.24+
print (df.iloc[(df['Team']=='Riders').to_numpy(), 1:].to_numpy().tolist())
#oldier pandas versions
#print (df.iloc[(df['Team']=='Riders').values, 1:].values.tolist())
[[1, 2014, 876], [2, 2015, 789], [2, 2016, 694], [2, 2017, 690]]

Related

Finding values from different rows in pandas

I have a dataframe comprising the data and another dataframe, containing a single row carrying indices.
data = {'col_1': [4, 5, 6, 7], 'col_2': [3, 4, 9, 8],'col_3': [5, 5, 6, 9],'col_4': [8, 7, 6, 5]}
df = pd.DataFrame(data)
ind = {'ind_1': [2], 'ind_2': [1],'ind_3': [3],'ind_4': [2]}
ind = pd.DataFrame(ind)
Both have the same number of columns. I want to extract the values of df corresponding to the index stored in ind so that I get a single row at the end.
For this data it should be: [6, 4, 9, 6]. I tried df.loc[ind.loc[0]] but that of course gives me four different rows, not one.
The other idea I have is to zip columns and rows and iterate over them. But I feel there should be a simpler way.

you can go to NumPy domain and index there:
In [14]: df.to_numpy()[ind, np.arange(len(df.columns))]
Out[14]: array([[6, 4, 9, 6]], dtype=int64)
this pairs up 2, 1, 3, 2 from ind and 0, 1, 2, 3 from 0 to number of columns - 1; so we get the values at [2, 0], [1, 1] and so on.
There's also df.lookup but it's being deprecated, so...
In [19]: df.lookup(ind.iloc[0], df.columns)
~\Anaconda3\Scripts\ipython:1: FutureWarning: The 'lookup' method is deprecated and will beremoved in a future version.You can use DataFrame.melt and DataFrame.locas a substitute.
Out[19]: array([6, 4, 9, 6], dtype=int64)

How to convert list of pandas._libs.tslibs.timestamps.Timestamp to datetime.datetime?

I have a list panndas._libs.tslibs.timestamps.Timestamp.
I need to convert this to datetime.datetime of pandas.core.series.Series
to_pydatetime() is working for 1 row but not whole column.
df = pd.DataFrame({"year": [2015, 2016],
"month": [2, 3],
"day": [4, 5],
"hour": [2, 3]})
df=pd.to_datetime(df)
type(df.loc[0])
Out: pandas._libs.tslibs.timestamps.Timestamp
I want to change DF to datetime.datetime of of pandas.core.series.Series
my desired output is as below
df
Out:
0 2015-02-04 02:00:00
1 2016-03-05 03:00:00
df.loc[0]
out: datetime.datetime(2015, 2, 4, 2, 0)
df.loc[1]
out: datetime.datetime(2016, 3, 5, 3, 0)
My question above may look strange as I start from Dataframe and changed to pandas._libs.tslibs.timestamps.Timestamp and trying to go back to Dataframe.
However, I used this code just as example.
What I actually have is dataframe imported from excel file and did computation using pandas.tseries and finally got pandas._libs.tslibs.timestamps.Timestamp.
I need to convert this to datetime.datetime. I couldn't post all my excel files and others.
So I took above code as example.

NumPy: generalize one-hot encoding to k-hot encoding

I'm using this code to one-hot encode values:
idxs = np.array([1, 3, 2])
vals = np.zeros((idxs.size, idxs.max()+1))
vals[np.arange(idxs.size), idxs] = 1
But I would like to generalize it to k-hot encoding (where shape of vals would be same, but each row can contain k ones).
Unfortunatelly, I can't figure out how to index multiple cols from each row. I tried vals[0:2, [[0, 1], [3]] to select first and second column from first row and third column from second row, but it does not work.

It's called advanced-indexing.
to select first and second column from first row and third column from second row
You just need to pass the respective rows and columns in separate iterables (tuple, list):
In [9]: a
Out[9]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [10]: a[[0, 0, 1],[0, 1, 3]]
Out[10]: array([0, 1, 8])

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?

Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Compute unique groups from Pandas group-by results

I'd like to count the unique groups from the result of a Pandas group-by operation. For instance here is an example data frame.
In [98]: df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})
In [99]: df.groupby('A').groups
Out[99]: {1: [0, 3], 2: [1, 4], 3: [2, 5]}
The conceptual groups are {1: [10, 10], 2: [10, 10], 3: [11, 15]} where the index locations in the groups above are substituded with the values from column B, but the first problem I've run into is how to convert those positions (e.g. [0, 3]) into values from the B column.
Given the ability to convert the groups into the value groups from column B I can compute the unique groups by hand, but a secondary question here is if Pandas has a built-in routine for this, which I haven't seen.
Edit updated with target output:
This is the output I would be looking for in the simplest case:
{1: [10, 10], 2: [10, 10], 3: [11, 15]}
And counting the unique groups would produce something equivalent to:
{[10, 10]: 2, [11, 15]: 1}

How about:
>>> df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})
>>> df.groupby("A")["B"].apply(tuple).value_counts()
(10, 10) 2
(11, 15) 1
dtype: int64
or maybe
>>> df.groupby("A")["B"].apply(lambda x: tuple(sorted(x))).value_counts()
(10, 10) 2
(11, 15) 1
dtype: int64
if you don't care about the order within the group.
You can trivially call .to_dict() if you'd like, e.g.
>>> df.groupby("A")["B"].apply(tuple).value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}

maybe:
>>> df.groupby('A')['B'].aggregate(lambda ts: list(ts.values)).to_dict()
{1: [10, 10], 2: [10, 10], 3: [11, 15]}
for counting the groups you need to convert to tuple because lists are not hashable:
>>> ts = df.groupby('A')['B'].aggregate(lambda ts: tuple(ts.values))
>>> ts.value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Skip first column - pandas

You can do this, df.loc[df['Team']=='Riders', ['Rank', 'Year', 'Points'] ].values.tolist() Or if you want to select the columns without explicitly specifying column names, columns = df.columns.values.tolist()[1:] df.loc[df['Team']=='Riders', columns].values.tolist()

Use: df.loc[df.Team == "Riders", df.columns[1:]].to_numpy().tolist() to_numpy() is recommended instead of values according to pandas documentation. Both gives you numpy arrays so you can still use tolist().

Related

Finding values from different rows in pandas

How to convert list of pandas._libs.tslibs.timestamps.Timestamp to datetime.datetime?

NumPy: generalize one-hot encoding to k-hot encoding

Numpy Indexing Behavior

Compute unique groups from Pandas group-by results

Categories

Resources