I have a following sample df:
idd x y
0 1 2 3
1 1 3 4
2 1 5 6
3 2 7 10
4 2 9 8
5 3 11 12
6 3 13 14
7 3 15 16
8 3 17 18
I want to use groupby by "idd" and find min of x and y and store it in a new df along with "idd".
In the above df, I expect to have xmin for idd=1 as 2, ymin for idd=1 as 3; idd = 2, xmin should be 7, ymin should be 8, and so on.
Expecting df:
idd xmin ymin
0 1 2 3
1 2 7 8
2 3 11 12
Code tried:
for group in df.groupby("idd"):
box = [df['x'].max(), df['y'].max()]
but it finds the min of x and y of the whole column and not as per "idd".
Here's a slightly different approach without rename
df = df.groupby('idd').min().add_suffix('min').reset_index()
idd xmin ymin
0 1 2 3
1 2 7 8
2 3 11 12
You can use groupby and then take min for each group.
df.groupby('idd').min().reset_index().rename(columns={'x':'xmin','y':'ymin'})
Out[105]:
idd xmin ymin
0 1 2 3
1 2 7 8
2 3 11 12
I have the following table.
Table_1
ID
12/1
1/1
2/1
X
1
2
3
Y
4
5
6
Z
7
8
9
I want the following table,
ID
Date
Forecast
X
12/1
1
X
1/1
2
X
2/1
3
Y
12/1
4
Y
1/1
5
Y
2/1
6
Z
12/1
7
Z
1/1
8
Z
2/1
9
Is there anyway I can do this in SQL?
Any help will be appreciated!
Thanks in advance.
UNPIVOT or the VALUES approach would be more performant, BUT based on your column names, I suspect you will have variable/expanding columns over time.
Here is an approach that will dynamically unpivot your data without actually using Dynamic SQL or having to specify the columns (only the ones to exclude)
Example
Select A.ID
,B.*
From YourTable A
Cross Apply (
Select Date = [Key]
,Forecast = Value
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ))
Where [Key] not in ('ID')
) B
Results
ID Date Forecast
X 12/1 1
X 1/1 2
X 2/1 3
Y 12/1 4
Y 1/1 5
Y 2/1 6
Z 12/1 7
Z 1/1 8
Z 2/1 9
x = df.groupby(["Customer ID", "Category"]).sum().sort_values(by="VALUE", ascending=False)
I want to group by Customer ID but when I use above code, it duplicates customers...
Here is the result:
Source DF:
Customer ID Category Value
0 A x 5
1 B y 5
2 B z 6
3 C x 7
4 A z 2
5 B x 5
6 A x 1
new: https://ufile.io/dpruz
I think you are looking for something like this:
df_out = df.groupby(['Customer ID','Category']).sum()
df_out.reindex(df_out.sum(level=0).sort_values('Value', ascending=False).index,level=0)
Output:
Value
Customer ID Category
B x 5
y 5
z 6
A x 6
z 2
C x 7
I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like:
T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz
0,1,2,1,3,2,1,4,2,1,5,2,1
1,8,2,3,3,2,9,9,1,3,4,9,1
2,4,5,7,7,7,1,8,3,6,9,2,3
This is really simple to load into a DataFrame, and I've learned a few tricks that are easy (converting marker data to z-scores, or computing velocities, for example).
One thing I'd like to do, though, is convert the "flat" data shown above into a format that has a hierarchical index on the column (marker), so that there would be N columns at level 0 (one for each marker), and each one of those would have 3 columns at level 1 (one each for x, y, and z).
A B C D
x y z x y z x y z x y z
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3
I know how do this by loading up the flat file and then manipulating the Series objects directly, perhaps by using append or just creating a new DataFrame using a manually-created MultiIndex.
As a Pandas learner, it feels like there must be a way to do this with less effort, but it's hard to discover. Is there an easier way?
You basically just need to manipulate the column names, in your case.
Starting with your original DataFrame (and a tiny index manipulation):
from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
0,1,2,1,3,2,1,4,2,1,5,2,1\n\
1,8,2,3,3,2,9,9,1,3,4,9,1\n\
2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)
So that:
>> a
Ax Ay Az Bx By Bz Cx Cy Cz Dx Dy Dz
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3
Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:
a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])
>> a
A B C D
x y z x y z x y z x y z
T
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3
I'm trying to use pandas to select a single result from a group of results, where some column has a minimum value. An example table representing my data frame is:
ID q A B C D
---------------
1 10 1 2 3 4
1 5 5 6 7 8
2 1 9 1 2 3
2 2 8 7 6 5
I would like to group by ID and then select the row that has the smallest q for each group. So, the second row corresponding to ID=1 and the first row corresponding to ID=2 to be selected.
I can only select the lowest values of each column, which is not what I need. Thanks a lot to anybody who can offer some guidance.
This should do what you're asking:
In [10]: df.groupby('ID').apply(lambda x: x.ix[x['q'].idxmin()])
Out[10]:
ID q A B C D
ID
1 1 5 5 6 7 8
2 2 1 9 1 2 3
Apply a function that returns the group row that has the index of the minimum 'q' value.