Display Rows only if group of rows' sum is greater then 0 - sql

I have a table like the one below. I would like to get this data to SSRS (Grouped by LineID and Product and Column as Hour) to show only those rows where HourCount > 0 for every LineID and Product.
LineID Product Hour HourCount
3 A 0 0
3 A 1 0
3 A 2 0
3 A 3 0
3 A 4 0
3 A 5 0
3 B 0 65
3 B 1 56
3 B 2 45
3 B 3 34
3 B 4 43
3 B 5 45
4 A 0 54
4 A 1 34
4 A 2 45
4 A 3 44
4 A 4 55
4 A 5 44
4 B 0 0
4 B 1 0
4 B 2 0
4 B 3 0
4 B 4 0
4 B 5 0
5 A 0 45
5 A 1 77
5 A 2 66
5 A 3 55
5 A 4 0
5 A 5 0
5 B 0 0
5 B 1 0
5 B 2 45
5 B 3 0
5 B 4 0
5 B 5 0
Basically I would like this table to look like this before it's in SSRS:
LineID Product Hour HourCount
3 B 0 65
3 B 1 56
3 B 2 45
3 B 3 34
3 B 4 43
3 B 5 45
4 A 0 54
4 A 1 34
4 A 2 45
4 A 3 44
4 A 4 55
4 A 5 44
5 A 0 45
5 A 1 77
5 A 2 66
5 A 3 55
5 A 4 0
5 A 5 0
5 B 0 0
5 B 1 0
5 B 2 45
5 B 3 0
5 B 4 0
5 B 5 0
So display Product for the line only if any of the Hourd have HourCount higher then 0.
Is there any query that could give me these results or I should play with display settings in SSRS?

Something like this should work:
with NonZero as
(
select *
, GroupZeroCount = sum(HourCount) over (partition by LineID, Product)
from HourTable
)
select LineID
, Product
, [Hour]
, HourCount
from NonZero
where GroupZeroCount > 0
SQL Fiddle with demo.
You could certainly so something similar in SSRS, but it's certainly much easier and intuitive to apply at the T-SQL level.

I think you are looking for
SELECT LineID,Product,Hour,Count(Hour) AS HourCount
FROM abc
GROUP BY LineID,Productm,Hour HAVING Count(Hour) > 0

Related

Dataframe within a Dataframe - to create new column_

For the following dataframe:
import pandas as pd
df=pd.DataFrame({'list_A':[3,3,3,3,3,\
2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,4,4,4,4]})
How can 'list_A' be manipulated to give 'list_B'?
Desired output:
list_A
list_B
0
3
1
1
3
1
2
3
1
3
3
0
4
2
1
5
2
1
6
2
0
7
2
0
8
4
1
9
4
1
10
4
1
11
4
1
12
4
0
13
4
0
14
4
0
15
4
0
16
4
0
As you can see, if List_A has the number 3 - then the first 3 values of List_B are '1' and then the value of List_B changes to '0', until List_A changes value again.
GroupBy.cumcount
df['list_B'] = df['list_A'].gt(df.groupby('list_A').cumcount()).astype(int)
print(df)
Output
list_A list_B
0 3 1
1 3 1
2 3 1
3 3 0
4 3 0
5 2 1
6 2 1
7 2 0
8 2 0
9 2 0
10 2 0
11 2 0
12 4 1
13 4 1
14 4 1
15 4 1
16 4 0
17 4 0
18 4 0
19 4 0
20 4 0
21 4 0
22 4 0
23 4 0
EDIT
blocks = df['list_A'].ne(df['list_A'].shift()).cumsum()
df['list_B'] = df['list_A'].gt(df.groupby(blocks).cumcount()).astype(int)

Python - Sort column ascending - using groupby

The following code:
import pandas as pd
df_original=pd.DataFrame({\
'race_num':[1,1,1,2,2,2,2,3,3],\
'race_position':[2,3,0,1,0,0,2,3,0],\
'percentage_place':[77,55,88,50,34,56,99,12,75]
})
Gives an output of:
race_num
race_position
percentage_place
1
2
77
1
3
55
1
0
88
2
1
50
2
0
34
2
0
56
2
2
99
3
3
12
3
0
75
I need to mainpulate this dataframe to keep the race_num grouped but sort the percentage place in ascending order - and the race_position is to stay aligned with the original percentage_place.
Desired out is:
race_num
race_position
percentage_place
1
0
88
1
2
77
1
3
55
2
2
99
2
0
56
2
1
50
2
0
34
3
0
75
3
3
12
My attempt is:
df_new = df_1.groupby(['race_num','race_position'])\['percentage_place'].nlargest().reset_index()
Thank you in advance.
Look into sort_values
In [137]: df_original.sort_values(['race_num', 'percentage_place'], ascending=[True, False])
Out[137]:
race_num race_position percentage_place
2 1 0 88
0 1 2 77
1 1 3 55
6 2 2 99
5 2 0 56
3 2 1 50
4 2 0 34
8 3 0 75
7 3 3 12

Pandas Groupby and divide the dataset into subgroups based on user input and label numbers to each subgroup

Here is my data:
ID Mnth Amt Flg
B 1 10 0
B 2 12 0
B 3 14 0
B 4 41 0
B 5 134 0
B 6 14 0
B 7 134 0
B 8 134 0
B 9 12 0
B 10 41 0
B 11 4 0
B 12 14 0
B 12 14 0
A 1 34 0
A 2 22 0
A 3 56 0
A 4 129 0
A 5 40 0
A 6 20 0
A 7 58 0
A 8 123 0
If I give 3 as input, my output should be:
ID Mnth Amt Flg Level_Flag
B 1 10 0 0
B 2 12 0 1
B 3 14 0 1
B 4 41 0 1
B 5 134 0 2
B 6 14 0 2
B 7 134 0 2
B 8 134 0 3
B 9 12 0 3
B 10 41 0 3
B 11 4 0 4
B 12 14 0 4
B 12 14 0 4
A 1 34 0 0
A 2 22 0 0
A 3 56 0 1
A 4 129 0 1
A 5 40 0 1
A 6 20 0 2
A 7 58 0 2
A 8 123 0 2
So basically I want to divide the data into subgroups with 3 rows in each subgroup from bottom up and label those subgroups as mentioned in level_flag column. I have IDs like A,C and so on. So I want to do this for each group of ID.Thanks in Advance.
Edit :- I want the same thing to be done after grouping it by ID
First we decide the unique numbers nums by dividing the length of your df by n. Then we repeat those numbers n times. Finally we reverse the array and chop it of at the length of df and reverse it one more time.
def create_flags(d, n):
nums = np.ceil(len(d) / n)
level_flag = np.repeat(np.arange(nums), n)[::-1][:len(d)][::-1]
return level_flag
df['Level_Flag'] = df.groupby('ID')['ID'].transform(lambda x: create_flags(x, 3))
ID Mnth Amt Flg Level_Flag
0 B 1 10 0 0.0
1 B 2 12 0 1.0
2 B 3 14 0 1.0
3 B 4 41 0 1.0
4 B 5 134 0 2.0
5 B 6 14 0 2.0
6 B 7 134 0 2.0
7 B 8 134 0 3.0
8 B 9 12 0 3.0
9 B 10 41 0 3.0
10 B 11 4 0 4.0
11 B 12 14 0 4.0
12 B 12 14 0 4.0
To remove the incomplete rows, use GroupBy.transform:
m = df.groupby(['ID', 'Level_Flag'])['Level_Flag'].transform('count').ge(3)
df = df[m]
ID Mnth Amt Flg Level_Flag
1 B 2 12 0 1.0
2 B 3 14 0 1.0
3 B 4 41 0 1.0
4 B 5 134 0 2.0
5 B 6 14 0 2.0
6 B 7 134 0 2.0
7 B 8 134 0 3.0
8 B 9 12 0 3.0
9 B 10 41 0 3.0
10 B 11 4 0 4.0
11 B 12 14 0 4.0
12 B 12 14 0 4.0

Pandas get order of column value grouped by other column value

I have the following dataframe:
srch_id price
1 30
1 20
1 25
3 15
3 102
3 39
Now I want to create a third column in which I determine the price position grouped by the search id. This is the result I want:
srch_id price price_position
1 30 3
1 20 1
1 25 2
3 15 1
3 102 3
3 39 2
I think I need to use the transform function. However I can't seem to figure out how I should handle the argument I get using .transform():
def k(r):
return min(r)
tmp = train.groupby('srch_id')['price']
train['min'] = tmp.transform(k)
Because r is either a list or an element?
You can use series.rank() with df.groupby():
df['price_position']=df.groupby('srch_id')['price'].rank()
print(df)
srch_id price price_position
0 1 30 3.0
1 1 20 1.0
2 1 25 2.0
3 3 15 1.0
4 3 102 3.0
5 3 39 2.0
is this:
df['price_position'] = df.sort_values('price').groupby('srch_id').price.cumcount() + 1
Out[1907]:
srch_id price price_position
0 1 30 3
1 1 20 1
2 1 25 2
3 3 15 1
4 3 102 3
5 3 39 2

Pandas Group By two columns and based on the value in one of them (categorical) write data into a specific column [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have following dataframe:
df = pd.DataFrame([[1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],['A','B','B','B','C','D','D','E','A','C','C','C','A','B','B','B','B','D','E'], [18,25,47,27,31,55,13,19,73,55,58,14,2,46,33,35,24,60,7]]).T
df.columns = ['Brand_ID','Category','Price']
Brand_ID Category Price
0 1 A 18
1 1 B 25
2 1 B 47
3 1 B 27
4 1 C 31
5 1 D 55
6 1 D 13
7 1 E 19
8 2 A 73
9 2 C 55
10 2 C 58
11 2 C 14
12 3 A 2
13 3 B 46
14 3 B 33
15 3 B 35
16 3 B 24
17 3 D 60
18 3 E 7
What I need to do is to group by Brand_ID and category and count (similar to the first part of this question). However, I need instead to write the output into a different column depending on the category. So my Output should look like follows:
Brand_ID Category_A Category_B Category_C Category_D Category_E
0 1 1 3 1 2 1
1 2 1 0 3 0 0
2 3 1 4 0 1 1
Is there any possibility to do this directly with pandas?
Try:
df.groupby(['Brand_ID','Category'])['Price'].count()\
.unstack(fill_value=0)\
.add_prefix('Category_')\
.reset_index()\
.rename_axis([None], axis=1)
Output
Brand_ID Category_A Category_B Category_C Category_D Category_E
0 1 1 3 1 2 1
1 2 1 0 3 0 0
2 3 1 4 0 1 1
OR
pd.crosstab(df.Brand_ID, df.Category)\
.add_prefix('Category_')\
.reset_index()\
.rename_axis([None], axis=1)
You're describing a pivot_table:
df.pivot_table(index='Brand_ID', columns='Category', aggfunc='size', fill_value=0)
Output:
Category A B C D E
Brand_ID
1 1 3 1 2 1
2 1 0 3 0 0
3 1 4 0 1 1