I have a dataframe in the form of:
no ans freq
1 Yes 23
No 89
2 Yes 45
No 76
3 Yes 99
I would like to drop ones that only have Yes or only NO as the second index (no and ans are indices). This would give:
no ans freq
1 Yes 23
No 89
2 Yes 45
No 76
You could groupby "no" and transform to get the group size. If 2 keep, else drop:
df[df.groupby(level='no')['freq'].transform('size').eq(2)]
input:
freq
no ans
1 Yes 23
No 89
2 Yes 45
No 76
3 Yes 99
output:
freq
no ans
1 Yes 23
No 89
2 Yes 45
No 76
Related
i have one mock table table_a as below:
id a b c d
1 11 22 33 44
2 22 33 44 55
3 33 44 55 66
4 44 55 66 77
5 55 66 77 88
6 66 77 88 99
7 77 88 99 100
8 88 99 11 22
suppose the known info is c and d, if i want to get entry id 2 & 6, i can run
' select * from table_a where (c, d) in ((44,55), (88,99))'.
Here is my question. If this table has 1 million rows , and i want to get 1 thousand rows out , just by knowing their c and d values, is there any better way to do it? My concern to use above script to do it is performance. Thanks.
If you have an index on (c, d), then Oracle should use the index for the in query:
create index idx_table_a_c_d on table_a(c, d);
I want to replace last 2 values of one of the column with zero. I understand for NaN values, I am able to use .fillna(0), but I would like to replace row 6 value of the last column as well.
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 1
7 34 Dimi 65 NaN
df.drop(df.tail(2).index,inplace=True)
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 0
7 34 Dimi 65 0
Before pandas 0.20.0 (long time) it was job for ix, but now it is deprecated. So you can use:
DataFrame.iloc for get last rows and also Index.get_loc for positions of column d_id_max:
df.iloc[-2:, df.columns.get_loc('d_id_max')] = 0
print (df)
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
Or DataFrame.loc with indexing index values:
df.loc[df.index[-2:], 'd_id_max'] = 0
Try .iloc and get_loc
df.iloc[[-1,-2], df.columns.get_loc('d_id_max')] = 0
Out[232]:
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
You can use:
df['d_id_max'].iloc[-2:] = 0
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
I have a bit of a weird pandas question.
I have a master Dataframe:
a b c
0 22 44 55
1 22 45 22
2 44 23 56
3 45 22 33
I then have a dataframe in a different dimension which has some over lapping index's and column names
index col_name new_value
0 a 111
3 b 234
I'm trying to then say if you find a match on index and col_name in the master dataframe, then replace the value.
So the output would be
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33
I've found "Combine_first" but this doesn't work unless I pivot the second dataframe (which I can't do in this scenario)
This is update problem
df.update(updated.pivot(*updated.columns))
df
Out[479]:
a b c
0 111.0 44.0 55
1 22.0 45.0 22
2 44.0 23.0 56
3 45.0 234.0 33
Or
df.values[updated['index'].values,df.columns.get_indexer(updated.col_name)]=updated.new_value.values
df
Out[495]:
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33
I have a SQL Server table with a column price looking like this:
10
96
64
38
32
103
74
32
67
103
55
28
30
110
79
91
16
71
36
106
89
87
59
41
56
89
68
32
80
47
45
77
64
93
17
88
13
19
83
12
76
99
104
65
83
95
Now my aim is to create a new column giving a category from 1 to 10 to each of those values.
For instance the max value in my column is 110 the min is 10. Max-min = 100. Then if I want to have 10 categories I do 100/10= 10. Therefore here are the ranges:
10-20 1
21-30 2
31-40 3
41-50 4
51-60 5
61-70 6
71-80 7
81-90 8
91-100 9
101-110 10
Desired output:
my new column called cat should look like this:
price cat
-----------------
10 1
96 9
64 6
38 3
32 3
103 10
74 7
32 3
67 6
103 10
55 5
28 2
30 3
110 10
79 7
91 9
16 1
71 7
36 3
106 10
89 8
87 8
59 5
41 4
56 5
89 8
68 6
32 3
80 7
47 4
45 4
77 7
64 6
93 9
17 1
88 8
13 1
19 1
83 8
12 1
76 7
99 9
104 10
65 6
83 8
95 9
Is there a way to perform this with T-SQL? Sorry if this question is maybe too easy. I searched long time on the web. So either the problem is not as simple as I imagine. Either I entered the wrong keywords.
Yes, almost exactly as you describe the calculation:
select price,
1 + (price - min_price) * 10 / (max_price - min_price + 1) as decile
from (select price,
min(price) over () as min_price,
max(price) over () as max_price
from t
) t;
The 1 + is because you want the values from 1 to 10, rather than 0 to 9.
Yes - a case statement can do that.
select
price
,case
when price between 10 and 20 then 1
when price between 21 and 30 then 2
when price between 31 and 40 then 3
when price between 41 and 50 then 4
when price between 51 and 60 then 5
when price between 61 and 70 then 6
when price between 71 and 80 then 7
when price between 81 and 90 then 8
when price between 91 and 100 then 9
when price between 101 and 110 then 10
else null
end as cat
from [<enter your table name here>]
I have a table SUB_Inst with columns id, low and high. How would I query the low and high numbers returning a new column with a record for each number from low to high?
Current table SUB_Inst
id low High
1 55 63
2 232 234
3 4 7
etc.
Desired Results
id low High Num_list
1 55 63 55
1 55 63 56
1 55 63 57
1 55 63 58
1 55 63 59
1 55 63 60
1 55 63 61
1 55 63 62
1 55 63 63
2 232 234 232
2 232 234 233
2 232 234 234
3 4 7 4
3 4 7 5
3 4 7 6
3 4 7 7
etc.
I tried something like this:
SELECT Low, HIGH,
(SELECT CAST(number as varchar)+','
FROM NUMBERS
WHERE number >= Low and number <= High
FOR XML PATH(''))
FROM SUB_Inst
but it returned all the numbers in one field like this which won't work:
Low High Num_List
24 27 24,25,26,27,
34 36 34,35,36,
10 17 10,11,12,13,14,15,16,17,
34 36 34,35,36,
65 67 65,66,67,
502 504 502,503,504,
56 59 56,57,58,59,
Thank you.
I think you want this :
SELECT id,low,high,number as Num_List
FROM SUB_Inst , NUMBERS
where low<=number and high>=number