How to replace last n values of a row with zero - pandas

I want to replace last 2 values of one of the column with zero. I understand for NaN values, I am able to use .fillna(0), but I would like to replace row 6 value of the last column as well.
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 1
7 34 Dimi 65 NaN
df.drop(df.tail(2).index,inplace=True)
Weight Name Age d_id_max
0 45 Sam 14 2
1 88 Andrea 25 1
2 56 Alex 55 1
3 15 Robin 8 3
4 71 Kia 21 3
5 44 Sia 43 2
6 54 Ryan 45 0
7 34 Dimi 65 0

Before pandas 0.20.0 (long time) it was job for ix, but now it is deprecated. So you can use:
DataFrame.iloc for get last rows and also Index.get_loc for positions of column d_id_max:
df.iloc[-2:, df.columns.get_loc('d_id_max')] = 0
print (df)
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0
Or DataFrame.loc with indexing index values:
df.loc[df.index[-2:], 'd_id_max'] = 0

Try .iloc and get_loc
df.iloc[[-1,-2], df.columns.get_loc('d_id_max')] = 0
Out[232]:
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0

You can use:
df['d_id_max'].iloc[-2:] = 0
Weight Name Age d_id_max
0 45 Sam 14 2.0
1 88 Andrea 25 1.0
2 56 Alex 55 1.0
3 15 Robin 8 3.0
4 71 Kia 21 3.0
5 44 Sia 43 2.0
6 54 Ryan 45 0.0
7 34 Dimi 65 0.0

Related

How to interchange rows and columns in panda dataframe

i m reading a csv file using pandas library. I want to interchange rows and columns but main issue is in Status column ..there is repetition of values after every three rows in this column....so transpose is making all the row values to columns...but in place i just want only three column...i.e. Confirmed, Recovered, Deceased for every date.. please find the attachment where i have shown sample input as well as sample output.
enter image description here
It's a case of using stack() and unstack()
import random
s = 10
d = pd.date_range("01-Jan-2021", periods=s)
cols = ["TT","AN","AP"]
df = pd.DataFrame([{**{"Date":dd, "Status":st}, **{c:random.randint(1,50) for c in cols}}
for dd in d
for st in ["Confirmed","Recovered","Deceased"]])
df.set_index(["Date","Status"]).rename_axis(columns="State").stack().unstack(1)
before
Date Status TT AN AP
0 2021-01-01 Confirmed 5 44 17
1 2021-01-01 Recovered 44 5 48
2 2021-01-01 Deceased 27 3 24
3 2021-01-02 Confirmed 33 14 38
4 2021-01-02 Recovered 21 15 6
5 2021-01-02 Deceased 15 37 8
6 2021-01-03 Confirmed 15 20 36
7 2021-01-03 Recovered 18 19 44
8 2021-01-03 Deceased 37 22 1
9 2021-01-04 Confirmed 16 35 37
10 2021-01-04 Recovered 30 45 49
11 2021-01-04 Deceased 35 7 18
after
Status Confirmed Deceased Recovered
Date State
2021-01-01 TT 5 27 44
AN 44 3 5
AP 17 24 48
2021-01-02 TT 33 15 21
AN 14 37 15
AP 38 8 6
2021-01-03 TT 15 37 18
AN 20 22 19
AP 36 1 44
2021-01-04 TT 16 35 30
AN 35 7 45
AP 37 18 49

BigQuery to count if two record values is greater than or equal the values in their columns and find percent overall

Let say I have a table of millions of records resulting from a simulation, below sample
TO Sim DUR Cost
1 1 20 145
1 2 24 120
1 3 27 176
1 4 30 170
1 5 23 173
1 6 26 148
1 7 21 175
1 8 22 171
1 9 23 169
1 10 23 178
2 1 23 172
2 2 29 152
2 3 25 162
2 4 20 179
2 5 26 154
2 6 27 137
2 7 27 131
2 8 28 148
2 9 25 156
2 10 22 169
how to do the calculation in BigQuery to find the percent count of rows that are satisfying two conditions. (i can do a UDF but I would like it to be all in SQL statements)
The excel equivalent to the new calculated column would be =countifs($C$2:$C$21,">="&C2,$D$2:$D$21,">="&D2,$A$2:$A$21,A2) / countif($A$2:$A$21,A2)
the results would look like
TO Sim DUR Cost f0
1 1 20 145 0.90
1 2 24 120 0.40
1 3 27 176 0.10
1 4 30 170 0.10
1 5 23 173 0.30
1 6 26 148 0.30
1 7 21 175 0.30
1 8 22 171 0.40
1 9 23 169 0.50
1 10 23 178 0.10
2 1 23 172 0.10
2 2 29 152 0.10
2 3 25 162 0.10
2 4 20 179 0.10
2 5 26 154 0.10
2 6 27 137 0.30
2 7 27 131 0.40
2 8 28 148 0.20
2 9 25 156 0.20
2 10 22 169 0.20
Below is for BigQuery Standard SQL
#standardSQL
SELECT ANY_VALUE(a).*, COUNTIF(b.dur >= a.dur AND b.cost >= a.cost) / COUNT(1) calc
FROM `project.dataset.table` a
JOIN `project.dataset.table` b
USING (to_)
GROUP BY FORMAT('%t', a)
-- ORDER BY to_, sim
if to apply to sample data from your question - result is
Row to_ sim dur cost calc
1 1 1 20 145 0.9
2 1 2 24 120 0.4
3 1 3 27 176 0.1
4 1 4 30 170 0.1
5 1 5 23 173 0.3
6 1 6 26 148 0.3
7 1 7 21 175 0.3
8 1 8 22 171 0.4
9 1 9 23 169 0.5
10 1 10 23 178 0.1
11 2 1 23 172 0.1
12 2 2 29 152 0.1
13 2 3 25 162 0.1
14 2 4 20 179 0.1
15 2 5 26 154 0.1
16 2 6 27 137 0.3
17 2 7 27 131 0.4
18 2 8 28 148 0.2
19 2 9 25 156 0.2
20 2 10 22 169 0.2
Note: I am using field name to_ instead of to which is keyword and not allowed to be used as column name

Pandas Dataframe Merging

I have a bit of a weird pandas question.
I have a master Dataframe:
a b c
0 22 44 55
1 22 45 22
2 44 23 56
3 45 22 33
I then have a dataframe in a different dimension which has some over lapping index's and column names
index col_name new_value
0 a 111
3 b 234
I'm trying to then say if you find a match on index and col_name in the master dataframe, then replace the value.
So the output would be
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33
I've found "Combine_first" but this doesn't work unless I pivot the second dataframe (which I can't do in this scenario)
This is update problem
df.update(updated.pivot(*updated.columns))
df
Out[479]:
a b c
0 111.0 44.0 55
1 22.0 45.0 22
2 44.0 23.0 56
3 45.0 234.0 33
Or
df.values[updated['index'].values,df.columns.get_indexer(updated.col_name)]=updated.new_value.values
df
Out[495]:
a b c
0 111 44 55
1 22 45 22
2 44 23 56
3 45 234 33

SQL Server : create new column category price according to price column

I have a SQL Server table with a column price looking like this:
10
96
64
38
32
103
74
32
67
103
55
28
30
110
79
91
16
71
36
106
89
87
59
41
56
89
68
32
80
47
45
77
64
93
17
88
13
19
83
12
76
99
104
65
83
95
Now my aim is to create a new column giving a category from 1 to 10 to each of those values.
For instance the max value in my column is 110 the min is 10. Max-min = 100. Then if I want to have 10 categories I do 100/10= 10. Therefore here are the ranges:
10-20 1
21-30 2
31-40 3
41-50 4
51-60 5
61-70 6
71-80 7
81-90 8
91-100 9
101-110 10
Desired output:
my new column called cat should look like this:
price cat
-----------------
10 1
96 9
64 6
38 3
32 3
103 10
74 7
32 3
67 6
103 10
55 5
28 2
30 3
110 10
79 7
91 9
16 1
71 7
36 3
106 10
89 8
87 8
59 5
41 4
56 5
89 8
68 6
32 3
80 7
47 4
45 4
77 7
64 6
93 9
17 1
88 8
13 1
19 1
83 8
12 1
76 7
99 9
104 10
65 6
83 8
95 9
Is there a way to perform this with T-SQL? Sorry if this question is maybe too easy. I searched long time on the web. So either the problem is not as simple as I imagine. Either I entered the wrong keywords.
Yes, almost exactly as you describe the calculation:
select price,
1 + (price - min_price) * 10 / (max_price - min_price + 1) as decile
from (select price,
min(price) over () as min_price,
max(price) over () as max_price
from t
) t;
The 1 + is because you want the values from 1 to 10, rather than 0 to 9.
Yes - a case statement can do that.
select
price
,case
when price between 10 and 20 then 1
when price between 21 and 30 then 2
when price between 31 and 40 then 3
when price between 41 and 50 then 4
when price between 51 and 60 then 5
when price between 61 and 70 then 6
when price between 71 and 80 then 7
when price between 81 and 90 then 8
when price between 91 and 100 then 9
when price between 101 and 110 then 10
else null
end as cat
from [<enter your table name here>]

how to eliminate a unique record using SQL?

I have a table with 3 columns. cid is the user, when is a timestamp of some transaction, and the 3rd column is me fumbling with how to achieve my objective.
In DB2, using this query:
SELECT cid, when, ROW_NUMBER() OVER (PARTITION BY cid ORDER BY when ASC) AS cid_when_rank
FROM (SELECT DISTINCT cid, when FROM yrb_purchase ORDER BY cid) AS temp
I get this table:
CID WHEN CID_WHEN_RANK
1 1999-04-20-12.12.00.000000 1
1 2001-12-01-11.59.00.000000 2
2 1998-08-08-17.33.00.000000 1
2 1999-02-13-15.13.00.000000 2
2 1999-04-16-11.46.00.000000 3
2 2001-02-23-12.37.00.000000 4
2 2001-04-24-17.02.00.000000 5
2 2001-10-21-11.05.00.000000 6
2 2001-12-01-15.39.00.000000 7
3 1998-01-27-09.19.00.000000 1
3 2001-10-06-11.12.00.000000 2
4 2000-06-13-09.45.00.000000 1
4 2001-06-30-13.58.00.000000 2
4 2001-08-11-17.40.00.000000 3
5 2001-07-17-16.27.00.000000 1
6 2000-05-18-11.43.00.000000 1
6 2001-07-08-18.09.00.000000 2
6 2001-10-02-12.37.00.000000 3
7 1999-06-15-12.13.00.000000 1
7 2000-05-05-14.49.00.000000 2
7 2000-09-26-16.32.00.000000 3
8 1999-01-19-09.32.00.000000 1
8 1999-08-02-09.20.00.000000 2
8 2000-07-03-12.39.00.000000 3
8 2001-08-13-13.11.00.000000 4
8 2001-10-18-10.18.00.000000 5
9 2001-09-10-13.03.00.000000 1
10 2000-03-11-10.05.00.000000 1
10 2001-03-11-15.46.00.000000 2
10 2001-04-29-18.30.00.000000 3
11 2001-07-27-11.45.00.000000 1
12 1999-02-07-10.59.00.000000 1
12 2001-08-24-11.12.00.000000 2
13 1998-03-17-14.04.00.000000 1
13 2001-05-18-10.11.00.000000 2
13 2001-09-14-12.56.00.000000 3
14 2001-10-10-17.18.00.000000 1
15 2000-12-01-18.27.00.000000 1
16 2000-01-04-14.18.00.000000 1
16 2001-02-27-15.08.00.000000 2
16 2001-11-16-09.52.00.000000 3
17 1998-04-08-17.59.00.000000 1
17 1999-06-07-10.13.00.000000 2
17 2001-09-13-12.08.00.000000 3
18 2001-09-22-10.01.00.000000 1
19 1999-03-09-12.11.00.000000 1
19 2001-07-23-09.27.00.000000 2
19 2001-12-01-16.10.00.000000 3
20 1999-11-22-14.29.00.000000 1
20 2000-05-27-17.56.00.000000 2
20 2001-06-01-09.37.00.000000 3
21 1998-02-17-16.08.00.000000 1
21 2000-02-15-13.22.00.000000 2
21 2001-03-10-15.05.00.000000 3
21 2001-03-10-16.22.00.000000 4
21 2001-10-25-10.15.00.000000 5
21 2001-11-19-11.02.00.000000 6
22 2001-03-04-17.13.00.000000 1
22 2001-08-16-16.59.00.000000 2
22 2001-10-23-11.24.00.000000 3
23 1998-07-04-16.33.00.000000 1
23 2000-09-26-13.17.00.000000 2
23 2000-09-27-12.27.00.000000 3
23 2001-06-23-16.45.00.000000 4
23 2001-10-27-18.01.00.000000 5
24 2001-10-23-14.59.00.000000 1
25 2001-03-14-09.26.00.000000 1
25 2001-11-30-14.23.00.000000 2
26 2001-04-27-15.07.00.000000 1
26 2001-06-30-11.26.00.000000 2
26 2001-12-01-18.04.00.000000 3
27 2000-06-05-09.44.00.000000 1
28 1999-07-17-10.14.00.000000 1
28 2001-02-03-15.50.00.000000 2
28 2001-02-13-12.08.00.000000 3
28 2001-07-20-16.52.00.000000 4
29 2001-06-10-17.16.00.000000 1
29 2001-09-20-10.19.00.000000 2
30 1999-05-22-16.59.00.000000 1
30 2001-10-20-15.28.00.000000 2
30 2001-12-01-14.50.00.000000 3
32 1999-05-05-14.20.00.000000 1
32 2000-05-12-13.51.00.000000 2
32 2001-05-18-10.43.00.000000 3
33 1999-02-07-18.58.00.000000 1
33 1999-09-30-14.05.00.000000 2
33 2001-09-18-12.48.00.000000 3
34 1999-05-29-15.57.00.000000 1
35 2001-03-19-18.38.00.000000 1
35 2001-03-28-15.49.00.000000 2
36 1999-06-22-11.42.00.000000 1
36 1999-10-30-15.25.00.000000 2
36 2000-01-27-10.17.00.000000 3
36 2000-11-04-09.06.00.000000 4
37 1999-01-11-09.51.00.000000 1
37 2000-11-25-17.53.00.000000 2
37 2000-12-01-17.21.00.000000 3
37 2001-10-21-16.49.00.000000 4
38 1997-10-11-17.15.00.000000 1
39 2000-03-09-13.46.00.000000 1
39 2001-01-09-16.22.00.000000 2
39 2001-07-03-14.12.00.000000 3
40 1998-07-27-17.39.00.000000 1
40 1999-01-27-09.36.00.000000 2
40 1999-06-12-17.18.00.000000 3
40 2000-05-17-14.17.00.000000 4
40 2001-04-08-15.39.00.000000 5
40 2001-09-30-10.26.00.000000 6
41 1998-06-05-10.06.00.000000 1
41 1998-08-23-09.39.00.000000 2
41 1999-12-01-18.42.00.000000 3
41 2001-03-30-15.26.00.000000 4
41 2001-11-15-15.33.00.000000 5
42 2000-06-22-12.16.00.000000 1
42 2001-01-13-15.03.00.000000 2
42 2001-08-19-14.18.00.000000 3
43 1998-07-07-11.29.00.000000 1
43 1999-01-22-15.46.00.000000 2
43 2000-08-04-12.16.00.000000 3
43 2001-03-17-14.18.00.000000 4
44 1999-11-03-09.32.00.000000 1
44 2001-05-26-17.23.00.000000 2
44 2001-07-18-12.59.00.000000 3
44 2001-10-23-10.04.00.000000 4
44 2001-11-09-16.18.00.000000 5
45 2000-03-19-10.31.00.000000 1
45 2001-07-14-11.36.00.000000 2
I am trying to eliminate all the customers (cid) who have made only one purchase. For example, cid=5 and cid=9 are good examples. The logic is that if they have a cid_when_rank=1, but no cid_when_rank=2, I need to drop those tuples. I have been breaking my head using INTERSECTION, EXCEPT, and using logic in the WHERE clause, but no luck. I looked online on how to eliminate DISTINCT records, but all I found was people discovering the DISTINCT keyword.
Please do not suggest hard coding cid=5 or cid=9 as there are more than those two records in the table.
Can you please suggest a simple SQL way to get this done. Please be aware I am not very strong at SQL yet, and would appreciate the most basic answer
Thanks in advance!
************************************EDIT #1**********************************
when I tried the first and second suggested answers my table went from 127 records to 287. I am trying to simply remove the records where a cid has a rank of 1, and does not have a rank of 2. Hope you can help.
The results of both suggested answers yield the same table:
CID WHEN CID_WHEN_RANK
1 1999-04-20-12.12.00.000000 1
1 2001-12-01-11.59.00.000000 2
1 2001-12-01-11.59.00.000000 3
1 2001-12-01-11.59.00.000000 4
1 2001-12-01-11.59.00.000000 5
2 1998-08-08-17.33.00.000000 1
2 1998-08-08-17.33.00.000000 2
2 1999-02-13-15.13.00.000000 3
2 1999-04-16-11.46.00.000000 4
2 2001-02-23-12.37.00.000000 5
2 2001-04-24-17.02.00.000000 6
2 2001-04-24-17.02.00.000000 7
2 2001-04-24-17.02.00.000000 8
2 2001-10-21-11.05.00.000000 9
2 2001-10-21-11.05.00.000000 10
2 2001-12-01-15.39.00.000000 11
3 1998-01-27-09.19.00.000000 1
3 1998-01-27-09.19.00.000000 2
3 1998-01-27-09.19.00.000000 3
3 2001-10-06-11.12.00.000000 4
3 2001-10-06-11.12.00.000000 5
3 2001-10-06-11.12.00.000000 6
3 2001-10-06-11.12.00.000000 7
3 2001-10-06-11.12.00.000000 8
4 2000-06-13-09.45.00.000000 1
4 2001-06-30-13.58.00.000000 2
4 2001-06-30-13.58.00.000000 3
4 2001-06-30-13.58.00.000000 4
4 2001-08-11-17.40.00.000000 5
5 2001-07-17-16.27.00.000000 1
5 2001-07-17-16.27.00.000000 2
5 2001-07-17-16.27.00.000000 3
5 2001-07-17-16.27.00.000000 4
5 2001-07-17-16.27.00.000000 5
5 2001-07-17-16.27.00.000000 6
5 2001-07-17-16.27.00.000000 7
6 2000-05-18-11.43.00.000000 1
6 2000-05-18-11.43.00.000000 2
6 2000-05-18-11.43.00.000000 3
6 2001-07-08-18.09.00.000000 4
6 2001-07-08-18.09.00.000000 5
6 2001-10-02-12.37.00.000000 6
7 1999-06-15-12.13.00.000000 1
7 1999-06-15-12.13.00.000000 2
7 2000-05-05-14.49.00.000000 3
7 2000-09-26-16.32.00.000000 4
8 1999-01-19-09.32.00.000000 1
8 1999-08-02-09.20.00.000000 2
8 2000-07-03-12.39.00.000000 3
8 2000-07-03-12.39.00.000000 4
8 2001-08-13-13.11.00.000000 5
8 2001-10-18-10.18.00.000000 6
8 2001-10-18-10.18.00.000000 7
9 2001-09-10-13.03.00.000000 1
9 2001-09-10-13.03.00.000000 2
9 2001-09-10-13.03.00.000000 3
9 2001-09-10-13.03.00.000000 4
9 2001-09-10-13.03.00.000000 5
9 2001-09-10-13.03.00.000000 6
9 2001-09-10-13.03.00.000000 7
9 2001-09-10-13.03.00.000000 8
10 2000-03-11-10.05.00.000000 1
10 2001-03-11-15.46.00.000000 2
10 2001-03-11-15.46.00.000000 3
10 2001-04-29-18.30.00.000000 4
10 2001-04-29-18.30.00.000000 5
11 2001-07-27-11.45.00.000000 1
11 2001-07-27-11.45.00.000000 2
11 2001-07-27-11.45.00.000000 3
11 2001-07-27-11.45.00.000000 4
11 2001-07-27-11.45.00.000000 5
12 1999-02-07-10.59.00.000000 1
12 2001-08-24-11.12.00.000000 2
12 2001-08-24-11.12.00.000000 3
12 2001-08-24-11.12.00.000000 4
13 1998-03-17-14.04.00.000000 1
13 2001-05-18-10.11.00.000000 2
13 2001-05-18-10.11.00.000000 3
13 2001-05-18-10.11.00.000000 4
13 2001-09-14-12.56.00.000000 5
14 2001-10-10-17.18.00.000000 1
14 2001-10-10-17.18.00.000000 2
14 2001-10-10-17.18.00.000000 3
14 2001-10-10-17.18.00.000000 4
14 2001-10-10-17.18.00.000000 5
14 2001-10-10-17.18.00.000000 6
14 2001-10-10-17.18.00.000000 7
14 2001-10-10-17.18.00.000000 8
15 2000-12-01-18.27.00.000000 1
15 2000-12-01-18.27.00.000000 2
15 2000-12-01-18.27.00.000000 3
15 2000-12-01-18.27.00.000000 4
15 2000-12-01-18.27.00.000000 5
16 2000-01-04-14.18.00.000000 1
16 2001-02-27-15.08.00.000000 2
16 2001-02-27-15.08.00.000000 3
16 2001-02-27-15.08.00.000000 4
16 2001-11-16-09.52.00.000000 5
16 2001-11-16-09.52.00.000000 6
16 2001-11-16-09.52.00.000000 7
17 1998-04-08-17.59.00.000000 1
17 1999-06-07-10.13.00.000000 2
17 2001-09-13-12.08.00.000000 3
17 2001-09-13-12.08.00.000000 4
17 2001-09-13-12.08.00.000000 5
18 2001-09-22-10.01.00.000000 1
18 2001-09-22-10.01.00.000000 2
18 2001-09-22-10.01.00.000000 3
19 1999-03-09-12.11.00.000000 1
19 1999-03-09-12.11.00.000000 2
19 1999-03-09-12.11.00.000000 3
19 2001-07-23-09.27.00.000000 4
19 2001-07-23-09.27.00.000000 5
19 2001-07-23-09.27.00.000000 6
19 2001-12-01-16.10.00.000000 7
19 2001-12-01-16.10.00.000000 8
19 2001-12-01-16.10.00.000000 9
19 2001-12-01-16.10.00.000000 10
19 2001-12-01-16.10.00.000000 11
20 1999-11-22-14.29.00.000000 1
20 1999-11-22-14.29.00.000000 2
20 2000-05-27-17.56.00.000000 3
20 2001-06-01-09.37.00.000000 4
20 2001-06-01-09.37.00.000000 5
21 1998-02-17-16.08.00.000000 1
21 2000-02-15-13.22.00.000000 2
21 2001-03-10-15.05.00.000000 3
21 2001-03-10-15.05.00.000000 4
21 2001-03-10-15.05.00.000000 5
21 2001-03-10-16.22.00.000000 6
21 2001-10-25-10.15.00.000000 7
21 2001-11-19-11.02.00.000000 8
21 2001-11-19-11.02.00.000000 9
21 2001-11-19-11.02.00.000000 10
21 2001-11-19-11.02.00.000000 11
22 2001-03-04-17.13.00.000000 1
22 2001-03-04-17.13.00.000000 2
22 2001-03-04-17.13.00.000000 3
22 2001-03-04-17.13.00.000000 4
22 2001-08-16-16.59.00.000000 5
22 2001-10-23-11.24.00.000000 6
23 1998-07-04-16.33.00.000000 1
23 2000-09-26-13.17.00.000000 2
23 2000-09-26-13.17.00.000000 3
23 2000-09-27-12.27.00.000000 4
23 2000-09-27-12.27.00.000000 5
23 2001-06-23-16.45.00.000000 6
23 2001-06-23-16.45.00.000000 7
23 2001-10-27-18.01.00.000000 8
23 2001-10-27-18.01.00.000000 9
23 2001-10-27-18.01.00.000000 10
23 2001-10-27-18.01.00.000000 11
24 2001-10-23-14.59.00.000000 1
24 2001-10-23-14.59.00.000000 2
24 2001-10-23-14.59.00.000000 3
25 2001-03-14-09.26.00.000000 1
25 2001-03-14-09.26.00.000000 2
25 2001-03-14-09.26.00.000000 3
25 2001-11-30-14.23.00.000000 4
26 2001-04-27-15.07.00.000000 1
26 2001-04-27-15.07.00.000000 2
26 2001-04-27-15.07.00.000000 3
26 2001-04-27-15.07.00.000000 4
26 2001-04-27-15.07.00.000000 5
26 2001-06-30-11.26.00.000000 6
26 2001-06-30-11.26.00.000000 7
26 2001-06-30-11.26.00.000000 8
26 2001-12-01-18.04.00.000000 9
26 2001-12-01-18.04.00.000000 10
26 2001-12-01-18.04.00.000000 11
27 2000-06-05-09.44.00.000000 1
27 2000-06-05-09.44.00.000000 2
28 1999-07-17-10.14.00.000000 1
28 2001-02-03-15.50.00.000000 2
28 2001-02-03-15.50.00.000000 3
28 2001-02-03-15.50.00.000000 4
28 2001-02-13-12.08.00.000000 5
28 2001-02-13-12.08.00.000000 6
28 2001-07-20-16.52.00.000000 7
28 2001-07-20-16.52.00.000000 8
29 2001-06-10-17.16.00.000000 1
29 2001-06-10-17.16.00.000000 2
29 2001-06-10-17.16.00.000000 3
29 2001-09-20-10.19.00.000000 4
29 2001-09-20-10.19.00.000000 5
29 2001-09-20-10.19.00.000000 6
30 1999-05-22-16.59.00.000000 1
30 2001-10-20-15.28.00.000000 2
30 2001-10-20-15.28.00.000000 3
30 2001-10-20-15.28.00.000000 4
30 2001-10-20-15.28.00.000000 5
30 2001-12-01-14.50.00.000000 6
30 2001-12-01-14.50.00.000000 7
32 1999-05-05-14.20.00.000000 1
32 1999-05-05-14.20.00.000000 2
32 2000-05-12-13.51.00.000000 3
32 2001-05-18-10.43.00.000000 4
32 2001-05-18-10.43.00.000000 5
32 2001-05-18-10.43.00.000000 6
32 2001-05-18-10.43.00.000000 7
32 2001-05-18-10.43.00.000000 8
33 1999-02-07-18.58.00.000000 1
33 1999-02-07-18.58.00.000000 2
33 1999-02-07-18.58.00.000000 3
33 1999-09-30-14.05.00.000000 4
33 1999-09-30-14.05.00.000000 5
33 1999-09-30-14.05.00.000000 6
33 2001-09-18-12.48.00.000000 7
33 2001-09-18-12.48.00.000000 8
34 1999-05-29-15.57.00.000000 1
34 1999-05-29-15.57.00.000000 2
35 2001-03-19-18.38.00.000000 1
35 2001-03-19-18.38.00.000000 2
35 2001-03-28-15.49.00.000000 3
35 2001-03-28-15.49.00.000000 4
36 1999-06-22-11.42.00.000000 1
36 1999-10-30-15.25.00.000000 2
36 1999-10-30-15.25.00.000000 3
36 1999-10-30-15.25.00.000000 4
36 2000-01-27-10.17.00.000000 5
36 2000-11-04-09.06.00.000000 6
37 1999-01-11-09.51.00.000000 1
37 1999-01-11-09.51.00.000000 2
37 1999-01-11-09.51.00.000000 3
37 2000-11-25-17.53.00.000000 4
37 2000-11-25-17.53.00.000000 5
37 2000-12-01-17.21.00.000000 6
37 2000-12-01-17.21.00.000000 7
37 2001-10-21-16.49.00.000000 8
38 1997-10-11-17.15.00.000000 1
38 1997-10-11-17.15.00.000000 2
38 1997-10-11-17.15.00.000000 3
38 1997-10-11-17.15.00.000000 4
38 1997-10-11-17.15.00.000000 5
38 1997-10-11-17.15.00.000000 6
39 2000-03-09-13.46.00.000000 1
39 2000-03-09-13.46.00.000000 2
39 2001-01-09-16.22.00.000000 3
39 2001-01-09-16.22.00.000000 4
39 2001-01-09-16.22.00.000000 5
39 2001-01-09-16.22.00.000000 6
39 2001-07-03-14.12.00.000000 7
40 1998-07-27-17.39.00.000000 1
40 1999-01-27-09.36.00.000000 2
40 1999-06-12-17.18.00.000000 3
40 1999-06-12-17.18.00.000000 4
40 2000-05-17-14.17.00.000000 5
40 2001-04-08-15.39.00.000000 6
40 2001-09-30-10.26.00.000000 7
40 2001-09-30-10.26.00.000000 8
41 1998-06-05-10.06.00.000000 1
41 1998-06-05-10.06.00.000000 2
41 1998-06-05-10.06.00.000000 3
41 1998-08-23-09.39.00.000000 4
41 1998-08-23-09.39.00.000000 5
41 1999-12-01-18.42.00.000000 6
41 1999-12-01-18.42.00.000000 7
41 1999-12-01-18.42.00.000000 8
41 2001-03-30-15.26.00.000000 9
41 2001-03-30-15.26.00.000000 10
41 2001-11-15-15.33.00.000000 11
42 2000-06-22-12.16.00.000000 1
42 2000-06-22-12.16.00.000000 2
42 2001-01-13-15.03.00.000000 3
42 2001-01-13-15.03.00.000000 4
42 2001-08-19-14.18.00.000000 5
42 2001-08-19-14.18.00.000000 6
42 2001-08-19-14.18.00.000000 7
42 2001-08-19-14.18.00.000000 8
43 1998-07-07-11.29.00.000000 1
43 1999-01-22-15.46.00.000000 2
43 2000-08-04-12.16.00.000000 3
43 2001-03-17-14.18.00.000000 4
43 2001-03-17-14.18.00.000000 5
43 2001-03-17-14.18.00.000000 6
44 1999-11-03-09.32.00.000000 1
44 2001-05-26-17.23.00.000000 2
44 2001-07-18-12.59.00.000000 3
44 2001-10-23-10.04.00.000000 4
44 2001-10-23-10.04.00.000000 5
44 2001-10-23-10.04.00.000000 6
44 2001-10-23-10.04.00.000000 7
44 2001-11-09-16.18.00.000000 8
45 2000-03-19-10.31.00.000000 1
45 2000-03-19-10.31.00.000000 2
45 2000-03-19-10.31.00.000000 3
45 2001-07-14-11.36.00.000000 4
287 record(s) selected.
Any suggestions?
You can use the count window function to fetch cid's when they have more than 1 row.
select cid,when,cid_when_rank
from (
SELECT cid, when, ROW_NUMBER() OVER(PARTITION BY cid ORDER BY when ASC) AS cid_when_rank
,COUNT(*) OVER(PARTITION BY cid) as cnt
FROM yrb_purchase
) t
where cnt > 1
Edit: Based on OP's comment,
select cid,when,cid_when_rank
from (
SELECT cid, when, ROW_NUMBER() OVER(PARTITION BY cid ORDER BY when ASC) AS cid_when_rank
,COUNT(*) OVER(PARTITION BY cid) as cnt
FROM (SELECT DISTINCT cid, when FROM yrb_purchase) tmp
) t
where cnt > 1
Using count(*) as a window function is a very good solution. One way that might return results faster is exists:
select p.*
from yrb_purchase p
where exists (select 1 from yrb_purchase p2 where p2.when <> p.when);
Of course, if you need the row number as well, then the overhead for the count is probably immeasurable.