How to SUM from MySQL for every n record - sql
I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt
Related
Select within grouped query to return last record of that group
I have a table which stores events along with event type and booking ID's, my goal is to group the BookingID and return the first EventDate, the last EventDate and the last BC_EventType | ID | BookingID | VenueID | BC_EventType | EventDate | +----+-----------+---------+--------------+-------------------------+ | 12 | 1468656 | 94 | 1 | 2020-10-20 12:06:27.027 | | 13 | 1468656 | 94 | 3 | 2020-10-20 12:06:27.060 | | 14 | 1468656 | 94 | 4 | 2020-10-20 12:06:43.923 | | 15 | 1468656 | 94 | 5 | 2020-10-20 12:06:49.603 | | 16 | 1468656 | 94 | 6 | 2020-10-20 12:06:56.523 | | 17 | 1468656 | 94 | 8 | 2020-10-20 12:07:09.203 | | 18 | 1468656 | 94 | 12 | 2020-10-20 12:07:21.287 | | 19 | 1468656 | 94 | 13 | 2020-10-20 12:07:26.167 | | 20 | 1468656 | 94 | 17 | 2020-10-20 12:07:36.337 | | 21 | 1468657 | 94 | 7 | 2020-10-20 13:54:48.697 | | 22 | 1468657 | 94 | 1 | 2020-10-20 13:53:56.297 | | 23 | 1468657 | 94 | 3 | 2020-10-20 13:53:56.330 | | 24 | 1468657 | 94 | 4 | 2020-10-20 13:54:38.257 | | 25 | 1468657 | 94 | 5 | 2020-10-20 13:54:40.333 | | 26 | 1468657 | 94 | 6 | 2020-10-20 13:54:40.540 | | 27 | 1468657 | 94 | 8 | 2020-10-20 13:54:51.193 | | 28 | 1468657 | 94 | 12 | 2020-10-20 13:55:13.650 | | 29 | 1468657 | 94 | 13 | 2020-10-20 13:55:13.727 | | 30 | 1468657 | 94 | 14 | 2020-10-20 13:55:26.933 | | 31 | 1468665 | 94 | 8 | 2020-10-20 15:00:41.043 | | 32 | 1468665 | 94 | 9 | 2020-10-20 15:00:41.073 | | 33 | 1468665 | 94 | 8 | 2020-10-20 15:00:41.090 | | 34 | 1468665 | 94 | 9 | 2020-10-20 15:00:41.120 | | 35 | 1468665 | 94 | 7 | 2020-10-20 15:00:41.137 | | 36 | 1468665 | 94 | 1 | 2020-10-20 15:00:20.687 | | 37 | 1468665 | 94 | 3 | 2020-10-20 15:00:20.703 | | 38 | 1468665 | 94 | 4 | 2020-10-20 15:00:28.560 | | 39 | 1468665 | 94 | 5 | 2020-10-20 15:00:32.617 | | 40 | 1468665 | 94 | 6 | 2020-10-20 15:00:32.663 | | 41 | 1468665 | 94 | 12 | 2020-10-20 15:00:48.680 | | 42 | 1468665 | 94 | 15 | 2020-10-20 15:00:48.743 | | 43 | 1468665 | 94 | 14 | 2020-10-20 15:00:56.247 | | 44 | 1468665 | 94 | 17 | 2020-10-20 15:00:56.527 | | 45 | 1468676 | 94 | 8 | 2020-10-20 15:35:14.870 | | 46 | 1468676 | 94 | 9 | 2020-10-20 15:35:14.887 | | 47 | 1468676 | 94 | 8 | 2020-10-20 15:35:14.917 | | 48 | 1468676 | 94 | 9 | 2020-10-20 15:35:14.933 | | 49 | 1468676 | 94 | 7 | 2020-10-20 15:35:14.947 | | 50 | 1468676 | 94 | 1 | 2020-10-20 15:35:13.927 | | 51 | 1468687 | 94 | 23 | 2020-10-20 16:11:38.820 | | 52 | 1468687 | 94 | 8 | 2020-10-20 16:11:39.837 | | 53 | 1468687 | 94 | 9 | 2020-10-20 16:11:39.853 | | 54 | 1468687 | 94 | 8 | 2020-10-20 16:11:39.870 | | 55 | 1468687 | 94 | 9 | 2020-10-20 16:11:39.883 | | 56 | 1468687 | 94 | 7 | 2020-10-20 16:11:39.900 | | 57 | 1468687 | 94 | 1 | 2020-10-20 16:11:39.493 | | 58 | 1468687 | 94 | 8 | 2020-10-20 16:12:47.077 | | 59 | 1468687 | 94 | 9 | 2020-10-20 16:12:47.093 | | 60 | 1468687 | 94 | 8 | 2020-10-20 16:12:47.110 | | 61 | 1468687 | 94 | 9 | 2020-10-20 16:12:47.123 | | 62 | 1468687 | 94 | 7 | 2020-10-20 16:12:47.150 | | 63 | 1468687 | 94 | 1 | 2020-10-20 16:12:36.270 | +----+-----------+---------+--------------+-------------------------+ At the moment I have this SQL: SELECT [BookingID] ,min(EventDate) as Min_Date ,max(EventDate) as Max_Date FROM [dbo].[BC_Event] Group By [BookingID] Which returns something along the lines of: | BookingID | Min_Date | Max_Date | +-----------+-------------------------+-------------------------+ | 1468656 | 2020-10-20 12:06:27.027 | 2020-10-20 12:07:36.337 | | 1468657 | 2020-10-20 13:53:56.297 | 2020-10-20 13:55:26.933 | | 1468665 | 2020-10-20 15:00:20.687 | 2020-10-20 15:00:56.527 | | 1468676 | 2020-10-20 15:35:13.927 | 2020-10-20 15:35:14.947 | | 1468687 | 2020-10-20 16:11:38.820 | 2020-10-20 16:12:47.150 | | 1468688 | 2020-10-20 16:13:53.390 | 2020-10-20 16:19:02.777 | +-----------+-------------------------+-------------------------+ This is great, however I need a column which displays the LAST BC_EventType ID, so in theory it would be where [BC_Event] = Max_Date. How would I do this?
One method is conditional aggregation: SELECT [BookingID], min(EventDate) as Min_Date, max(EventDate) as Max_Date, MAX(CASE WHEN seqnum_asc = 1 THEN BC_EventType END) as first_BC_EventType, MAX(CASE WHEN seqnum_desc = 1 THEN BC_EventType END) as last_BC_EventType FROM (SELECT e.*, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY EventDate ASC) as seqnum_asc, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY EventDate DESC) as seqnum_desc FROM [dbo].[BC_Event] e ) e GROUP BY [BookingID]
Here's an approach that uses 2 ROW_NUMBER functions. Something like this with bce_cte as ( select *, row_number() over (partition by BookingID order by EventDate asc) as rn_asc, row_number() over (partition by BookingID order by EventDate desc) as rn_desc from [dbo].[BC_Event]) select BookingID, max(case when rn_asc=1 then EventDate else null end) Min_Date, max(case when rn_desc=1 then EventDate else null end) Max_Date, max(case when rn_desc=1 then BC_EventType else null end) Last_BC_EventType group by BookingID;
How to reshape a table having multiple records for the same id into a table with one record per id without losing information?
Basically, I want to transform this(Initial) into this(Final). In other words, I want to "squash" the initial table so that it will have only one record per id "dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...). I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that? Edit: I what to change +----+--------+------+-----+------+ | ID | source | c1 | c2 | c3 | +----+--------+------+-----+------+ | 1 | A | 432 | 56 | 1 | | 1 | B | 53 | 3 | 73 | | 1 | C | 7 | 342 | 83 | | 1 | D | 543 | 43 | 73 | | 2 | A | 8 | 882 | 39 | | 2 | B | 5 | 54 | 46 | | 2 | C | 8 | 3 | 2226 | | 2 | D | 87 | 2 | 45 | | 3 | A | 93 | 143 | 45 | | 3 | B | 1023 | 72 | 8 | | 3 | C | 3 | 3 | 704 | | 4 | A | 2 | 5 | 0 | | 4 | B | 78 | 888 | 2 | | 4 | C | 87 | 23 | 34 | | 4 | D | 112 | 7 | 712 | +----+--------+------+-----+------+ into +----+------+------+------+------+------+------+------+------+------+------+------+------+ | ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D | +----+------+------+------+------+------+------+------+------+------+------+------+------+ | 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 | | 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 | | 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | | | 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 | +----+------+------+------+------+------+------+------+------+------+------+------+------+
Abandon hope ... ? data want; input ID source $ c1 c2 c3;datalines; 1 A 432 56 1 1 B 53 3 73 1 C 7 342 83 1 D 543 43 73 2 A 8 882 39 2 B 5 54 46 2 C 8 3 2226 2 D 87 2 45 3 A 93 143 45 3 B 1023 72 8 3 C 3 3 704 4 A 2 5 0 4 B 78 888 2 4 C 87 23 34 4 D 112 7 712 ; * one to grow you oh data; proc transpose data=want out=stage1; by id source; var c1-c3; run; * and one to shrink; proc transpose data=stage1 out=want(drop=_name_) delim=_; by id; id _name_ source; run;
Get RMSE score while fetching data from the Table directly.Write a query for that
I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner. My Table is something like this Actual_Feature_1 = Act_F_1 Predicted_Feature_1 = Pred_F_1 Actual_Feature_1 = Act_F_2 Predicted_Feature_1 = Pred_F_2 Table 1: ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 | 4 | 24 | 10 | 12 | 22 | 20 | 4 | 24 | 11 | 13 | 23 | 21 | 4 | 24 | 11 | 12 | 24 | 23 | 4 | 25 | 13 | 15 | 22 | 20 | 4 | 25 | 15 | 12 | 21 | 20 | 4 | 25 | 15 | 14 | 21 | 21 | 4 | 27 | 13 | 12 | 35 | 32 | 4 | 27 | 12 | 16 | 34 | 31 | 4 | 27 | 17 | 14 | 36 | 34 | 5 | 301 | 19 | 17 | 56 | 54 | 5 | 301 | 21 | 20 | 58 | 54 | 5 | 301 | 22 | 19 | 59 | 57 | 5 | 301 | 24 | 22 | 46 | 50 | 5 | 301 | 25 | 22 | 49 | 54 | 5 | 350 | 12 | 10 | 67 | 66 | 5 | 350 | 12 | 11 | 65 | 64 | 5 | 350 | 14 | 13 | 68 | 67 | 5 | 350 | 15 | 12 | 61 | 61 | 5 | 350 | 12 | 10 | 63 | 60 | 7 | 420 | 51 | 49 | 30 | 29 | 7 | 420 | 51 | 48 | 32 | 30 | 7 | 410 | 49 | 45 | 81 | 79 | 7 | 410 | 48 | 44 | 83 | 80 | 7 | 410 | 45 | 43 | 84 | 81 | I need the RMSE score for each account in each partners in this format Resulted Table : ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 | 4 | 24 | rmse_score | rmse_score | 4 | 25 | rmse_score | rmse_score | 4 | 27 | rmse_score | rmse_score | 5 | 301 | rmse_score | rmse_score | 5 | 350 | rmse_score | rmse_score | 7 | 420 | rmse_score | rmse_score | 7 | 410 | rmse_score | rmse_score | Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having. I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL. SELECT ID_PARTNER, ID_ACCOUNT , SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse FROM ... GROUP BY ID_PARTNER, ID_ACCOUNT
select all rows that match criteria if not get a random one
+----+---------------+--------------------+------------+----------+-----------------+ | id | restaurant_id | filename | is_profile | priority | show_in_profile | +----+---------------+--------------------+------------+----------+-----------------+ | 40 | 20 | 1320849687_390.jpg | | | 1 | | 60 | 24 | 1320853501_121.png | 1 | | 1 | | 61 | 24 | 1320853504_847.png | | | 1 | | 62 | 24 | 1320853505_732.png | | | 1 | | 63 | 24 | 1320853505_865.png | | | 1 | | 64 | 29 | 1320854617_311.png | 1 | | 1 | | 65 | 29 | 1320854617_669.png | | | 1 | | 66 | 29 | 1320854618_636.png | | | 1 | | 67 | 29 | 1320854619_791.png | | | 1 | | 74 | 154 | 1320922653_259.png | | | 1 | | 76 | 154 | 1320922656_332.png | | | 1 | | 77 | 154 | 1320922657_106.png | | | 1 | | 84 | 130 | 1321269380_960.jpg | 1 | | 1 | | 85 | 130 | 1321269383_555.jpg | | | 1 | | 86 | 130 | 1321269384_251.jpg | | | 1 | | 89 | 28 | 1321269714_303.jpg | | | 1 | | 90 | 28 | 1321269716_938.jpg | 1 | | 1 | | 91 | 28 | 1321269717_147.jpg | | | 1 | | 92 | 28 | 1321269717_774.jpg | | | 1 | | 93 | 28 | 1321269717_250.jpg | | | 1 | | 94 | 28 | 1321269718_964.jpg | | | 1 | | 95 | 28 | 1321269719_830.jpg | | | 1 | | 96 | 43 | 1321270013_629.jpg | 1 | | 1 | +----+---------------+--------------------+------------+----------+-----------------+ I have this table and I want to select the filename for a given list of restaurants ids. For example for 24,29,154: +----+--------------- | filename | +----+--------------- 1320853501_121.png (has is_profile 1) 1320854617_311.png (has is_profile 1) 1320922653_259.png (chosen as profile picture because restaurant doesn't have a profile pic but has pictures) I tried group by and case statements but I got nowhere.Also if you use group by it should be a full group by.
You can do this with aggregation and some logic: select restaurant_id, coalesce(max(case when is_profile = 1 then filename end), max(filename) ) as filename from t where restaurant_id in (24, 29, 154) group by restaurant_id; First look for the/a profile filename. Next just choose an arbitrary one.
Find the highest and lowest value locations within an interval on a column?
Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question. import pandas as pd import numpy as np data = pd.DataFrame([ [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1], [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1], [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1], [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1], [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1], [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1], [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan], [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1], [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1], [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1], [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1], [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1], [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan], [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1], [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1], [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1], [1873.174,1],[1873.691,np.nan],[1873.685,np.nan] ]) In the third column below you can see where the max and min is for each interval. +-------+----------+-----------+---------+ | index | Value | Intervals | Min/Max | +-------+----------+-----------+---------+ | 0 | 1879.289 | np.nan | | | 1 | 1879.281 | np.nan | | | 2 | 1879.292 | 1 | | | 3 | 1879.295 | 1 | | | 4 | 1879.481 | 1 | | | 5 | 1879.294 | 1 | | | 6 | 1879.268 | 1 | min | | 7 | 1879.293 | 1 | | | 8 | 1879.277 | 1 | | | 9 | 1879.285 | 1 | | | 10 | 1879.464 | 1 | | | 11 | 1879.475 | 1 | | | 12 | 1879.971 | 1 | | | 13 | 1879.779 | 1 | | | 17 | 1879.986 | 1 | | | 18 | 1880.791 | 1 | max | | 19 | 1880.29 | 1 | | | 55 | 1879.253 | np.nan | | | 56 | 1878.268 | np.nan | | | 57 | 1875.73 | 1 | | | 58 | 1876.792 | 1 | | | 59 | 1875.977 | 1 | min | | 60 | 1876.408 | 1 | | | 61 | 1877.159 | 1 | | | 62 | 1877.187 | 1 | | | 63 | 1883.164 | 1 | | | 64 | 1883.171 | 1 | | | 65 | 1883.495 | 1 | | | 66 | 1883.962 | 1 | | | 67 | 1885.158 | 1 | | | 68 | 1885.974 | 1 | max | | 69 | 1886.479 | np.nan | | | 70 | 1885.969 | np.nan | | | 71 | 1884.693 | 1 | | | 72 | 1884.977 | 1 | | | 73 | 1884.967 | 1 | | | 74 | 1884.691 | 1 | min | | 75 | 1886.171 | 1 | max | | 76 | 1886.166 | np.nan | | | 77 | 1884.476 | np.nan | | | 78 | 1884.66 | 1 | max | | 79 | 1882.962 | 1 | | | 80 | 1881.496 | 1 | | | 81 | 1871.163 | 1 | min | | 82 | 1874.985 | 1 | | | 83 | 1874.979 | 1 | | | 84 | 1871.173 | np.nan | | | 85 | 1871.973 | np.nan | | | 86 | 1871.682 | np.nan | | | 87 | 1872.476 | np.nan | | | 88 | 1882.361 | 1 | max | | 89 | 1880.869 | 1 | | | 90 | 1882.165 | 1 | | | 91 | 1881.857 | 1 | | | 92 | 1880.375 | 1 | | | 93 | 1880.66 | 1 | | | 94 | 1880.891 | 1 | | | 95 | 1880.377 | 1 | | | 96 | 1881.663 | 1 | | | 97 | 1881.66 | 1 | | | 98 | 1877.888 | 1 | | | 99 | 1875.69 | 1 | | | 100 | 1875.161 | 1 | min | | 101 | 1876.697 | np.nan | | | 102 | 1876.671 | np.nan | | | 103 | 1879.666 | np.nan | | | 111 | 1877.182 | np.nan | | | 112 | 1878.898 | 1 | | | 113 | 1878.668 | 1 | | | 114 | 1878.871 | 1 | | | 115 | 1878.882 | 1 | | | 116 | 1879.173 | 1 | max | | 117 | 1878.887 | 1 | | | 118 | 1878.68 | 1 | | | 119 | 1878.872 | 1 | | | 120 | 1878.677 | 1 | | | 121 | 1877.877 | 1 | | | 122 | 1877.669 | 1 | | | 123 | 1877.69 | 1 | | | 124 | 1877.684 | 1 | | | 125 | 1877.68 | 1 | | | 126 | 1877.885 | 1 | | | 127 | 1877.863 | 1 | | | 128 | 1877.674 | 1 | | | 129 | 1877.676 | 1 | | | 130 | 1877.687 | 1 | | | 131 | 1878.367 | 1 | | | 132 | 1878.179 | 1 | | | 133 | 1877.696 | 1 | | | 134 | 1877.665 | 1 | min | | 135 | 1877.667 | np.nan | | | 136 | 1878.678 | np.nan | | | 137 | 1878.661 | 1 | max | | 138 | 1878.171 | 1 | | | 139 | 1877.371 | 1 | | | 140 | 1877.359 | 1 | | | 141 | 1878.381 | 1 | | | 142 | 1875.185 | 1 | min | | 143 | 1875.367 | np.nan | | | 144 | 1865.492 | np.nan | | | 145 | 1865.495 | 1 | max | | 146 | 1866.995 | 1 | | | 147 | 1866.672 | 1 | | | 148 | 1867.465 | 1 | | | 149 | 1867.663 | 1 | | | 150 | 1867.186 | 1 | | | 151 | 1867.687 | 1 | | | 152 | 1867.459 | 1 | | | 153 | 1867.168 | 1 | | | 154 | 1869.689 | 1 | | | 155 | 1869.693 | 1 | | | 156 | 1871.676 | 1 | | | 157 | 1873.174 | 1 | min | | 158 | 1873.691 | np.nan | | | 159 | 1873.685 | np.nan | | +-------+----------+-----------+---------+
isnull = data.iloc[:, 1].isnull() minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin']) data.loc[minmax['idxmax'], 'MinMax'] = 'max' data.loc[minmax['idxmin'], 'MinMax'] = 'min' data.MinMax = data.MinMax.fillna('') print(data) 0 1 MinMax 0 1879.289 NaN 1 1879.281 NaN 2 1879.292 1.0 3 1879.295 1.0 4 1879.481 1.0 5 1879.294 1.0 6 1879.268 1.0 min 7 1879.293 1.0 8 1879.277 1.0 9 1879.285 1.0 10 1879.464 1.0 11 1879.475 1.0 12 1879.971 1.0 13 1879.779 1.0 14 1879.986 1.0 15 1880.791 1.0 max 16 1880.290 1.0 17 1879.253 NaN 18 1878.268 NaN 19 1875.730 1.0 min 20 1876.792 1.0 21 1875.977 1.0 22 1876.408 1.0 23 1877.159 1.0 24 1877.187 1.0 25 1883.164 1.0 26 1883.171 1.0 27 1883.495 1.0 28 1883.962 1.0 29 1885.158 1.0 .. ... ... ... 85 1877.687 1.0 86 1878.367 1.0 87 1878.179 1.0 88 1877.696 1.0 89 1877.665 1.0 min 90 1877.667 NaN 91 1878.678 NaN 92 1878.661 1.0 max 93 1878.171 1.0 94 1877.371 1.0 95 1877.359 1.0 96 1878.381 1.0 97 1875.185 1.0 min 98 1875.367 NaN 99 1865.492 NaN 100 1865.495 1.0 min 101 1866.995 1.0 102 1866.672 1.0 103 1867.465 1.0 104 1867.663 1.0 105 1867.186 1.0 106 1867.687 1.0 107 1867.459 1.0 108 1867.168 1.0 109 1869.689 1.0 110 1869.693 1.0 111 1871.676 1.0 112 1873.174 1.0 max 113 1873.691 NaN 114 1873.685 NaN [115 rows x 3 columns]
data.columns=['Value','Interval'] data['Ingroup'] = (data['Interval'].notnull() + 0) Use data['Interval'].notnull() to separate the groups... Use cumsum() to number them with `groupno`... Use groupby(groupno).. Finally you want something using apply/idxmax/idxmin to label the max/min But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.