How to SUM from MySQL for every n record - sql

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?

You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K

You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.

For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt

Related

Select within grouped query to return last record of that group

I have a table which stores events along with event type and booking ID's, my goal is to group the BookingID and return the first EventDate, the last EventDate and the last BC_EventType
| ID | BookingID | VenueID | BC_EventType | EventDate |
+----+-----------+---------+--------------+-------------------------+
| 12 | 1468656 | 94 | 1 | 2020-10-20 12:06:27.027 |
| 13 | 1468656 | 94 | 3 | 2020-10-20 12:06:27.060 |
| 14 | 1468656 | 94 | 4 | 2020-10-20 12:06:43.923 |
| 15 | 1468656 | 94 | 5 | 2020-10-20 12:06:49.603 |
| 16 | 1468656 | 94 | 6 | 2020-10-20 12:06:56.523 |
| 17 | 1468656 | 94 | 8 | 2020-10-20 12:07:09.203 |
| 18 | 1468656 | 94 | 12 | 2020-10-20 12:07:21.287 |
| 19 | 1468656 | 94 | 13 | 2020-10-20 12:07:26.167 |
| 20 | 1468656 | 94 | 17 | 2020-10-20 12:07:36.337 |
| 21 | 1468657 | 94 | 7 | 2020-10-20 13:54:48.697 |
| 22 | 1468657 | 94 | 1 | 2020-10-20 13:53:56.297 |
| 23 | 1468657 | 94 | 3 | 2020-10-20 13:53:56.330 |
| 24 | 1468657 | 94 | 4 | 2020-10-20 13:54:38.257 |
| 25 | 1468657 | 94 | 5 | 2020-10-20 13:54:40.333 |
| 26 | 1468657 | 94 | 6 | 2020-10-20 13:54:40.540 |
| 27 | 1468657 | 94 | 8 | 2020-10-20 13:54:51.193 |
| 28 | 1468657 | 94 | 12 | 2020-10-20 13:55:13.650 |
| 29 | 1468657 | 94 | 13 | 2020-10-20 13:55:13.727 |
| 30 | 1468657 | 94 | 14 | 2020-10-20 13:55:26.933 |
| 31 | 1468665 | 94 | 8 | 2020-10-20 15:00:41.043 |
| 32 | 1468665 | 94 | 9 | 2020-10-20 15:00:41.073 |
| 33 | 1468665 | 94 | 8 | 2020-10-20 15:00:41.090 |
| 34 | 1468665 | 94 | 9 | 2020-10-20 15:00:41.120 |
| 35 | 1468665 | 94 | 7 | 2020-10-20 15:00:41.137 |
| 36 | 1468665 | 94 | 1 | 2020-10-20 15:00:20.687 |
| 37 | 1468665 | 94 | 3 | 2020-10-20 15:00:20.703 |
| 38 | 1468665 | 94 | 4 | 2020-10-20 15:00:28.560 |
| 39 | 1468665 | 94 | 5 | 2020-10-20 15:00:32.617 |
| 40 | 1468665 | 94 | 6 | 2020-10-20 15:00:32.663 |
| 41 | 1468665 | 94 | 12 | 2020-10-20 15:00:48.680 |
| 42 | 1468665 | 94 | 15 | 2020-10-20 15:00:48.743 |
| 43 | 1468665 | 94 | 14 | 2020-10-20 15:00:56.247 |
| 44 | 1468665 | 94 | 17 | 2020-10-20 15:00:56.527 |
| 45 | 1468676 | 94 | 8 | 2020-10-20 15:35:14.870 |
| 46 | 1468676 | 94 | 9 | 2020-10-20 15:35:14.887 |
| 47 | 1468676 | 94 | 8 | 2020-10-20 15:35:14.917 |
| 48 | 1468676 | 94 | 9 | 2020-10-20 15:35:14.933 |
| 49 | 1468676 | 94 | 7 | 2020-10-20 15:35:14.947 |
| 50 | 1468676 | 94 | 1 | 2020-10-20 15:35:13.927 |
| 51 | 1468687 | 94 | 23 | 2020-10-20 16:11:38.820 |
| 52 | 1468687 | 94 | 8 | 2020-10-20 16:11:39.837 |
| 53 | 1468687 | 94 | 9 | 2020-10-20 16:11:39.853 |
| 54 | 1468687 | 94 | 8 | 2020-10-20 16:11:39.870 |
| 55 | 1468687 | 94 | 9 | 2020-10-20 16:11:39.883 |
| 56 | 1468687 | 94 | 7 | 2020-10-20 16:11:39.900 |
| 57 | 1468687 | 94 | 1 | 2020-10-20 16:11:39.493 |
| 58 | 1468687 | 94 | 8 | 2020-10-20 16:12:47.077 |
| 59 | 1468687 | 94 | 9 | 2020-10-20 16:12:47.093 |
| 60 | 1468687 | 94 | 8 | 2020-10-20 16:12:47.110 |
| 61 | 1468687 | 94 | 9 | 2020-10-20 16:12:47.123 |
| 62 | 1468687 | 94 | 7 | 2020-10-20 16:12:47.150 |
| 63 | 1468687 | 94 | 1 | 2020-10-20 16:12:36.270 |
+----+-----------+---------+--------------+-------------------------+
At the moment I have this SQL:
SELECT
[BookingID]
,min(EventDate) as Min_Date
,max(EventDate) as Max_Date
FROM [dbo].[BC_Event]
Group By [BookingID]
Which returns something along the lines of:
| BookingID | Min_Date | Max_Date |
+-----------+-------------------------+-------------------------+
| 1468656 | 2020-10-20 12:06:27.027 | 2020-10-20 12:07:36.337 |
| 1468657 | 2020-10-20 13:53:56.297 | 2020-10-20 13:55:26.933 |
| 1468665 | 2020-10-20 15:00:20.687 | 2020-10-20 15:00:56.527 |
| 1468676 | 2020-10-20 15:35:13.927 | 2020-10-20 15:35:14.947 |
| 1468687 | 2020-10-20 16:11:38.820 | 2020-10-20 16:12:47.150 |
| 1468688 | 2020-10-20 16:13:53.390 | 2020-10-20 16:19:02.777 |
+-----------+-------------------------+-------------------------+
This is great, however I need a column which displays the LAST BC_EventType ID, so in theory it would be where [BC_Event] = Max_Date. How would I do this?
One method is conditional aggregation:
SELECT [BookingID], min(EventDate) as Min_Date, max(EventDate) as Max_Date,
MAX(CASE WHEN seqnum_asc = 1 THEN BC_EventType END) as first_BC_EventType,
MAX(CASE WHEN seqnum_desc = 1 THEN BC_EventType END) as last_BC_EventType
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY EventDate ASC) as seqnum_asc,
ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY EventDate DESC) as seqnum_desc
FROM [dbo].[BC_Event] e
) e
GROUP BY [BookingID]
Here's an approach that uses 2 ROW_NUMBER functions. Something like this
with bce_cte as (
select *,
row_number() over (partition by BookingID order by EventDate asc) as rn_asc,
row_number() over (partition by BookingID order by EventDate desc) as rn_desc
from [dbo].[BC_Event])
select BookingID,
max(case when rn_asc=1 then EventDate else null end) Min_Date,
max(case when rn_desc=1 then EventDate else null end) Max_Date,
max(case when rn_desc=1 then BC_EventType else null end) Last_BC_EventType
group by BookingID;

How to reshape a table having multiple records for the same id into a table with one record per id without losing information?

Basically, I want to transform this(Initial) into this(Final). In other words, I want to
"squash" the initial table so that it will have only one record per id
"dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...).
I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that?
Edit: I what to change
+----+--------+------+-----+------+
| ID | source | c1 | c2 | c3 |
+----+--------+------+-----+------+
| 1 | A | 432 | 56 | 1 |
| 1 | B | 53 | 3 | 73 |
| 1 | C | 7 | 342 | 83 |
| 1 | D | 543 | 43 | 73 |
| 2 | A | 8 | 882 | 39 |
| 2 | B | 5 | 54 | 46 |
| 2 | C | 8 | 3 | 2226 |
| 2 | D | 87 | 2 | 45 |
| 3 | A | 93 | 143 | 45 |
| 3 | B | 1023 | 72 | 8 |
| 3 | C | 3 | 3 | 704 |
| 4 | A | 2 | 5 | 0 |
| 4 | B | 78 | 888 | 2 |
| 4 | C | 87 | 23 | 34 |
| 4 | D | 112 | 7 | 712 |
+----+--------+------+-----+------+
into
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 |
| 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 |
| 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | |
| 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
Abandon hope ... ?
data want;
input
ID source $ c1 c2 c3;datalines;
1 A 432 56 1
1 B 53 3 73
1 C 7 342 83
1 D 543 43 73
2 A 8 882 39
2 B 5 54 46
2 C 8 3 2226
2 D 87 2 45
3 A 93 143 45
3 B 1023 72 8
3 C 3 3 704
4 A 2 5 0
4 B 78 888 2
4 C 87 23 34
4 D 112 7 712
;
* one to grow you oh data;
proc transpose data=want out=stage1;
by id source;
var c1-c3;
run;
* and one to shrink;
proc transpose data=stage1 out=want(drop=_name_) delim=_;
by id;
id _name_ source;
run;

Get RMSE score while fetching data from the Table directly.Write a query for that

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

select all rows that match criteria if not get a random one

+----+---------------+--------------------+------------+----------+-----------------+
| id | restaurant_id | filename | is_profile | priority | show_in_profile |
+----+---------------+--------------------+------------+----------+-----------------+
| 40 | 20 | 1320849687_390.jpg | | | 1 |
| 60 | 24 | 1320853501_121.png | 1 | | 1 |
| 61 | 24 | 1320853504_847.png | | | 1 |
| 62 | 24 | 1320853505_732.png | | | 1 |
| 63 | 24 | 1320853505_865.png | | | 1 |
| 64 | 29 | 1320854617_311.png | 1 | | 1 |
| 65 | 29 | 1320854617_669.png | | | 1 |
| 66 | 29 | 1320854618_636.png | | | 1 |
| 67 | 29 | 1320854619_791.png | | | 1 |
| 74 | 154 | 1320922653_259.png | | | 1 |
| 76 | 154 | 1320922656_332.png | | | 1 |
| 77 | 154 | 1320922657_106.png | | | 1 |
| 84 | 130 | 1321269380_960.jpg | 1 | | 1 |
| 85 | 130 | 1321269383_555.jpg | | | 1 |
| 86 | 130 | 1321269384_251.jpg | | | 1 |
| 89 | 28 | 1321269714_303.jpg | | | 1 |
| 90 | 28 | 1321269716_938.jpg | 1 | | 1 |
| 91 | 28 | 1321269717_147.jpg | | | 1 |
| 92 | 28 | 1321269717_774.jpg | | | 1 |
| 93 | 28 | 1321269717_250.jpg | | | 1 |
| 94 | 28 | 1321269718_964.jpg | | | 1 |
| 95 | 28 | 1321269719_830.jpg | | | 1 |
| 96 | 43 | 1321270013_629.jpg | 1 | | 1 |
+----+---------------+--------------------+------------+----------+-----------------+
I have this table and I want to select the filename for a given list of restaurants ids.
For example for 24,29,154:
+----+---------------
| filename |
+----+---------------
1320853501_121.png (has is_profile 1)
1320854617_311.png (has is_profile 1)
1320922653_259.png (chosen as profile picture because restaurant doesn't have a profile pic but has pictures)
I tried group by and case statements but I got nowhere.Also if you use group by it should be a full group by.
You can do this with aggregation and some logic:
select restaurant_id,
coalesce(max(case when is_profile = 1 then filename end),
max(filename)
) as filename
from t
where restaurant_id in (24, 29, 154)
group by restaurant_id;
First look for the/a profile filename. Next just choose an arbitrary one.

Find the highest and lowest value locations within an interval on a column?

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.
import pandas as pd
import numpy as np
data = pd.DataFrame([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | min |
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | max |
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | |
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | min |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | max |
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | min |
| 75 | 1886.171 | 1 | max |
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | max |
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | min |
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | max |
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | min |
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | max |
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | min |
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | max |
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | min |
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | max |
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | min |
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)
0 1 MinMax
0 1879.289 NaN
1 1879.281 NaN
2 1879.292 1.0
3 1879.295 1.0
4 1879.481 1.0
5 1879.294 1.0
6 1879.268 1.0 min
7 1879.293 1.0
8 1879.277 1.0
9 1879.285 1.0
10 1879.464 1.0
11 1879.475 1.0
12 1879.971 1.0
13 1879.779 1.0
14 1879.986 1.0
15 1880.791 1.0 max
16 1880.290 1.0
17 1879.253 NaN
18 1878.268 NaN
19 1875.730 1.0 min
20 1876.792 1.0
21 1875.977 1.0
22 1876.408 1.0
23 1877.159 1.0
24 1877.187 1.0
25 1883.164 1.0
26 1883.171 1.0
27 1883.495 1.0
28 1883.962 1.0
29 1885.158 1.0
.. ... ... ...
85 1877.687 1.0
86 1878.367 1.0
87 1878.179 1.0
88 1877.696 1.0
89 1877.665 1.0 min
90 1877.667 NaN
91 1878.678 NaN
92 1878.661 1.0 max
93 1878.171 1.0
94 1877.371 1.0
95 1877.359 1.0
96 1878.381 1.0
97 1875.185 1.0 min
98 1875.367 NaN
99 1865.492 NaN
100 1865.495 1.0 min
101 1866.995 1.0
102 1866.672 1.0
103 1867.465 1.0
104 1867.663 1.0
105 1867.186 1.0
106 1867.687 1.0
107 1867.459 1.0
108 1867.168 1.0
109 1869.689 1.0
110 1869.693 1.0
111 1871.676 1.0
112 1873.174 1.0 max
113 1873.691 NaN
114 1873.685 NaN
[115 rows x 3 columns]
data.columns=['Value','Interval']
data['Ingroup'] = (data['Interval'].notnull() + 0)
Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..
Finally you want something using apply/idxmax/idxmin to label the max/min
But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.