Solar-Heating: Data analytics for Grafana, advanced query - sql

I would need some help with a very specific use case I have for my homelab.
I do have some solar panels on my roof, and I do extract a lot of data points to my server. I am using a specific app for that, making it easy to consume and automate stuff for that data (iobroker). The data I do save into a progres database. (No questions please why not Influx or TimescaleDB, postgres is what I need to live with...)
I use everything on docker right now, works perfectly. While I was able to create numerous dashboard on Grafana, display everything I like there, there is one specific "thing" I was unable to do, and after month of trying to get it done I finally ask for help. I do have a device supporting my heating from generated power to warm up the water. The device is using energy that we would normally feed back to the grid. The device is updating the power it pushes to the heating device pretty much every second. I am pulling the data from the device also every second. However I do have the logging configured in the way, that is only logs data when there is a difference to the previous datapoint.
One example:
Time
consumption in W
2018-02-21 12:00:00
3500
2018-02-21 12:00:01
1470
2018-02-21 12:00:02
1470
2018-02-21 12:00:03
1470
2018-02-21 12:00:00
1600
The second and third entry with the value of "1470" would not exist!
So first issue I have is a missing data point(s). What I would like to achieve is to have a calculation showing the consumption by individual day, month, year and all-time.
This does not need to happen inside Grafana, and I don't think Grafana can do this at all. There are options to do similar things in Grafana, but they do not provide an accurate result ($__unixEpochGroupAlias(ts,1s,previous)). I do have every option that is needed to create the data, so there should not be any obstacle in your ideas, and store it again inside the DB.
The data is polled/stored every 1000ms, so every second. Idea is to use Ws (Watt-seconds) to easily calculate with accurate numbers, as well as to display them better in Wh or kWh.
The DB can be only queried with SQL - but as mentioned if calculations needs to be done in a different language or so, then this is also fine.
Tried everything I could think of. SQL queries, searching numerous posts, all avaialble SQL based Grafana options. Guess I need custom code, but that above my skillset.
Anything more you'd need to know? Let me know. Thanks in advance!
The data structure looks the following:
id=entry for the application to identify the datapoint ts=timestamp
val=value in Ws
The other values are not important, but I wanted to show them for completeness.
id | ts | val | ack | _from | q
----+---------------+------+-----+-------+---
23 | 1661439981910 | 1826 | t | 3 | 0
23 | 1661439982967 | 1830 | t | 3 | 0
23 | 1661439984027 | 1830 | t | 3 | 0
23 | 1661439988263 | 1828 | t | 3 | 0
23 | 1661439985088 | 1829 | t | 3 | 0
23 | 1661439987203 | 1829 | t | 3 | 0
23 | 1661439989322 | 1831 | t | 3 | 0
23 | 1661439990380 | 1830 | t | 3 | 0
23 | 1661439991439 | 1827 | t | 3 | 0
23 | 1661439992498 | 1829 | t | 3 | 0
23 | 1661440021097 | 1911 | t | 3 | 0
23 | 1661439993558 | 1830 | t | 3 | 0
23 | 1661440022156 | 1924 | t | 3 | 0
23 | 1661439994624 | 1830 | t | 3 | 0
23 | 1661440023214 | 1925 | t | 3 | 0
23 | 1661439995683 | 1828 | t | 3 | 0
23 | 1661440024273 | 1924 | t | 3 | 0
23 | 1661439996739 | 1830 | t | 3 | 0
23 | 1661440025332 | 1925 | t | 3 | 0
23 | 1661440052900 | 1694 | t | 3 | 0
23 | 1661439997797 | 1831 | t | 3 | 0
23 | 1661440026391 | 1927 | t | 3 | 0
23 | 1661439998855 | 1831 | t | 3 | 0
23 | 1661440027450 | 1925 | t | 3 | 0
23 | 1661439999913 | 1828 | t | 3 | 0
23 | 1661440028509 | 1925 | t | 3 | 0
23 | 1661440029569 | 1927 | t | 3 | 0
23 | 1661440000971 | 1830 | t | 3 | 0
23 | 1661440030634 | 1926 | t | 3 | 0
23 | 1661440002030 | 1838 | t | 3 | 0
23 | 1661440031694 | 1925 | t | 3 | 0
23 | 1661440053955 | 1692 | t | 3 | 0
23 | 1659399542399 | 0 | t | 3 | 0
23 | 1659399543455 | 1 | t | 3 | 0
23 | 1659399544511 | 0 | t | 3 | 0
23 | 1663581880895 | 2813 | t | 3 | 0
23 | 1663581883017 | 2286 | t | 3 | 0
23 | 1663581881952 | 2646 | t | 3 | 0
23 | 1663581884074 | 1905 | t | 3 | 0
23 | 1661440004144 | 1838 | t | 3 | 0
23 | 1661440032752 | 1926 | t | 3 | 0
23 | 1661440005202 | 1839 | t | 3 | 0
23 | 1661440034870 | 1924 | t | 3 | 0
23 | 1661440006260 | 1840 | t | 3 | 0
23 | 1661440035929 | 1922 | t | 3 | 0
23 | 1661440007318 | 1840 | t | 3 | 0
23 | 1661440036987 | 1918 | t | 3 | 0
23 | 1661440008377 | 1838 | t | 3 | 0
23 | 1661440038045 | 1919 | t | 3 | 0
23 | 1661440009437 | 1839 | t | 3 | 0
23 | 1661440039104 | 1900 | t | 3 | 0
23 | 1661440010495 | 1839 | t | 3 | 0
23 | 1661440040162 | 1877 | t | 3 | 0
23 | 1661440011556 | 1838 | t | 3 | 0
23 | 1661440041220 | 1862 | t | 3 | 0
23 | 1661440012629 | 1840 | t | 3 | 0
23 | 1661440042279 | 1847 | t | 3 | 0
23 | 1661440013687 | 1840 | t | 3 | 0
23 | 1661440043340 | 1829 | t | 3 | 0
23 | 1661440014746 | 1833 | t | 3 | 0
23 | 1661440044435 | 1817 | t | 3 | 0
23 | 1661440015804 | 1833 | t | 3 | 0
23 | 1661440045493 | 1789 | t | 3 | 0
23 | 1661440046551 | 1766 | t | 3 | 0
23 | 1661440016862 | 1846 | t | 3 | 0
23 | 1661440047610 | 1736 | t | 3 | 0
23 | 1661440048670 | 1705 | t | 3 | 0
23 | 1661440017920 | 1863 | t | 3 | 0
23 | 1661440049726 | 1694 | t | 3 | 0
23 | 1661440050783 | 1694 | t | 3 | 0
23 | 1661440018981 | 1876 | t | 3 | 0
23 | 1661440051840 | 1696 | t | 3 | 0
23 | 1661440055015 | 1692 | t | 3 | 0
23 | 1661440056071 | 1693 | t | 3 | 0
23 | 1661440322966 | 1916 | t | 3 | 0
23 | 1661440325082 | 1916 | t | 3 | 0
23 | 1661440326142 | 1926 | t | 3 | 0
23 | 1661440057131 | 1693 | t | 3 | 0
23 | 1661440327199 | 1913 | t | 3 | 0
23 | 1661440058189 | 1692 | t | 3 | 0
23 | 1661440328256 | 1915 | t | 3 | 0
23 | 1661440059247 | 1691 | t | 3 | 0
23 | 1661440329315 | 1923 | t | 3 | 0
23 | 1661440060306 | 1692 | t | 3 | 0
23 | 1661440330376 | 1912 | t | 3 | 0
23 | 1661440061363 | 1676 | t | 3 | 0
23 | 1661440331470 | 1913 | t | 3 | 0
23 | 1661440062437 | 1664 | t | 3 | 0
23 | 1663581885133 | 1678 | t | 3 | 0
23 | 1661440332530 | 1923 | t | 3 | 0
23 | 1661440064552 | 1667 | t | 3 | 0
23 | 1661440334647 | 1915 | t | 3 | 0
23 | 1661440335708 | 1913 | t | 3 | 0
23 | 1661440065608 | 1665 | t | 3 | 0
23 | 1661440066665 | 1668 | t | 3 | 0
23 | 1661440336763 | 1912 | t | 3 | 0
23 | 1661440337822 | 1913 | t | 3 | 0
23 | 1661440338879 | 1911 | t | 3 | 0
23 | 1661440068780 | 1664 | t | 3 | 0
23 | 1661440339939 | 1912 | t | 3 | 0
(100 rows)```
iobroker=# \d ts_number
Table "public.ts_number"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
ts | bigint | | not null |
val | real | | |
ack | boolean | | |
_from | integer | | |
q | integer | | |
Indexes:
"ts_number_pkey" PRIMARY KEY, btree (id, ts)

You can do this with a mix of generate_series() and some window functions.
First we use generate_series() to get all the second timestamps in a desired range. Then we join to our readings to find what consumption values we have. Group nulls with their most recent non-null reading. Then set the consumption the same for the whole group.
So: if we have readings like this:
richardh=> SELECT * FROM readings;
id | ts | consumption
----+------------------------+-------------
1 | 2023-02-16 20:29:13+00 | 900
2 | 2023-02-16 20:29:16+00 | 1000
3 | 2023-02-16 20:29:20+00 | 925
(3 rows)
We can get all of the seconds we might want like this:
richardh=> SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second');
generate_series
------------------------
2023-02-16 20:29:13+00
2023-02-16 20:29:14+00
...etc...
2023-02-16 20:29:29+00
2023-02-16 20:29:30+00
(18 rows)
Then we join our complete set of timestamps to our readings:
WITH wanted_timestamps (ts) AS (
SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second')
)
SELECT
wt.ts
, r.consumption
, sum(CASE WHEN r.consumption IS NOT NULL THEN 1 ELSE 0 END)
OVER (ORDER BY ts) AS group_num
FROM
wanted_timestamps wt
LEFT JOIN readings r USING (ts)
ORDER BY wt.ts;
ts | consumption | group_num
------------------------+-------------+-----------
2023-02-16 20:29:13+00 | 900 | 1
2023-02-16 20:29:14+00 | | 1
2023-02-16 20:29:15+00 | | 1
2023-02-16 20:29:16+00 | 1000 | 2
2023-02-16 20:29:17+00 | | 2
2023-02-16 20:29:18+00 | | 2
2023-02-16 20:29:19+00 | | 2
2023-02-16 20:29:20+00 | 925 | 3
2023-02-16 20:29:21+00 | | 3
2023-02-16 20:29:22+00 | | 3
2023-02-16 20:29:23+00 | | 3
2023-02-16 20:29:24+00 | | 3
2023-02-16 20:29:25+00 | | 3
2023-02-16 20:29:26+00 | | 3
2023-02-16 20:29:27+00 | | 3
2023-02-16 20:29:28+00 | | 3
2023-02-16 20:29:29+00 | | 3
2023-02-16 20:29:30+00 | | 3
(18 rows)
Finally, fill in the missing consumption values:
WITH wanted_timestamps (ts) AS (
SELECT generate_series(timestamptz '2023-02-16 20:29:13+00', timestamptz '2023-02-16 20:29:30+00', interval '1 second')
), grouped_values AS (
SELECT
wt.ts
, r.consumption
, sum(CASE WHEN r.consumption IS NOT NULL THEN 1 ELSE 0 END)
OVER (ORDER BY ts) AS group_num
FROM wanted_timestamps wt
LEFT JOIN readings r USING (ts)
)
SELECT
gv.ts
, first_value(gv.consumption) OVER (PARTITION BY group_num)
AS consumption
FROM
grouped_values gv
ORDER BY ts;
ts | consumption
------------------------+-------------
2023-02-16 20:29:13+00 | 900
2023-02-16 20:29:14+00 | 900
2023-02-16 20:29:15+00 | 900
2023-02-16 20:29:16+00 | 1000
2023-02-16 20:29:17+00 | 1000
2023-02-16 20:29:18+00 | 1000
2023-02-16 20:29:19+00 | 1000
2023-02-16 20:29:20+00 | 925
2023-02-16 20:29:21+00 | 925
2023-02-16 20:29:22+00 | 925
2023-02-16 20:29:23+00 | 925
2023-02-16 20:29:24+00 | 925
2023-02-16 20:29:25+00 | 925
2023-02-16 20:29:26+00 | 925
2023-02-16 20:29:27+00 | 925
2023-02-16 20:29:28+00 | 925
2023-02-16 20:29:29+00 | 925
2023-02-16 20:29:30+00 | 925
(18 rows)

Related

Redshift SQL - Count Sequences of Repeating Values Within Groups

I have a table that looks like this:
| id | date_start | gap_7_days |
| -- | ------------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 |
| 1 | 2021-06-13 00:00:00 | 0 |
| 1 | 2021-06-19 00:00:00 | 0 |
| 1 | 2021-06-27 00:00:00 | 0 |
| 2 | 2021-07-04 00:00:00 | 1 |
| 2 | 2021-07-11 00:00:00 | 1 |
| 2 | 2021-07-18 00:00:00 | 1 |
| 2 | 2021-07-25 00:00:00 | 1 |
| 2 | 2021-08-01 00:00:00 | 1 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-09 00:00:00 | 0 |
| 2 | 2021-08-16 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
| 2 | 2021-08-30 00:00:00 | 1 |
| 2 | 2021-08-31 00:00:00 | 0 |
| 2 | 2021-09-01 00:00:00 | 0 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-15 00:00:00 | 1 |
| 2 | 2021-08-22 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
For each ID, I check whether consecutive date_start values are 7 days apart, and put a 1 or 0 in gap_7_days accordingly.
I want to do the following (using Redshift SQL only):
Get the length of each sequence of consecutive 1s in gap_7_days for each ID
Expected output:
| id | date_start | gap_7_days | sequence_length |
| -- | ------------------- | --------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 | |
| 1 | 2021-06-13 00:00:00 | 0 | |
| 1 | 2021-06-19 00:00:00 | 0 | |
| 1 | 2021-06-27 00:00:00 | 0 | |
| 2 | 2021-07-04 00:00:00 | 1 | 6 |
| 2 | 2021-07-11 00:00:00 | 1 | 6 |
| 2 | 2021-07-18 00:00:00 | 1 | 6 |
| 2 | 2021-07-25 00:00:00 | 1 | 6 |
| 2 | 2021-08-01 00:00:00 | 1 | 6 |
| 2 | 2021-08-08 00:00:00 | 1 | 6 |
| 2 | 2021-08-09 00:00:00 | 0 | |
| 2 | 2021-08-16 00:00:00 | 1 | 3 |
| 2 | 2021-08-23 00:00:00 | 1 | 3 |
| 2 | 2021-08-30 00:00:00 | 1 | 3 |
| 2 | 2021-08-31 00:00:00 | 0 | |
| 2 | 2021-09-01 00:00:00 | 0 | |
| 2 | 2021-08-08 00:00:00 | 1 | 4 |
| 2 | 2021-08-15 00:00:00 | 1 | 4 |
| 2 | 2021-08-22 00:00:00 | 1 | 4 |
| 2 | 2021-08-23 00:00:00 | 1 | 4 |
Get the number of sequences for each ID
Expected output:
| id | num_sequences |
| -- | ------------------- |
| 1 | 0 |
| 2 | 3 |
How can I achieve this?
If you want the number of sequences, just look at the previous value. When the current value is "1" and the previous is NULL or 0, then you have a new sequence.
So:
select id,
sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
from (select t.*,
lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
from t
) t
group by id;
If you actually want the lengths of the sequences, as in the intermediate results, then ask a new question. That information is not needed for this question.

How I can I add a count to rank null values in SQL Hive?

This is what I have right now:
| time | car_id | order | in_order |
|-------|--------|-------|----------|
| 12:31 | 32 | null | 0 |
| 12:33 | 32 | null | 0 |
| 12:35 | 32 | null | 0 |
| 12:37 | 32 | 123 | 1 |
| 12:38 | 32 | 123 | 1 |
| 12:39 | 32 | 123 | 1 |
| 12:41 | 32 | 123 | 1 |
| 12:43 | 32 | 123 | 1 |
| 12:45 | 32 | null | 0 |
| 12:47 | 32 | null | 0 |
| 12:49 | 32 | 321 | 1 |
| 12:51 | 32 | 321 | 1 |
I'm trying to rank orders, including those who have null values, in this case by car_id.
This is the result I'm looking for:
| time | car_id | order | in_order | row |
|-------|--------|-------|----------|-----|
| 12:31 | 32 | null | 0 | 1 |
| 12:33 | 32 | null | 0 | 1 |
| 12:35 | 32 | null | 0 | 1 |
| 12:37 | 32 | 123 | 1 | 2 |
| 12:38 | 32 | 123 | 1 | 2 |
| 12:39 | 32 | 123 | 1 | 2 |
| 12:41 | 32 | 123 | 1 | 2 |
| 12:43 | 32 | 123 | 1 | 2 |
| 12:45 | 32 | null | 0 | 3 |
| 12:47 | 32 | null | 0 | 3 |
| 12:49 | 32 | 321 | 1 | 4 |
| 12:51 | 32 | 321 | 1 | 4 |
I just don't know how to manage a count for the null values.
Thanks!
You can count the number of non-NULL values before each row and then use dense_rank():
select t.*,
dense_rank() over (partition by car_id order by grp) as row
from (select t.*,
count(order) over (partition by car_id order by time) as grp
from t
) t;

Cumulative sum of multiple window functions V3

I have this table:
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 8 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 9 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 10 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 11 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 12 | 0
33656 | 2018-09-13 | 1 | 0 | 13 | 0 | 13 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 0 | 14 | 0
45393 | 2018-09-27 | 1 | 0 | 15 | 0 | 15 | 0
45402 | 2018-09-27 | 1 | 0 | 16 | 0 | 16 | 0
45422 | 2018-09-27 | 1 | 1 | 17 | 0 | 17 | 0
45453 | 2018-09-27 | 1 | 0 | 18 | 1 | 18 | 0
45461 | 2018-09-27 | 1 | 0 | 19 | 1 | 19 | 0
45474 | 2018-09-27 | 1 | 0 | 20 | 1 | 20 | 0
57155 | 2018-10-11 | 1 | 0 | 21 | 1 | 21 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 1 | 22 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 1 | 23 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 1 | 24 | 1
The issue that I now need to solve is that I need n_games to be a rolling count of the last number of games per day, i.e. a user can play multiple games per day, as present it is just the same as row_number(*) OVER all_games
The other issues is that the column n_wins only does a sum(*) of the rolling windows wins for the day, so if a user wins a couple of games early on in day, that will not be added to the n_wins column until the next day.
I have the example DEMO:
I have tried this query
SELECT id,
date,
player_id,
score,
row_number(*) OVER all_races AS all_games,
sum(score) OVER all_races AS all_wins,
row_number(*) OVER last_n AS n_games,
sum(score) OVER last_n AS n_wins
FROM scores
WINDOW
all_races AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_n AS (PARTITION BY player_id ORDER BY date ASC RANGE BETWEEN interval '7 days' PRECEDING AND interval '1 day' PRECEDING);
Ideally I need a query that will output something like this table
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 1 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 2 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 3 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 4 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 5 | 0
33656 | 2018-09-13 | 1 | 1 | 13 | 1 | 6 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 1 | 7 | 1
45393 | 2018-09-27 | 1 | 0 | 15 | 1 | 1 | 1
45402 | 2018-09-27 | 1 | 0 | 16 | 1 | 2 | 1
45422 | 2018-09-27 | 1 | 1 | 17 | 1 | 3 | 1
45453 | 2018-09-27 | 1 | 0 | 18 | 2 | 4 | 2
45461 | 2018-09-27 | 1 | 0 | 19 | 2 | 5 | 2
45474 | 2018-09-27 | 1 | 0 | 20 | 2 | 6 | 1
57155 | 2018-10-11 | 1 | 0 | 21 | 2 | 7 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 2 | 1 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 2 | 2 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 2 | 3 | 1

Find the highest and lowest value locations within an interval on a column?

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.
import pandas as pd
import numpy as np
data = pd.DataFrame([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | min |
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | max |
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | |
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | min |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | max |
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | min |
| 75 | 1886.171 | 1 | max |
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | max |
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | min |
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | max |
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | min |
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | max |
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | min |
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | max |
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | min |
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | max |
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | min |
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)
0 1 MinMax
0 1879.289 NaN
1 1879.281 NaN
2 1879.292 1.0
3 1879.295 1.0
4 1879.481 1.0
5 1879.294 1.0
6 1879.268 1.0 min
7 1879.293 1.0
8 1879.277 1.0
9 1879.285 1.0
10 1879.464 1.0
11 1879.475 1.0
12 1879.971 1.0
13 1879.779 1.0
14 1879.986 1.0
15 1880.791 1.0 max
16 1880.290 1.0
17 1879.253 NaN
18 1878.268 NaN
19 1875.730 1.0 min
20 1876.792 1.0
21 1875.977 1.0
22 1876.408 1.0
23 1877.159 1.0
24 1877.187 1.0
25 1883.164 1.0
26 1883.171 1.0
27 1883.495 1.0
28 1883.962 1.0
29 1885.158 1.0
.. ... ... ...
85 1877.687 1.0
86 1878.367 1.0
87 1878.179 1.0
88 1877.696 1.0
89 1877.665 1.0 min
90 1877.667 NaN
91 1878.678 NaN
92 1878.661 1.0 max
93 1878.171 1.0
94 1877.371 1.0
95 1877.359 1.0
96 1878.381 1.0
97 1875.185 1.0 min
98 1875.367 NaN
99 1865.492 NaN
100 1865.495 1.0 min
101 1866.995 1.0
102 1866.672 1.0
103 1867.465 1.0
104 1867.663 1.0
105 1867.186 1.0
106 1867.687 1.0
107 1867.459 1.0
108 1867.168 1.0
109 1869.689 1.0
110 1869.693 1.0
111 1871.676 1.0
112 1873.174 1.0 max
113 1873.691 NaN
114 1873.685 NaN
[115 rows x 3 columns]
data.columns=['Value','Interval']
data['Ingroup'] = (data['Interval'].notnull() + 0)
Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..
Finally you want something using apply/idxmax/idxmin to label the max/min
But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.

SQL Performance multiple exclusion from the same table

I have a table where I have a list of people, lets say i have 100 people listed in that table
I need to filter out the people using different criteria's and put them in groups, problem is when i start excluding on the 4th-5th level, performance issues come up and it becomes slow
with lst_tous_movements as (
select
t1.refid_eClinibase
t1.[dthrfinmouvement]
t1.[unite_service_id]
t1.[unite_service_suiv_id]
from sometable t1
)
,lst_patients_hospitalisés as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
where
t1.[dthrfinmouvement] = '4000-01-01'
)
,lst_patients_admisUIB_transferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_hospitalisés t2 on t1.refid_eClinibase = t2.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] <> 0
and t2.refid_eClinibase is null
)
,lst_patients_admisUIB_nonTransferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] = 0
and t2.refid_eClinibase is null
and t3.refid_eClinibase is null
)
,lst_patients_autres as (
select distinct
t1.refid_eClinibase
from lst_patients t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
left join lst_patients_admisUIB_nonTransferes t4 on t1.refid_eClinibase = t4.refid_eClinibase
where
t2.refid_eClinibase is null
and t3.refid_eClinibase is null
and t4.refid_eClinibase is null
)
as you can see i have a multi level filtering out going on here...
1st i get the people where t1.[dthrfinmouvement] = '4000-01-01'
2nd i get the people with another criteria EXCLUDING the 1st group
3rd i get the people with yet another criteria EXCLUDING the 1st and
the 2nd group
etc..
when i get to the 4th level, my query takes 6 - 10 seconds to complete
is there any way to speed this up ?
this is my dataset i'm working with:
+------------------+-------------------------------+------------------+------------------+-----------------------+
| refid_eClinibase | nodossierpermanent_eClinibase | dthrfinmouvement | unite_service_id | unite_service_suiv_id |
+------------------+-------------------------------+------------------+------------------+-----------------------+
| 25611 | P0017379 | 2013-04-27 | 58 | 0 |
| 25611 | P0017379 | 2013-05-02 | 4 | 2 |
| 25611 | P0017379 | 2013-05-18 | 2 | 0 |
| 85886 | P0077918 | 2013-04-10 | 58 | 0 |
| 85886 | P0077918 | 2013-05-06 | 6 | 12 |
| 85886 | P0077918 | 4000-01-01 | 12 | 0 |
| 91312 | P0083352 | 2013-07-24 | 3 | 14 |
| 91312 | P0083352 | 2013-07-24 | 14 | 3 |
| 91312 | P0083352 | 2013-07-30 | 3 | 8 |
| 91312 | P0083352 | 4000-01-01 | 8 | 0 |
| 93835 | P0085879 | 2013-04-30 | 58 | 0 |
| 93835 | P0085879 | 2013-05-07 | 4 | 2 |
| 93835 | P0085879 | 2013-05-16 | 2 | 0 |
| 93835 | P0085879 | 2013-05-22 | 58 | 0 |
| 93835 | P0085879 | 2013-05-24 | 4 | 0 |
| 93835 | P0085879 | 2013-05-31 | 58 | 0 |
| 93836 | P0085880 | 2013-05-20 | 58 | 0 |
| 93836 | P0085880 | 2013-05-22 | 4 | 2 |
| 93836 | P0085880 | 2013-05-31 | 2 | 0 |
| 97509 | P0089576 | 2013-04-09 | 58 | 0 |
| 97509 | P0089576 | 2013-04-11 | 4 | 0 |
| 102787 | P0094886 | 2013-04-08 | 58 | 0 |
| 102787 | P0094886 | 2013-04-11 | 4 | 2 |
| 102787 | P0094886 | 2013-05-21 | 2 | 0 |
| 103029 | P0095128 | 2013-04-04 | 58 | 0 |
| 103029 | P0095128 | 2013-04-10 | 4 | 1 |
| 103029 | P0095128 | 2013-05-03 | 1 | 0 |
| 103813 | P0095922 | 2013-07-02 | 58 | 0 |
| 103813 | P0095922 | 2013-07-03 | 4 | 6 |
| 103813 | P0095922 | 2013-08-14 | 6 | 0 |
| 105106 | P0097215 | 2013-08-09 | 58 | 0 |
| 105106 | P0097215 | 2013-08-13 | 4 | 0 |
| 105106 | P0097215 | 2013-08-14 | 58 | 0 |
| 105106 | P0097215 | 4000-01-01 | 4 | 0 |
| 106223 | P0098332 | 2013-06-11 | 1 | 0 |
| 106223 | P0098332 | 2013-08-01 | 58 | 0 |
| 106223 | P0098332 | 4000-01-01 | 1 | 0 |
| 106245 | P0098354 | 2013-04-02 | 58 | 0 |
| 106245 | P0098354 | 2013-05-24 | 58 | 0 |
| 106245 | P0098354 | 2013-05-29 | 4 | 1 |
| 106245 | P0098354 | 2013-07-12 | 1 | 0 |
| 106280 | P0098389 | 2013-04-07 | 58 | 0 |
| 106280 | P0098389 | 2013-04-09 | 4 | 0 |
| 106416 | P0098525 | 2013-04-19 | 58 | 0 |
| 106416 | P0098525 | 2013-04-23 | 4 | 0 |
| 106444 | P0098553 | 2013-04-22 | 58 | 0 |
| 106444 | P0098553 | 2013-04-25 | 4 | 0 |
| 106609 | P0098718 | 2013-05-08 | 58 | 0 |
| 106609 | P0098718 | 2013-05-10 | 4 | 11 |
| 106609 | P0098718 | 2013-07-24 | 11 | 12 |
| 106609 | P0098718 | 4000-01-01 | 12 | 0 |
| 106616 | P0098725 | 2013-05-09 | 58 | 0 |
| 106616 | P0098725 | 2013-05-09 | 4 | 1 |
| 106616 | P0098725 | 2013-07-27 | 1 | 0 |
| 106698 | P0098807 | 2013-05-16 | 58 | 0 |
| 106698 | P0098807 | 2013-05-22 | 4 | 6 |
| 106698 | P0098807 | 2013-06-14 | 6 | 1 |
| 106698 | P0098807 | 2013-06-28 | 1 | 0 |
| 106714 | P0098823 | 2013-05-20 | 58 | 0 |
| 106714 | P0098823 | 2013-05-21 | 58 | 0 |
| 106714 | P0098823 | 2013-05-24 | 58 | 0 |
| 106729 | P0098838 | 2013-05-21 | 58 | 0 |
| 106729 | P0098838 | 2013-05-23 | 4 | 1 |
| 106729 | P0098838 | 2013-06-03 | 1 | 0 |
| 107038 | P0099147 | 2013-06-25 | 58 | 0 |
| 107038 | P0099147 | 2013-06-28 | 4 | 1 |
| 107038 | P0099147 | 2013-07-04 | 1 | 0 |
| 107038 | P0099147 | 2013-08-13 | 58 | 0 |
| 107038 | P0099147 | 2013-08-15 | 4 | 6 |
| 107038 | P0099147 | 4000-01-01 | 6 | 0 |
| 107082 | P0099191 | 2013-06-29 | 58 | 0 |
| 107082 | P0099191 | 2013-07-04 | 4 | 6 |
| 107082 | P0099191 | 2013-07-19 | 6 | 0 |
| 107157 | P0099267 | 4000-01-01 | 13 | 0 |
| 107336 | P0099446 | 4000-01-01 | 6 | 0 |
+------------------+-------------------------------+------------------+------------------+-----------------------+
thanks.
It is hard to understand exactly what all your rules are from the question, but the general approach should be to add a "Grouping" column to a singl query that uses a CASE statement to categorize the people.
The conditions in a CASE are evaluated in order, so that if the first criteria is met, then the subsequent criteria are not even evaluated for that row.
Here is some code to get you started....
select t1.refid_eClinibase
,t1.[dthrfinmouvement]
,t1.[unite_service_id]
,t1.[unite_service_suiv_id]
CASE WHEN [dthrfinmouvement] = '4000-01-01' THEN 'Group1 Label'
WHEN condition2 = something THEN 'Group2 Label'
....
WHEN conditionN = something THEN 'GroupN Label'
ELSE 'Catch All Label'
END as person_category
from sometable t1