Find the highest and lowest value locations within an interval on a column?

Find the highest and lowest value locations within an interval on a column? - pandas

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.
import pandas as pd
import numpy as np
data = pd.DataFrame([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | min |
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | max |
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | |
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | min |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | max |
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | min |
| 75 | 1886.171 | 1 | max |
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | max |
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | min |
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | max |
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | min |
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | max |
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | min |
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | max |
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | min |
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | max |
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | min |
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+

isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)
0 1 MinMax
0 1879.289 NaN
1 1879.281 NaN
2 1879.292 1.0
3 1879.295 1.0
4 1879.481 1.0
5 1879.294 1.0
6 1879.268 1.0 min
7 1879.293 1.0
8 1879.277 1.0
9 1879.285 1.0
10 1879.464 1.0
11 1879.475 1.0
12 1879.971 1.0
13 1879.779 1.0
14 1879.986 1.0
15 1880.791 1.0 max
16 1880.290 1.0
17 1879.253 NaN
18 1878.268 NaN
19 1875.730 1.0 min
20 1876.792 1.0
21 1875.977 1.0
22 1876.408 1.0
23 1877.159 1.0
24 1877.187 1.0
25 1883.164 1.0
26 1883.171 1.0
27 1883.495 1.0
28 1883.962 1.0
29 1885.158 1.0
.. ... ... ...
85 1877.687 1.0
86 1878.367 1.0
87 1878.179 1.0
88 1877.696 1.0
89 1877.665 1.0 min
90 1877.667 NaN
91 1878.678 NaN
92 1878.661 1.0 max
93 1878.171 1.0
94 1877.371 1.0
95 1877.359 1.0
96 1878.381 1.0
97 1875.185 1.0 min
98 1875.367 NaN
99 1865.492 NaN
100 1865.495 1.0 min
101 1866.995 1.0
102 1866.672 1.0
103 1867.465 1.0
104 1867.663 1.0
105 1867.186 1.0
106 1867.687 1.0
107 1867.459 1.0
108 1867.168 1.0
109 1869.689 1.0
110 1869.693 1.0
111 1871.676 1.0
112 1873.174 1.0 max
113 1873.691 NaN
114 1873.685 NaN
[115 rows x 3 columns]

data.columns=['Value','Interval']
data['Ingroup'] = (data['Interval'].notnull() + 0)
Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..
Finally you want something using apply/idxmax/idxmin to label the max/min
But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.

Related

Pandas - Grouping Rows With Same Value in Dataframe

Here is the dataframe in question:
|City|District|Population| Code | ID |
| A | 4 | 2000 | 3 | 21 |
| A | 8 | 7000 | 3 | 21 |
| A | 38 | 3000 | 3 | 21 |
| A | 7 | 2000 | 3 | 21 |
| B | 34 | 3000 | 6 | 84 |
| B | 9 | 5000 | 6 | 84 |
| C | 4 | 9000 | 1 | 28 |
| C | 21 | 1000 | 1 | 28 |
| C | 32 | 5000 | 1 | 28 |
| C | 46 | 20 | 1 | 28 |
I want to regroup the population counts by city to have this kind of output:
|City|Population| Code | ID |
| A | 14000 | 3 | 21 |
| B | 8000 | 6 | 84 |
| C | 15020 | 1 | 28 |

df = df.groupby(['City', 'Code', 'ID'])['Population'].sum()
You can make a group by 'City', 'Code' and 'ID then make sum of 'population'.

How to make sql hive when i have input this?

input:
| a.user_id | a_stream_length | b_stream_length | subtract_inactive |
-----------------------------------------------------------------------------
| a | 11 | 1686 | 22 |
| a | 1686 | 328 | 12 |
| a | 328 | 732 | 22 |
| a | 732 | 11 | 1699 |
| a | 11 | 2123 | 18 |
| a | 2123 | 160 | 2 |
| a | 160 | 1358 | 0 |
| a | 1358 | 129 | 1 |
| a | 129 | 4042 | 109334 |
output:
| a | (1686+11+328+732) (if subtract_inactive < 1000) |
| a | 732(a_stream_length) if subtract_inactive > 1000) |

select all rows that match criteria if not get a random one

+----+---------------+--------------------+------------+----------+-----------------+
| id | restaurant_id | filename | is_profile | priority | show_in_profile |
+----+---------------+--------------------+------------+----------+-----------------+
| 40 | 20 | 1320849687_390.jpg | | | 1 |
| 60 | 24 | 1320853501_121.png | 1 | | 1 |
| 61 | 24 | 1320853504_847.png | | | 1 |
| 62 | 24 | 1320853505_732.png | | | 1 |
| 63 | 24 | 1320853505_865.png | | | 1 |
| 64 | 29 | 1320854617_311.png | 1 | | 1 |
| 65 | 29 | 1320854617_669.png | | | 1 |
| 66 | 29 | 1320854618_636.png | | | 1 |
| 67 | 29 | 1320854619_791.png | | | 1 |
| 74 | 154 | 1320922653_259.png | | | 1 |
| 76 | 154 | 1320922656_332.png | | | 1 |
| 77 | 154 | 1320922657_106.png | | | 1 |
| 84 | 130 | 1321269380_960.jpg | 1 | | 1 |
| 85 | 130 | 1321269383_555.jpg | | | 1 |
| 86 | 130 | 1321269384_251.jpg | | | 1 |
| 89 | 28 | 1321269714_303.jpg | | | 1 |
| 90 | 28 | 1321269716_938.jpg | 1 | | 1 |
| 91 | 28 | 1321269717_147.jpg | | | 1 |
| 92 | 28 | 1321269717_774.jpg | | | 1 |
| 93 | 28 | 1321269717_250.jpg | | | 1 |
| 94 | 28 | 1321269718_964.jpg | | | 1 |
| 95 | 28 | 1321269719_830.jpg | | | 1 |
| 96 | 43 | 1321270013_629.jpg | 1 | | 1 |
+----+---------------+--------------------+------------+----------+-----------------+
I have this table and I want to select the filename for a given list of restaurants ids.
For example for 24,29,154:
+----+---------------
| filename |
+----+---------------
1320853501_121.png (has is_profile 1)
1320854617_311.png (has is_profile 1)
1320922653_259.png (chosen as profile picture because restaurant doesn't have a profile pic but has pictures)
I tried group by and case statements but I got nowhere.Also if you use group by it should be a full group by.

You can do this with aggregation and some logic:
select restaurant_id,
coalesce(max(case when is_profile = 1 then filename end),
max(filename)
) as filename
from t
where restaurant_id in (24, 29, 154)
group by restaurant_id;
First look for the/a profile filename. Next just choose an arbitrary one.

Get rows from matches with different teamId than playing user

So what I am trying to do:
I want to get every row with the enemy champion (different teamId) for every match the summoner with the id 42999456 has played.
For example:
summonerId: 42999456
matchId: 2528256239
will return
+------------+------------+------------+--------+--------+
| matchId | summonerId | championId | teamId | winner |
+------------+------------+------------+--------+--------+
| 2528256239 | 23364213 | 412 | 200 | 0 |
| 2528256239 | 34928637 | 429 | 200 | 0 |
| 2528256239 | 40909308 | 4 | 200 | 0 |
| 2528256239 | 50717471 | 122 | 200 | 0 |
| 2528256239 | 52439549 | 60 | 200 | 0 |
(note that I do not want to query for only 1 matchId, I want to query for all matches).
The table looks like that:
+------------+------------+------------+--------+--------+
| matchId | summonerId | championId | teamId | winner |
+------------+------------+------------+--------+--------+
| 2528256239 | 23364213 | 412 | 200 | 0 |
| 2528256239 | 23949601 | 32 | 100 | 1 |
| 2528256239 | 30032566 | 127 | 100 | 1 |
| 2528256239 | 34519064 | 236 | 100 | 1 |
| 2528256239 | 34928637 | 429 | 200 | 0 |
| 2528256239 | 35157572 | 91 | 100 | 1 |
| 2528256239 | 40909308 | 4 | 200 | 0 |
| 2528256239 | 42999456 | 201 | 100 | 1 |
| 2528256239 | 50717471 | 122 | 200 | 0 |
| 2528256239 | 52439549 | 60 | 200 | 0 |
| 2528415543 | 26264559 | 236 | 100 | 1 |
| 2528415543 | 30032566 | 79 | 100 | 1 |
| 2528415543 | 30066298 | 203 | 200 | 0 |
| 2528415543 | 30144484 | 201 | 200 | 0 |
| 2528415543 | 30420315 | 81 | 200 | 0 |
| 2528415543 | 35666995 | 238 | 100 | 1 |
| 2528415543 | 42999456 | 117 | 100 | 1 |
| 2528415543 | 55777006 | 75 | 100 | 1 |
| 2528415543 | 71020371 | 114 | 200 | 0 |
| 2528415543 | 75067455 | 1 | 200 | 0 |
| 2528508209 | 20859869 | 41 | 200 | 1 |
| 2528508209 | 20926263 | 51 | 100 | 0 |
| 2528508209 | 21489056 | 81 | 200 | 1 |
| 2528508209 | 30032566 | 62 | 100 | 0 |
| 2528508209 | 31429371 | 19 | 200 | 1 |
| 2528508209 | 34198484 | 103 | 100 | 0 |
| 2528508209 | 39520185 | 32 | 100 | 0 |
| 2528508209 | 42954909 | 201 | 200 | 1 |
| 2528508209 | 42999456 | 40 | 100 | 0 |
| 2528508209 | 44449359 | 236 | 200 | 1 |
| 2528567430 | 22896699 | 64 | 100 | 0 |
| 2528567430 | 27716534 | 90 | 200 | 1 |
| 2528567430 | 30032566 | 157 | 200 | 1 |
| 2528567430 | 30161338 | 12 | 100 | 0 |
| 2528567430 | 33288363 | 30 | 100 | 0 |
| 2528567430 | 38554025 | 81 | 100 | 0 |
| 2528567430 | 40124474 | 62 | 200 | 1 |
| 2528567430 | 42999456 | 26 | 200 | 1 |
| 2528567430 | 61287205 | 92 | 100 | 0 |
| 2528567430 | 69117699 | 104 | 200 | 1 |
| 2528778889 | 19128606 | 102 | 100 | 0 |
| 2528778889 | 21226478 | 16 | 100 | 0 |
| 2528778889 | 24671894 | 74 | 100 | 0 |
| 2528778889 | 30032566 | 31 | 200 | 1 |
| 2528778889 | 42728001 | 157 | 200 | 1 |
| 2528778889 | 42999456 | 201 | 200 | 1 |
| 2528778889 | 43160768 | 236 | 200 | 1 |
| 2528778889 | 44918136 | 55 | 100 | 0 |
| 2528778889 | 52104644 | 51 | 100 | 0 |
| 2528778889 | 52420228 | 24 | 200 | 1 |
| 2529734554 | 19611148 | 412 | 100 | 1 |
| 2529734554 | 27427187 | 420 | 100 | 1 |
| 2529734554 | 27926072 | 117 | 200 | 0 |
| 2529734554 | 30032566 | 77 | 200 | 0 |
| 2529734554 | 33899765 | 4 | 200 | 0 |
| 2529734554 | 40093026 | 245 | 100 | 1 |
| 2529734554 | 42999456 | 40 | 200 | 0 |
| 2529734554 | 44385431 | 67 | 200 | 0 |
| 2529734554 | 49240203 | 81 | 100 | 1 |
| 2529734554 | 80637139 | 25 | 100 | 1 |
| 2529747312 | 19648659 | 90 | 200 | 1 |
| 2529747312 | 30032566 | 114 | 200 | 1 |
| 2529747312 | 33120079 | 67 | 200 | 1 |
| 2529747312 | 35688371 | 32 | 100 | 0 |
| 2529747312 | 35817488 | 106 | 200 | 1 |
| 2529747312 | 36068030 | 41 | 100 | 0 |
| 2529747312 | 40406867 | 412 | 100 | 0 |
| 2529747312 | 42999456 | 44 | 200 | 1 |
| 2529747312 | 43212358 | 236 | 100 | 0 |
| 2529747312 | 52238049 | 8 | 100 | 0 |
| 2529802806 | 20929372 | 105 | 200 | 1 |
| 2529802806 | 24507439 | 236 | 200 | 1 |
| 2529802806 | 24849750 | 238 | 100 | 0 |
| 2529802806 | 28026768 | 117 | 100 | 0 |
| 2529802806 | 30032566 | 223 | 200 | 1 |
| 2529802806 | 31689726 | 67 | 100 | 0 |
| 2529802806 | 35685814 | 92 | 100 | 0 |
| 2529802806 | 40621123 | 254 | 100 | 0 |
| 2529802806 | 42999456 | 40 | 200 | 1 |
| 2529802806 | 56868633 | 64 | 200 | 1 |
| 2530087947 | 108807 | 16 | 200 | 1 |
| 2530087947 | 19409641 | 84 | 100 | 0 |
| 2530087947 | 21422420 | 81 | 100 | 0 |
| 2530087947 | 23851356 | 112 | 200 | 1 |
| 2530087947 | 25847381 | 96 | 200 | 1 |
| 2530087947 | 27575895 | 11 | 200 | 1 |
| 2530087947 | 39058809 | 64 | 100 | 0 |
| 2530087947 | 39409025 | 61 | 100 | 0 |
| 2530087947 | 42999456 | 44 | 100 | 0 |
| 2530087947 | 54220113 | 41 | 200 | 1 |
| 2537795256 | 19118675 | 40 | 200 | 1 |
| 2537795256 | 20071645 | 42 | 200 | 1 |
| 2537795256 | 29826523 | 11 | 100 | 0 |
| 2537795256 | 30032566 | 15 | 100 | 0 |
| 2537795256 | 37639463 | 31 | 100 | 0 |
| 2537795256 | 37741313 | 245 | 200 | 1 |
| 2537795256 | 42999456 | 117 | 100 | 0 |
| 2537795256 | 46537422 | 80 | 200 | 1 |
| 2537795256 | 71466951 | 238 | 200 | 1 |
| 2537795256 | 76797025 | 75 | 100 | 0 |

A way of doing it is with NOT EXISTS:
SELECT matchId, summonerId, championId, teamId, winner
FROM mytable AS t1
WHERE matchId = 2528256239 AND
NOT EXISTS (SELECT 1
FROM mytable AS t2
WHERE t2.summonerId = 42999456 AND
t1.matchId = t2.matchId AND
t1.teamId = t2.teamId)
Demo here
You can also use a LEFT JOIN:
SELECT t1.matchId, t1.summonerId, t1.championId, t1.teamId, t1.winner
FROM mytable AS t1
LEFT JOIN mytable AS t2
ON t1.matchId = t2.matchId AND t1.teamId = t2.teamId AND t2.summonerId = 42999456
WHERE t1.matchId = 2528256239 AND t2.matchId IS NULL
Demo here

SQL Performance multiple exclusion from the same table

I have a table where I have a list of people, lets say i have 100 people listed in that table
I need to filter out the people using different criteria's and put them in groups, problem is when i start excluding on the 4th-5th level, performance issues come up and it becomes slow
with lst_tous_movements as (
select
t1.refid_eClinibase
t1.[dthrfinmouvement]
t1.[unite_service_id]
t1.[unite_service_suiv_id]
from sometable t1
)
,lst_patients_hospitalisés as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
where
t1.[dthrfinmouvement] = '4000-01-01'
)
,lst_patients_admisUIB_transferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_hospitalisés t2 on t1.refid_eClinibase = t2.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] <> 0
and t2.refid_eClinibase is null
)
,lst_patients_admisUIB_nonTransferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] = 0
and t2.refid_eClinibase is null
and t3.refid_eClinibase is null
)
,lst_patients_autres as (
select distinct
t1.refid_eClinibase
from lst_patients t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
left join lst_patients_admisUIB_nonTransferes t4 on t1.refid_eClinibase = t4.refid_eClinibase
where
t2.refid_eClinibase is null
and t3.refid_eClinibase is null
and t4.refid_eClinibase is null
)
as you can see i have a multi level filtering out going on here...
1st i get the people where t1.[dthrfinmouvement] = '4000-01-01'
2nd i get the people with another criteria EXCLUDING the 1st group
3rd i get the people with yet another criteria EXCLUDING the 1st and
the 2nd group
etc..
when i get to the 4th level, my query takes 6 - 10 seconds to complete
is there any way to speed this up ?
this is my dataset i'm working with:
+------------------+-------------------------------+------------------+------------------+-----------------------+
| refid_eClinibase | nodossierpermanent_eClinibase | dthrfinmouvement | unite_service_id | unite_service_suiv_id |
+------------------+-------------------------------+------------------+------------------+-----------------------+
| 25611 | P0017379 | 2013-04-27 | 58 | 0 |
| 25611 | P0017379 | 2013-05-02 | 4 | 2 |
| 25611 | P0017379 | 2013-05-18 | 2 | 0 |
| 85886 | P0077918 | 2013-04-10 | 58 | 0 |
| 85886 | P0077918 | 2013-05-06 | 6 | 12 |
| 85886 | P0077918 | 4000-01-01 | 12 | 0 |
| 91312 | P0083352 | 2013-07-24 | 3 | 14 |
| 91312 | P0083352 | 2013-07-24 | 14 | 3 |
| 91312 | P0083352 | 2013-07-30 | 3 | 8 |
| 91312 | P0083352 | 4000-01-01 | 8 | 0 |
| 93835 | P0085879 | 2013-04-30 | 58 | 0 |
| 93835 | P0085879 | 2013-05-07 | 4 | 2 |
| 93835 | P0085879 | 2013-05-16 | 2 | 0 |
| 93835 | P0085879 | 2013-05-22 | 58 | 0 |
| 93835 | P0085879 | 2013-05-24 | 4 | 0 |
| 93835 | P0085879 | 2013-05-31 | 58 | 0 |
| 93836 | P0085880 | 2013-05-20 | 58 | 0 |
| 93836 | P0085880 | 2013-05-22 | 4 | 2 |
| 93836 | P0085880 | 2013-05-31 | 2 | 0 |
| 97509 | P0089576 | 2013-04-09 | 58 | 0 |
| 97509 | P0089576 | 2013-04-11 | 4 | 0 |
| 102787 | P0094886 | 2013-04-08 | 58 | 0 |
| 102787 | P0094886 | 2013-04-11 | 4 | 2 |
| 102787 | P0094886 | 2013-05-21 | 2 | 0 |
| 103029 | P0095128 | 2013-04-04 | 58 | 0 |
| 103029 | P0095128 | 2013-04-10 | 4 | 1 |
| 103029 | P0095128 | 2013-05-03 | 1 | 0 |
| 103813 | P0095922 | 2013-07-02 | 58 | 0 |
| 103813 | P0095922 | 2013-07-03 | 4 | 6 |
| 103813 | P0095922 | 2013-08-14 | 6 | 0 |
| 105106 | P0097215 | 2013-08-09 | 58 | 0 |
| 105106 | P0097215 | 2013-08-13 | 4 | 0 |
| 105106 | P0097215 | 2013-08-14 | 58 | 0 |
| 105106 | P0097215 | 4000-01-01 | 4 | 0 |
| 106223 | P0098332 | 2013-06-11 | 1 | 0 |
| 106223 | P0098332 | 2013-08-01 | 58 | 0 |
| 106223 | P0098332 | 4000-01-01 | 1 | 0 |
| 106245 | P0098354 | 2013-04-02 | 58 | 0 |
| 106245 | P0098354 | 2013-05-24 | 58 | 0 |
| 106245 | P0098354 | 2013-05-29 | 4 | 1 |
| 106245 | P0098354 | 2013-07-12 | 1 | 0 |
| 106280 | P0098389 | 2013-04-07 | 58 | 0 |
| 106280 | P0098389 | 2013-04-09 | 4 | 0 |
| 106416 | P0098525 | 2013-04-19 | 58 | 0 |
| 106416 | P0098525 | 2013-04-23 | 4 | 0 |
| 106444 | P0098553 | 2013-04-22 | 58 | 0 |
| 106444 | P0098553 | 2013-04-25 | 4 | 0 |
| 106609 | P0098718 | 2013-05-08 | 58 | 0 |
| 106609 | P0098718 | 2013-05-10 | 4 | 11 |
| 106609 | P0098718 | 2013-07-24 | 11 | 12 |
| 106609 | P0098718 | 4000-01-01 | 12 | 0 |
| 106616 | P0098725 | 2013-05-09 | 58 | 0 |
| 106616 | P0098725 | 2013-05-09 | 4 | 1 |
| 106616 | P0098725 | 2013-07-27 | 1 | 0 |
| 106698 | P0098807 | 2013-05-16 | 58 | 0 |
| 106698 | P0098807 | 2013-05-22 | 4 | 6 |
| 106698 | P0098807 | 2013-06-14 | 6 | 1 |
| 106698 | P0098807 | 2013-06-28 | 1 | 0 |
| 106714 | P0098823 | 2013-05-20 | 58 | 0 |
| 106714 | P0098823 | 2013-05-21 | 58 | 0 |
| 106714 | P0098823 | 2013-05-24 | 58 | 0 |
| 106729 | P0098838 | 2013-05-21 | 58 | 0 |
| 106729 | P0098838 | 2013-05-23 | 4 | 1 |
| 106729 | P0098838 | 2013-06-03 | 1 | 0 |
| 107038 | P0099147 | 2013-06-25 | 58 | 0 |
| 107038 | P0099147 | 2013-06-28 | 4 | 1 |
| 107038 | P0099147 | 2013-07-04 | 1 | 0 |
| 107038 | P0099147 | 2013-08-13 | 58 | 0 |
| 107038 | P0099147 | 2013-08-15 | 4 | 6 |
| 107038 | P0099147 | 4000-01-01 | 6 | 0 |
| 107082 | P0099191 | 2013-06-29 | 58 | 0 |
| 107082 | P0099191 | 2013-07-04 | 4 | 6 |
| 107082 | P0099191 | 2013-07-19 | 6 | 0 |
| 107157 | P0099267 | 4000-01-01 | 13 | 0 |
| 107336 | P0099446 | 4000-01-01 | 6 | 0 |
+------------------+-------------------------------+------------------+------------------+-----------------------+
thanks.

It is hard to understand exactly what all your rules are from the question, but the general approach should be to add a "Grouping" column to a singl query that uses a CASE statement to categorize the people.
The conditions in a CASE are evaluated in order, so that if the first criteria is met, then the subsequent criteria are not even evaluated for that row.
Here is some code to get you started....
select t1.refid_eClinibase
,t1.[dthrfinmouvement]
,t1.[unite_service_id]
,t1.[unite_service_suiv_id]
CASE WHEN [dthrfinmouvement] = '4000-01-01' THEN 'Group1 Label'
WHEN condition2 = something THEN 'Group2 Label'
....
WHEN conditionN = something THEN 'GroupN Label'
ELSE 'Catch All Label'
END as person_category
from sometable t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find the highest and lowest value locations within an interval on a column? - pandas

Related

Pandas - Grouping Rows With Same Value in Dataframe

How to make sql hive when i have input this?

select all rows that match criteria if not get a random one

Get rows from matches with different teamId than playing user

SQL Performance multiple exclusion from the same table

Categories

Resources