How to make sql hive when i have input this? - sql

input:
| a.user_id | a_stream_length | b_stream_length | subtract_inactive |
-----------------------------------------------------------------------------
| a | 11 | 1686 | 22 |
| a | 1686 | 328 | 12 |
| a | 328 | 732 | 22 |
| a | 732 | 11 | 1699 |
| a | 11 | 2123 | 18 |
| a | 2123 | 160 | 2 |
| a | 160 | 1358 | 0 |
| a | 1358 | 129 | 1 |
| a | 129 | 4042 | 109334 |
output:
| a | (1686+11+328+732) (if subtract_inactive < 1000) |
| a | 732(a_stream_length) if subtract_inactive > 1000) |

Related

SQL: FIter rows with specyfic pattern

I'm a bit new in sql.
I have the following table:
+-----+---------+------------------------+
| ID | ID_TEST | FILE_PATH |
+-----+---------+------------------------+
| 575 | 3 | Landscapes_001_h_A.jpg |
| 576 | 3 | Landscapes_001_h_B.jpg |
| 577 | 3 | Landscapes_001_h_C.jpg |
| 578 | 3 | Landscapes_001_h_D.jpg |
| 579 | 3 | Landscapes_001_h_E.jpg |
| 580 | 3 | Landscapes_002_h_A.jpg |
| 581 | 3 | Landscapes_002_h_B.jpg |
| 582 | 3 | Landscapes_002_h_C.jpg |
| 583 | 3 | Landscapes_002_h_D.jpg |
| 584 | 3 | Landscapes_002_h_E.jpg |
+-----+---------+------------------------+
The pattern for picture is Landscapes_XXX_h_Y.jpg
where
XXX is number from 1 to 185 and Y is quality version from A to E
I wanna select each image name with different quality.
The output should be
+-----+---------+------------------------+
| ID | ID_TEST | FILE_PATH |
+-----+---------+------------------------+
| 575 | 3 | Landscapes_001_h_A.jpg |
| 576 | 3 | Landscapes_002_h_E.jpg |
| 577 | 3 | Landscapes_003_h_C.jpg |
| 578 | 3 | Landscapes_004_h_B.jpg |
| 579 | 3 | Landscapes_005_h_D.jpg |
| 580 | 3 | Landscapes_006_h_A.jpg |
| 581 | 3 | Landscapes_007_h_E.jpg |
| 582 | 3 | Landscapes_008_h_C.jpg |
| 583 | 3 | Landscapes_009_h_B.jpg |
| 584 | 3 | Landscapes_010_h_E.jpg |
+-----+---------+------------------------+
but of course for 185 elements.
I'm using 5.5.60-MariaDB.
How to write SELECT statement? Using REGEXP?

Properly using PERCENTILE_CONT - Oracle SQL

I am trying to calculate the following:
The average of the dataset
The median of the dataset
The top 20% of the dataset
The bottom 20% of the dataset
My dataset looks like this:
| Part | Step | Step_Start | Part_Finish | TheTime |
|:----:|:----:|:----------:|:-----------:|:-----------:|
| 1 | 200 | 15-Aug-18 | 19-Jun-19 | 307.4926273 |
| 2 | 200 | 7-Jun-19 | 19-Jun-19 | 11.4434375 |
| 3 | 200 | 17-Sep-18 | 4-Feb-19 | 139.4360417 |
| 4 | 200 | 30-Jan-19 | 4-Feb-19 | 4.356666667 |
| 5 | 200 | 1-Oct-18 | 18-Feb-19 | 139.4528009 |
| 6 | 200 | 13-Feb-19 | 18-Feb-19 | 4.50375 |
| 7 | 200 | 17-Oct-18 | 28-Mar-19 | 161.7007176 |
| 8 | 200 | 12-Nov-18 | 28-Mar-19 | 135.630625 |
| 9 | 200 | 25-Oct-18 | 26-Feb-19 | 123.6026968 |
| 10 | 200 | 22-Feb-19 | 26-Feb-19 | 3.628090278 |
| 11 | 200 | 30-Oct-18 | 3-Jan-19 | 64.51466435 |
| 12 | 200 | 12-Dec-18 | 3-Jan-19 | 21.48703704 |
| 13 | 200 | 15-Nov-18 | 14-Jan-19 | 59.41373843 |
| 14 | 200 | 7-Jan-19 | 14-Jan-19 | 6.621828704 |
| 15 | 200 | 15-Nov-18 | 12-Jan-19 | 57.62283565 |
| 16 | 200 | 8-Jan-19 | 12-Jan-19 | 3.264398148 |
| 17 | 200 | 15-Nov-18 | 7-Mar-19 | 111.5082523 |
| 18 | 200 | 4-Mar-19 | 7-Mar-19 | 2.153587963 |
| 19 | 200 | 16-Nov-18 | 23-May-19 | 187.6931481 |
| 20 | 200 | 16-Nov-18 | 3-Jan-19 | 47.47916667 |
| 21 | 200 | 17-Dec-18 | 3-Jan-19 | 16.62722222 |
| 22 | 200 | 20-Nov-18 | 14-Feb-19 | 85.6115625 |
| 23 | 200 | 9-Feb-19 | 14-Feb-19 | 4.520787037 |
| 24 | 200 | 19-Nov-18 | 14-Jan-19 | 55.53342593 |
| 25 | 200 | 9-Jan-19 | 14-Jan-19 | 4.721400463 |
| 26 | 200 | 26-Nov-18 | 9-Jan-19 | 43.50748843 |
| 27 | 200 | 4-Jan-19 | 9-Jan-19 | 4.417164352 |
| 28 | 200 | 26-Nov-18 | 21-Jan-19 | 55.59988426 |
| 29 | 200 | 13-Jan-19 | 21-Jan-19 | 7.535 |
| 30 | 200 | 16-Jan-19 | 21-Jan-19 | 4.618796296 |
| 31 | 200 | 26-Nov-18 | 11-Jan-19 | 45.42148148 |
| 32 | 200 | 4-Jan-19 | 11-Jan-19 | 6.316921296 |
| 33 | 200 | 4-Dec-18 | 24-Jan-19 | 50.3669213 |
| 34 | 200 | 18-Jan-19 | 24-Jan-19 | 5.589467593 |
| 35 | 200 | 4-Dec-18 | 31-Jan-19 | 57.26877315 |
| 36 | 200 | 22-Jan-19 | 31-Jan-19 | 8.240034722 |
| 37 | 200 | 5-Dec-18 | 28-Jun-19 | 204.5283912 |
| 38 | 200 | 26-Jun-19 | 28-Jun-19 | 1.508252315 |
| 39 | 200 | 9-Feb-19 | 19-Feb-19 | 9.532893519 |
| 40 | 200 | 7-Dec-18 | 14-Feb-19 | 68.51900463 |
| 41 | 200 | 5-Feb-19 | 14-Feb-19 | 8.641076389 |
| 42 | 200 | 11-Dec-18 | 25-Jan-19 | 44.50501157 |
| 43 | 200 | 22-Jan-19 | 25-Jan-19 | 2.511435185 |
| 44 | 200 | 13-Dec-18 | 17-Jan-19 | 34.43806713 |
| 45 | 200 | 14-Jan-19 | 17-Jan-19 | 2.210972222 |
| 46 | 200 | 13-Dec-18 | 24-Jan-19 | 41.38921296 |
| 47 | 200 | 17-Jan-19 | 24-Jan-19 | 6.444664352 |
| 48 | 200 | 10-Jan-19 | 7-Feb-19 | 27.43130787 |
| 49 | 200 | 1-Feb-19 | 7-Feb-19 | 5.349189815 |
| 50 | 200 | 18-Dec-18 | 4-Feb-19 | 47.50416667 |
| 51 | 200 | 29-Jan-19 | 4-Feb-19 | 5.481979167 |
| 52 | 200 | 3-Jan-19 | 30-Jan-19 | 26.46112269 |
| 53 | 200 | 23-Jan-19 | 30-Jan-19 | 6.712175926 |
| 54 | 200 | 4-Jan-19 | 5-Feb-19 | 31.49590278 |
| 55 | 200 | 30-Jan-19 | 5-Feb-19 | 5.385798611 |
| 56 | 200 | 23-Jan-19 | 20-Mar-19 | 55.296875 |
| 57 | 200 | 21-Feb-19 | 20-Mar-19 | 26.06854167 |
| 58 | 200 | 22-Jan-19 | 14-Mar-19 | 50.57989583 |
| 59 | 200 | 8-Mar-19 | 14-Mar-19 | 5.147303241 |
| 60 | 200 | 22-Jan-19 | 21-Feb-19 | 29.46405093 |
| 61 | 200 | 14-Feb-19 | 21-Feb-19 | 6.701724537 |
| 62 | 200 | 24-Jan-19 | 23-Apr-19 | 88.50689815 |
| 63 | 200 | 17-Apr-19 | 23-Apr-19 | 5.725405093 |
| 64 | 200 | 28-Jan-19 | 21-Feb-19 | 23.50082176 |
| 65 | 200 | 13-Feb-19 | 21-Feb-19 | 7.115717593 |
| 66 | 200 | 31-Jan-19 | 28-Feb-19 | 27.55881944 |
| 67 | 200 | 25-Feb-19 | 28-Feb-19 | 2.633738426 |
| 68 | 200 | 31-Jan-19 | 27-Feb-19 | 26.46105324 |
| 69 | 200 | 23-Feb-19 | 27-Feb-19 | 3.531423611 |
| 70 | 200 | 1-Feb-19 | 28-Feb-19 | 26.45835648 |
| 71 | 200 | 27-Feb-19 | 28-Feb-19 | 0.471296296 |
| 72 | 200 | 6-Feb-19 | 27-Feb-19 | 20.54436343 |
| 73 | 200 | 23-Feb-19 | 27-Feb-19 | 3.598854167 |
| 74 | 200 | 6-Feb-19 | 5-Mar-19 | 26.54347222 |
| 75 | 200 | 28-Feb-19 | 5-Mar-19 | 4.303773148 |
| 76 | 200 | 12-Feb-19 | 6-Mar-19 | 21.56993056 |
| 77 | 200 | 1-Mar-19 | 6-Mar-19 | 4.597615741 |
| 78 | 200 | 12-Feb-19 | 14-Mar-19 | 29.50417824 |
| 79 | 200 | 7-Mar-19 | 14-Mar-19 | 6.083541667 |
| 80 | 200 | 28-Feb-19 | 28-Mar-19 | 27.5291088 |
| 81 | 200 | 25-Mar-19 | 28-Mar-19 | 2.637824074 |
| 82 | 200 | 29-Jan-19 | 28-Feb-19 | 29.34280093 |
| 83 | 200 | 21-Feb-19 | 28-Feb-19 | 6.233831019 |
| 84 | 200 | 19-Feb-19 | 30-Apr-19 | 69.51832176 |
| 85 | 200 | 7-Feb-19 | 5-Mar-19 | 25.74865741 |
| 86 | 200 | 27-Feb-19 | 5-Mar-19 | 5.380034722 |
| 87 | 200 | 21-Feb-19 | 21-Mar-19 | 27.56310185 |
| 88 | 200 | 19-Mar-19 | 21-Mar-19 | 1.161828704 |
| 89 | 200 | 26-Feb-19 | 28-Mar-19 | 29.41315972 |
| 90 | 200 | 22-Mar-19 | 28-Mar-19 | 5.673703704 |
| 91 | 200 | 26-Feb-19 | 28-Mar-19 | 29.5131713 |
| 92 | 200 | 20-Mar-19 | 28-Mar-19 | 7.073414352 |
| 93 | 200 | 28-Feb-19 | 15-Apr-19 | 45.63513889 |
| 94 | 200 | 5-Apr-19 | 15-Apr-19 | 9.479456019 |
| 95 | 200 | 1-Mar-19 | 29-Mar-19 | 27.54568287 |
| 96 | 200 | 25-Mar-19 | 29-Mar-19 | 3.044340278 |
| 97 | 200 | 4-Mar-19 | 27-Mar-19 | 22.52392361 |
| 98 | 200 | 21-Mar-19 | 27-Mar-19 | 5.074421296 |
| 99 | 200 | 14-Feb-19 | 19-Mar-19 | 32.54349537 |
| 100 | 200 | 13-Mar-19 | 19-Mar-19 | 5.265266204 |
My current SQL query looks like this:
SELECT
Step,
ROUND(MEDIAN(Part_Finish - Step_Start), 2) AS "The_Median",
ROUND(AVG(Part_Finish - Step_Start), 2) AS "The_Average",
PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Best_Time",
PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Worst_Time"
FROM
myTbl
GROUP BY
Step
However, I am not sure if my results are correct, because I don't think I am using PERCENTILE_CONT() correctly. How can I use PERCENTILE_CONT() (or another method) to find the average or median (whichever is easier) "time to complete" based on the best 20% of the data, and the worst 20% of the data?
I would expect some results to look like this:
| Step | The_Average | The_Median | Best_Time | Worst_Time |
|:----:|:-----------:|:----------:|:---------:|:----------:|
| 200 | < value > | < value > | < value > | < value > |
where the < value > fields are the properly calculated average, median, and best and worst of the dataset. Best and worst being calculated by finding the average or median of the top 20% of the data (i.e., the smallest times) or the worst 20% of the data (i.e., the largest times)
PERCENTILE_CONT is a window function, so if you just want a result set consisting of a single record with scalar values, you may try selecting distinct:
SELECT DISTINCT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "The_median",
ROUND(AVG(Part_Finish - Step_Start) OVER (ORDER BY Part_Finish - Step_Start), 2) AS "The_Average",
PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Best_Time",
PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Worst_Time"
FROM myTbl;
The reason for the above approach is that selecting PERCENTILE_CONT, a window function, over your entire table would just return the entire table as the result set. But, as you are using it, the values would always be the same for each record. Therefore, we can just take the distinct value to get a single result.
If you instead expect a different report for each Step value, then you should be using PARTITION BY in the calls to PERCENTILE_CONT, e.g.
PERCENTILE_CONT(0.5) WITHIN GROUP (PARTITION BY Step
ORDER BY Part_Finish - Step_Start) AS "The_median"

Find the highest and lowest value locations within an interval on a column?

Given this pandas dataframe with two columns, 'Values' and 'Intervals'. How do I get a third column 'MinMax' indicating whether the value is a maximum or a minimum within that interval? The challenge for me is that the interval length and the distance between intervals are not fixed, therefore I post the question.
import pandas as pd
import numpy as np
data = pd.DataFrame([
[1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
[1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
[1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
[1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
[1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
[1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
[1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
[1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
[1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
[1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
[1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
[1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
[1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
[1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
[1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
[1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
[1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
])
In the third column below you can see where the max and min is for each interval.
+-------+----------+-----------+---------+
| index | Value | Intervals | Min/Max |
+-------+----------+-----------+---------+
| 0 | 1879.289 | np.nan | |
| 1 | 1879.281 | np.nan | |
| 2 | 1879.292 | 1 | |
| 3 | 1879.295 | 1 | |
| 4 | 1879.481 | 1 | |
| 5 | 1879.294 | 1 | |
| 6 | 1879.268 | 1 | min |
| 7 | 1879.293 | 1 | |
| 8 | 1879.277 | 1 | |
| 9 | 1879.285 | 1 | |
| 10 | 1879.464 | 1 | |
| 11 | 1879.475 | 1 | |
| 12 | 1879.971 | 1 | |
| 13 | 1879.779 | 1 | |
| 17 | 1879.986 | 1 | |
| 18 | 1880.791 | 1 | max |
| 19 | 1880.29 | 1 | |
| 55 | 1879.253 | np.nan | |
| 56 | 1878.268 | np.nan | |
| 57 | 1875.73 | 1 | |
| 58 | 1876.792 | 1 | |
| 59 | 1875.977 | 1 | min |
| 60 | 1876.408 | 1 | |
| 61 | 1877.159 | 1 | |
| 62 | 1877.187 | 1 | |
| 63 | 1883.164 | 1 | |
| 64 | 1883.171 | 1 | |
| 65 | 1883.495 | 1 | |
| 66 | 1883.962 | 1 | |
| 67 | 1885.158 | 1 | |
| 68 | 1885.974 | 1 | max |
| 69 | 1886.479 | np.nan | |
| 70 | 1885.969 | np.nan | |
| 71 | 1884.693 | 1 | |
| 72 | 1884.977 | 1 | |
| 73 | 1884.967 | 1 | |
| 74 | 1884.691 | 1 | min |
| 75 | 1886.171 | 1 | max |
| 76 | 1886.166 | np.nan | |
| 77 | 1884.476 | np.nan | |
| 78 | 1884.66 | 1 | max |
| 79 | 1882.962 | 1 | |
| 80 | 1881.496 | 1 | |
| 81 | 1871.163 | 1 | min |
| 82 | 1874.985 | 1 | |
| 83 | 1874.979 | 1 | |
| 84 | 1871.173 | np.nan | |
| 85 | 1871.973 | np.nan | |
| 86 | 1871.682 | np.nan | |
| 87 | 1872.476 | np.nan | |
| 88 | 1882.361 | 1 | max |
| 89 | 1880.869 | 1 | |
| 90 | 1882.165 | 1 | |
| 91 | 1881.857 | 1 | |
| 92 | 1880.375 | 1 | |
| 93 | 1880.66 | 1 | |
| 94 | 1880.891 | 1 | |
| 95 | 1880.377 | 1 | |
| 96 | 1881.663 | 1 | |
| 97 | 1881.66 | 1 | |
| 98 | 1877.888 | 1 | |
| 99 | 1875.69 | 1 | |
| 100 | 1875.161 | 1 | min |
| 101 | 1876.697 | np.nan | |
| 102 | 1876.671 | np.nan | |
| 103 | 1879.666 | np.nan | |
| 111 | 1877.182 | np.nan | |
| 112 | 1878.898 | 1 | |
| 113 | 1878.668 | 1 | |
| 114 | 1878.871 | 1 | |
| 115 | 1878.882 | 1 | |
| 116 | 1879.173 | 1 | max |
| 117 | 1878.887 | 1 | |
| 118 | 1878.68 | 1 | |
| 119 | 1878.872 | 1 | |
| 120 | 1878.677 | 1 | |
| 121 | 1877.877 | 1 | |
| 122 | 1877.669 | 1 | |
| 123 | 1877.69 | 1 | |
| 124 | 1877.684 | 1 | |
| 125 | 1877.68 | 1 | |
| 126 | 1877.885 | 1 | |
| 127 | 1877.863 | 1 | |
| 128 | 1877.674 | 1 | |
| 129 | 1877.676 | 1 | |
| 130 | 1877.687 | 1 | |
| 131 | 1878.367 | 1 | |
| 132 | 1878.179 | 1 | |
| 133 | 1877.696 | 1 | |
| 134 | 1877.665 | 1 | min |
| 135 | 1877.667 | np.nan | |
| 136 | 1878.678 | np.nan | |
| 137 | 1878.661 | 1 | max |
| 138 | 1878.171 | 1 | |
| 139 | 1877.371 | 1 | |
| 140 | 1877.359 | 1 | |
| 141 | 1878.381 | 1 | |
| 142 | 1875.185 | 1 | min |
| 143 | 1875.367 | np.nan | |
| 144 | 1865.492 | np.nan | |
| 145 | 1865.495 | 1 | max |
| 146 | 1866.995 | 1 | |
| 147 | 1866.672 | 1 | |
| 148 | 1867.465 | 1 | |
| 149 | 1867.663 | 1 | |
| 150 | 1867.186 | 1 | |
| 151 | 1867.687 | 1 | |
| 152 | 1867.459 | 1 | |
| 153 | 1867.168 | 1 | |
| 154 | 1869.689 | 1 | |
| 155 | 1869.693 | 1 | |
| 156 | 1871.676 | 1 | |
| 157 | 1873.174 | 1 | min |
| 158 | 1873.691 | np.nan | |
| 159 | 1873.685 | np.nan | |
+-------+----------+-----------+---------+
isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)
0 1 MinMax
0 1879.289 NaN
1 1879.281 NaN
2 1879.292 1.0
3 1879.295 1.0
4 1879.481 1.0
5 1879.294 1.0
6 1879.268 1.0 min
7 1879.293 1.0
8 1879.277 1.0
9 1879.285 1.0
10 1879.464 1.0
11 1879.475 1.0
12 1879.971 1.0
13 1879.779 1.0
14 1879.986 1.0
15 1880.791 1.0 max
16 1880.290 1.0
17 1879.253 NaN
18 1878.268 NaN
19 1875.730 1.0 min
20 1876.792 1.0
21 1875.977 1.0
22 1876.408 1.0
23 1877.159 1.0
24 1877.187 1.0
25 1883.164 1.0
26 1883.171 1.0
27 1883.495 1.0
28 1883.962 1.0
29 1885.158 1.0
.. ... ... ...
85 1877.687 1.0
86 1878.367 1.0
87 1878.179 1.0
88 1877.696 1.0
89 1877.665 1.0 min
90 1877.667 NaN
91 1878.678 NaN
92 1878.661 1.0 max
93 1878.171 1.0
94 1877.371 1.0
95 1877.359 1.0
96 1878.381 1.0
97 1875.185 1.0 min
98 1875.367 NaN
99 1865.492 NaN
100 1865.495 1.0 min
101 1866.995 1.0
102 1866.672 1.0
103 1867.465 1.0
104 1867.663 1.0
105 1867.186 1.0
106 1867.687 1.0
107 1867.459 1.0
108 1867.168 1.0
109 1869.689 1.0
110 1869.693 1.0
111 1871.676 1.0
112 1873.174 1.0 max
113 1873.691 NaN
114 1873.685 NaN
[115 rows x 3 columns]
data.columns=['Value','Interval']
data['Ingroup'] = (data['Interval'].notnull() + 0)
Use data['Interval'].notnull() to separate the groups...
Use cumsum() to number them with `groupno`...
Use groupby(groupno)..
Finally you want something using apply/idxmax/idxmin to label the max/min
But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.

Get rows from matches with different teamId than playing user

So what I am trying to do:
I want to get every row with the enemy champion (different teamId) for every match the summoner with the id 42999456 has played.
For example:
summonerId: 42999456
matchId: 2528256239
will return
+------------+------------+------------+--------+--------+
| matchId | summonerId | championId | teamId | winner |
+------------+------------+------------+--------+--------+
| 2528256239 | 23364213 | 412 | 200 | 0 |
| 2528256239 | 34928637 | 429 | 200 | 0 |
| 2528256239 | 40909308 | 4 | 200 | 0 |
| 2528256239 | 50717471 | 122 | 200 | 0 |
| 2528256239 | 52439549 | 60 | 200 | 0 |
(note that I do not want to query for only 1 matchId, I want to query for all matches).
The table looks like that:
+------------+------------+------------+--------+--------+
| matchId | summonerId | championId | teamId | winner |
+------------+------------+------------+--------+--------+
| 2528256239 | 23364213 | 412 | 200 | 0 |
| 2528256239 | 23949601 | 32 | 100 | 1 |
| 2528256239 | 30032566 | 127 | 100 | 1 |
| 2528256239 | 34519064 | 236 | 100 | 1 |
| 2528256239 | 34928637 | 429 | 200 | 0 |
| 2528256239 | 35157572 | 91 | 100 | 1 |
| 2528256239 | 40909308 | 4 | 200 | 0 |
| 2528256239 | 42999456 | 201 | 100 | 1 |
| 2528256239 | 50717471 | 122 | 200 | 0 |
| 2528256239 | 52439549 | 60 | 200 | 0 |
| 2528415543 | 26264559 | 236 | 100 | 1 |
| 2528415543 | 30032566 | 79 | 100 | 1 |
| 2528415543 | 30066298 | 203 | 200 | 0 |
| 2528415543 | 30144484 | 201 | 200 | 0 |
| 2528415543 | 30420315 | 81 | 200 | 0 |
| 2528415543 | 35666995 | 238 | 100 | 1 |
| 2528415543 | 42999456 | 117 | 100 | 1 |
| 2528415543 | 55777006 | 75 | 100 | 1 |
| 2528415543 | 71020371 | 114 | 200 | 0 |
| 2528415543 | 75067455 | 1 | 200 | 0 |
| 2528508209 | 20859869 | 41 | 200 | 1 |
| 2528508209 | 20926263 | 51 | 100 | 0 |
| 2528508209 | 21489056 | 81 | 200 | 1 |
| 2528508209 | 30032566 | 62 | 100 | 0 |
| 2528508209 | 31429371 | 19 | 200 | 1 |
| 2528508209 | 34198484 | 103 | 100 | 0 |
| 2528508209 | 39520185 | 32 | 100 | 0 |
| 2528508209 | 42954909 | 201 | 200 | 1 |
| 2528508209 | 42999456 | 40 | 100 | 0 |
| 2528508209 | 44449359 | 236 | 200 | 1 |
| 2528567430 | 22896699 | 64 | 100 | 0 |
| 2528567430 | 27716534 | 90 | 200 | 1 |
| 2528567430 | 30032566 | 157 | 200 | 1 |
| 2528567430 | 30161338 | 12 | 100 | 0 |
| 2528567430 | 33288363 | 30 | 100 | 0 |
| 2528567430 | 38554025 | 81 | 100 | 0 |
| 2528567430 | 40124474 | 62 | 200 | 1 |
| 2528567430 | 42999456 | 26 | 200 | 1 |
| 2528567430 | 61287205 | 92 | 100 | 0 |
| 2528567430 | 69117699 | 104 | 200 | 1 |
| 2528778889 | 19128606 | 102 | 100 | 0 |
| 2528778889 | 21226478 | 16 | 100 | 0 |
| 2528778889 | 24671894 | 74 | 100 | 0 |
| 2528778889 | 30032566 | 31 | 200 | 1 |
| 2528778889 | 42728001 | 157 | 200 | 1 |
| 2528778889 | 42999456 | 201 | 200 | 1 |
| 2528778889 | 43160768 | 236 | 200 | 1 |
| 2528778889 | 44918136 | 55 | 100 | 0 |
| 2528778889 | 52104644 | 51 | 100 | 0 |
| 2528778889 | 52420228 | 24 | 200 | 1 |
| 2529734554 | 19611148 | 412 | 100 | 1 |
| 2529734554 | 27427187 | 420 | 100 | 1 |
| 2529734554 | 27926072 | 117 | 200 | 0 |
| 2529734554 | 30032566 | 77 | 200 | 0 |
| 2529734554 | 33899765 | 4 | 200 | 0 |
| 2529734554 | 40093026 | 245 | 100 | 1 |
| 2529734554 | 42999456 | 40 | 200 | 0 |
| 2529734554 | 44385431 | 67 | 200 | 0 |
| 2529734554 | 49240203 | 81 | 100 | 1 |
| 2529734554 | 80637139 | 25 | 100 | 1 |
| 2529747312 | 19648659 | 90 | 200 | 1 |
| 2529747312 | 30032566 | 114 | 200 | 1 |
| 2529747312 | 33120079 | 67 | 200 | 1 |
| 2529747312 | 35688371 | 32 | 100 | 0 |
| 2529747312 | 35817488 | 106 | 200 | 1 |
| 2529747312 | 36068030 | 41 | 100 | 0 |
| 2529747312 | 40406867 | 412 | 100 | 0 |
| 2529747312 | 42999456 | 44 | 200 | 1 |
| 2529747312 | 43212358 | 236 | 100 | 0 |
| 2529747312 | 52238049 | 8 | 100 | 0 |
| 2529802806 | 20929372 | 105 | 200 | 1 |
| 2529802806 | 24507439 | 236 | 200 | 1 |
| 2529802806 | 24849750 | 238 | 100 | 0 |
| 2529802806 | 28026768 | 117 | 100 | 0 |
| 2529802806 | 30032566 | 223 | 200 | 1 |
| 2529802806 | 31689726 | 67 | 100 | 0 |
| 2529802806 | 35685814 | 92 | 100 | 0 |
| 2529802806 | 40621123 | 254 | 100 | 0 |
| 2529802806 | 42999456 | 40 | 200 | 1 |
| 2529802806 | 56868633 | 64 | 200 | 1 |
| 2530087947 | 108807 | 16 | 200 | 1 |
| 2530087947 | 19409641 | 84 | 100 | 0 |
| 2530087947 | 21422420 | 81 | 100 | 0 |
| 2530087947 | 23851356 | 112 | 200 | 1 |
| 2530087947 | 25847381 | 96 | 200 | 1 |
| 2530087947 | 27575895 | 11 | 200 | 1 |
| 2530087947 | 39058809 | 64 | 100 | 0 |
| 2530087947 | 39409025 | 61 | 100 | 0 |
| 2530087947 | 42999456 | 44 | 100 | 0 |
| 2530087947 | 54220113 | 41 | 200 | 1 |
| 2537795256 | 19118675 | 40 | 200 | 1 |
| 2537795256 | 20071645 | 42 | 200 | 1 |
| 2537795256 | 29826523 | 11 | 100 | 0 |
| 2537795256 | 30032566 | 15 | 100 | 0 |
| 2537795256 | 37639463 | 31 | 100 | 0 |
| 2537795256 | 37741313 | 245 | 200 | 1 |
| 2537795256 | 42999456 | 117 | 100 | 0 |
| 2537795256 | 46537422 | 80 | 200 | 1 |
| 2537795256 | 71466951 | 238 | 200 | 1 |
| 2537795256 | 76797025 | 75 | 100 | 0 |
A way of doing it is with NOT EXISTS:
SELECT matchId, summonerId, championId, teamId, winner
FROM mytable AS t1
WHERE matchId = 2528256239 AND
NOT EXISTS (SELECT 1
FROM mytable AS t2
WHERE t2.summonerId = 42999456 AND
t1.matchId = t2.matchId AND
t1.teamId = t2.teamId)
Demo here
You can also use a LEFT JOIN:
SELECT t1.matchId, t1.summonerId, t1.championId, t1.teamId, t1.winner
FROM mytable AS t1
LEFT JOIN mytable AS t2
ON t1.matchId = t2.matchId AND t1.teamId = t2.teamId AND t2.summonerId = 42999456
WHERE t1.matchId = 2528256239 AND t2.matchId IS NULL
Demo here

SQL Performance multiple exclusion from the same table

I have a table where I have a list of people, lets say i have 100 people listed in that table
I need to filter out the people using different criteria's and put them in groups, problem is when i start excluding on the 4th-5th level, performance issues come up and it becomes slow
with lst_tous_movements as (
select
t1.refid_eClinibase
t1.[dthrfinmouvement]
t1.[unite_service_id]
t1.[unite_service_suiv_id]
from sometable t1
)
,lst_patients_hospitalisés as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
where
t1.[dthrfinmouvement] = '4000-01-01'
)
,lst_patients_admisUIB_transferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_hospitalisés t2 on t1.refid_eClinibase = t2.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] <> 0
and t2.refid_eClinibase is null
)
,lst_patients_admisUIB_nonTransferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] = 0
and t2.refid_eClinibase is null
and t3.refid_eClinibase is null
)
,lst_patients_autres as (
select distinct
t1.refid_eClinibase
from lst_patients t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
left join lst_patients_admisUIB_nonTransferes t4 on t1.refid_eClinibase = t4.refid_eClinibase
where
t2.refid_eClinibase is null
and t3.refid_eClinibase is null
and t4.refid_eClinibase is null
)
as you can see i have a multi level filtering out going on here...
1st i get the people where t1.[dthrfinmouvement] = '4000-01-01'
2nd i get the people with another criteria EXCLUDING the 1st group
3rd i get the people with yet another criteria EXCLUDING the 1st and
the 2nd group
etc..
when i get to the 4th level, my query takes 6 - 10 seconds to complete
is there any way to speed this up ?
this is my dataset i'm working with:
+------------------+-------------------------------+------------------+------------------+-----------------------+
| refid_eClinibase | nodossierpermanent_eClinibase | dthrfinmouvement | unite_service_id | unite_service_suiv_id |
+------------------+-------------------------------+------------------+------------------+-----------------------+
| 25611 | P0017379 | 2013-04-27 | 58 | 0 |
| 25611 | P0017379 | 2013-05-02 | 4 | 2 |
| 25611 | P0017379 | 2013-05-18 | 2 | 0 |
| 85886 | P0077918 | 2013-04-10 | 58 | 0 |
| 85886 | P0077918 | 2013-05-06 | 6 | 12 |
| 85886 | P0077918 | 4000-01-01 | 12 | 0 |
| 91312 | P0083352 | 2013-07-24 | 3 | 14 |
| 91312 | P0083352 | 2013-07-24 | 14 | 3 |
| 91312 | P0083352 | 2013-07-30 | 3 | 8 |
| 91312 | P0083352 | 4000-01-01 | 8 | 0 |
| 93835 | P0085879 | 2013-04-30 | 58 | 0 |
| 93835 | P0085879 | 2013-05-07 | 4 | 2 |
| 93835 | P0085879 | 2013-05-16 | 2 | 0 |
| 93835 | P0085879 | 2013-05-22 | 58 | 0 |
| 93835 | P0085879 | 2013-05-24 | 4 | 0 |
| 93835 | P0085879 | 2013-05-31 | 58 | 0 |
| 93836 | P0085880 | 2013-05-20 | 58 | 0 |
| 93836 | P0085880 | 2013-05-22 | 4 | 2 |
| 93836 | P0085880 | 2013-05-31 | 2 | 0 |
| 97509 | P0089576 | 2013-04-09 | 58 | 0 |
| 97509 | P0089576 | 2013-04-11 | 4 | 0 |
| 102787 | P0094886 | 2013-04-08 | 58 | 0 |
| 102787 | P0094886 | 2013-04-11 | 4 | 2 |
| 102787 | P0094886 | 2013-05-21 | 2 | 0 |
| 103029 | P0095128 | 2013-04-04 | 58 | 0 |
| 103029 | P0095128 | 2013-04-10 | 4 | 1 |
| 103029 | P0095128 | 2013-05-03 | 1 | 0 |
| 103813 | P0095922 | 2013-07-02 | 58 | 0 |
| 103813 | P0095922 | 2013-07-03 | 4 | 6 |
| 103813 | P0095922 | 2013-08-14 | 6 | 0 |
| 105106 | P0097215 | 2013-08-09 | 58 | 0 |
| 105106 | P0097215 | 2013-08-13 | 4 | 0 |
| 105106 | P0097215 | 2013-08-14 | 58 | 0 |
| 105106 | P0097215 | 4000-01-01 | 4 | 0 |
| 106223 | P0098332 | 2013-06-11 | 1 | 0 |
| 106223 | P0098332 | 2013-08-01 | 58 | 0 |
| 106223 | P0098332 | 4000-01-01 | 1 | 0 |
| 106245 | P0098354 | 2013-04-02 | 58 | 0 |
| 106245 | P0098354 | 2013-05-24 | 58 | 0 |
| 106245 | P0098354 | 2013-05-29 | 4 | 1 |
| 106245 | P0098354 | 2013-07-12 | 1 | 0 |
| 106280 | P0098389 | 2013-04-07 | 58 | 0 |
| 106280 | P0098389 | 2013-04-09 | 4 | 0 |
| 106416 | P0098525 | 2013-04-19 | 58 | 0 |
| 106416 | P0098525 | 2013-04-23 | 4 | 0 |
| 106444 | P0098553 | 2013-04-22 | 58 | 0 |
| 106444 | P0098553 | 2013-04-25 | 4 | 0 |
| 106609 | P0098718 | 2013-05-08 | 58 | 0 |
| 106609 | P0098718 | 2013-05-10 | 4 | 11 |
| 106609 | P0098718 | 2013-07-24 | 11 | 12 |
| 106609 | P0098718 | 4000-01-01 | 12 | 0 |
| 106616 | P0098725 | 2013-05-09 | 58 | 0 |
| 106616 | P0098725 | 2013-05-09 | 4 | 1 |
| 106616 | P0098725 | 2013-07-27 | 1 | 0 |
| 106698 | P0098807 | 2013-05-16 | 58 | 0 |
| 106698 | P0098807 | 2013-05-22 | 4 | 6 |
| 106698 | P0098807 | 2013-06-14 | 6 | 1 |
| 106698 | P0098807 | 2013-06-28 | 1 | 0 |
| 106714 | P0098823 | 2013-05-20 | 58 | 0 |
| 106714 | P0098823 | 2013-05-21 | 58 | 0 |
| 106714 | P0098823 | 2013-05-24 | 58 | 0 |
| 106729 | P0098838 | 2013-05-21 | 58 | 0 |
| 106729 | P0098838 | 2013-05-23 | 4 | 1 |
| 106729 | P0098838 | 2013-06-03 | 1 | 0 |
| 107038 | P0099147 | 2013-06-25 | 58 | 0 |
| 107038 | P0099147 | 2013-06-28 | 4 | 1 |
| 107038 | P0099147 | 2013-07-04 | 1 | 0 |
| 107038 | P0099147 | 2013-08-13 | 58 | 0 |
| 107038 | P0099147 | 2013-08-15 | 4 | 6 |
| 107038 | P0099147 | 4000-01-01 | 6 | 0 |
| 107082 | P0099191 | 2013-06-29 | 58 | 0 |
| 107082 | P0099191 | 2013-07-04 | 4 | 6 |
| 107082 | P0099191 | 2013-07-19 | 6 | 0 |
| 107157 | P0099267 | 4000-01-01 | 13 | 0 |
| 107336 | P0099446 | 4000-01-01 | 6 | 0 |
+------------------+-------------------------------+------------------+------------------+-----------------------+
thanks.
It is hard to understand exactly what all your rules are from the question, but the general approach should be to add a "Grouping" column to a singl query that uses a CASE statement to categorize the people.
The conditions in a CASE are evaluated in order, so that if the first criteria is met, then the subsequent criteria are not even evaluated for that row.
Here is some code to get you started....
select t1.refid_eClinibase
,t1.[dthrfinmouvement]
,t1.[unite_service_id]
,t1.[unite_service_suiv_id]
CASE WHEN [dthrfinmouvement] = '4000-01-01' THEN 'Group1 Label'
WHEN condition2 = something THEN 'Group2 Label'
....
WHEN conditionN = something THEN 'GroupN Label'
ELSE 'Catch All Label'
END as person_category
from sometable t1