Cummulative sum by year_month, location, state in Hive SQL

Cummulative sum by year_month, location, state in Hive SQL - sql

I want to make a cumulative count using Hive SQL in recorrencia column according to the other ones.
+------------+---------+-------+--------------+--+
| t.ano_mes | t.site | t.uf | recorrencia |
+------------+---------+-------+--------------+--+
| 202001 | 174 | AM | 1 |
| 202002 | 174 | AM | 1 |
| 202003 | 174 | AM | 1 |
| 202004 | 174 | AM | 1 |
| 202005 | 174 | AM | 1 |
| 202006 | 174 | AM | 1 |
| 202007 | 174 | AM | 1 |
| 202008 | 174 | AM | 1 |
| 202005 | 1JN | SP | 1 |
| 202006 | 1JN | SP | 1 |
| 202005 | 1LJ | SP | 1 |
| 202009 | 1LJ | SP | 1 |
| 202001 | 1RG | SP | 1 |
| 202002 | 1RG | SP | 1 |
| 202003 | 1RG | SP | 1 |
| 202004 | 1RG | SP | 1 |
| 202005 | 1RG | SP | 1 |
| 202006 | 1RG | SP | 1 |
| 202007 | 1RG | SP | 1 |
Desired output
+------------+---------+-------+--------------+--------+
| t.ano_mes | t.site | t.uf | recorrencia |cum_rec
+------------+---------+-------+--------------+--------+
| 202001 | 174 | AM | 1 |1
| 202002 | 174 | AM | 1 |2
| 202003 | 174 | AM | 1 |3
| 202004 | 174 | AM | 1 |4
| 202005 | 174 | AM | 1 |5
| 202006 | 174 | AM | 1 |6
| 202007 | 174 | AM | 1 |7
| 202008 | 174 | AM | 1 |8
| 202005 | 1JN | SP | 1 |1
| 202006 | 1JN | SP | 1 |2
| 202005 | 1LJ | SP | 1 |1
| 202009 | 1LJ | SP | 1 |2
| 202001 | 1RG | SP | 1 |1
| 202002 | 1RG | SP | 1 |2
| 202003 | 1RG | SP | 1 |3
| 202004 | 1RG | SP | 1 |4
| 202005 | 1RG | SP | 1 |5
| 202006 | 1RG | SP | 1 |6
| 202007 | 1RG | SP | 1 |7
I've tried a lot of functions like COUNT(*) OVER (t.ano_mes) and COUNT(*) OVER (t.site) but it runs the sum until the end of table, and do not restarts as the t.site changes.
As soon as t.site changes, the counter should restart.

That would be:
sum(recorrencia) over(partition by t.site order by t.ano_mes) as cum_rec
The partition by clause causes the sum to reset every time the site changes.
Note that if recorrencia is always 1, as shown in your sample data, then row_number() is sufficient:
row_number() over(partition by t.site order by t.ano_mes) as cum_rec

Related

Oracle SQL get last year of dates excluding weekends

I'd expect this to work to get me a list of calendar dates over the past 12 months excluding weekends; but it just gives me the entire list of dates - which I suppose is fine - but want to know why the below is incorrect.
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')

Because you're doing this:
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')
And today isn't Saturday or Sunday. You need to look at the calculated CalendarDate value; but you can't do that in the same level of subquery. You could try to recalculate it:
AND to_char(ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum,'DY') NOT IN ('SAT','SUN')
but this will return no rows - at least when run at the moment. As it happens, March 1st 2020 was a Sunday, so that is excluded; and because of when and how rownum is generated, that result is excluded, and the next one sees the same value, which is excluded, and so on.
You can use an inline view to avoid both issues:
SELECT CalendarDate
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
CALENDARDATE
02-MAR-20
03-MAR-20
04-MAR-20
05-MAR-20
06-MAR-20
09-MAR-20
10-MAR-20
...
db<>fiddle
I've chucked in a language modifier to stop it behaving differently for users with sessions not set to English.
Querying against all_objects isn't ideal though, it would be better to use a hierarcical query:
SELECT *
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + level AS CalendarDate
FROM dual
CONNECT BY level <= TRUNC(SYSDATE) - ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) + 1
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle
or a recursive CTE, if you're 11gR2+:
WITH rcte (CalendarDate) AS (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12)
FROM dual
UNION ALL
SELECT rcte.CalendarDate + interval '1' day
FROM rcte
WHERE rcte.CalendarDate < TRUNC(SYSDATE)
)
SELECT CalendarDate
FROM rcte
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle (as 18c to avoid a couple of issues with the patch level in the 11g version it uses).

You checking whether today is sunday or monday with to_char(sysdate,'DY'). you need to check CalendarDate which is not available in your window. You can use cte to calculate the calendar then you can remove weekends with your condition as below.
with cte (CalendarDate) as
(
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
select * from cte where
to_char(CalendarDate,'DY') not in ('SAT','SUN');
| CALENDARDATE |
| :----------- |
| 02-MAR-20 |
| 03-MAR-20 |
| 04-MAR-20 |
| 05-MAR-20 |
| 06-MAR-20 |
| 09-MAR-20 |
| 10-MAR-20 |
| 11-MAR-20 |
| 12-MAR-20 |
| 13-MAR-20 |
| 16-MAR-20 |
| 17-MAR-20 |
| 18-MAR-20 |
| 19-MAR-20 |
| 20-MAR-20 |
| 23-MAR-20 |
| 24-MAR-20 |
| 25-MAR-20 |
| 26-MAR-20 |
| 27-MAR-20 |
| 30-MAR-20 |
| 31-MAR-20 |
| 01-APR-20 |
| 02-APR-20 |
| 03-APR-20 |
| 06-APR-20 |
| 07-APR-20 |
| 08-APR-20 |
| 09-APR-20 |
| 10-APR-20 |
| 13-APR-20 |
| 14-APR-20 |
| 15-APR-20 |
| 16-APR-20 |
| 17-APR-20 |
| 20-APR-20 |
| 21-APR-20 |
| 22-APR-20 |
| 23-APR-20 |
| 24-APR-20 |
| 27-APR-20 |
| 28-APR-20 |
| 29-APR-20 |
| 30-APR-20 |
| 01-MAY-20 |
| 04-MAY-20 |
| 05-MAY-20 |
| 06-MAY-20 |
| 07-MAY-20 |
| 08-MAY-20 |
| 11-MAY-20 |
| 12-MAY-20 |
| 13-MAY-20 |
| 14-MAY-20 |
| 15-MAY-20 |
| 18-MAY-20 |
| 19-MAY-20 |
| 20-MAY-20 |
| 21-MAY-20 |
| 22-MAY-20 |
| 25-MAY-20 |
| 26-MAY-20 |
| 27-MAY-20 |
| 28-MAY-20 |
| 29-MAY-20 |
| 01-JUN-20 |
| 02-JUN-20 |
| 03-JUN-20 |
| 04-JUN-20 |
| 05-JUN-20 |
| 08-JUN-20 |
| 09-JUN-20 |
| 10-JUN-20 |
| 11-JUN-20 |
| 12-JUN-20 |
| 15-JUN-20 |
| 16-JUN-20 |
| 17-JUN-20 |
| 18-JUN-20 |
| 19-JUN-20 |
| 22-JUN-20 |
| 23-JUN-20 |
| 24-JUN-20 |
| 25-JUN-20 |
| 26-JUN-20 |
| 29-JUN-20 |
| 30-JUN-20 |
| 01-JUL-20 |
| 02-JUL-20 |
| 03-JUL-20 |
| 06-JUL-20 |
| 07-JUL-20 |
| 08-JUL-20 |
| 09-JUL-20 |
| 10-JUL-20 |
| 13-JUL-20 |
| 14-JUL-20 |
| 15-JUL-20 |
| 16-JUL-20 |
| 17-JUL-20 |
| 20-JUL-20 |
| 21-JUL-20 |
| 22-JUL-20 |
| 23-JUL-20 |
| 24-JUL-20 |
| 27-JUL-20 |
| 28-JUL-20 |
| 29-JUL-20 |
| 30-JUL-20 |
| 31-JUL-20 |
| 03-AUG-20 |
| 04-AUG-20 |
| 05-AUG-20 |
| 06-AUG-20 |
| 07-AUG-20 |
| 10-AUG-20 |
| 11-AUG-20 |
| 12-AUG-20 |
| 13-AUG-20 |
| 14-AUG-20 |
| 17-AUG-20 |
| 18-AUG-20 |
| 19-AUG-20 |
| 20-AUG-20 |
| 21-AUG-20 |
| 24-AUG-20 |
| 25-AUG-20 |
| 26-AUG-20 |
| 27-AUG-20 |
| 28-AUG-20 |
| 31-AUG-20 |
| 01-SEP-20 |
| 02-SEP-20 |
| 03-SEP-20 |
| 04-SEP-20 |
| 07-SEP-20 |
| 08-SEP-20 |
| 09-SEP-20 |
| 10-SEP-20 |
| 11-SEP-20 |
| 14-SEP-20 |
| 15-SEP-20 |
| 16-SEP-20 |
| 17-SEP-20 |
| 18-SEP-20 |
| 21-SEP-20 |
| 22-SEP-20 |
| 23-SEP-20 |
| 24-SEP-20 |
| 25-SEP-20 |
| 28-SEP-20 |
| 29-SEP-20 |
| 30-SEP-20 |
| 01-OCT-20 |
| 02-OCT-20 |
| 05-OCT-20 |
| 06-OCT-20 |
| 07-OCT-20 |
| 08-OCT-20 |
| 09-OCT-20 |
| 12-OCT-20 |
| 13-OCT-20 |
| 14-OCT-20 |
| 15-OCT-20 |
| 16-OCT-20 |
| 19-OCT-20 |
| 20-OCT-20 |
| 21-OCT-20 |
| 22-OCT-20 |
| 23-OCT-20 |
| 26-OCT-20 |
| 27-OCT-20 |
| 28-OCT-20 |
| 29-OCT-20 |
| 30-OCT-20 |
| 02-NOV-20 |
| 03-NOV-20 |
| 04-NOV-20 |
| 05-NOV-20 |
| 06-NOV-20 |
| 09-NOV-20 |
| 10-NOV-20 |
| 11-NOV-20 |
| 12-NOV-20 |
| 13-NOV-20 |
| 16-NOV-20 |
| 17-NOV-20 |
| 18-NOV-20 |
| 19-NOV-20 |
| 20-NOV-20 |
| 23-NOV-20 |
| 24-NOV-20 |
| 25-NOV-20 |
| 26-NOV-20 |
| 27-NOV-20 |
| 30-NOV-20 |
| 01-DEC-20 |
| 02-DEC-20 |
| 03-DEC-20 |
| 04-DEC-20 |
| 07-DEC-20 |
| 08-DEC-20 |
| 09-DEC-20 |
| 10-DEC-20 |
| 11-DEC-20 |
| 14-DEC-20 |
| 15-DEC-20 |
| 16-DEC-20 |
| 17-DEC-20 |
| 18-DEC-20 |
| 21-DEC-20 |
| 22-DEC-20 |
| 23-DEC-20 |
| 24-DEC-20 |
| 25-DEC-20 |
| 28-DEC-20 |
| 29-DEC-20 |
| 30-DEC-20 |
| 31-DEC-20 |
| 01-JAN-21 |
| 04-JAN-21 |
| 05-JAN-21 |
| 06-JAN-21 |
| 07-JAN-21 |
| 08-JAN-21 |
| 11-JAN-21 |
| 12-JAN-21 |
| 13-JAN-21 |
| 14-JAN-21 |
| 15-JAN-21 |
| 18-JAN-21 |
| 19-JAN-21 |
| 20-JAN-21 |
| 21-JAN-21 |
| 22-JAN-21 |
| 25-JAN-21 |
| 26-JAN-21 |
| 27-JAN-21 |
| 28-JAN-21 |
| 29-JAN-21 |
| 01-FEB-21 |
| 02-FEB-21 |
| 03-FEB-21 |
| 04-FEB-21 |
| 05-FEB-21 |
| 08-FEB-21 |
| 09-FEB-21 |
| 10-FEB-21 |
| 11-FEB-21 |
| 12-FEB-21 |
| 15-FEB-21 |
| 16-FEB-21 |
| 17-FEB-21 |
| 18-FEB-21 |
| 19-FEB-21 |
| 22-FEB-21 |
| 23-FEB-21 |
| 24-FEB-21 |
| 25-FEB-21 |
| 26-FEB-21 |
| 01-MAR-21 |
| 02-MAR-21 |
| 03-MAR-21 |
| 04-MAR-21 |
| 05-MAR-21 |
| 08-MAR-21 |
| 09-MAR-21 |
db<>fiddle here

Stored Procedure IF Exist Update Else Insert, if match

I got a lot of help form different articles, for which i thank you a lot.
But, right now i have a case were i need your support namely related with an SQL procedure which not only is checking if data exists (for update or insert it, but i have to match it with other 2 tables to check if data is matching then insert the same row with different values for some columns on a separate table).
I hope to be more explicit, so the example below I hope to help
Main row data : db.Table1
|rowID|PurchaseDate|ProducID | ProductName | CustomerID | Qty | UnitType|
|row1 |09.09.2018 |206 | Prod1 | 1 | 10 | bl. |
|row2 |09.09.2018 |207 | Prod2 | 2 | 15 | bl. |
|row3 |12.09.2018 |203 | Prod5 | 5 | 5 | lk. |
|row4 |15.09.2018 |207 | Prod2 | 6 | 10 | lk. |
|row5 |20.09.2018 |207 | Prod2 | 8 | 3 | Pk. |
|row6 |20.09.2018 |203 | Prod5 | 8 | 6 | Pk. |
|row7 |20.09.2018 |205 | Prod0 | 2 | 5 | J. |
to match with: db.Table2
|CustomerID| CustomerName|
|1 | Customer1 |
|2 | Customer2 |
|3 | Customer3 |
|4 | Customer4 |
|5 | Customer5 |
to match with: db.Table3
|ProducID| ProductName| SubProdNAME| |
|205 | Prod0 | Prod101 |
|205 | Prod0 | Prod202 |
|204 | Prod01 | Prod1001 |
|204 | Prod01 | Prod2002 |
to final table: db.TableFIN
|rowID| PurchaseDate|ProducID|ProductName|CustomerID|Qty|UnitType |Stage|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | DONE|
|row1 | 09.09.2018 | 206 | Prod1 | 1 |10 | bl. | NONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | DONE|
|row2 | 09.09.2018 | 207 | Prod2 | 2 |15 | bl. | NONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | DONE|
|row3 | 12.09.2018 | 203 | Prod5 | 5 |5 | lk. | NONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |10 | lk. | DONE|
|row4 | 15.09.2018 | 207 | Prod2 | 6 |0 | lk. | NONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |3 | Pk. | DONE|
|row5 | 20.09.2018 | 207 | Prod2 | 8 |0 | Pk. | NONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |6 | Pk. | DONE|
|row6 | 20.09.2018 | 203 | Prod5 | 8 |0 | Pk. | NONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod101 | 3 |5 | bundle| NONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| DONE|
|row7 | 20.09.2018 | 205 | Prod202 | 3 |5 | bundle| NONE|
So, basically what i need is to insert data by row depending on Stage, one row with stage DONE and second with NONE - plus, in case the consumerID it matches then Qty value it's equal in both cases, otherwise for NONE value = 0 and for DONE the original value.
FOR ProducID if it matches the product, then we have to insert 4 rows. as on above table. Again matching consumerID & Prod updating/inserting stage/values.
YOur support, is highly appreciated.
Thank you in advance!

How to conditionally count rows from another table WITHOUT USING A CORRELATED SUBQUERY?

I have a dataset for which I have to conditionally count rows from table B that are between two dates in table A. I have to do this without the use of a correlated subquery in the SELECT clause, as this is not supported in Netezza - docs: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.0.3/com.ibm.nz.dbu.doc/c_dbuser_correlated_subqueries_ntz_sql.html.
Background on tables: Users can log in to a site (logins). When they log in, they can take actions, which are in (actions_taken). The desired output is a count of rows that are between the actions_taken action_date and lag_action_date.
Data and attempt found here: http://rextester.com/NLDH13254
Table: actions_taken (with added calculations - see RexTester.)
| user_id | action_type | action_date | lag_action_date | elapsed_days |
|---------|---------------|-------------|-----------------|--------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL |
Table: logins
| user_id | login_date |
|---------|------------|
| 12345 | 6/27/2017 |
| 12345 | 6/26/2017 |
| 12345 | 3/7/2017 |
| 12345 | 3/6/2017 |
| 12345 | 3/3/2017 |
| 12345 | 3/2/2017 |
| 12345 | 3/1/2017 |
| 12345 | 2/28/2017 |
| 12345 | 2/27/2017 |
| 12345 | 2/25/2017 |
| 12345 | 3/25/2016 |
| 12345 | 3/23/2016 |
| 12345 | 3/20/2016 |
| 99887 | 6/27/2017 |
| 99887 | 6/26/2017 |
| 99887 | 6/24/2017 |
| 99887 | 4/2/2017 |
| 99887 | 4/1/2017 |
| 99887 | 3/30/2017 |
| 99887 | 3/8/2017 |
| 99887 | 3/6/2017 |
| 99887 | 3/3/2017 |
| 99887 | 3/2/2017 |
| 99887 | 2/28/2017 |
| 99887 | 2/11/2017 |
| 99887 | 1/28/2017 |
| 99887 | 1/26/2017 |
| 99887 | 5/28/2016 |
DESIRED OUTPUT: cnt_logins_between_action_dates field
| user_id | action_type | action_date | lag_action_date | elapsed_days | cnt_logins_between_action_dates |
|---------|---------------|-------------|-----------------|--------------|---------------------------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 | 5 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 | 4 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL | 1 |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 | 7 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL | 1 |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL | 1 |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 | 8 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 | 2 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL | 1 |

You don't need a correlated sub-query. Get the previous date using lag and join the logins table to count the actions between dates.
with prev_dates as (select at.*
,coalesce(lag(action_date) over(partition by user_id,action_type order by action_date)
,action_date) as lag_action_date
from actions_taken at
)
select at.user_id,at.action_type,at.action_date,at.lag_action_date
,at.action_date-at.lag_action_date as elapsed_days
,count(*) as cnt
from prev_dates at
join login l on l.user_id=at.user_id and l.login_date<=at.action_date and l.login_date>=at.lag_action_date
group by at.user_id,at.action_type,at.action_date,at.lag_action_date
order by 1,2,3

How do i can convert Column Values Into Row?

Here I'm having table 'TABLE_1' as
|ID | Name | ACID1 | ACVALUE1 | ACID2 | ACVALUE2 | ACID3 | ACVALUE3 |
|----------------------------------------------------------------------
| 1 |ABC | 10 | 82.50 | 20 | 175.95 | 40 | 125.75 |
| 2 |IJK | 30 | 120.55 | 20 | 68.30 | 50 | 25.45 |
| 3 |LMN | 40 | 62.50 | 10 | 87.25 | 30 | 40.50 |
----------------------------------------------------------------------
Another table is AC_TABLE whose ID is recored in above table as ACID1,ACID2,...
___________________
|ID | Name |
|-------------------
|10 | AC1 |
|20 | AC2 |
|30 | AC3 |
|40 | AC4 |
|50 | AC5 |
-------------------
Now what all i want the result in following format
_____________________________
ID | Name | ACName | ACVALUE
------------------------------
1 | ABC | AC1 | 82.50
1 | ABC | AC2 | 175.95
1 | ABC | AC4 | 125.75
2 | IJK | AC3 | 120.55
2 | IJK | AC2 | 68.30
2 | IJK | AC5 | 25.45
3 | LMN | AC4 | 62.50
3 | LMN | AC1 | 87.25
3 | LMN | AC3 | 40.50
-------------------------------
Please help me to get the desired result.

Use Cross Apply to unpivot multiple columns.
First unpivot the Table_1 using cross apply then join the result with the ac_table table. Try this.
SELECT a.id,
a.name,
b.name AS ACName,
cs.ACValue
FROM Table_1 a
CROSS apply (VALUES (ACVALUE1,ACID1),
(ACVALUE2,ACID2),
(ACVALUE3,ACID3))cs(acvalue, ac)
JOIN ac_table b
ON cs.ac = b.id
SQLFIDDLE DEMO

SQL Performance multiple exclusion from the same table

I have a table where I have a list of people, lets say i have 100 people listed in that table
I need to filter out the people using different criteria's and put them in groups, problem is when i start excluding on the 4th-5th level, performance issues come up and it becomes slow
with lst_tous_movements as (
select
t1.refid_eClinibase
t1.[dthrfinmouvement]
t1.[unite_service_id]
t1.[unite_service_suiv_id]
from sometable t1
)
,lst_patients_hospitalisés as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
where
t1.[dthrfinmouvement] = '4000-01-01'
)
,lst_patients_admisUIB_transferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_hospitalisés t2 on t1.refid_eClinibase = t2.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] <> 0
and t2.refid_eClinibase is null
)
,lst_patients_admisUIB_nonTransferes as (
select distinct
t1.refid_eClinibase
from lst_tous_movements t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
where
t1.[unite_service_id] = 4
and t1.[unite_service_suiv_id] = 0
and t2.refid_eClinibase is null
and t3.refid_eClinibase is null
)
,lst_patients_autres as (
select distinct
t1.refid_eClinibase
from lst_patients t1
left join lst_patients_admisUIB_transferes t2 on t1.refid_eClinibase = t2.refid_eClinibase
left join lst_patients_hospitalisés t3 on t1.refid_eClinibase = t3.refid_eClinibase
left join lst_patients_admisUIB_nonTransferes t4 on t1.refid_eClinibase = t4.refid_eClinibase
where
t2.refid_eClinibase is null
and t3.refid_eClinibase is null
and t4.refid_eClinibase is null
)
as you can see i have a multi level filtering out going on here...
1st i get the people where t1.[dthrfinmouvement] = '4000-01-01'
2nd i get the people with another criteria EXCLUDING the 1st group
3rd i get the people with yet another criteria EXCLUDING the 1st and
the 2nd group
etc..
when i get to the 4th level, my query takes 6 - 10 seconds to complete
is there any way to speed this up ?
this is my dataset i'm working with:
+------------------+-------------------------------+------------------+------------------+-----------------------+
| refid_eClinibase | nodossierpermanent_eClinibase | dthrfinmouvement | unite_service_id | unite_service_suiv_id |
+------------------+-------------------------------+------------------+------------------+-----------------------+
| 25611 | P0017379 | 2013-04-27 | 58 | 0 |
| 25611 | P0017379 | 2013-05-02 | 4 | 2 |
| 25611 | P0017379 | 2013-05-18 | 2 | 0 |
| 85886 | P0077918 | 2013-04-10 | 58 | 0 |
| 85886 | P0077918 | 2013-05-06 | 6 | 12 |
| 85886 | P0077918 | 4000-01-01 | 12 | 0 |
| 91312 | P0083352 | 2013-07-24 | 3 | 14 |
| 91312 | P0083352 | 2013-07-24 | 14 | 3 |
| 91312 | P0083352 | 2013-07-30 | 3 | 8 |
| 91312 | P0083352 | 4000-01-01 | 8 | 0 |
| 93835 | P0085879 | 2013-04-30 | 58 | 0 |
| 93835 | P0085879 | 2013-05-07 | 4 | 2 |
| 93835 | P0085879 | 2013-05-16 | 2 | 0 |
| 93835 | P0085879 | 2013-05-22 | 58 | 0 |
| 93835 | P0085879 | 2013-05-24 | 4 | 0 |
| 93835 | P0085879 | 2013-05-31 | 58 | 0 |
| 93836 | P0085880 | 2013-05-20 | 58 | 0 |
| 93836 | P0085880 | 2013-05-22 | 4 | 2 |
| 93836 | P0085880 | 2013-05-31 | 2 | 0 |
| 97509 | P0089576 | 2013-04-09 | 58 | 0 |
| 97509 | P0089576 | 2013-04-11 | 4 | 0 |
| 102787 | P0094886 | 2013-04-08 | 58 | 0 |
| 102787 | P0094886 | 2013-04-11 | 4 | 2 |
| 102787 | P0094886 | 2013-05-21 | 2 | 0 |
| 103029 | P0095128 | 2013-04-04 | 58 | 0 |
| 103029 | P0095128 | 2013-04-10 | 4 | 1 |
| 103029 | P0095128 | 2013-05-03 | 1 | 0 |
| 103813 | P0095922 | 2013-07-02 | 58 | 0 |
| 103813 | P0095922 | 2013-07-03 | 4 | 6 |
| 103813 | P0095922 | 2013-08-14 | 6 | 0 |
| 105106 | P0097215 | 2013-08-09 | 58 | 0 |
| 105106 | P0097215 | 2013-08-13 | 4 | 0 |
| 105106 | P0097215 | 2013-08-14 | 58 | 0 |
| 105106 | P0097215 | 4000-01-01 | 4 | 0 |
| 106223 | P0098332 | 2013-06-11 | 1 | 0 |
| 106223 | P0098332 | 2013-08-01 | 58 | 0 |
| 106223 | P0098332 | 4000-01-01 | 1 | 0 |
| 106245 | P0098354 | 2013-04-02 | 58 | 0 |
| 106245 | P0098354 | 2013-05-24 | 58 | 0 |
| 106245 | P0098354 | 2013-05-29 | 4 | 1 |
| 106245 | P0098354 | 2013-07-12 | 1 | 0 |
| 106280 | P0098389 | 2013-04-07 | 58 | 0 |
| 106280 | P0098389 | 2013-04-09 | 4 | 0 |
| 106416 | P0098525 | 2013-04-19 | 58 | 0 |
| 106416 | P0098525 | 2013-04-23 | 4 | 0 |
| 106444 | P0098553 | 2013-04-22 | 58 | 0 |
| 106444 | P0098553 | 2013-04-25 | 4 | 0 |
| 106609 | P0098718 | 2013-05-08 | 58 | 0 |
| 106609 | P0098718 | 2013-05-10 | 4 | 11 |
| 106609 | P0098718 | 2013-07-24 | 11 | 12 |
| 106609 | P0098718 | 4000-01-01 | 12 | 0 |
| 106616 | P0098725 | 2013-05-09 | 58 | 0 |
| 106616 | P0098725 | 2013-05-09 | 4 | 1 |
| 106616 | P0098725 | 2013-07-27 | 1 | 0 |
| 106698 | P0098807 | 2013-05-16 | 58 | 0 |
| 106698 | P0098807 | 2013-05-22 | 4 | 6 |
| 106698 | P0098807 | 2013-06-14 | 6 | 1 |
| 106698 | P0098807 | 2013-06-28 | 1 | 0 |
| 106714 | P0098823 | 2013-05-20 | 58 | 0 |
| 106714 | P0098823 | 2013-05-21 | 58 | 0 |
| 106714 | P0098823 | 2013-05-24 | 58 | 0 |
| 106729 | P0098838 | 2013-05-21 | 58 | 0 |
| 106729 | P0098838 | 2013-05-23 | 4 | 1 |
| 106729 | P0098838 | 2013-06-03 | 1 | 0 |
| 107038 | P0099147 | 2013-06-25 | 58 | 0 |
| 107038 | P0099147 | 2013-06-28 | 4 | 1 |
| 107038 | P0099147 | 2013-07-04 | 1 | 0 |
| 107038 | P0099147 | 2013-08-13 | 58 | 0 |
| 107038 | P0099147 | 2013-08-15 | 4 | 6 |
| 107038 | P0099147 | 4000-01-01 | 6 | 0 |
| 107082 | P0099191 | 2013-06-29 | 58 | 0 |
| 107082 | P0099191 | 2013-07-04 | 4 | 6 |
| 107082 | P0099191 | 2013-07-19 | 6 | 0 |
| 107157 | P0099267 | 4000-01-01 | 13 | 0 |
| 107336 | P0099446 | 4000-01-01 | 6 | 0 |
+------------------+-------------------------------+------------------+------------------+-----------------------+
thanks.

It is hard to understand exactly what all your rules are from the question, but the general approach should be to add a "Grouping" column to a singl query that uses a CASE statement to categorize the people.
The conditions in a CASE are evaluated in order, so that if the first criteria is met, then the subsequent criteria are not even evaluated for that row.
Here is some code to get you started....
select t1.refid_eClinibase
,t1.[dthrfinmouvement]
,t1.[unite_service_id]
,t1.[unite_service_suiv_id]
CASE WHEN [dthrfinmouvement] = '4000-01-01' THEN 'Group1 Label'
WHEN condition2 = something THEN 'Group2 Label'
....
WHEN conditionN = something THEN 'GroupN Label'
ELSE 'Catch All Label'
END as person_category
from sometable t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cummulative sum by year_month, location, state in Hive SQL - sql

Related

Oracle SQL get last year of dates excluding weekends

Stored Procedure IF Exist Update Else Insert, if match

How to conditionally count rows from another table WITHOUT USING A CORRELATED SUBQUERY?

How do i can convert Column Values Into Row?

SQL Performance multiple exclusion from the same table

Categories

Resources