How can one assign a rank that increases--rather than the same value as rank() and dense_rank() does--to later group of values previously encountered?

How can one assign a rank that increases--rather than the same value as rank() and dense_rank() does--to later group of values previously encountered? - sql

date id b bc x
2017-06-01 a35b3y26f 3 0.19 1
2017-06-02 a35b3y26f 3 0.19 1
2017-06-03 a35b3y26f 3 0.23 2
2017-06-04 a35b3y26f 3 0.12 3
2017-06-05 a35b3y26f 3 0.21 4
2017-06-06 a35b3y26f 3 0.19 5
2017-06-07 a35b3y26f 3 0.28 6
2017-06-08 a35b3y26f 3 0 7
2017-06-09 a35b3y26f 3 0 7
2017-06-10 a35b3y26f 3 0.15 8
2017-06-11 a35b3y26f 3 0.3 9
2017-06-12 a35b3y26f 3 0.17 10
2017-06-13 a35b3y26f 3 0.27 11
2017-06-14 a35b3y26f 3 0.28 12
2017-06-15 a35b3y26f 3 0.18 13
2017-06-16 a35b3y26f 3 0 14
2017-06-17 a35b3y26f 3 0.2 15
2017-06-18 a35b3y26f 3 0 16
2017-06-19 a35b3y26f 3 0.28 17
2017-06-20 a35b3y26f 3 0.25 18
2017-06-21 a35b3y26f 3 0.19 19
2017-06-22 a35b3y26f 3 0.23 20
2017-06-23 a35b3y26f 3 0 21
2017-06-24 a35b3y26f 3 0 21
2017-06-25 a35b3y26f 3 0.13 22
Above, column x represents the values that I wish to have outputted in the result-set.
Is there a way using the existing windowing functions provided by PostgreSQL that I can obtain this outcome?

One way is to use sum and lag functions:
SELECT "date", "id", "b", "bc", "x",
SUM( xxxxx ) OVER (order by "date") As X
FROM (
SELECT *,
CASE "bc"
WHEN lag( "bc" ) over (order by "date")
THEN 0 ELSE 1
END as xxxxx
FROM table1
) x
Demo: http://sqlfiddle.com/#!17/8dab6/4
| date | id | b | bc | x | x |
|----------------------|-----------|---|------|----|----|
| 2017-06-01T00:00:00Z | a35b3y26f | 3 | 0.19 | 1 | 1 |
| 2017-06-02T00:00:00Z | a35b3y26f | 3 | 0.19 | 1 | 1 |
| 2017-06-03T00:00:00Z | a35b3y26f | 3 | 0.23 | 2 | 2 |
| 2017-06-04T00:00:00Z | a35b3y26f | 3 | 0.12 | 3 | 3 |
| 2017-06-05T00:00:00Z | a35b3y26f | 3 | 0.21 | 4 | 4 |
| 2017-06-06T00:00:00Z | a35b3y26f | 3 | 0.19 | 5 | 5 |
| 2017-06-07T00:00:00Z | a35b3y26f | 3 | 0.28 | 6 | 6 |
| 2017-06-08T00:00:00Z | a35b3y26f | 3 | 0 | 7 | 7 |
| 2017-06-09T00:00:00Z | a35b3y26f | 3 | 0 | 7 | 7 |
| 2017-06-10T00:00:00Z | a35b3y26f | 3 | 0.15 | 8 | 8 |
| 2017-06-11T00:00:00Z | a35b3y26f | 3 | 0.3 | 9 | 9 |
| 2017-06-12T00:00:00Z | a35b3y26f | 3 | 0.17 | 10 | 10 |
| 2017-06-13T00:00:00Z | a35b3y26f | 3 | 0.27 | 11 | 11 |
| 2017-06-14T00:00:00Z | a35b3y26f | 3 | 0.28 | 12 | 12 |
| 2017-06-15T00:00:00Z | a35b3y26f | 3 | 0.18 | 13 | 13 |
| 2017-06-16T00:00:00Z | a35b3y26f | 3 | 0 | 14 | 14 |
| 2017-06-17T00:00:00Z | a35b3y26f | 3 | 0.2 | 15 | 15 |
| 2017-06-18T00:00:00Z | a35b3y26f | 3 | 0 | 16 | 16 |
| 2017-06-19T00:00:00Z | a35b3y26f | 3 | 0.28 | 17 | 17 |
| 2017-06-20T00:00:00Z | a35b3y26f | 3 | 0.25 | 18 | 18 |
| 2017-06-21T00:00:00Z | a35b3y26f | 3 | 0.19 | 19 | 19 |
| 2017-06-22T00:00:00Z | a35b3y26f | 3 | 0.23 | 20 | 20 |
| 2017-06-23T00:00:00Z | a35b3y26f | 3 | 0 | 21 | 21 |
| 2017-06-24T00:00:00Z | a35b3y26f | 3 | 0 | 21 | 21 |
| 2017-06-25T00:00:00Z | a35b3y26f | 3 | 0.13 | 22 | 22 |

Related

How to drop duplicate rows from postgresql sql table

date | window | points | actual_bool | previous_bool | creation_time | source
------------+---------+---------+---------------------+---------------------------------+----------------------------+--------
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh
2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata
2021-02-11 | 110 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | nfiws
2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | asdg1
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:22:22.969014 | j
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | aba
2021-02-11 | 110 | 0.5 | 0 | 1 | 2021-02-14 09:22:22.969014 | fg
2021-02-11 | 110 | 0.6 | 1 | 0 | 2021-02-14 09:22:22.969014 | wdda
2021-02-11 | 110 | 0.7 | 1 | 1 | 2021-02-14 09:23:21.977685 | dda
2021-02-11 | 110 | 0.5 | 1 | 0 | 2021-02-14 09:23:21.977685 | dd
2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | so
2021-02-11 | 110 | 0.5 | 1 | 1 | 2021-02-14 09:23:21.977685 | dar
2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | firr
2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | xim
2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | cxyy
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:24:15.831411 | bisd
2021-02-11 | 110 | 0.1 | 0 | 1 | 2021-02-14 09:24:15.831411 | cope
2021-02-11 | 110 | 0.2 | 0 | 1 | 2021-02-14 09:24:15.831411 | sand
...
I have the following dataset in a postgresql table called testtable in testdb.
I have accidentally copied over the database and duplicated rows.
How can I delete the duplicates?
Row 1 and row 5 are copies in this frame and row 2 and row 4 are copies too.
I have never used sql before to drop duplicates I have no idea where to start.
I tried
select creation_time, count(creation_time) from classification group by creation_time having count (creation_time)>1 order by source;
But all it did was show me howmany duplicates I had in each day,
Like this
creation_time | count
----------------------------+-------
2021-02-14 09:20:57.51966 | 10
2021-02-14 09:22:22.969014 | 10
2021-02-14 09:23:21.977685 | 10
2021-02-14 09:24:15.831411 | 10
2021-02-14 09:24:27.733763 | 10
2021-02-14 09:24:38.41793 | 10
2021-02-14 09:27:04.432466 | 10
2021-02-14 09:27:21.62256 | 10
2021-02-14 09:27:22.677763 | 10
2021-02-14 09:27:37.996054 | 10
2021-02-14 09:28:09.275041 | 10
2021-02-14 09:28:22.649391 | 10
...
There should only be 5 unique records in each creation_timestamp.
It doesnt show me the duplicates and even if i did it would have no idea how to drop them.

That is a lot of rows to delete. I would suggest just recreating the table:
create table new_classification as
select distinct c.*
from classification c;
After you have validated the data, you can reload it if you really want:
truncate table classification;
insert into classification
select *
from new_classification;
This process should be much faster than deleting 90% of the rows.

Get sum per month from daily data into new column while keeping daily data

I have a df with daily data and some levels per day:
date | value1 | value2 | level
2020-01-01 | 1 | 2 | "a"
2020-01-01 | 3 | 10 | "b"
2020-01-01 | 2 | 3 | "c"
2020-01-02 | 1 | 2 | "a"
2020-01-02 | 3 | 10 | "b"
2020-01-02 | 2 | 3 | "c"
... | ... | ... | ...
2021-02-01 | 10 | 1 | "a"
2021-02-01 | 8 | 4 | "b"
2021-02-01 | 1 | 5 | "c"
2021-02-03 | 10 | 1 | "a"
2021-02-03 | 8 | 4 | "b"
2021-02-03 | 1 | 5 | "c"
I need the sum per month of value1 and value2 in a new column while keeping the daily rows, like:
date | value1 | value2 | level | value1_permonth | value2_permonth
2020-01-01 | 1 | 2 | "a" | 12 | 30
2020-01-01 | 3 | 10 | "b" | 12 | 30
2020-01-01 | 2 | 3 | "c" | 12 | 30
2020-01-02 | 1 | 2 | "a" | 12 | 30
2020-01-02 | 3 | 10 | "b" | 12 | 30
2020-01-02 | 2 | 3 | "c" | 12 | 30
... | ... | ... | ... | ... | ...
2021-02-01 | 10 | 1 | "a" | 38 | 20
2021-02-01 | 8 | 4 | "b" | 38 | 20
2021-02-01 | 1 | 5 | "c" | 38 | 20
2021-02-03 | 10 | 1 | "a" | 38 | 20
2021-02-03 | 8 | 4 | "b" | 38 | 20
2021-02-03 | 1 | 5 | "c" | 38 | 20
How can I do this with pandas?

Use Grouper with GroupBy.transform for new columns filled by aggregate values:
cols = ['value1','value2']
df1 = df.groupby(pd.Grouper(freq='MS', key='date'))[cols].transform('sum')
Or DataFrame.resample:
cols = ['value1','value2']
df1 = df.resample('MS', on='date')[cols].transform('sum')
Or use monthle periods by Series.dt.to_period passed to groupby:
cols = ['value1','value2']
df1 = df.groupby(df['date'].dt.to_period('m'))[cols].transform('sum')
print (df1)
df2 = df.join(df1.add_suffix('_permonth'))
print (df2)
date value1 value2 level value1_permonth value2_permonth
0 2020-01-01 1 2 a 12 30
1 2020-01-01 3 10 b 12 30
2 2020-01-01 2 3 c 12 30
3 2020-01-02 1 2 a 12 30
4 2020-01-02 3 10 b 12 30
5 2020-01-02 2 3 c 12 30
6 2021-02-01 10 1 a 38 20
7 2021-02-01 8 4 b 38 20
8 2021-02-01 1 5 c 38 20
9 2021-02-03 10 1 a 38 20
10 2021-02-03 8 4 b 38 20
11 2021-02-03 1 5 c 38 20

How to detect missing values if the dataframe has removed the missing rows already?

I have a dataframe which contains time series data of 30 consecutive days, each day is supposed to contain data of 24 hours from 0 to 23, so there suppose to have 24*30 = 720 rows in the dataframe. However, there are some rows containing missing records of the column "Fooo" already being removed from the dataframe.
Index | DATE(YYYY/MM/DD) | Hour | Fooo
0 | 2015/01/01 | 0 | x
1 | 2015/01/01 | 1 | xy
2 | ... | ... | z
23 | 2015/01/01 | 23 | z
24 | 2015/01/02 | 0 | z
25 | 2015/01/02 | 2 | bz
... | ... | ... | z
46 | 2015/01/02 | 23 | zz
...
...
680 | 2015/01/30 | 1 | z
681 | 2015/01/30 | 3 | bz
... | ... | ... | z
701 | 2015/01/30 | 23 | zz
I would like to rewrite the dataframe so that it contains full 720 rows, with missing values in the column "Fooo" being filled with "NA".
Index | DATE(YYYY/MM/DD) | Hour | Fooo
0 | 2015/01/01 | 0 | x
1 | 2015/01/01 | 1 | xy
2 | ... | ... | z
23 | 2015/01/01 | 23 | z
24 | 2015/01/02 | 0 | z
25 | 2015/01/02 | 1 | NA
26 | 2015/01/02 | 2 | bz
... | ... | ... | z
47 | 2015/01/02 | 23 | zz
...
...
690 | 2015/01/30 | 0 | NA
691 | 2015/01/30 | 1 | z
692 | 2015/01/30 | 2 | NA
693 | 2015/01/30 | 3 | bz
... | ... | ... | z
719 | 2015/01/30 | 23 | zz
How can I do that in pandas? I tried to create another dataframe with one column "Hour" like this:
Index | Hour |
0 | 0 |
1 | 1 |
2 | ... |
23 | 23 |
24 | 0 |
25 | 1 |
26 | 2 |
... | ...
47 | 23 |
...
...
690 | 0 |
691 | 1 |
692 | 2
693 | 3 |
... | |
719 | 23 |
then outer join it with the original one, but it did not work.

Create helper DataFrame by product and DataFrame.merge with left join:
from itertools import product
df['DATE(YYYY/MM/DD)'] = pd.to_datetime(df['DATE(YYYY/MM/DD)'])
df1 = pd.DataFrame(list(product(df['DATE(YYYY/MM/DD)'].unique(), range(27))),
columns=['DATE(YYYY/MM/DD)','Hour'])
df = df1.merge(df, how='left')
print (df.head(10))
DATE(YYYY/MM/DD) Hour Fooo
0 2015-01-01 0 x
1 2015-01-01 1 xy
2 2015-01-01 2 NaN
3 2015-01-01 3 NaN
4 2015-01-01 4 NaN
5 2015-01-01 5 NaN
6 2015-01-01 6 NaN
7 2015-01-01 7 NaN
8 2015-01-01 8 NaN
9 2015-01-01 9 NaN
Or create MultiIndex by MultiIndex.from_product and use DataFrame.reindex for append missing rows:
df['DATE(YYYY/MM/DD)'] = pd.to_datetime(df['DATE(YYYY/MM/DD)'])
mux = pd.MultiIndex.from_product([df['DATE(YYYY/MM/DD)'].unique(), range(27)],
names=['DATE(YYYY/MM/DD)','Hour'])
df = df.set_index(['DATE(YYYY/MM/DD)','Hour']).reindex(mux).reset_index()
print (df.head(10))
DATE(YYYY/MM/DD) Hour Fooo
0 2015-01-01 0 x
1 2015-01-01 1 xy
2 2015-01-01 2 NaN
3 2015-01-01 3 NaN
4 2015-01-01 4 NaN
5 2015-01-01 5 NaN
6 2015-01-01 6 NaN
7 2015-01-01 7 NaN
8 2015-01-01 8 NaN
9 2015-01-01 9 NaN

Count records for "empty" rows in multiple columns and joins

I' have searched a lot through the site trying to find a solution to my problem and I have found similar problems but I haven't managed to find a solution that works in my case.
I have a tickets table like this (which has a lot more data than this):
TICKET:
+---------+--------------+------------+------------+
| ticketid| report_date | impact | open |
+---------+--------------+------------+------------+
| 1 | 29/01/2019 | 1 | true |
| 2 | 29/01/2019 | 2 | true |
| 3 | 30/01/2019 | 4 | true |
| 4 | 27/01/2019 | 1 | true |
| 5 | 29/01/2019 | 1 | true |
| 6 | 30/01/2019 | 2 | true |
+---------+--------------+------------+------------+
There is another table that holds the possible values for the impact column in the table above:
IMPACT:
+---------+
| impact |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
+---------+
My objective is to extract a result set from the ticket table where I group by the impact, report_date and open flag and count the number of tickets in each group. Therefore, for the example above, I would like to extract the following result set.
+--------------+------------+------------+-----------+
| report_date | impact | open | tkt_count |
+--------------+------------+------------+-----------+
| 27/01/2019 | 1 | true | 1 |
| 27/01/2019 | 1 | false | 0 |
| 27/01/2019 | 2 | true | 0 |
| 27/01/2019 | 2 | false | 0 |
| 27/01/2019 | 3 | true | 0 |
| 27/01/2019 | 3 | false | 0 |
| 27/01/2019 | 4 | true | 0 |
| 27/01/2019 | 4 | false | 0 |
| 29/01/2019 | 1 | true | 2 |
| 29/01/2019 | 1 | false | 0 |
| 29/01/2019 | 2 | true | 1 |
| 29/01/2019 | 2 | false | 0 |
| 29/01/2019 | 3 | true | 0 |
| 29/01/2019 | 3 | false | 0 |
| 29/01/2019 | 4 | true | 0 |
| 29/01/2019 | 4 | false | 0 |
| 30/01/2019 | 1 | true | 0 |
| 30/01/2019 | 1 | false | 0 |
| 30/01/2019 | 2 | true | 1 |
| 30/01/2019 | 2 | false | 0 |
| 30/01/2019 | 3 | true | 0 |
| 30/01/2019 | 3 | false | 0 |
| 30/01/2019 | 4 | true | 1 |
| 30/01/2019 | 4 | false | 0 |
+--------------+------------+------------+-----------+
It seems simple enough, but the problem is with the "zero" rows.
For the example that I showed here, there are no tickets with impact 3 or tickets with the open flag flase for the range of dates given. And I cannot come up with a query that will show me all the counts, even if there are no rows for some values.
Can anyone help me?
Thanks in advance.

To solve this type of problem, one way to proceed is to generate a intermediate resultset that contains all records for which a value needs to be computed, and then LEFT JOIN it with the original data, using aggregation.
SELECT
dt.report_date,
i.impact,
op.[open],
COUNT(t.report_date) tkt_count
FROM
(SELECT DISTINCT report_date FROM ticket) dt
CROSS JOIN impact i
CROSS JOIN (SELECT 'true' [open] UNION ALL SELECT 'false') op
LEFT JOIN ticket t
ON t.report_date = dt.report_date
AND t.impact = i.impact
AND t.[open] = op.[open]
GROUP BY
dt.report_date,
i.impact,
op.[open]
This query generates the intermediate resultset as follows :
report_date : all distinct dates in the original data (report_date)
impact : contains of table impact
open : fixed list containing true or false (could also have been built from distinct values in the original data, but value false was not available is your sample data)
You can choose to change the above rules, the logic should remain the same. For example if there are gaps in the report_date, another widely used option is to create a calendar table.
Demo on DB Fiddle:
report_date | impact | open | tkt_count
:------------------ | -----: | :---- | --------:
27/01/2019 00:00:00 | 1 | false | 0
27/01/2019 00:00:00 | 1 | true | 1
27/01/2019 00:00:00 | 2 | false | 0
27/01/2019 00:00:00 | 2 | true | 0
27/01/2019 00:00:00 | 3 | false | 0
27/01/2019 00:00:00 | 3 | true | 0
27/01/2019 00:00:00 | 4 | false | 0
27/01/2019 00:00:00 | 4 | true | 0
29/01/2019 00:00:00 | 1 | false | 0
29/01/2019 00:00:00 | 1 | true | 2
29/01/2019 00:00:00 | 2 | false | 0
29/01/2019 00:00:00 | 2 | true | 1
29/01/2019 00:00:00 | 3 | false | 0
29/01/2019 00:00:00 | 3 | true | 0
29/01/2019 00:00:00 | 4 | false | 0
29/01/2019 00:00:00 | 4 | true | 0
30/01/2019 00:00:00 | 1 | false | 0
30/01/2019 00:00:00 | 1 | true | 0
30/01/2019 00:00:00 | 2 | false | 0
30/01/2019 00:00:00 | 2 | true | 1
30/01/2019 00:00:00 | 3 | false | 0
30/01/2019 00:00:00 | 3 | true | 0
30/01/2019 00:00:00 | 4 | false | 0
30/01/2019 00:00:00 | 4 | true | 1

I queried against a start and end calendar table by day and cross joined all available impact/open combos and finally bought in the ticket data, counting the non-null matches.
DECLARE #Impact TABLE(Impact INT)
INSERT #Impact VALUES(1),(2),(3),(4)
DECLARE #Tickets TABLE(report_date DATETIME, Impact INT, IsOpen BIT)
INSERT #Tickets VALUES
('01/29/2019',1,1),('01/29/2019',2,1),('01/30/2019',3,1),('01/27/2019',4,1),('01/29/2019',5,1),('01/30/2019',6,1)
DECLARE #StartDate DATETIME='01/01/2019'
DECLARE #EndDate DATETIME='02/01/2019'
;WITH AllDates AS
(
SELECT Date = #StartDate
UNION ALL
SELECT Date= DATEADD(DAY, 1, Date) FROM AllDates WHERE DATEADD(DAY, 1,Date) <= #EndDate
)
,AllImpacts AS
(
SELECT DISTINCT Impact,IsOpen = 1 FROM #Impact
UNION
SELECT DISTINCT Impact,IsOpen = 0 FROM #Impact
),
AllData AS
(
SELECT D.Date,A.impact,A.IsOpen
FROM AllDates D
CROSS APPLY AllImpacts A
)
SELECT
A.Date,A.Impact,A.IsOpen,
GroupCount = COUNT(T.Impact)
FROM
AllData A
LEFT OUTER JOIN #Tickets T ON T.report_date=A.Date AND T.Impact=A.Impact AND T.IsOpen = A.IsOpen
GROUP BY
A.Date,A.Impact,A.IsOpen
ORDER BY
A.Date,A.Impact,A.IsOpen
OPTION (MAXRECURSION 0);
GO
Date | Impact | IsOpen | GroupCount
:------------------ | -----: | -----: | ---------:
01/01/2019 00:00:00 | 1 | 0 | 0
01/01/2019 00:00:00 | 1 | 1 | 0
01/01/2019 00:00:00 | 2 | 0 | 0
01/01/2019 00:00:00 | 2 | 1 | 0
01/01/2019 00:00:00 | 3 | 0 | 0
01/01/2019 00:00:00 | 3 | 1 | 0
01/01/2019 00:00:00 | 4 | 0 | 0
01/01/2019 00:00:00 | 4 | 1 | 0
02/01/2019 00:00:00 | 1 | 0 | 0
02/01/2019 00:00:00 | 1 | 1 | 0
02/01/2019 00:00:00 | 2 | 0 | 0
02/01/2019 00:00:00 | 2 | 1 | 0
02/01/2019 00:00:00 | 3 | 0 | 0
02/01/2019 00:00:00 | 3 | 1 | 0
02/01/2019 00:00:00 | 4 | 0 | 0
02/01/2019 00:00:00 | 4 | 1 | 0
03/01/2019 00:00:00 | 1 | 0 | 0
03/01/2019 00:00:00 | 1 | 1 | 0
03/01/2019 00:00:00 | 2 | 0 | 0
03/01/2019 00:00:00 | 2 | 1 | 0
03/01/2019 00:00:00 | 3 | 0 | 0
03/01/2019 00:00:00 | 3 | 1 | 0
03/01/2019 00:00:00 | 4 | 0 | 0
03/01/2019 00:00:00 | 4 | 1 | 0
04/01/2019 00:00:00 | 1 | 0 | 0
04/01/2019 00:00:00 | 1 | 1 | 0
04/01/2019 00:00:00 | 2 | 0 | 0
04/01/2019 00:00:00 | 2 | 1 | 0
04/01/2019 00:00:00 | 3 | 0 | 0
04/01/2019 00:00:00 | 3 | 1 | 0
04/01/2019 00:00:00 | 4 | 0 | 0
04/01/2019 00:00:00 | 4 | 1 | 0
05/01/2019 00:00:00 | 1 | 0 | 0
05/01/2019 00:00:00 | 1 | 1 | 0
05/01/2019 00:00:00 | 2 | 0 | 0
05/01/2019 00:00:00 | 2 | 1 | 0
05/01/2019 00:00:00 | 3 | 0 | 0
05/01/2019 00:00:00 | 3 | 1 | 0
05/01/2019 00:00:00 | 4 | 0 | 0
05/01/2019 00:00:00 | 4 | 1 | 0
06/01/2019 00:00:00 | 1 | 0 | 0
06/01/2019 00:00:00 | 1 | 1 | 0
06/01/2019 00:00:00 | 2 | 0 | 0
06/01/2019 00:00:00 | 2 | 1 | 0
06/01/2019 00:00:00 | 3 | 0 | 0
06/01/2019 00:00:00 | 3 | 1 | 0
06/01/2019 00:00:00 | 4 | 0 | 0
06/01/2019 00:00:00 | 4 | 1 | 0
07/01/2019 00:00:00 | 1 | 0 | 0
07/01/2019 00:00:00 | 1 | 1 | 0
07/01/2019 00:00:00 | 2 | 0 | 0
07/01/2019 00:00:00 | 2 | 1 | 0
07/01/2019 00:00:00 | 3 | 0 | 0
07/01/2019 00:00:00 | 3 | 1 | 0
07/01/2019 00:00:00 | 4 | 0 | 0
07/01/2019 00:00:00 | 4 | 1 | 0
08/01/2019 00:00:00 | 1 | 0 | 0
08/01/2019 00:00:00 | 1 | 1 | 0
08/01/2019 00:00:00 | 2 | 0 | 0
08/01/2019 00:00:00 | 2 | 1 | 0
08/01/2019 00:00:00 | 3 | 0 | 0
08/01/2019 00:00:00 | 3 | 1 | 0
08/01/2019 00:00:00 | 4 | 0 | 0
08/01/2019 00:00:00 | 4 | 1 | 0
09/01/2019 00:00:00 | 1 | 0 | 0
09/01/2019 00:00:00 | 1 | 1 | 0
09/01/2019 00:00:00 | 2 | 0 | 0
09/01/2019 00:00:00 | 2 | 1 | 0
09/01/2019 00:00:00 | 3 | 0 | 0
09/01/2019 00:00:00 | 3 | 1 | 0
09/01/2019 00:00:00 | 4 | 0 | 0
09/01/2019 00:00:00 | 4 | 1 | 0
10/01/2019 00:00:00 | 1 | 0 | 0
10/01/2019 00:00:00 | 1 | 1 | 0
10/01/2019 00:00:00 | 2 | 0 | 0
10/01/2019 00:00:00 | 2 | 1 | 0
10/01/2019 00:00:00 | 3 | 0 | 0
10/01/2019 00:00:00 | 3 | 1 | 0
10/01/2019 00:00:00 | 4 | 0 | 0
10/01/2019 00:00:00 | 4 | 1 | 0
11/01/2019 00:00:00 | 1 | 0 | 0
11/01/2019 00:00:00 | 1 | 1 | 0
11/01/2019 00:00:00 | 2 | 0 | 0
11/01/2019 00:00:00 | 2 | 1 | 0
11/01/2019 00:00:00 | 3 | 0 | 0
11/01/2019 00:00:00 | 3 | 1 | 0
11/01/2019 00:00:00 | 4 | 0 | 0
11/01/2019 00:00:00 | 4 | 1 | 0
12/01/2019 00:00:00 | 1 | 0 | 0
12/01/2019 00:00:00 | 1 | 1 | 0
12/01/2019 00:00:00 | 2 | 0 | 0
12/01/2019 00:00:00 | 2 | 1 | 0
12/01/2019 00:00:00 | 3 | 0 | 0
12/01/2019 00:00:00 | 3 | 1 | 0
12/01/2019 00:00:00 | 4 | 0 | 0
12/01/2019 00:00:00 | 4 | 1 | 0
13/01/2019 00:00:00 | 1 | 0 | 0
13/01/2019 00:00:00 | 1 | 1 | 0
13/01/2019 00:00:00 | 2 | 0 | 0
13/01/2019 00:00:00 | 2 | 1 | 0
13/01/2019 00:00:00 | 3 | 0 | 0
13/01/2019 00:00:00 | 3 | 1 | 0
13/01/2019 00:00:00 | 4 | 0 | 0
13/01/2019 00:00:00 | 4 | 1 | 0
14/01/2019 00:00:00 | 1 | 0 | 0
14/01/2019 00:00:00 | 1 | 1 | 0
14/01/2019 00:00:00 | 2 | 0 | 0
14/01/2019 00:00:00 | 2 | 1 | 0
14/01/2019 00:00:00 | 3 | 0 | 0
14/01/2019 00:00:00 | 3 | 1 | 0
14/01/2019 00:00:00 | 4 | 0 | 0
14/01/2019 00:00:00 | 4 | 1 | 0
15/01/2019 00:00:00 | 1 | 0 | 0
15/01/2019 00:00:00 | 1 | 1 | 0
15/01/2019 00:00:00 | 2 | 0 | 0
15/01/2019 00:00:00 | 2 | 1 | 0
15/01/2019 00:00:00 | 3 | 0 | 0
15/01/2019 00:00:00 | 3 | 1 | 0
15/01/2019 00:00:00 | 4 | 0 | 0
15/01/2019 00:00:00 | 4 | 1 | 0
16/01/2019 00:00:00 | 1 | 0 | 0
16/01/2019 00:00:00 | 1 | 1 | 0
16/01/2019 00:00:00 | 2 | 0 | 0
16/01/2019 00:00:00 | 2 | 1 | 0
16/01/2019 00:00:00 | 3 | 0 | 0
16/01/2019 00:00:00 | 3 | 1 | 0
16/01/2019 00:00:00 | 4 | 0 | 0
16/01/2019 00:00:00 | 4 | 1 | 0
17/01/2019 00:00:00 | 1 | 0 | 0
17/01/2019 00:00:00 | 1 | 1 | 0
17/01/2019 00:00:00 | 2 | 0 | 0
17/01/2019 00:00:00 | 2 | 1 | 0
17/01/2019 00:00:00 | 3 | 0 | 0
17/01/2019 00:00:00 | 3 | 1 | 0
17/01/2019 00:00:00 | 4 | 0 | 0
17/01/2019 00:00:00 | 4 | 1 | 0
18/01/2019 00:00:00 | 1 | 0 | 0
18/01/2019 00:00:00 | 1 | 1 | 0
18/01/2019 00:00:00 | 2 | 0 | 0
18/01/2019 00:00:00 | 2 | 1 | 0
18/01/2019 00:00:00 | 3 | 0 | 0
18/01/2019 00:00:00 | 3 | 1 | 0
18/01/2019 00:00:00 | 4 | 0 | 0
18/01/2019 00:00:00 | 4 | 1 | 0
19/01/2019 00:00:00 | 1 | 0 | 0
19/01/2019 00:00:00 | 1 | 1 | 0
19/01/2019 00:00:00 | 2 | 0 | 0
19/01/2019 00:00:00 | 2 | 1 | 0
19/01/2019 00:00:00 | 3 | 0 | 0
19/01/2019 00:00:00 | 3 | 1 | 0
19/01/2019 00:00:00 | 4 | 0 | 0
19/01/2019 00:00:00 | 4 | 1 | 0
20/01/2019 00:00:00 | 1 | 0 | 0
20/01/2019 00:00:00 | 1 | 1 | 0
20/01/2019 00:00:00 | 2 | 0 | 0
20/01/2019 00:00:00 | 2 | 1 | 0
20/01/2019 00:00:00 | 3 | 0 | 0
20/01/2019 00:00:00 | 3 | 1 | 0
20/01/2019 00:00:00 | 4 | 0 | 0
20/01/2019 00:00:00 | 4 | 1 | 0
21/01/2019 00:00:00 | 1 | 0 | 0
21/01/2019 00:00:00 | 1 | 1 | 0
21/01/2019 00:00:00 | 2 | 0 | 0
21/01/2019 00:00:00 | 2 | 1 | 0
21/01/2019 00:00:00 | 3 | 0 | 0
21/01/2019 00:00:00 | 3 | 1 | 0
21/01/2019 00:00:00 | 4 | 0 | 0
21/01/2019 00:00:00 | 4 | 1 | 0
22/01/2019 00:00:00 | 1 | 0 | 0
22/01/2019 00:00:00 | 1 | 1 | 0
22/01/2019 00:00:00 | 2 | 0 | 0
22/01/2019 00:00:00 | 2 | 1 | 0
22/01/2019 00:00:00 | 3 | 0 | 0
22/01/2019 00:00:00 | 3 | 1 | 0
22/01/2019 00:00:00 | 4 | 0 | 0
22/01/2019 00:00:00 | 4 | 1 | 0
23/01/2019 00:00:00 | 1 | 0 | 0
23/01/2019 00:00:00 | 1 | 1 | 0
23/01/2019 00:00:00 | 2 | 0 | 0
23/01/2019 00:00:00 | 2 | 1 | 0
23/01/2019 00:00:00 | 3 | 0 | 0
23/01/2019 00:00:00 | 3 | 1 | 0
23/01/2019 00:00:00 | 4 | 0 | 0
23/01/2019 00:00:00 | 4 | 1 | 0
24/01/2019 00:00:00 | 1 | 0 | 0
24/01/2019 00:00:00 | 1 | 1 | 0
24/01/2019 00:00:00 | 2 | 0 | 0
24/01/2019 00:00:00 | 2 | 1 | 0
24/01/2019 00:00:00 | 3 | 0 | 0
24/01/2019 00:00:00 | 3 | 1 | 0
24/01/2019 00:00:00 | 4 | 0 | 0
24/01/2019 00:00:00 | 4 | 1 | 0
25/01/2019 00:00:00 | 1 | 0 | 0
25/01/2019 00:00:00 | 1 | 1 | 0
25/01/2019 00:00:00 | 2 | 0 | 0
25/01/2019 00:00:00 | 2 | 1 | 0
25/01/2019 00:00:00 | 3 | 0 | 0
25/01/2019 00:00:00 | 3 | 1 | 0
25/01/2019 00:00:00 | 4 | 0 | 0
25/01/2019 00:00:00 | 4 | 1 | 0
26/01/2019 00:00:00 | 1 | 0 | 0
26/01/2019 00:00:00 | 1 | 1 | 0
26/01/2019 00:00:00 | 2 | 0 | 0
26/01/2019 00:00:00 | 2 | 1 | 0
26/01/2019 00:00:00 | 3 | 0 | 0
26/01/2019 00:00:00 | 3 | 1 | 0
26/01/2019 00:00:00 | 4 | 0 | 0
26/01/2019 00:00:00 | 4 | 1 | 0
27/01/2019 00:00:00 | 1 | 0 | 0
27/01/2019 00:00:00 | 1 | 1 | 0
27/01/2019 00:00:00 | 2 | 0 | 0
27/01/2019 00:00:00 | 2 | 1 | 0
27/01/2019 00:00:00 | 3 | 0 | 0
27/01/2019 00:00:00 | 3 | 1 | 0
27/01/2019 00:00:00 | 4 | 0 | 0
27/01/2019 00:00:00 | 4 | 1 | 1
28/01/2019 00:00:00 | 1 | 0 | 0
28/01/2019 00:00:00 | 1 | 1 | 0
28/01/2019 00:00:00 | 2 | 0 | 0
28/01/2019 00:00:00 | 2 | 1 | 0
28/01/2019 00:00:00 | 3 | 0 | 0
28/01/2019 00:00:00 | 3 | 1 | 0
28/01/2019 00:00:00 | 4 | 0 | 0
28/01/2019 00:00:00 | 4 | 1 | 0
29/01/2019 00:00:00 | 1 | 0 | 0
29/01/2019 00:00:00 | 1 | 1 | 1
29/01/2019 00:00:00 | 2 | 0 | 0
29/01/2019 00:00:00 | 2 | 1 | 1
29/01/2019 00:00:00 | 3 | 0 | 0
29/01/2019 00:00:00 | 3 | 1 | 0
29/01/2019 00:00:00 | 4 | 0 | 0
29/01/2019 00:00:00 | 4 | 1 | 0
30/01/2019 00:00:00 | 1 | 0 | 0
30/01/2019 00:00:00 | 1 | 1 | 0
30/01/2019 00:00:00 | 2 | 0 | 0
30/01/2019 00:00:00 | 2 | 1 | 0
30/01/2019 00:00:00 | 3 | 0 | 0
30/01/2019 00:00:00 | 3 | 1 | 1
30/01/2019 00:00:00 | 4 | 0 | 0
30/01/2019 00:00:00 | 4 | 1 | 0
31/01/2019 00:00:00 | 1 | 0 | 0
31/01/2019 00:00:00 | 1 | 1 | 0
31/01/2019 00:00:00 | 2 | 0 | 0
31/01/2019 00:00:00 | 2 | 1 | 0
31/01/2019 00:00:00 | 3 | 0 | 0
31/01/2019 00:00:00 | 3 | 1 | 0
31/01/2019 00:00:00 | 4 | 0 | 0
31/01/2019 00:00:00 | 4 | 1 | 0
01/02/2019 00:00:00 | 1 | 0 | 0
01/02/2019 00:00:00 | 1 | 1 | 0
01/02/2019 00:00:00 | 2 | 0 | 0
01/02/2019 00:00:00 | 2 | 1 | 0
01/02/2019 00:00:00 | 3 | 0 | 0
01/02/2019 00:00:00 | 3 | 1 | 0
01/02/2019 00:00:00 | 4 | 0 | 0
01/02/2019 00:00:00 | 4 | 1 | 0
db<>fiddle here

Rank based in condition in redshift

I have the following data set:
id | bool_col | datetime_col
1 | N | 2017-01-01 00:01:00
2 | N | 2017-01-01 00:02:00
3 | N | 2017-01-01 00:03:00
4 | Y | 2017-01-01 00:04:00
5 | N | 2017-01-01 00:05:00
6 | N | 2017-01-01 00:06:00
7 | N | 2017-01-01 00:07:00
8 | Y | 2017-01-01 00:08:00
9 | N | 2017-01-01 00:09:00
10 | N | 2017-01-01 00:10:00
11 | N | 2017-01-01 00:11:00
12 | N | 2017-01-01 00:12:00
13 | Y | 2017-01-01 00:13:00
I need to add an extra column with a rank that separates each chunk that ends with a Y in the bool_col:
id | bool_col | datetime_col | rank
1 | N | 2017-01-01 00:01:00 | 1
2 | N | 2017-01-01 00:02:00 | 1
3 | N | 2017-01-01 00:03:00 | 1
4 | Y | 2017-01-01 00:04:00 | 1
5 | N | 2017-01-01 00:05:00 | 2
6 | N | 2017-01-01 00:06:00 | 2
7 | N | 2017-01-01 00:07:00 | 2
8 | Y | 2017-01-01 00:08:00 | 2
9 | N | 2017-01-01 00:09:00 | 3
10 | N | 2017-01-01 00:10:00 | 3
11 | N | 2017-01-01 00:11:00 | 3
12 | N | 2017-01-01 00:12:00 | 3
13 | Y | 2017-01-01 00:13:00 | 3
I have tried many iterations of lead, lag and rank, but still no clue of how to tell it to increase the rank only if there is a Y in the bool_col
Any thoughts?

Simply do a cumulative sum of the number of "Y"s before each value. In your case:
select t.*,
(1 + sum(case when bool_col is true then 1 else 0 end) over (order by id rows between unbounded preceding and current row)) as rnk
from t;
Note: This uses is true, assuming the column really is boolean. Otherwise, use something like = 'Y'.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can one assign a rank that increases--rather than the same value as rank() and dense_rank() does--to later group of values previously encountered? - sql

Related

How to drop duplicate rows from postgresql sql table

Get sum per month from daily data into new column while keeping daily data

How to detect missing values if the dataframe has removed the missing rows already?

Count records for "empty" rows in multiple columns and joins

Rank based in condition in redshift

Categories

Resources