Making rank() ignore null cases - sql

I'm trying to figure out a way to make the rank() window function "skip" in its counting some rows with null values in a specific column. Pretty much, what I want is to count how many paid transactions there are, for each client, before each transaction/row.
I tried using case when inside the rank() and I got something similar to the results I expect, but still not quite what I need.
+-------------------------------------------------------+
| What I need |
+-------------+------+----------+-----------------------+
| CLIENT | CODE | PAYMENT | PAID_PURCHASES_SO_FAR |
| A | 341 | 17/09/21 | 0 |
| A | 342 | 18/09/21 | 1 |
| A | 343 | (null) | 2 |
| A | 344 | 18/09/21 | 2 |
| A | 345 | 19/09/21 | 3 |
| A | 346 | 19/09/21 | 4 |
| A | 347 | (null) | 5 |
| A | 348 | 24/09/21 | 5 |
| B | 855 | (null) | 0 |
| B | 856 | 20/09/21 | 0 |
| B | 857 | (null) | 1 |
+-------------+------+----------+-----------------------+
-+------------------------------------------------------+
| What I got |
-+------------+------+----------+-----------------------+
| CLIENT | CODE | PAYMENT | PAID_PURCHASES_SO_FAR |
| A | 341 | 17/09/21 | 0 |
| A | 342 | 18/09/22 | 1 |
| A | 343 | (null) | (null) |
| A | 344 | 18/09/22 | 2 |
| A | 345 | 19/09/22 | 3 |
| A | 346 | 19/09/21 | 4 |
| A | 347 | (null) | (null) |
| A | 348 | 24/09/21 | 5 |
| B | 855 | (null) | (null) |
| B | 856 | 20/09/21 | 0 |
| B | 857 | (null) | (null) |
-+------------+------+----------+-----------------------+
In a single image: comparison
And here my code:
SELECT
CLIENT
, CODE
, PAYMENT
, CASE WHEN PAYMENT IS NOT NULL THEN DENSE_RANK() OVER(PARTITION BY CLIENT, (CASE WHEN PAYMENT IS NOT NULL THEN 1 ELSE 0 END) ORDER BY CODE) - 1 END NUMBER_OF_PURCHASES_SO_FAR
FROM FOO.BAR
Note: The CODE column may be used as time reference. E.g. code = 750 came before code = 751, and so on.
Any help would be appreciated.
Thanks in advance.

You can use conditional aggregation combined with a window frame, as in:
select *, coalesce(sum(case when payment is null then 0 else 1 end)
over(partition by client order by code
rows between unbounded preceding and 1 preceding), 0)
as ppsf
from t
order by client, code
Result:
client code payment ppsf
------- ----- --------- ----
A 341 17/09/21 0
A 342 18/09/21 1
A 343 null 2
A 344 18/09/21 2
A 345 19/09/21 3
A 346 19/09/21 4
A 347 null 5
A 348 24/09/21 5
B 855 null 0
B 856 20/09/21 0
B 857 null 1
See running example at db<>fiddle.

This is it:
SELECT
"CLIENT"
, "CODE"
, "PAYMENT"
, rank() over(partition by "CLIENT"
order by COALESCE("PAYMENT",'01/01/70') )
FROM Table1
http://sqlfiddle.com/#!15/f941a/14

Related

Append Row To Each Group in SQL

Let's say that I have database table:
| id | value | rank |
| -----------|-----------|-------------|
| 303 | D | 3 |
| 404 | A | 1 |
| 505 | B | 1 |
| 505 | D | 4 |
| 202 | B | 1 |
| 505 | A | 5 |
| 303 | N | 2 |
| 101 | A | 1 |
| 505 | A | 7 |
| 202 | A | 6 |
| 202 | N | 3 |
| 505 | N | 3 |
| 202 | A | 4 |
| 505 | A | 2 |
| 202 | N | 5 |
| 303 | A | 1 |
| 505 | N | 6 |
| 202 | A | 2 |
Following:
SELECT *
FROM table_name
GROUP BY id
ORDER BY rank;
I get:
| id | value | rank |
| -----------|-----------|-------------|
| 101 | A | 1 |
| 202 | B | 1 |
| 202 | A | 2 |
| 202 | N | 3 |
| 202 | A | 4 |
| 202 | N | 5 |
| 202 | A | 6 |
| 303 | A | 1 |
| 303 | N | 2 |
| 303 | D | 3 |
| 404 | A | 1 |
| 505 | B | 1 |
| 505 | A | 2 |
| 505 | N | 3 |
| 505 | D | 4 |
| 505 | A | 5 |
| 505 | N | 6 |
| 505 | A | 7 |
However, for each group, I'd like to append an additional row with the value column taken from the id column so that the resulting table looks like:
| id | value | rank |
| -----------|-----------|-------------|
| 101 | A | 1 |
| 101 | 101 | 2 |
| 202 | B | 1 |
| 202 | A | 2 |
| 202 | N | 3 |
| 202 | A | 4 |
| 202 | N | 5 |
| 202 | A | 6 |
| 202 | 202 | 7 |
| 303 | A | 1 |
| 303 | N | 2 |
| 303 | D | 3 |
| 303 | 303 | 4 |
| 404 | A | 1 |
| 404 | 404 | 2 |
| 505 | B | 1 |
| 505 | A | 2 |
| 505 | N | 3 |
| 505 | D | 4 |
| 505 | A | 5 |
| 505 | N | 6 |
| 505 | A | 7 |
| 505 | 505 | 8 |
What is ANSI SQL (or most database agnostic) way to accomplish this?
You don't want a group by in the initial set as you appear to want all the rows back:
select "id", "value", "rank"
from T
union all
select "id", cast("id" as varchar(10)), max("rank") + 1
from T
group by "id"
order by "id", "rank";
And you can do this with grouping sets for the fun of it:
select "id",
grouping("rank"),
case when grouping("rank") = 0 then min("value") else cast("id" as varchar(10)) end as "value",
case when grouping("rank") = 0 then "rank" else max("rank") over (partition by "id") + 1 end as "rank"
from T
group by grouping sets ("id", "rank"), ("id")
order by "id", "rank";
Here's how I would do it:
SELECT id,
value,
rank
FROM table_name
UNION
SELECT id,
id AS value,
max(rank) + 1 AS rank
FROM table_name
GROUP BY id
ORDER BY id, rank;
You seem to want:
select id, value, max(rank) as rank
from t
group by id, value
union all
select id, id, max(rank) + 1
from t
group by id;

How to fill forward time series data in Postgres

I am looking to join three tables together and fill forward null values on the resulting table.
Three tables:
Table 1 (raw.fb_historical_data) - this is the main table on which I would like to join the other two on to. Each row of this table is related to one or more rows in the other two tables through a combination of columns id, clk and timestamp (mkt_id and row_id in the other tables).
+---------------------+-----+-----+--------------+
| timestamp | clk | id | some_columns |
+---------------------+-----+-----+--------------+
| 2016-06-19 06:11:13 | 123 | 126 | a |
| 2016-06-19 06:16:13 | 124 | 127 | b |
| 2016-06-19 06:21:13 | 234 | 126 | c |
| 2016-06-19 06:41:13 | 456 | 127 | d |
| ... | ... | ... | ... |
+---------------------+-----+-----+--------------+
Table 2 (raw.fb_runner_changes) - this table essentially gives price changes for a wide range of different markets
+---------------------+--------+--------+-------+
| timestamp | row_id | mkt_id | price |
+---------------------+--------+--------+-------+
| 2016-06-19 06:11:13 | 123 | 126 | 1 |
| 2016-06-19 06:21:13 | 123 | 126 | 2 |
| 2016-06-19 06:41:13 | 123 | 126 | 3 |
| 2016-06-06 18:54:06 | 124 | 127 | 1 |
| 2016-06-06 18:56:06 | 124 | 127 | 2 |
| 2016-06-06 18:57:06 | 124 | 127 | 3 |
| ... | ... | ... | ... |
+---------------------+--------+--------+-------+
Table 3 (raw.fb_runners) - a table with extra information about market changes that I would like to join
+---------------------+--------+--------+---------------+
| timestamp | row_id | mkt_id | other_columns |
+---------------------+--------+--------+---------------+
| 2016-06-19 06:15:13 | 234 | 126 | ab |
| 2016-06-19 06:31:13 | 234 | 126 | cd |
| 2016-06-19 06:56:13 | 234 | 126 | ef |
| 2016-06-06 18:54:06 | 456 | 127 | gh |
| 2016-06-06 18:56:06 | 456 | 127 | jk |
| 2016-06-06 18:57:06 | 456 | 127 | lm |
| ... | ... | ... | ... |
+---------------------+--------+--------+---------------+
Essentially what I want to do is fill NULL information forward (ordered by timestamp) while grouping by market id.
So far, I have tried to join the tables together using
SELECT *
FROM raw.fb_historical_data AS h
LEFT JOIN raw.fb_runner_changes AS rc
ON rc.row_id = h.clk
AND rc.timestamp = h.timestamp
AND rc.mkt_id = h.id
LEFT JOIN raw.fb_runners AS r
ON r.row_id = h.clk
AND r.timestamp = h.timestamp
AND r.mkt_id = h.id
Which has worked as intended, though now there are nulls in the resulting dataset which i'd like to fill in with the last available value for that market.
With some of the other SQL dialects, fill forward could be done using the window function last_value in combination with the instruction ignore nulls.
Since this is not supported in PostgreSQL (check the note at the bottom of this page), we are using a 2 steps work-around.
select ts, val, val_seq, min(val) over (partition by val_seq) val_fill_fw
from (select ts, val, count(val) over(order by ts) as val_seq
from t
) t
-
+----+----------+---------+-------------+
| ts | val | val_seq | val_fill_fw |
+----+----------+---------+-------------+
| 1 | (null) | 0 | (null) |
| 2 | (null) | 0 | (null) |
| 3 | hello | 1 | hello |
| 4 | (null) | 1 | hello |
| 5 | (null) | 1 | hello |
| 6 | darkness | 2 | darkness |
| 7 | my | 3 | my |
| 8 | (null) | 3 | my |
| 9 | old | 4 | old |
| 10 | (null) | 4 | old |
| 11 | (null) | 4 | old |
| 12 | (null) | 4 | old |
| 13 | friend | 5 | friend |
| 14 | (null) | 5 | friend |
+----+----------+---------+-------------+
SQL Fiddle
This seems to correctly do 'forward fill' in postgres. However I am a postgres newbie so I would appreciate feedback if it's wrong.
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select id, (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example

Getting two columns one containing and one not containing a grouped value

My data looks like this -
+-----------+-----------+-----------+----------+
| FLIGHT_NO | FL_DATE | SERIAL_NO | PILOT_NO |
+-----------+-----------+-----------+----------+
| 501 | 15-OCT-19 | 456710 | 345 |
| 521 | 16-OCT-19 | 562911 | 345 |
| 534 | 17-OCT-19 | 877694 | 345 |
| 577 | 17-OCT-19 | 338157 | 345 |
| 501 | 14-OCT-19 | 921225 | 346 |
| 534 | 15-OCT-19 | 877694 | 346 |
| 534 | 14-OCT-19 | 338157 | 347 |
| 590 | 16-OCT-19 | 650012 | 347 |
| 531 | 14-OCT-19 | 562911 | 348 |
| 531 | 15-OCT-19 | 562911 | 348 |
| 501 | 16-OCT-19 | 220989 | 349 |
| 521 | 18-OCT-19 | 650012 | 349 |
| 590 | 14-OCT-19 | 562911 | 351 |
| 577 | 18-OCT-19 | 877694 | 351 |
| 590 | 18-OCT-19 | 456710 | 346 |
+-----------+-----------+-----------+----------+
My aim is to return the total number of flights flying and not flying on 18-oct-19.
I'm doing it with dual but that doesn't seem to be the correct/best method.
Can anyone help me do it the correct way?
SELECT
(SELECT COUNT(FLIGHT_NO) NO_FLY FROM schd_flight WHERE fl_date = '18-OCT-19') AS FLY,
(SELECT COUNT(FLIGHT_NO) NO_FLY FROM schd_flight WHERE fl_date <> '18-OCT-19') AS NO_FLY
FROM dual;
My output -
+-----+--------+
| fly | no_fly |
+-----+--------+
| 3 | 12 |
+-----+--------+
Simply use sum with case statement
Select
sum(case when fl_date = '18-OCT-19' then 1 end) fly,
sum(case when fl_date <> '18-OCT-19' then 1 end) no_fly
From schd_flight;
Cheers!!
I think the second query is not necessary, no_fly = total - fly.
So I came up with my solution, may improve the query time :
SELECT sub.FLY as FLY, (SELECT count(*) from schd_flight) - sub.FLY as NO_FLY
FROM (
SELECT COUNT(CASE when fl_date = '18-OCT-19' then 1 end) AS FLY
from schd_flight
) sub;
Not tested yet though.

How to select data order by sort value and sort NULL By name

I wrote following SQL query to select data from #tmp table variable.
SELECT #rowCount AS [row-count],
t.[row-no] AS [row-no],
t.[ServiceID] AS ServiceID,
t.ServiceName AS ServiceName,
t.[BranchServiceSortValue] AS SortValue,
(CASE WHEN t.OptIn = 1 THEN 'Yes' ELSE 'No' END) AS OptIn
FROM #tmp t
INNER JOIN dbo.Category
ON Category.CategoryId = t.FkCategoryId
INNER JOIN dbo.ServiceType
ON ServiceType.ServiceTypeId = t.FkServiceTypeId
WHERE t.[row-no] >= #startRow
AND t.[row-no] <= #endRow
ORDER BY t.BranchServiceSortValue,t.serviceName
According to the data in #tmp table,my above query return following output.
| row-count | row-no | ServiceID | ServiceName | SortValue | OptIn |
|-----------|--------|-----------|-------------|-----------|-------|
| 24 | 4 | 1088 | AAB | NULL | No |
| 24 | 5 | 1089 | AAC | NULL | No |
| 24 | 6 | 1090 | AAD | NULL | No |
| 24 | 1 | 1093 | GDGD | 0 | Yes |
| 24 | 7 | 1091 | EETETE | 1 | Yes |
| 24 | 8 | 1092 | CSCDF | 2 | Yes |
| 24 | 3 | 1086 | CXCX | 3 | Yes |
| 24 | 9 | 16 | ASA | 4 | Yes |
| 24 | 2 | 1087 | BFB | 5 | Yes |
| 24 | 10 | 7 | Mortgage | 6 | Yes |
| 24 | 11 | 17 | DDWW | 7 | Yes |
| 24 | 12 | 11 | IL | 8 | Yes |
| 24 | 13 | 5 | SAA | 9 | Yes |
| 24 | 14 | 9 | CD | 10 | Yes |
You can see according to my above query data rows are sorted by SortValue and when SortValue = NULL, those 3 rows sorted by its ServiceName,
But I need to displaySortValue = NULLrows at the bottom of the other rows.Its mean I need to display Null rows after the SortValue Not NULL data and SortValue = NULL should be display order by its ServiceName.
My Expected Output is:
| row-count | row-no | ServiceID | ServiceName | SortValue | OptIn |
|-----------|--------|-----------|-------------|-----------|-------|
| 14 | 1 | 1093 | GDGD | 0 | Yes |
| 14 | 7 | 1091 | EETETE | 1 | Yes |
| 14 | 8 | 1092 | CSCDF | 2 | Yes |
| 14 | 3 | 1086 | CXCX | 3 | Yes |
| 14 | 9 | 16 | ASA | 4 | Yes |
| 14 | 2 | 1087 | BFB | 5 | Yes |
| 14 | 10 | 7 | Mortgage | 6 | Yes |
| 14 | 11 | 17 | DDWW | 7 | Yes |
| 14 | 12 | 11 | IL | 8 | Yes |
| 14 | 13 | 5 | SAA | 9 | Yes |
| 14 | 14 | 9 | CD | 10 | Yes |
| 14 | 4 | 1088 | AAB | NULL | No |
| 14 | 5 | 1089 | AAC | NULL | No |
| 14 | 6 | 1090 | AAD | NULL | No |
How should I need to change my query to get above output? please help me
NULL has the lowest value, so you'll need to use a CASE to put NULL at the end, and then sort by SortValue:
ORDER BY CASE WHEN t.BranchServiceSortValue IS NULL THEN 1 ELSE 0 END,
t.BranchServiceSortValue,
t.serviceName;
Just add a key to the ORDER BY:
ORDER BY (CASE WHEN t.BranchServiceSortValue IS NOT NULL THEN 1 ELSE 2 END),
t.BranchServiceSortValue, t.serviceName
The SQL standard provides the options NULLS FIRST and NULLS LAST for ORDER BY clauses. SQL Server does not (yet) implement these.

equating an entry to an aggregated version of itself

I am trying to find if an entry's value is the max of the grouped value. Its purpose is to sit in a larger if logic.
Which I'd expect would look something like this:
SELECT
t.id as t_id,
sum(if(t.value = max(t.value), 1, 0)) AS is_max_value
FROM dataset.table AS t
GROUP BY t_id
The response is:
Error: Expression 't.value' is not present in the GROUP BY list
How should my code look to do this?
You first need to compile in a subquery the max value, then join again the value to the table.
Using the public data set available here is an example:
SELECT
t.word,
t.word_count,
t.corpus_date
FROM
[publicdata:samples.shakespeare] t
JOIN (
SELECT
corpus_date,
MAX(word_count) word_count,
FROM
[publicdata:samples.shakespeare]
GROUP BY
1 ) d
ON
d.corpus_date=t.corpus_date
AND t.word_count=d.word_count
LIMIT
25
Results:
+-----+--------+--------------+---------------+---+
| Row | t_word | t_word_count | t_corpus_date | |
+-----+--------+--------------+---------------+---+
| 1 | the | 762 | 1597 | |
| 2 | the | 894 | 1598 | |
| 3 | the | 841 | 1590 | |
| 4 | the | 680 | 1606 | |
| 5 | the | 942 | 1607 | |
| 6 | the | 779 | 1609 | |
| 7 | the | 995 | 1600 | |
| 8 | the | 937 | 1599 | |
| 9 | the | 738 | 1612 | |
| 10 | the | 612 | 1595 | |
| 11 | the | 848 | 1592 | |
| 12 | the | 753 | 1594 | |
| 13 | the | 740 | 1596 | |
| 14 | I | 828 | 1603 | |
| 15 | the | 525 | 1608 | |
| 16 | the | 363 | 0 | |
| 17 | I | 629 | 1593 | |
| 18 | I | 447 | 1611 | |
| 19 | the | 715 | 1602 | |
| 20 | the | 717 | 1610 | |
+-----+--------+--------------+---------------+---+
You can see that retains the word that have the maximum word_count in the partition defined by corpus_date
Use window function to "spread" the max value over all relevant records.
this way you can avoid the Join.
SELECT
*
FROM (
SELECT
corpus,
corpus_date,
word,
word_count,
MAX(word_count) OVER (PARTITION BY corpus) AS Max_Word_Count
FROM
[publicdata:samples.shakespeare] )
WHERE
word_count=Max_Word_Count
select
id,
value,
integer(value = max_value) as is_max_value
from (
select id, value, max(value) over(partition by id) as max_value
from dataset.table
)
Explanation:
Inner select - for each row/record calculates max of value among all rows with the same id
Outer select - for each row/record compares row's value with max value for respective group and then converts true or false into respectively 1 or 0 (as per expectation in question)