SQL iterating through rows - sql

I have a table with a million records. This is the structure of the table with some example data points -
INPUT
patient claim thru_dt cd start
322 65 20200201 42 20181008
322 65 20200202 42
322 95 20200203 52
122 05 20200105 23
122 05 20200115 42 20190102
122 05 20200116 42
I need to write a query that would produce this output -
OUTPUT
patient claim thru_dt cd start
322 65 20200201 42 20181008
322 65 20200202 42 20181008
322 95 20200203 52 20181008
122 05 20200105 23
122 05 20200115 42 20190102
122 05 20200416 42
This is the query that does that -
SELECT t.*, s.*,
CASE
WHEN ISNULL(t.start, '') = '' AND (
s.cd = t.cd
OR
(
ROW_NUMBER() OVER(PARTITION BY t.patient ORDER BY t.thru_dt DESC) = 1
)
)
THEN s.start
ELSE t.start
END new_start
FROM table t
OUTER APPLY (
SELECT top (1) s.*
FROM table s
WHERE
s.patient = t.patient
AND ISNULL(s.start, '') <> ''
AND s.thru_dt >= DATEADD(DAY, -30, t.thru_dt)
ORDER BY s.thru_dt DESC
) s
ORDER BY t.patient DESC, t.thru_dt
This query produces the output I need but it doesn't work for this edge case -
INPUT
patient claim thru_dt cd start
322 65 20200201 42 20181008
322 65 20200202 45
322 95 20200203 52
122 05 20200105 23
122 05 20200115 42 20190102
122 05 20200416 42
The difference between the first input and the second input is the cd value of the second row. When I run the above query, the output I get is -
OUTPUT
patient claim thru_dt cd start
322 65 20200201 42 20181008
322 65 20200202 45 NULL
322 95 20200203 52 20181008
122 05 20200105 23
122 05 20200115 42 20190102
122 05 20200416 42
However, I don't want the third record to have a start value if the second record for patient 322 was given a NULL.
Here is a DBFiddle outlining my issue - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=10842c7e66e148dbaad58ccb71cd6c11
UPDATE
This is the explanation for the first output data block -
The reason the second claim of patient 322 was given 20181008 is because both the first and the second one have the same cd value.
The reason the third claim of patient 322 was also given 20181008 value even though it doesn't have the same cd value is because it is the last claim for the patient.
The reason the first claim of patient 122 is still a NULL is because that claim's cd value is not equal to 42.
The reason the third claim of patient 122 was NOT given a value of 20190102 even though it has the same cd value is because the thru_dt in their prior claim is more than 30 days apart.

Related

BigQuery SQL: determine the number of daily transactions given a moving counter

I've been stuck for hours with writing a SQL query that would solve the following:
Given a history of a daily customer transaction counter, is it possible to specify exactly how many transactions were made each day?
Each datapoint represents sum of all transactions made in the last 30 days (ignore the missing dates)
The counter will decrement if the number of transactions made on the current day was smaller than the number of transactions that are no longer factored in, as they were made 31 days ago. It would increment otherwise.
The complete history of the counter is unavailable, so we don't know the numbers' evolution from the beginning, but only from certain point in time.
Please refer to the following table (for one offer_id):
transaction_date num_transactions
0 21/05/2022 25
1 22/05/2022 26
2 23/05/2022 25
3 24/05/2022 28
4 25/05/2022 30
5 26/05/2022 32
6 27/05/2022 33
7 28/05/2022 34
8 29/05/2022 33
9 30/05/2022 33
10 31/05/2022 34
11 01/06/2022 35
12 02/06/2022 35
13 03/06/2022 59
14 04/06/2022 73
15 07/06/2022 87
16 08/06/2022 98
17 09/06/2022 109
18 10/06/2022 120
19 11/06/2022 123
20 12/06/2022 122
21 13/06/2022 127
22 14/06/2022 142
23 15/06/2022 145
24 16/06/2022 148
25 17/06/2022 156
26 18/06/2022 162
27 19/06/2022 164
28 20/06/2022 167
29 21/06/2022 173
30 22/06/2022 185
31 23/06/2022 194
32 24/06/2022 206
33 25/06/2022 206
34 26/06/2022 208
35 28/06/2022 227
36 29/06/2022 237
37 30/06/2022 241
38 01/07/2022 248
39 02/07/2022 237
40 03/07/2022 230
41 04/07/2022 217
42 05/07/2022 208
43 06/07/2022 214
44 07/07/2022 216
45 08/07/2022 211
46 09/07/2022 203
47 10/07/2022 194
48 11/07/2022 192
49 12/07/2022 195
50 13/07/2022 193
51 14/07/2022 181
52 15/07/2022 174
53 16/07/2022 169
54 17/07/2022 162
55 18/07/2022 162
56 19/07/2022 164
57 20/07/2022 160
58 21/07/2022 163
59 22/07/2022 155
60 23/07/2022 144
61 24/07/2022 134
62 25/07/2022 139
63 26/07/2022 154
For each day (at least starting with 23/06) I'd like to be able to tell what were the numbers of transactions day-by-day in the preceding 30 days that sum up to the transactions counter on that day.
My current code in BigQuery SQL is below. It is obviously wrong - although the calculated counter evolution history does sum up to the right numbers when negative numbers are included, I'm interested in finding out only the actual transaction counts (thus only positive numbers and 0 are in question) for each last 30-days window.
When I include a simple condition that when a decrement happens, let's round it up to 0...:
WHEN IFNULL(transactions_diff_yesterday + transaction_reference, 0) < 0
THEN 0
... the sum for the last 30 days never matches the counter.
WITH outer_base AS(
WITH base AS(
SELECT
*,
LAG(num_transactions, 31) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date) as transactions_31_days_ago,
IFNULL(LAG(num_transactions, 30) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date), 0) as transactions_30_days_ago,
IFNULL(LAG(transactions, 1) OVER(PARTITION BY offer_id ORDER BY offer_id, transaction_date), 0) as transactions_yesterday
FROM
`my_table`
ORDER BY
offer_id,
transaction_date
)
SELECT
*,
IFNULL(transactions - transactions_yesterday, 0) AS transactions_diff_yesterday,
IFNULL(transactions_30_days_ago - transactions_31_days_ago, 0) AS transaction_reference
FROM
base
)
SELECT
*,
CASE
WHEN IFNULL(transactions_diff_yesterday + transaction_reference, 0) < 0
THEN 0
ELSE
IFNULL(transactions_diff_yesterday + transaction_reference, 0) END
AS real_transactions
FROM
outer_base;

How to create dynamic row in sql without inserting a value?

I have a requirement to add dynamic rows based on results fetched by SQL query. I've written a query that shows result something like the below:
Value
Name
1
Test 1
2
Test 2
.
.
n
n
The above SQL result will return a dynamic number of rows. (Number of rows not fixed)
So I want to add a column with values like Parent1, Parent2, and so on based on the number of rows. Suppose my query returns a total of 300 rows then the first row should be named as Parent1 in column Value and In name both, Then result of my query until the 150th row then another dynamic row with value column as Parent2 field and so on like below table.
Value
Name
Parent1
Parent 1
1
Test 1
2
Test 2
.
.
Parent2
Parent2
151
Test 151
.
.
n
n
Please Note : I can not use DDL or DML Commands to achive this.
Suppose this is your original query
select
to_char(rownum) value, 'Test '||rownum name
from dual
connect by level <= 6
;
VALUE NAME
---------- ----------
1 Test 1
2 Test 2
3 Test 3
4 Test 4
5 Test 5
6 Test 6
and you want to introdues two header Parent lines.
You may use NTILEto split the original query in two parts ordering on some column (here VALUE)
NTILE(2) OVER (ORDER BY VALUE) nt
Change the number in NTILE to increase the split.
The query below uses the original query as base, calculates the NTILE for the split, adds with UNION ALL the Parent rows.
Most importantly covers the correct order using the NTILE number (nt), the source (first parent row than data) and the value.
with dt as ( /* your original query */
select
to_char(rownum) value, 'Test '||rownum name
from dual
connect by level <= 6
)
select VALUE, NAME,
NTILE(2) OVER (ORDER BY VALUE) nt, /* modify to change split */
1 src
from dt
union all
select
'Parent'||rownum value,
'Parent'||rownum name,
rownum nt, 0 src
from dual connect by level <= 2 /* modify to change split */
order by nt, src, value;
VALUE NAME NT SRC
---------------------------------------------- ---------------------------------------------- ---------- ----------
Parent1 Parent1 1 0
1 Test 1 1 1
2 Test 2 1 1
3 Test 3 1 1
Parent2 Parent2 2 0
4 Test 4 2 1
5 Test 5 2 1
6 Test 6 2 1
The query below will generate a list of parents/non-parents using CONNECT BY. You can change the 300 to the number of rows you want to generate and you change the 150 in the query to have a parent generated that many number of rows.
SELECT LEVEL,
CASE
WHEN MOD (LEVEL, 150) = 0 OR LEVEL = 1
THEN
'Parent' || TO_CHAR (TRUNC (LEVEL / 150) + 1)
ELSE
TO_CHAR (LEVEL)
END AS VALUE,
CASE
WHEN MOD (LEVEL, 150) = 0 OR LEVEL = 1
THEN
'Parent' || TO_CHAR (TRUNC (LEVEL / 150) + 1)
ELSE
'Test ' || TO_CHAR (LEVEL)
END AS VALUE
FROM DUAL
CONNECT BY LEVEL <= 300;
A similar approach, more dynamic.
col value for 9999
col name for a20
define limit = &1
define split = &2
select level as lvl,
case
when mod (level, &&split) = 0 or level = 1
then
'parent' || to_char (trunc (level / &&split) + 1)
else
to_char (level)
end as name,
case
when mod (level, &&split) = 0 or level = 1
then
'parent' || to_char (trunc (level / &&split) + 1)
else
'test ' || to_char (level)
end as value
from dual
connect by level <= &&limit
/
Executed as script, you inform two parameters, the total number of values and the split value.
SQL> #generate.sql 100 50
old 3: when mod (level, &&split) = 0 or level = 1
new 3: when mod (level, 50) = 0 or level = 1
old 5: 'parent' || to_char (trunc (level / &&split) + 1)
new 5: 'parent' || to_char (trunc (level / 50) + 1)
old 10: when mod (level, &&split) = 0 or level = 1
new 10: when mod (level, 50) = 0 or level = 1
old 12: 'parent' || to_char (trunc (level / &&split) + 1)
new 12: 'parent' || to_char (trunc (level / 50) + 1)
old 17: connect by level <= &&limit
new 17: connect by level <= 100
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
1 parent1 parent1
2 2 test 2
3 3 test 3
4 4 test 4
5 5 test 5
6 6 test 6
7 7 test 7
8 8 test 8
9 9 test 9
10 10 test 10
11 11 test 11
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
12 12 test 12
13 13 test 13
14 14 test 14
15 15 test 15
16 16 test 16
17 17 test 17
18 18 test 18
19 19 test 19
20 20 test 20
21 21 test 21
22 22 test 22
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
23 23 test 23
24 24 test 24
25 25 test 25
26 26 test 26
27 27 test 27
28 28 test 28
29 29 test 29
30 30 test 30
31 31 test 31
32 32 test 32
33 33 test 33
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
34 34 test 34
35 35 test 35
36 36 test 36
37 37 test 37
38 38 test 38
39 39 test 39
40 40 test 40
41 41 test 41
42 42 test 42
43 43 test 43
44 44 test 44
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
45 45 test 45
46 46 test 46
47 47 test 47
48 48 test 48
49 49 test 49
50 parent2 parent2
51 51 test 51
52 52 test 52
53 53 test 53
54 54 test 54
55 55 test 55
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
56 56 test 56
57 57 test 57
58 58 test 58
59 59 test 59
60 60 test 60
61 61 test 61
62 62 test 62
63 63 test 63
64 64 test 64
65 65 test 65
66 66 test 66
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
67 67 test 67
68 68 test 68
69 69 test 69
70 70 test 70
71 71 test 71
72 72 test 72
73 73 test 73
74 74 test 74
75 75 test 75
76 76 test 76
77 77 test 77
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
78 78 test 78
79 79 test 79
80 80 test 80
81 81 test 81
82 82 test 82
83 83 test 83
84 84 test 84
85 85 test 85
86 86 test 86
87 87 test 87
88 88 test 88
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
89 89 test 89
90 90 test 90
91 91 test 91
92 92 test 92
93 93 test 93
94 94 test 94
95 95 test 95
96 96 test 96
97 97 test 97
98 98 test 98
99 99 test 99
LVL NAME VALUE
---------- -------------------- ----------------------------------------------
100 parent3 parent3
100 rows selected.

Rank() with Null first in Bigquery based on multiple columns

I have a data like as shown below
Subject_id T1 T2 T3 T4 T5
1234
1234 21 22 23 24 25
3456 34 31
3456 34 31 36 37 39
5678 65 64 62 61 67
5678 65 64 62 67
9876 12 13 14 15 16
4790 47 87 52 13 16
As you can see above, subject_ids 1234,3456 and 5678 are repeating.
I would like to remove those repeating subjects when they have null/empty/blank value in any of the columns like T1,T2,T3,T4,T5.
Now the problem is in real time, I have more than 250 columns and not sure whether I can put 250 where clause checking for null value. So, I was trying with row_number(), rank(). Not sure which one is better. The below is what I was trying
SELECT *,ROW_NUMBER() OVER(PARTITION BY subject_id,T1,T2,T3,T4,T5) NULLS FIRST
from table A;
But it throws syntax error Syntax error: Unexpected keyword NULLS at [1:62]
I expect my output to be like below
Subject_id T1 T2 T3 T4 T5
1234 21 22 23 24 25
3456 34 31 36 37 39
5678 65 64 62 61 67
9876 12 13 14 15 16
4790 47 87 52 13 16
As you can see, the output doesn't contain rows which had at least 1 null/empty/blank value in T1,T2,T3,T4,T5 columns.
Can help please?
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE NOT REGEXP_CONTAINS(FORMAT('%t', t), r'NULL')
If to apply to sample data from your question - output is
Row Subject_id t1 t2 t3 t4 t5
1 1234 21 22 23 24 25
2 3456 34 31 36 37 39
3 5678 65 64 62 61 67
4 9876 12 13 14 15 16
5 4790 47 87 52 13 16
I think you want:
SELECT *,
ROW_NUMBER() OVER (PARTITION BY subject_id
ORDER BY (T1 IS NULL OR T2 IS NULL OR T3 IS NULL OR T4 IS NULL OR T5 IS NULL) DESC
)
FROM table A;
I might approach this problem differently, but this appears to be what you are trying to write.

SQL Queries - Join two tables

I have two tables
TABLE 1 - Called Artista (artist) with an ID, Name, first year, second year.
ID NAME year1 year2 COUNTRY
41 Filipe Nobrega 2001 2051 Portugal
42 Bernardo Morais 2010 2060 Portugal
43 Fernando Evora 2013 2070 Portugal
44 Florenzo Giovanni 2003 2047 Italia
45 Tiago Alves 1980 1990 Portugal
46 Rui Gonzales 1975 1995 Espanha
47 Jose Almeida 1800 1876 Portugal
48 Jhon Snow 1900 1940 Winterfell
49 test 2001 2020 Espanha
TABLE 2 - Called autoria (author), with the ID of a piece of art and the ID of an artist, also it has the type of art( painting, music, sculpture...)
ART ARTIST TYPE_OF_ART
121 41 Pintura
122 41 Musica
123 42 Pintura
124 42 Cinema
125 42 Literatura
126 43 Teatro
127 43 Literatura
128 43 Danca
129 43 Arte_digital
130 43 Pintura
131 44 Pintura
132 44 Cinema
133 44 Pintura
134 45 Cinema
135 45 Literatura
136 46 Cinema
137 46 Literatura
138 46 Literatura
139 47 Arte_digital
140 47 Pintura
141 47 Teatro
142 48 Cinema
The problem is: Get all the artists that made less than 2 different pieces of art.
The result should be:
FILIPE NOBREGA - 41 he has 2 pieces of art
TIAGO ALVES - 45 he has 2 pieces of art
JOHN SNOW - 48 he has 1 piece of art
AND
TEST - 49 he has 0
This is what I've got:
SELECT DISTINCT A.name, A.id
FROM artista A, autoria AUT
WHERE AUT.artist = A.id
GROUP BY(A.name, A.id)
HAVING (COUNT(*) <= 2);
And it returns all of the above except TEST.
This query performs an INNER JOIN. You need an OUTER JOIN because autoria may not contain any records that join to Artista. And if it does not contain records that join, then an INNER JOIN does not include those in the result set. Change your query to use an OUTER JOIN:
SELECT DISTINCT A.name, A.id
FROM artista A LEFT OUTER JOIN autoria AUT ON AUT.artist = A.id
GROUP BY(A.name, A.id)
HAVING (COUNT(*) <= 2);

How to get quantity of a group by recursively summing up quantities of if its undergroups n so on?

GoupName Id UnderGRoupId Quantity
Computer 66 57 0
Keyboard 67 66 0
Monitor 68 66 0
Mouse 69 66 25
CPU 70 66 0
Stationary 71 57 0
Pencil 72 71 0
Ruler 73 71 0
Mechanical 74 67 30
Membrane 75 67 0
This is my view where Quantity of items falling directly under these groups are displayed. I want it to recursively add quantities of groups which directly or indirectly fall under main group say Computer.
GoupName Id UnderGRoupId Quantity
Computer 66 57 55
Keyboard 67 66 30
Monitor 68 66 0
Mouse 69 66 25
CPU 70 66 0
Stationary 71 57 0
Pencil 72 71 0
Ruler 73 71 0
Mechanical 74 67 30
Membrane 75 67 0
this is how i want my function to return values.
In SQL Server the query will look like:
WITH temp (ID, UnderGRoupId, Quantity)
AS
(
SELECT ID, UnderGRoupId, Quantity
FROM MyTable
WHERE NOT EXISTS (SELECT * FROM MyTable cc WHERE cc.UnderGRoupId = MyTable.ID)
UNION ALL
SELECT MyTable.ID, MyTable.UnderGRoupId, Quantity + MyTable.Quantity
FROM MyTable
INNER JOIN temp ON MyTable.ID = temp.UnderGRoupId
)
SELECT ID, SUM(Quantity) FROM
(SELECT ID, Quantity - (SELECT Amount FROM MyTable M WHERE M.ID = temp.ID) Quantity
FROM temp) X
GROUP BY ID