How to order rows in a table within partitions? - sql

I am using DB2 to take a table, split it into partitions and then order rows within each partition. The table I have is like:
ID DATE EVENT
-- ---- -----
01 1999-06-01 a
01 1999-06-01 b
01 2006-01-01 a
01 2011-12-31 c
02 1999-01-01 a
02 2003-01-01 a
02 2003-01-01 b
02 2009-11-12 b
where I want to order it to get the following...
ID DATE EVENT SEQUENCE
-- ---- ----- --------
01 1999-06-01 a 1
01 1999-06-01 b 1
01 2006-01-01 a 2
01 2011-12-31 c 3
02 1999-01-01 a 1
02 2003-01-01 a 2
02 2003-01-01 b 2
02 2009-11-12 b 3
so I am trying:
select a.*, row_number() over(partition by ID,order by DATE) from mytable a
which gives me:
ID DATE EVENT SEQUENCE
-- ---- ----- --------
01 1999-06-01 a 1
01 1999-06-01 b 2
01 2006-01-01 a 3
01 2011-12-31 c 4
02 1999-01-01 a 1
02 2003-01-01 a 2
02 2003-01-01 b 3
02 2009-11-12 b 4
where as you can see, even though a consecutive row may have the same date as the previous row, this is ignored and the SEQUENCE column is iterated.
How do I ensure that if the next row has the same date that the sequence is preserved until a row with a later date appears?
Thanks very much.

Clearly, the row_number() function would not return the same number for different rows within the window. You need to use the dense_rank() function.
By the way, your query has a syntax error, and it is not a good idea to use reserved words ('DATE' in this case) for column names.

You could use the DENSE_RANK function instead, which gives you an option of assigning the same rank, if two rows have the same values, as below:
select a.*, DENSE_RANK() OVER(PARTITION BY ID ORDER BY DATE DESC) from mytable a;
References:
Using OLAP specifications

Related

SQL - view only one datetime difference

I have two tables. Let's call the first one A and the other B.
A is:
ID
Doc_ID
Date
1
1a
1-Jan-2020
1
1a
1-Feb-2020
1
1b
1-Mar-2020
2
1a
1-Jan-2020
B is:
ID
Doc2_ID
Date
1
2a
1-Mar-2020
1
2a
1-Apr-2020
2
2b
1-Feb-2020
2
2a
1-Mar-2020
Now using SQL, I want to create a table which has all the values in Table A and the difference between the date in table A and the closest date in table B. For eg. 1-Jan-2020 should be subtracted from 1-Mar-2020 and similarly, 1-Feb-2020 should be subtracted from 1-Mar-2020. Can you please help me with it?
I am using the query below in azure databricks:
%sql
SELECT a.ID, a.Doc_ID, DATEDIFF(b.DATE, a.DATE) as day FROM a
LEFT JOIN b
ON a.ID = b.ID
AND a.DATE < b.DATE
But this is generating more than one row in the results i.e. it is subtracting from all the dates in Table 3 which fulfils the where conditions (For eg. it is subtracting 1 Jan 2020 from 1 Mar 2020 and 1 Apr 2020 and it want it subtract only from the closest date in Table B i.e. 1 Mar 2020)
The expected outcome should be:
ID
Doc_ID
day
1
1a
59
1
1a
30
1
1b
0
2
1a
30
The day column for first two rows was obtained after subtracting the respective dates in Table A from 1-Mar-2020 i.e. closest value in Table B for ID 1

Adding rows, running count, running sum to query results

I have a table with the following ddl.
CREATE TABLE "LEDGER"
("FY" NUMBER,
"FP" VARCHAR2(20 BYTE),
"FUND" VARCHAR2(20 BYTE),
"TYPE" VARCHAR2(2 BYTE),
"AMT" NUMBER
)
The table contains the following data.
REM INSERTING into LEDGER
SET DEFINE OFF;
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (15,'03','A','03',1);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (15,'04','A','03',2);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (16,'04','A','03',3);
Insert into LEDGER (FY,FP,FUND,TYPE,AMT) values (12,'05','A','04',6);
based on the partition of fy,fp,fund and type I would like to write a query to keep a running count from the beginning of fp(fp though it is a varchar it represents a number in the month. i.E 2 equals february and 3 equals march etc.) to a hard number of 14. So taking a closer look at the data you will notice that in FY 15 the max period is 04 so i must add another 10 periods to my report to get my report to have the full 14 periods. here is the expected output.
here is what i tried, but I'm just simply stumbling all together on this.
WITH fy_range AS
(
SELECT MIN (fy) AS min_fy
, MAX (fy) AS max_fy
FROM ledger
),all_fys AS
(
SELECT min_fy + LEVEL - 1 AS fy
FROM fy_range
CONNECT BY LEVEL <= max_fy + 1 - min_fy
)
,all_fps AS
(
SELECT TO_CHAR (LEVEL, 'FM00') AS fp
FROM dual
CONNECT BY LEVEL <= 14
)
SELECT
FUND
,G.TYPE
,G.FY
,G.FP
,LAST_VALUE(G.AMT ignore nulls) OVER (PARTITION BY G.FUND ORDER BY Y.FY P.FP ) AS AMT
FROM all_fys y
CROSS JOIN all_fps p
LEFT OUTER JOIN LEDGER G PARTITION BY(FUND)
ON g.fy = y.fy
AND g.fp = p.fp;
but I end up with a bunch of nulls and some strange results.
This may not be the most efficient solution, but it is easy to understand and maintain. First (in the most deeply nested subquery) we find the min FP for each combination of FY, FUND and TYPE. Then we use a CONNECT BY query to fill all the FP for all FY, FUND, TYPE combinations (up to the hard upper limit of 14). Then we left-outer-join to the original data in the LEDGER table. So far we densified the data. In the final query (the join) we also add the column for the cumulative sum - that part is easy after we densified the data.
TYPE is an Oracle keyword, so it is probably best not to use it as a column name. It is also best not to use double-quoted table and column names (I had to use upper case everywhere because of that). I also made sure to convert from varchar2 to number and back to varchar2 - we shouldn't rely on implicit conversions.
select S.FY, to_char(S.FP, 'FM09') as FP, S.FUND, S.TYPE,
sum(L.AMT) over (partition by S.FY, S.FUND, S.TYPE order by S.FP) as CUMULATIVE_AMT
from (
select FY, MIN_FP + level - 1 as FP, FUND, TYPE
from (
select FY, min(to_number(FP)) as MIN_FP, FUND, TYPE
from LEDGER
group by FY, FUND, TYPE
)
connect by level <= 15 - MIN_FP
and prior FY = FY
and prior FUND = FUND
and prior TYPE = TYPE
and prior sys_guid() is not null
) S left outer join LEDGER L
on S.FY = L.FY and S.FP = L.FP and S.FUND = L.FUND and S.TYPE = L.TYPE
;
Output:
FY FP FUND TYPE CUMULATIVE_AMT
--- --- ---- ---- --------------
12 05 A 04 6
12 06 A 04 6
12 07 A 04 6
12 08 A 04 6
12 09 A 04 6
12 10 A 04 6
12 11 A 04 6
12 12 A 04 6
12 13 A 04 6
12 14 A 04 6
15 03 A 03 1
15 04 A 03 3
15 05 A 03 3
15 06 A 03 3
15 07 A 03 3
15 08 A 03 3
15 09 A 03 3
15 10 A 03 3
15 11 A 03 3
15 12 A 03 3
15 13 A 03 3
15 14 A 03 3
16 04 A 03 3
16 05 A 03 3
16 06 A 03 3
16 07 A 03 3
16 08 A 03 3
16 09 A 03 3
16 10 A 03 3
16 11 A 03 3
16 12 A 03 3
16 13 A 03 3
16 14 A 03 3

SQL find rows in groups where a column has a null and a non-null value

The Data
row ID YEAR PROD STA DATE
01 01 2011 APPLE NEW 2011-11-18 00:00:00.000
02 01 2011 APPLE NEW 2011-11-18 00:00:00.000
03 01 2013 APPLE OLD NULL
04 01 2013 APPLE OLD NULL
05 02 2013 APPLE OLD 2014-04-08 00:00:00.000
06 02 2013 APPLE OLD 2014-04-08 00:00:00.000
07 02 2013 APPLE OLD 2014-11-17 10:50:14.113
08 02 2013 APPLE OLD 2014-11-17 10:46:04.947
09 02 2013 MELON OLD 2014-11-17 11:01:19.657
10 02 2013 MELON OLD 2014-11-17 11:19:35.547
11 02 2013 MELON OLD NULL
12 02 2013 MELON OLD 2014-11-21 10:32:36.017
13 03 2006 APPLE NEW 2007-04-11 00:00:00.000
14 03 2006 APPLE NEW 2007-04-11 00:00:00.000
15 04 2004 APPLE OTH 2004-09-27 00:00:00.000
16 04 2004 APPLE OTH NULL
ROW is not a column in the table. Is just to show which records i want.
The question
I need to find rows where a group consisting of (ID, YEAR, PROD, STA) has at least one NULL DATE and a non-NULL DATE.
Expected result
From the above dataset this would be rows 9 to 12 and 15 to 16
Im sitting in front od SSMS and have no idea how to get this. Thinking about group by and exists but really no idea.
You can use COUNT ... OVER:
SELECT ID, YEAR, PROD, STA, [DATE]
FROM (
SELECT ID, YEAR, PROD, STA, [DATE],
COUNT(IIF([DATE] IS NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_nulls,
COUNT(IIF([DATE] IS NOT NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_not_nulls
FROM mytable) AS t
WHERE t.cnt_nulls > 0 AND t.cnt_not_nulls > 0
The window version of COUNT is applied twice over ID, YEAR, PROD, STA partitions of data: it returns for every row the population of the current partition. The count is conditionally performed:
the first COUNT counts the number of NULL [Date] values within the partition
the second COUNT counts the number of NOT NULL [Date] values within the partition.
The outer query checks for partitions having a count of at least one for both of the two COUNT functions of the inner query.

How do I add specific values from one different table to columns in a row based off of values in another table?

In SQL Server I have 2 tables that looks like this:
TEST SCRIPT 'a collection of test scripts'
(PK)
ID Description Count
------------------------
A12 Proj/Num/Dev 12
B34 Gone/Tri/Tel 43
C56 Geff/Ben/Dan 03
SCRIPT HISTORY 'the history of the aforementioned scripts'
(FK) (PK)
ScriptID ID Machine Date Time Passes
----------------------------------
A12 01 DEV012 6/26/15 16:54 4
A12 02 DEV596 6/28/15 13:12 9
A12 03 COM199 3/12/14 14:22 10
B34 04 COM199 6/30/13 15:45 12
B34 05 DEV012 6/30/15 13:13 14
B34 06 DEV444 6/12/15 11:14 14
C56 07 COM321 6/29/14 02:19 12
C56 08 ANS042 6/24/14 20:10 18
C56 09 COM432 6/30/15 12:24 4
C56 10 DEV444 4/20/12 23:55 2
In a single query, how would I write a select statement that takes just one entry for each DISTINCT script in TEST SCRIPT and pairs it with the values in only the TOP 1 most recent run time in SCRIPT HISTORY?
For example, the solution to the example tables above would be:
OUTPUT
ScriptID ID Machine Date Time Passes
---------------------------------------------------
A12 02 DEV596 6/28/15 13:12 9
B34 05 DEV012 6/30/15 13:13 14
C56 09 COM432 6/30/15 12:24 4
The way you describe the problem is almost directly as cross apply:
select h.*
from testscript ts cross apply
(select top 1 h.*
from history h
where h.scriptid = ts.id
order by h.date desc, h.time desc
) h;
Please try something like this:
select *
from SCRIPT SCR
left join (select MAX(SCRIPT_HISTORY.Date) as Date, SCRIPT_HISTORY.ScriptID
from SCRIPT_HISTORY
group by SCRIPT_HISTORY.ScriptID
) SH on SCR.ID = SH.ScriptID

SQL: how to select IDs according to a condition?

I just started to program in SQL and I have a bit of a problem (n.b., I am working of a tabl that come from a game). My table is something like this, where ID refers to a single person, H to a certain hour of playing and IF to a certain condition:
ID H IF
01 1 0
01 2 0
01 3 0
02 1 0
02 2 1
03 1 0
03 2 1
03 3 0
03 4 1
In this case player 01 played for three hours, player 02 for two hours and player 03 for four hours. In each of these hours they may or may have not performed an action. If they did, a 1 appears in the IF column.
Now, my doubt is: how can I query so that I have a table with only the ID of the people who never performed the action? I do not want to rule out only the row with IF = 1, I want to rule out all the row with that ID. In this case it should become:
01 1 0
01 2 0
01 3 0
Any help?
This should do it.
select *
from table
where Id not in (select Id from table where IF = 1)
SELECT ID FROM Table GROUP BY ID HAVING SUM(IF)=0