SQL Connect clause - generate all data by dates - sql

The data in by table is stored by effective date. Can you please help me with an ORACLE SQL statement, that replicates the 8/1 data onto 8/2, 8/3,8/4 and repeat the 8/5 value after?
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/7/2017 y 4
8/7/2017 x 3
Desired output :
DATE VALUE1 VALUE2
8/1/2017 x 1
8/1/2017 x 2
8/2/2017 x 1
8/2/2017 x 2
... repeat to 8/6
8/7/2017 y 4
8/7/2017 x 3
8/8/2017 y 4
8/8/2017 x 3
... repeat to sysdate - 1

Here is one way to do this. It's not the most elegant or efficient, but it is the most elementary way I could think of (short of really inefficient things like correlated subqueries which can't be unwound easily to joins).
In the first subquery, aliases as a, I create all the needed dates. In the second subquery, b, I create the date ranges, for which we will need to repeat specific rows (in the test data, I allow the number of rows which must be repeated to be variable, to make one of the subtleties of the problem more evident).
With these in hand, it's easy to get the result by joining these two subqueries and the original data. Alas, this approach requires reading the base table three times; hopefully you don't have too much data to process.
with
inputs ( dt, val1, val2 ) as (
select date '2017-08-14', 'x', 1 from dual union all
select date '2017-08-14', 'x', 2 from dual union all
select date '2017-08-17', 'y', 4 from dual union all
select date '2017-08-17', 'x', 3 from dual union all
select date '2017-08-19', 'a', 5 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- Use your actual table and column names in the SQL query below.
select a.dt, i.val1, i.val2
from (
select min_dt + level - 1 as dt
from ( select min(dt) as min_dt from inputs )
connect by level <= sysdate - min_dt
) a
join
(
select dt, lead(dt, 1, sysdate) over (order by dt) as lead_dt
from (select distinct dt from inputs)
) b
on a.dt >= b.dt and a.dt < b.lead_dt
join
inputs i on i.dt = b.dt
order by dt, val1, val2
;
Output:
DT VAL1 VAL2
---------- ---- ----
2017-08-14 x 1
2017-08-14 x 2
2017-08-15 x 1
2017-08-15 x 2
2017-08-16 x 1
2017-08-16 x 2
2017-08-17 x 3
2017-08-17 y 4
2017-08-18 x 3
2017-08-18 y 4
2017-08-19 a 5
2017-08-20 a 5

You want to make use of the LAST_VALUE analytic function, something like this:
select
fakedate,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue1rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue1rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue1,
CASE
WHEN flip=1 THEN
LAST_VALUE(yourvalue2rown1 IGNORE NULLS) OVER(ORDER BY fakedate)
ELSE
LAST_VALUE(yourvalue2rown2 IGNORE NULLS) OVER(ORDER BY fakedate)
END as lastvalue2
from
select
fakedate, flip,
CASE WHEN rown = 1 THEN yourvalue1 END as yourvalue1rown1,
CASE WHEN rown = 2 THEN yourvalue1 END as yourvalue1rown2,
CASE WHEN rown = 1 THEN yourvalue2 END as yourvalue2rown1,
CASE WHEN rown = 2 THEN yourvalue2 END as yourvalue2rown2
from
(select (sysdate - 100) + trunc(rownum/2) fakedate, mod(rownum, 2)+1 as flip from dual connect by level <= 100) fakedates
left outer join
(select yt.*, row_number() over(partition by yourdate order by yourvalue1) as rown) yourtable
on
fakedate = yourdate and flip = rown
You'll have to adjust the column names to match your table. You'll also have to adjust the 100 to reflect how many days back you need to go to get to the start of your date data.
Please note this is untested (SQLFiddle is having some oracle issues for me at the momnt) so if you get any syntax errors or other minor things you cant fix, comment and I'll address them

Related

multiple top n aggregates query defined as a view (or function)?

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:
customer_id date a b c d
0001 07/01/22 0 3 3 5
0001 07/12/22 12 0 50 0
0002 06/30/22 5 65 0 30
0002 07/20/22 1 0 19 2
0003 08/01/22 0 0 99 0
I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:
column_ranked customer_id sum rank
'a' 001 12 1
'a' 002 6 2
'b' 002 65 1
'b 001 3 2
'c' 003 99 1
'c' 001 53 2
'd' 002 30 1
'd' 001 5 2
(this assumes no date range filter)
My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:
SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )
How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)
Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:
with
p as (select cast ('2022-01-01' as date) as ds, cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc
The date range is in the second line.
See db<>fiddle.
Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

How to find the last non null value of a column and recursively find the sum value of another column

Suppose I have a column A and currently fetched value of A is null. I need to go back to previous rows and find the non -null value of column A. Then I need to find the sum of another column B from the point non value is seen till the current point. After that I need to add the sum of B with A, which will be new value of A.
For finding the column A non null value I have written the query as
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
But I need to do the calculation of B as mentioned above.
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
Can anyone please help me out ?
Sample data
A B date
null 20 14/06/2019
null 40 13/06/2019
10 50 12/06/2019
here value of A on 14/06/2019 should be replaced by sum of B + value of A on 12/06/2019(which is the 1st non null value of A)=20+40+50+10=120
If you have version 12c or higher:
with t( A,B, dte ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select * from t
match_recognize(
order by dte desc
measures
nvl(
first(a),
y.a + sum(b)
) as a,
first(b) as b,
first(dte) as dte
after match skip to next row
pattern(x* y{0,1})
define x as a is null,
y as a is not null
);
A B DTE
------ ---------- ----------
120 20 2019-14-06
100 40 2019-13-06
10 50 2019-12-06
Use conditional count to divide data into separate groups, then use this group for analytical calculation:
select a, b, dt, grp, sum(nvl(a, 0) + nvl(b, 0)) over (partition by grp order by dt) val
from (
select a, b, dt, count(case when a is not null then 1 end) over (order by dt) grp
from t order by dt desc)
order by dt desc
Sample result:
A B DT GRP VAL
------ ---------- ----------- ---------- ----------
20 2019-06-14 4 120
40 2019-06-13 4 100
10 50 2019-06-12 4 60
5 2 2019-06-11 3 7
6 1 2019-06-10 2 7
3 2019-06-09 1 14
7 4 2019-06-08 1 11
demo
I think what you want is handled by using
sum(<column>) over (...) together with last_value over (...) function as below
:
with t( A,B, "date" ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select nvl(a,sum(b) over (order by 1)+
last_value(a) ignore nulls
over (order by 1 desc)
) as a,
b, "date"
from t;
A B date
--- -- ----------
120 20 14.06.2019
120 40 13.06.2019
10 50 12.06.2019
Demo

Counting blocks of continuous sequences in SQL

Let´s suppose this situation:
CAR TIME
A 1300
A 1301
A 1302
A 1315
A 1316
A 1317
A 1319
A 1320
B 1321
B 1322
I´d like to generate another column, enumerating each trip did by each car.
We consider there´s a new trip every time we get a discontinuity on TIME.
CAR TIME TRIP
A 1300 1
A 1301 1
A 1302 1
A 1315 2
A 1316 2
A 1317 2
A 1319 3
A 1320 3
B 1321 1
B 1322 1
Is there some SQL function to obtain this count ?
Thanks in advance.
You seems want cumulative approach :
select t.*, dense_rank() over (partition by car order by grp1) as trp
from (select t.*, sum(case when grp > 1 then 1 else 0 end) over (partition by car order by time) as grp1
from (select t.*, coalesce((time - lag(time) over (partition by car order by time)), 1) as grp
from table t
) t
) t;
I would use row_number() . . . and - to define the groups. Then, dense_rank():
select t.*,
dense_rank() over (partition by car order by time - seqnum) as trip
from (select t.*, row_number() over (partition by car order by time) as seqnum
from t
) t;
I cannot readily think of any alternative that uses fewer than 2 window functions -- or that would likely be faster using joins and group bys.
Here is how I'd solve this problem:
with grp as (
select row_number() over (partition by CAR order by TIME) rn, a.CAR, a.TIME
from test a
where not exists (select * from test b
where a.CAR=b.CAR
and to_date(b.TIME, 'YYYYmmDDHH24MI')+1/(24*60) = to_date(a.TIME, 'YYYYmmDDHH24MI'))
)
select t.CAR, t.TIME, (
select max(rn) from grp where t.CAR=grp.CAR and grp.TIME <= t.TIME
) as trip
from test t
the main idea is to select start time for each trip (this is done in CTE grp), then use row number as trip identifier
Sample fiddle http://sqlfiddle.com/#!4/6a327/10
Another approach:
SELECT t.car, t.time, MIN(t3.time)
FROM test t, test t3
WHERE NOT EXISTS (SELECT 1
FROM test t2
WHERE t2.car = t.car
AND t2.time = t.time - 1)
AND t3.car = t.car
AND t3.time >= t.time
AND NOT EXISTS (SELECT 1
FROM test t4
WHERE t4.car = t3.car
AND t4.time = t3.time + 1)
GROUP BY t.car, t.time
ORDER BY 1, 2;
The first not-exists finds all the rows that don't have a row for the same car in the previous minute - that is to say, those rows who begin a period for a car.
The later not-exists gets a set of rows that do not have a following row for the same car - i.e. rows that end a period. The max function finds the least of these (that also are filtered to be greater or equal to the start of the period in question.
Combining some of the other ideas, including trips crossing an hour boundary but without converting to a date (in case that is significantly slowing things down), and allowing for repeated times in the same trip:
-- CTE for sample data
with your_table (car, time) as (
select 'A', 201808151259 from dual -- extra row to go across hour
union all select 'A', 201808151300 from dual
union all select 'A', 201808151301 from dual
union all select 'A', 201808151302 from dual
union all select 'A', 201808151315 from dual
union all select 'A', 201808151316 from dual
union all select 'A', 201808151317 from dual
union all select 'A', 201808151319 from dual
union all select 'A', 201808151319 from dual -- extra row for duplicate time
union all select 'A', 201808151320 from dual
union all select 'B', 201808151321 from dual
union all select 'B', 201808151322 from dual
)
-- actual query
select car,
time,
dense_rank() over (partition by car order by trip_start) as trip
from (
select car,
time,
max(case when lag_time = time
or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
then null else time end
) over (partition by car order by time) as trip_start
from (
select car,
time,
lag(time) over (partition by car order by time) as lag_time
from your_table
)
)
order by car, time;
which gets
CAR TIME TRIP
--- ------------ ------------
A 201808151259 1
A 201808151300 1
A 201808151301 1
A 201808151302 1
A 201808151315 2
A 201808151316 2
A 201808151317 2
A 201808151319 3
A 201808151319 3
A 201808151320 3
B 201808151321 1
B 201808151322 1
The innermost query just gets the original data and the previous time value for each row using lag().
The next query out finds the trip start by treating duplicate and adjacent times - including over an hour boundary, via the nested case expression - as null, and then finding the highest value so far, which ignores the just-generated nulls by default. All contiguous runs of times end up with the same trip-start time:
select car,
time,
max(case when lag_time = time
or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
then null else time end
) over (partition by car order by time) as trip_start
from (
select car,
time,
lag(time) over (partition by car order by time) as lag_time
from your_table
)
order by car, time;
CAR TIME TRIP_START
--- ------------ ------------
A 201808151259 201808151259
A 201808151300 201808151259
A 201808151301 201808151259
A 201808151302 201808151259
A 201808151315 201808151315
A 201808151316 201808151315
A 201808151317 201808151315
A 201808151319 201808151319
A 201808151319 201808151319
A 201808151320 201808151319
B 201808151321 201808151321
B 201808151322 201808151321
The outermost query then uses dense_rank() to give the trips consecutive numbering based on their trip-start times.

How to count most consecutive occurrences of a value in a Column in SQL Server

I have a table Attendance in my database.
Date | Present
------------------------
20/11/2013 | Y
21/11/2013 | Y
22/11/2013 | N
23/11/2013 | Y
24/11/2013 | Y
25/11/2013 | Y
26/11/2013 | Y
27/11/2013 | N
28/11/2013 | Y
I want to count the most consecutive occurrence of a value Y or N.
For example in the above table Y occurs 2, 4 & 1 times. So I want 4 as my result.
How to achieve this in SQL Server?
Any help will be appreciated.
Try this:-
The difference between the consecutive date will remain constant
Select max(Sequence)
from
(
select present ,count(*) as Sequence,
min(date) as MinDt, max(date) as MaxDt
from (
select t.Present,t.Date,
dateadd(day,
-(row_number() over (partition by present order by date))
,date
) as grp
from Table1 t
) t
group by present, grp
)a
where Present ='Y'
SQL FIDDLE
You can do this with a recursive CTE:
;WITH cte AS (SELECT Date,Present,ROW_NUMBER() OVER(ORDER BY Date) RN
FROM Table1)
,cte2 AS (SELECT Date,Present,RN,ct = 1
FROM cte
WHERE RN = 1
UNION ALL
SELECT a.Date,a.Present,a.RN,ct = CASE WHEN a.Present = b.Present THEN ct + 1 ELSE 1 END
FROM cte a
JOIN cte2 b
ON a.RN = b.RN+1)
SELECT TOP 1 *
FROM cte2
ORDER BY CT DESC
Demo: SQL Fiddle
Note, the date's in the demo got altered due to the format you posted the dates in your question.

SQL: changing table (ID, [Datetime], [INT]) to (ID, Start_DT, End_DT, [INT])

I have some old data in this format:
ID DT NUM
1 6-1-2012 2
1 6-2-2012 2
1 6-3-2012 4
1 6-4-2012 4
1 6-5-2012 8
1 6-6-2012 8
1 6-7-2012 8
1 6-8-2012 16
1 6-9-2012 2
1 6-10-2012 2
And I need it to look like this:
ID START_DT END_DT NUM
1 6-1-2012 6-2-2012 2
1 6-3-2012 6-4-2012 4
1 6-5-2012 6-7-2012 8
1 6-8-2012 6-8-2012 16
1 6-9-2012 6-10-2012 2
This is the best example of the data that I could quickly come up with. I would love to clarify if I accidently included some misunderstanding(s) in it.
The Rules:
ID: this does change, it will be grouped on eventually, to make things easy it says the same in my example
DT: I get one orginal datetime, in the real data the time part does vary
START_DT, END_DT: I need to get these columns out of the original DT
NUM: this is just an integer that changes and can reoccur per ID
EDIT: this is very awkward..... (there MUST be a better answer)... i haven't tested this yet with a lot of conditions but it looks okay from the start.... and had to manually find and replace all the field names (be kind)
select * from (
select *,row_number() over (partition by if_id, [z.num] order by if_id, [y.num]) as rownum
from (
select y.id,
y.dt as [y.dt],
z.dt as [z.dt],
y.num
from #temp as y
outer apply (select top 1 id, dt, num
from #temp as x
where x.id = y.id and
x.dt > y.dtand
x.num <> y.num
order by x.dt asc) as z ) as x ) as k
where rownum=1
order by [y.dt]
select id,min(dt) as start_date, max(dt) as end_date, num
from whatevertablename_helps_if_you_supply_these_when_asking_for_code
group by 1,4
It's also possible to do it as a subquery to get the min and a subquery to get the max, but don't think you need to do that here.
My answer is Postgres...I think you'll need to change the group by statement to be id,num instead in t-sql.
Adding:
How do you know that it is
1 6-1-2012 6-2-2012 2
1 6-9-2012 6-10-2012 2
and not
1 6-1-2012 6-10-2012 2
1 6-2-2012 6-9-2012 2
You need more business rules to determine that
select id, [y.dt] as start_dt, [z.dt] as end_dt, num from (
select *,row_number() over (partition by id, [z.dt] order by id, [y.dt]) as rownum
from (
select y.id,
y.dt as [y.dt],
z.dt as [z.dt],
y.num
from #temp as y
outer apply (select top 1 id, dt, num
from #temp as x
where x.id = y.id and
x.dt > y.dt and
x.num <> y.num
order by x.dt asc) as z ) as x ) as k
where rownum=1
order by id, [y.dt]
and that gives us... (with different data)
id start_dt end_dt num
6 2011-10-01 00:00:00.000 2012-01-18 00:00:00.000 896
6 2012-01-18 00:00:00.000 2012-02-01 00:00:00.000 864
6 2012-02-01 00:00:00.000 NULL 896
i posted that up at the top about an hour ago maybe...? and said it was awkward (and sloppy)... i was wondering if anyone has a better answer because mine sucks. but i don't understand why people keep posting that they need better business rules and need to know how to handle certain situations. this code does exactly what i want except end_dt is the datetime of the new num and not the last occurance of the current num.... but I can work with that. It is better than nothing. (sorry, frustrated).
Business rule: the data is already there. it should show the logical span. I need the start_dt and end_dt for num... When NUM = Y, the Start date is when NUM changes from X to Y and the End Date is when Y changes to Z. I can't give you more than I have myself with all of this... These rules were enough for me...??
ok, same data:
id start_dt end_dt num
1 6-1-2012 6-3-2012 2
1 6-3-2012 6-5-2012 4
1 6-5-2012 6-8-2012 8
1 6-8-2012 6-9-2012 16
1 6-9-2012 NULL 2