SQL - delimiting + matching - sql

I have a table with the follow columns:
reportid, reportname, startdate, consolidated
Reports which are consolidated do not have a start date.
What I need to do is to find the earliest start date within the subreports and set it as the start date
For example
report reportname startdate consolidated
1 ABC 2019/1/1 1
2 DEF 3,4
3 GHI 2019/4/1 3
4 JKF 2019/5/1 4
The report may be consolidated from any number of reports (ie. report 10 may consist of 11,12,13 while report 20 may consist of only 21 and 22)
Output required
report reportname startdate consolidated
1 ABC 2019/1/1 1
2 DEF 2019/4/1 3,4
3 GHI 2019/4/1 3
4 JKF 2019/5/1 4
I can only think of pulling each number and looping through the entire list, comparing each date that is picked up as I go. However, this list is very very long which doesn't make it very feasible.
Thanks in advance!
Unfortunately, I do not have the authority to adjust the database where these tables are concerned.

You need to fix your data model. Storing multiple values in a string is wrong. Storing numbers in a string is wrong.
Sometimes, we are stuck with other peoples really, really bad decisions. You can do what you want, but it is more complicated than necessary:
select t.*
from (select t.*, min(startdate) over (partition by s.value) as imputed_startdate,
row_number() over (partition by report order by report) as seqnum
from t outer apply
string_split(t.consolidated, ',') s
) t
where seqnum = 1;
Here is a db<>fiddle.

Related

MS Access Join Queries without common field

I have an MS Access database that is used for monitoring my investment portfolio.
I have a series of individual queries which monitor individual stock performance on 1 Week, 1 Month, 3 Month, 6 Month interval etc and returns the top 5 values. What I want to now do is merge these into a table in a format similar to the following. What Tickers are shown for each period aren't necessary the same or the same order
Ticker1 - Gain1Week | Ticker2 - Gain1Month | Ticker3 - Gain3Month | Ticker4 - Gain6Month
Top Reading for wk Top Reading for Mth etc.
2nd Top Reading wk
etc.
Here is a sample of the individual query for Ticker2 - Gain 1Month
SELECT TOP 5 qryStockOverview01Month.StockID, Round(([CurrentPrice]-[1Month])/[1Month]*100,2) AS Gain01Month
FROM qryStockOverview01Month LEFT JOIN qryStockOverviewNow ON qryStockOverview01Month.StockID = qryStockOverviewNow.StockID
ORDER BY Round(([CurrentPrice]-[1Month])/[1Month]*100,2) DESC;
What I am unsure of is how to join the queries together given there is no common field. I did try to put a autonumber field in the query for 1 - 5 and was thinking could use that as the common field but couldn't get that to work.

How can I return rows which match on some columns and fulfil a DateTime comparison between two other columns using SQL?

I have a table which contains rows for jobs, example below, where 01/01/1980 is used rather than null in the ClosedDate column for jobs which are not finished:
JobNumber JobCategory CustomerID CreatedDate ClosedDate
1 Small 1 01/01/2016 03/01/2016
2 Small 2 03/01/2016 07/01/2016
3 Large 2 06/01/2016 07/01/2016
4 Medium 1 08/01/2016 10/01/2016
5 Small 3 10/01/2016 01/01/1980
6 Medium 3 15/01/2016 01/01/1980
7 Large 2 16/01/2016 17/01/2016
8 Large 2 19/01/2016 20/01/2016
9 Small 1 19/01/2016 01/01/1980
10 Medium 2 19/01/2016 01/01/1980
I need to return a list of any jobs where the same customer has had a job of the same category created within 3 days of the previous job being closed.
So, I would want to return:
7 Large 2 16/01/2016 17/01/2016
8 Large 2 19/01/2016 20/01/2016
because Customer 2 had a Large job closed on 17/01/2016 and another Large job opened on 19/01/2016, which is within 3 days.
In order to do this, I assume I need to compare each record in the table with each subsequent record, looking for a match on JobCategory and comparing CreatedDate with ClosedDate between rows.
Can anyone advise my best option for this using SQL? I'm using SQL Server 2012.
The first thing that you should do is get rid of "magic dates" in your system. If the job hasn't been closed yet then the ClosedDate is not known. SQL has a value for exactly that - NULL. That prevents anyone in the future from having to know the magic date of 1/1/1980 or from that having to be hard-coded throughout your system.
Next, you don't have to compare each row with each one after it. Define what you're looking for and find matches that meet those qualifications. You didn't specify which type of SQL Server you're using (you should tag your question with Oracle or MySQL or SQL Server), so the below query is written for SQL Server. Your version might have different date functions.
SELECT
J1.JobNumber,
J1.JobCategory,
J1.CustomerID,
J1.CreatedDate,
J1.ClosedDate,
J2.JobNumber,
J2.CreatedDate,
J2.ClosedDate
FROM
Jobs J1
INNER JOIN Jobs J2 ON
J2.CustomerID = J1.CustomerID AND
J2.JobCategory = J1.JobCategory AND
DATEDIFF(DAY, J1.ClosedDate, J2.CreatedDate) BETWEEN 0 AND 3 AND
J2.JobNumber <> J1.JobNumber
This will return the jobs in a single row instead of two rows. If that's a problem then the query could be altered slightly to do so. This can also be done a little more easily with windowed functions, but again, since you didn't specify your SQL vendor I didn't want to use those.
Since you're using SQL Server, you should be able to use windowed functions like so:
;WITH CTE_JobsWithDates AS -- Probably a poor name for the CTE
(
SELECT
JobNumber,
JobCategory,
CustomerID,
CreatedDate,
ClosedDate,
LEAD(CreatedDate, 1) OVER (PARTITION BY JobCategory, CustomerID ORDER BY CreatedDate) AS NextCreatedDate,
LAG(ClosedDate, 1) OVER (PARTITION BY JobCategory, CustomerID ORDER BY CreatedDate) AS PreviousClosedDate
FROM
Jobs
)
SELECT
JobNumber,
JobCategory,
CustomerID,
CreatedDate,
ClosedDate
FROM
CTE_JobsWithDates
WHERE
DATEDIFF(DAY, ClosedDate, NextCreatedDate) BETWEEN 0 AND 3 OR
DATEDIFF(DAY, LastClosedDate, CreatedDate) BETWEEN 0 AND 3
That was off the cuff, so please test and let me know if anything isn't quite right.
Try:
SELECT a.*
FROM
Job AS a
JOIN
Job AS b ON
a.CustomerID = b.CustomerID AND a.JobCategory = b.JobCategory
WHERE
a.JobNumber != b.JobNumber
AND (
b.CreatedDate - a.ClosedDate BETWEEN 0 AND 3
OR
a.CreatedDate - b.ClosedDate BETWEEN 0 AND 3)

count occurrences for each week using db2

I am looking for some general advice rather than a solution. My problem is that I have a list of dates per person where due to administrative procedures, a person may have multiple records stored for this one instance, yet the date recorded is when the data was entered in as this person is passed through the paper trail. I understand this is quite difficult to explain so I'll give an example:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2000-01-01 B
1 2000-01-02 C
1 2003-04-01 A
1 2003-04-03 A
where I want to know how many valid records a person has by removing annoying audits that have recorded the date as the day the data was entered, rather than the date the person first arrives in the dataset. So for the above person I am only interested in:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2003-04-01 A
what makes this problem difficult is that I do not have the luxury of an audit column (the audit column here is just to present how to data is collected). I merely have dates. So one way where I could crudely count real events (and remove repeat audit data) is to look at individual weeks within a persons' history and if a record(s) exists for a given week, add 1 to my counter. This way even though there are multiple records split over a few days, I am only counting the succession of dates as one record (which after all I am counting by date).
So does anyone know of any db2 functions that could help me solve this problem?
If you can live with standard weeks it's pretty simple:
select
person, year(dt), week(dt), min(dt), min(audit)
from
blah
group by
person, year(dt), week(dt)
If you need seven-day ranges starting with the first date you'd need to generate your own week numbers, a calendar of sorts, e.g. like so:
with minmax(mindt, maxdt) as ( -- date range of the "calendar"
select min(dt), max(dt)
from blah
),
cal(dt,i) as ( -- fill the range with every date, count days
select mindt, 0
from minmax
union all
select dt+1 day , i+1
from cal
where dt < (select maxdt from minmax) and i < 100000
)
select
person, year(blah.dt), wk, min(blah.dt), min(audit)
from
(select dt, int(i/7)+1 as wk from cal) t -- generate week numbers
inner join
blah
on t.dt = blah.dt
group by person, year(blah.dt), wk

Exponential decay in SQL for different dates page views

I have a different dates with the amount of products viewed on a webpage over a 30 day time frame. I am trying to create a exponential decay model in SQL. I am using exponential decay because I want to highlight the latest events over older ones. I not sure how to write this in SQL without getting an error. I have never done this before with this type of model so want to make sure I am doing it correctly too.
=================================
Data looks like this
product views date
a 1 2014-05-15
a 2 2014-05-01
b 2 2014-05-10
c 4 2014-05-02
c 1 2014-05-12
d 3 2014-05-11
================================
Code:
create table decay model as
select product,views,date
case when......
from table abc
group by product;
not sure what to write to do the model
I want to penalize products that were viewed that were older vs products that were viewed more recently
Thank you for your help
You can do it like this:
Choose the partition in which you want to apply exponential decay, then order descending by date within such a group.
use the function ROW_NUMBER() with ascendent ordering to get the row numbering within each subgroup.
calculate pow(your_variable_in_[0,1], rownum) and apply it to your result.
Code might look like this (might work in Oracle SQL or db2):
SELECT <your_partitioning>, date, <whatever>*power(<your_variable>,rownum-1)
FROM (SELECT a.*
, ROW_NUMBER() OVER (PARTITION BY <your_partitioning> ORDER BY a.date DESC) AS rownum
FROM YOUR_TABLE a)
ORDER BY <your_partitioning>, date DESC
EDIT: I read again over your problem and think I understood now what you asked for, so here is a solution which might work (decay factor is 0.9 here):
SELECT product, sum(adjusted_views) // (i)
FROM (SELECT product, views*power(0.9, rownum-1) AS adjusted_views, date, rownum // (ii)
FROM (SELECT product, views, date // (iii)
, ROW_NUMBER() OVER (PARTITION BY product ORDER BY a.date DESC) AS rownum
FROM YOUR_TABLE a)
ORDER BY product, date DESC)
GROUP BY product
The inner select statement (iii) creates a temporary table that might look like this
product views date rownum
--------------------------------------------------
a 1 2014-05-15 1
a 2 2014-05-14 2
a 2 2014-05-13 3
b 2 2014-05-10 1
b 3 2014-05-09 2
b 2 2014-05-08 3
b 1 2014-05-07 4
The next query (ii) then uses the rownumber to construct an exponentially decaying factor 0.9^(rownum-1) and applies it to views. The result is
product adjusted_views date rownum
--------------------------------------------------
a 1 * 0.9^0 2014-05-15 1
a 2 * 0.9^1 2014-05-14 2
a 2 * 0.9^2 2014-05-13 3
b 2 * 0.9^0 2014-05-10 1
b 3 * 0.9^1 2014-05-09 2
b 2 * 0.9^2 2014-05-08 3
b 1 * 0.9^3 2014-05-07 4
In a last step (the outer query) the adjusted views are summed up, as this seems to be the quantity you are interested in.
Note, however, that in order to be consistent there should be regular distances between the dates, e.g., always on day (--not one day here and a month there, because these will be weighted in a similar fashion although they shouldn't).

Oracle Database Temporal Query Implementation - Collapse Date Ranges

This is the result of one of my queries:
SURGERY_D
---------
01-APR-05
02-APR-05
03-APR-05
04-APR-05
05-APR-05
06-APR-05
07-APR-05
11-APR-05
12-APR-05
13-APR-05
14-APR-05
15-APR-05
16-APR-05
19-APR-05
20-APR-05
21-APR-05
22-APR-05
23-APR-05
24-APR-05
26-APR-05
27-APR-05
28-APR-05
29-APR-05
30-APR-05
I want to collapse the date ranges which are continuous, into intervals. For examples,
[01-APR-05, 07-APR-05], [11-APR-05, 16-APR-05] and so on.
In terms of temporal databases, I want to 'collapse' the dates. Any idea how to do that on Oracle? I am using version 11. I searched for it and read a book but couldn't find/understand how to do it. It might be simple, but everyone has their own flaws and Oracle is mine. Also, I am new to SO so my apologies if I have violated any rules. Thank You!
You can take advantage of the ROW_NUMBER analytical function to generate a unique, sequential number for each of the records (we'll assign that number to the dates in ascending order).
Then, you group the dates by difference between the date and the generated number - the consecutive dates will have the same difference:
Date Number Difference
01-APR-05 1 1 -- MIN(date_val) in group with diff. = 1
02-APR-05 2 1
03-APR-05 3 1
04-APR-05 4 1
05-APR-05 5 1
06-APR-05 6 1
07-APR-05 7 1 -- MAX(date_val) in group with diff. = 1
11-APR-05 8 3 -- MIN(date_val) in group with diff. = 3
12-APR-05 9 3
13-APR-05 10 3
14-APR-05 11 3
15-APR-05 12 3
16-APR-05 13 3 -- MAX(date_val) in group with diff. = 3
Finally, you select the minimal and maximal date in each of the groups to get the beginning and ending of each range.
Here's the query:
SELECT
MIN(date_val) start_date,
MAX(date_val) end_date
FROM (
SELECT
date_val,
row_number() OVER (ORDER BY date_val) AS rn
FROM date_tab
)
GROUP BY date_val - rn
ORDER BY 1
;
Output:
START_DATE END_DATE
------------ ----------
01-04-2005 07-04-2005
11-04-2005 16-04-2005
19-04-2005 24-04-2005
26-04-2005 30-04-2005
You can check how that works on SQLFidlle: Dates ranges example