IBM i5OS SQL: Forecasting periodic events - sql

I have a situation where I have about 4000 tasks that all have different periodic rules for occurences.
They are preventive maintenance tasks. The table I get them from only provides me the start date and frequency of occurence in units of weeks.
Example:
Task (A) is scheduled to occur every two weeks, starting on week 1 of 2015.
Task (B) is scheduled to occur every 6 weeks, starting on week 2 of 2011.
And so on...
What I need to do is produce a resultset that contains a record for each occurence since the start point, for each task.
It's like generating a sequence.
Example:
Task | Year | Week
------|-------|-------
A | 2015 | 1
A | 2015 | 3
A | 2015 | 5
A | 2015 | 7
[...]
B | 2011 | 2
B | 2011 | 8
And so on...
You probably think "hey, that is simple, just put it in a loop then your good."
Not so fast!
The trick is that I need this to be within one SQL query.
I know I probably should be doing it in a stored procedure or a function. But I can't, for now. I could also do it in some VbA code since it will go in an Excel spreadsheet. But Excel has become an unstable product lately and I do not want to risk my code to fail after an update from Microsoft. So I try as much as possible to stay within the limits of IBM i5OS SQL queries.
I know the answer could be that it is impossible. But I believe in this community.
Thanks in advance,
EDIT :
I have found this post where it shows how to list dates within a range.
IBM DB2: Generate list of dates between two dates
I tried to generate a list of dates based on periodicity and it worked.
I am still struggling on the generation of multiple sequences based on multiple periodicity.
Here's the code I have so far:
SELECT d.min + num.n DAYS AS DATES
FROM
(VALUES(DATE('2017-01-01'), DATE('2017-03-01'))) AS d(min, max)
JOIN
(
-- Creates a table of numbers based on periodicity
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
-- I just need to replace the 2nd argument by the desired frequency */
WHERE MOD(n1.n+n10.n+n100.n, 6 )=0
ORDER BY n1.n + n10.n + n100.n
) num
ON
d.min + num.n DAYS<= d.max
ORDER BY num.n
In other words, I need the dates in table d to be dynamic as well as the periodicity (6) in num's table WHERE clause.
Should I be using a WITH statement? If so, can someone please guide me because I am not very used to this kind of statement.
EDIT#2:
Here is the table structure I'm working with:
TABLE NAME: SGTRCDP (Programmed Tasks):
| | Start | Start | Freq.
Asset | Task | Year | Week | (week)
--------------|------------|----------|----------|----------
TMPC531 | VER0560 | 2011 | 10 | 26
BAT0404 | IPNET030 | 2011 | 2 | 4
B-EXTINCT-151 | 001H-0011 | 2014 | 15 | 17
[...] | [...] | [...] | [...] | [...]
4000 more like these, the unique key being combination of `Asset` and `Task` fields.
What I would like to have is this:
Asset | Task | Year | Week
--------------|------------|----------|----------
TMPC531 | VER0560 | 2011 | 10
TMPC531 | VER0560 | 2011 | 36
TMPC531 | VER0560 | 2012 | 10
TMPC531 | VER0560 | 2012 | 36
TMPC531 | VER0560 | 2013 | 10
TMPC531 | VER0560 | 2013 | 36
TMPC531 | VER0560 | 2014 | 10
TMPC531 | VER0560 | 2014 | 36
TMPC531 | VER0560 | 2015 | 10
TMPC531 | VER0560 | 2015 | 36
TMPC531 | VER0560 | 2016 | 10
TMPC531 | VER0560 | 2016 | 36
TMPC531 | VER0560 | 2017 | 10
TMPC531 | VER0560 | 2017 | 36
BAT0404 | IPNET030 | 2011 | 2
BAT0404 | IPNET030 | 2011 | 6
BAT0404 | IPNET030 | 2011 | 10
BAT0404 | IPNET030 | 2011 | 14
BAT0404 | IPNET030 | 2011 | 18
BAT0404 | IPNET030 | 2011 | 22
BAT0404 | IPNET030 | 2011 | 26
BAT0404 | IPNET030 | 2011 | 30
BAT0404 | IPNET030 | 2011 | 34
BAT0404 | IPNET030 | 2011 | 38
[...] | [...] | [...] | [...]
BAT0404 | IPNET030 | 2017 | 34
BAT0404 | IPNET030 | 2017 | 38
B-EXTINCT-151 | 001H-0011 | 2014 | 15
B-EXTINCT-151 | 001H-0011 | 2014 | 32
B-EXTINCT-151 | 001H-0011 | 2014 | 49
B-EXTINCT-151 | 001H-0011 | 2015 | 14
B-EXTINCT-151 | 001H-0011 | 2015 | 31
[...] | [...] | [...] | [...]
B-EXTINCT-151 | 001H-0011 | 2017 | 8
B-EXTINCT-151 | 001H-0011 | 2017 | 24
I was able to make it using CTE, but it generates so many records that whenever I want to filter or order data, it takes forever. Same goes for downloading the whole resultset.
And I wouldn't risk creating a temporary table and bust the disk space.
Another caveat of CTE, is that It cannot be referenced as a subquery.
And guess what, my plan was to use it as a subquery in FROM clause of a SELECT joining it with the actual work orders table and do Asset-Task-Year-Week matching to see if the programmed tasks were executed as planned or not.
Anyway, here is the CTE I used to get it:
WITH PPM (EQ, TASK, FREQ, OCCYR, OCCWK, OCCDAT, NXTDAT) AS
(
SELECT
TRCD.DLACCD EQ,
TRCD.DLJ1CD TASK,
INT(SUBSTR(TRCD.DLL1TX,9,3)) FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (INT(SUBSTR(TRCD.DLL1TX,9,3)) * 7) DAYS) NXTDAT
FROM
(SELECT * FROM SGTRCDP WHERE DLIMST<>'H' AND TRIM(DLK5Cd)='S') TRCD
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=SUBSTR(TRCD.DLL1TX,1,4) AND AOAQNB=INT(SUBSTR(TRCD.DLL1TX,12,2))
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
UNION ALL
SELECT
PPMNXT.EQ,
PPMNXT.TASK,
PPMNXT.FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) NXTDAT
FROM
PPM
PPMNXT
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=YEAR(PPMNXT.NXTDAT) AND AOAQNB=WEEK_ISO(PPMNXT.NXTDAT)
WHERE
YEAR(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) <= YEAR(CURRENT_DATE)
)
SELECT EQ, TASK, OCCYR, OCCWK, OCCDAT FROM PPM
That was the best I could do.
You will notice that I set a root to a specific Asset and Task:
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
Normally I would not filter data in order to retrieve all the scheduled weeks for each tasks. I had to filter on one root key to avoid the query to eventually eat up resources make our AS/400 crash.
Again, I am not an expert in CTEs, there might be a better solution.
Thanks

Related

Filter on date relative to today but the dates are in separate fields

I have a table where the date parts are in separate fields and I am struggling to put a filter on it (pulling all the data is so much that it basically times out).
How can I write a sql query to pull the data for only the past 7 days?
| eventinfo | entity | year | month | day |
|------------|-------------------------|------|-------|-----|
| source=abc | abc=030,abd=203219,.... | 2022 | 08 | 07 |
| source=abc | abc=030,abd=203219,.... | 2022 | 08 | 05 |
| source=abc | abc=030,abd=203219,.... | 2022 | 07 | 33 |
Many thanks in advance.
You can use concatenation on your columns, convert them to date and then apply the filter.
-- Oracle database
select *
from event
where to_date( year||'-'||month||'-'||day,'YYYY-MM-DD') >= trunc(sysdate) - 7;

How to query to get dates from the previous fiscal week of this year and the corresponding fiscal week from last year in a dynamic table in GCP?

So say I have a table that refreshes weekly. There's a column for fiscal week (FW). Say we are currently on fiscal week 32, and the way this query should go is that we always need the week prior so FW31 in this example. However, I not only need FW31 of this year, but FW31 of last year too. What's a way to create a dynamic query that would do that, if possible?
Example table below:
YEAR | FW | Dates | Info_1
... | ... | ... | ...
2019 | 30 | 09-02-2020 | blah
2019 | 30 | 09-03-2020 | blah
2019 | 30 | 09-04-2020 | blah
... | ... | ... | ...
2019 | 31 | 09-10-2020 |
... | ... | ... | ...
2020 | 30 | 09-06-2020 |
... | ... | ... | ...
2020 | 31 | 09-14-2020 | blah
2020 | 31 | 09-15-2020 | blah
2020 | 31 | 09-16-2020 | blah
... | ... | ... | ...
So to my understanding, it wouldn't be possible to do it by date since the fiscal week of this year might not correspond with the exact same dates for the same fiscal week of last year. So I'm banking on utilizing the 'FW' column in order to pull it. However, again, this is something that I would like for the query to be able to change each week, as far as going from 31 to 32 and so on. This is within Google Cloud Platform, so I'd love to save it as view.
Given that I don't know what you actually mean with "fiscal week", a possible solution could be the following:
SELECT *
FROM `example_db`.`example_table`
WHERE
`FW` = weekofyear(date(now())-7) AND
`YEAR` IN (year(now()), year(now())-1);

How can I get a record to be counted in multiple columns of a Crosstab Query?

Background information:
My company requires employees to maintain at least one certification (cert) on a position. There are a total of 17 different certifications that an employee can get.
An employee can hold multiple certs. But on any one day they can only "sit" one of the positions that they are certified in. Most employees primarily sit the highest level position that they hold a cert in, but can sit a lower level position if there are manning shortages in that position and if they hold that particular cert (some employees come to us holding the higher level certs but none of the lower ones because they let them expire).
Multiple employees can hold the same cert.
Around 90% of employees are on contract, meaning they have a set termination date. Contracts can be extended but for the sake of this Access database, and the report to be generated, we're presuming that the termination date is set in stone.
My boss (and boss' boss) are wanting to put together a manning projection report so that they don't get caught off guard should we start running low on employees certified in any one position.
Example of what they want:
Lets say you have three employees:
Employee1 has certs in position1, position2, and position3 but he primarily sits as position3 and his contract expires June 2020.
Employee2 has certs in position1 and position2 but primarily sits as position2 and her contract expires in February 2022.
Employee3 is new and arrived August 2019 and is in training to get position1, maximum allowed training time for initial cert is 3 months, so presumably he should have his position1 cert by December 2019 and his contract expires August 2025.
Lets say my boss wants to project out 12 months with the starting month being November 2019 (he'll only be able to select a starting month-year that is equal to or later than the current month-year). The charts below, which are generated in subreports, should be what gets generated off of the above employee information.
All Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |
| Position2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Primary Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Now I already have a solution in place but it's extremely inefficient and involves a query for each cell (2 Charts X 12 Months X 17 positions = 408 Queries when a report is generated). I'm hoping to do something more efficient with a crosstab query.
The tables are set up as such (only listing relevant fields):
Emp_table
ID (autoNum)
contractStarted (Date)
contractEnd (Date)
Cert_individual
ID (autoNum)
certID (num, many->one relationship to cert_table.ID)
EmpID (num, many->one relationship to Emp_table.ID)
date_cert_received (date)
primary (yes/no)
cert_table
ID (autoNum)
cert_name (short text)
Obviously I'd need to do a couple of INNER JOINS in order to get everything together and I tried using the format from this website for my crosstab query but it would only add an individual cert to a count on the month-year that the employee received it and not to every month that the employee will hold the cert.
So my question is:
Is there a way in SQL or VBA to get a cert counted across multiple columns (month-years) based off of when the employee received the cert and when their contract is scheduled to terminate?
As far as I know, the main problem in getting the crosstab query is that it can only generate columns with data that you already have.
A solution for you to get the monthly columns would be to have a side table with the 12 dates and then use the Cartesian product to generate the monthly data for each of your records in your certification table. This "date" table can be updated and maintained to match the months that you require in your report with a query.
For example, if you have a table named TempDates :
And a table with Employees with the following data :
You can generate the cartesian product with a query that I named QryCertsDates :
SELECT Employees.*, TempDates.* FROM TempDates, Employees;
Which lets you attach all the wanted dates with your original date from the table Employees in order to obtain data similar to below :
Now you can generate your crosstab query pivoting on the month and year and filtering the dates with the WHERE criteria such as :
TRANSFORM Count(QryCertsDates.Cert) AS CountOfCert SELECT QryCertsDates.Cert FROM QryCertsDates WHERE (((CDate([Yr] & "-" & [Mo])) Between CDate([Start]) And CDate([Expire]))) GROUP BY QryCertsDates.Cert PIVOT CDate([Yr] & "-" & Format([Mo],"00"));
You will end up ultimately with something like this :
You can do the same thing to get your second table/report as well. I don't know your database structure, so you will most likely need to do some adaptation. The other possible way that you can achieve a similar result would be to fill in a table using VBA.
However, this might be the easier solution to implement. Good luck!

Old Excel Data - Merge With SQL Transaction Log

So I have two years of total data from 2013 and 2014 from excel sheets. They're summed up and put onto my server. For 2015 we have a log with dates and each individual transaction. This makes it convenient to just add it all up for 2015. I want to merge the two: sum up the 2015s and then add that as its own column for the master sheet.
What I have now:
Company | 2013 | 2014 |
----------------------
Apple | 300 | 200 |
Toyota | 250 | 250 |
2015:
Date | Company | Units
-------------------------
1/1/15 | Apple | 30
2/28/15 | Toyota | 14
3/14/15 | Toyota | 22
Ideal Look:
Company | 2013 | 2014 | 2015
-----------------------------
Apple | 300 | 200 | 300
Toyota | 250 | 250 | 400
Without knowing your DBMS, I'm taking a shot in the dark, but I think you can handle this all directly within SQL:
with data_2015 as (
select company, sum (units) as units
from table_2015_data
group by company
)
select
h.Company, h.Units_2013, h.Units_2014, d.units as Units_2015
from
history h
left join data_2015 d on
h.Company = d.Company
From there, just bring it into Excel using MSQuery.

Get latest child record without given order

Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id. The Column checks.old describes a type of check. Now I want to get for each migration the check with the biggest time where old is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id) instead of max(time).)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
A simple solution with the Postgres specific DISTINCT ON:
Query 1 ("for each migration the check with the biggest time where old is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE condition for Query 2:
...
WHERE NOT old
...
Details:
Select first row in each GROUP BY group?
But if you want better read performance with big tables, use JOIN LATERAL (Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Optimize GROUP BY query to retrieve latest record per user
SQL Fiddle.
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.