How can I get a record to be counted in multiple columns of a Crosstab Query? - sql

Background information:
My company requires employees to maintain at least one certification (cert) on a position. There are a total of 17 different certifications that an employee can get.
An employee can hold multiple certs. But on any one day they can only "sit" one of the positions that they are certified in. Most employees primarily sit the highest level position that they hold a cert in, but can sit a lower level position if there are manning shortages in that position and if they hold that particular cert (some employees come to us holding the higher level certs but none of the lower ones because they let them expire).
Multiple employees can hold the same cert.
Around 90% of employees are on contract, meaning they have a set termination date. Contracts can be extended but for the sake of this Access database, and the report to be generated, we're presuming that the termination date is set in stone.
My boss (and boss' boss) are wanting to put together a manning projection report so that they don't get caught off guard should we start running low on employees certified in any one position.
Example of what they want:
Lets say you have three employees:
Employee1 has certs in position1, position2, and position3 but he primarily sits as position3 and his contract expires June 2020.
Employee2 has certs in position1 and position2 but primarily sits as position2 and her contract expires in February 2022.
Employee3 is new and arrived August 2019 and is in training to get position1, maximum allowed training time for initial cert is 3 months, so presumably he should have his position1 cert by December 2019 and his contract expires August 2025.
Lets say my boss wants to project out 12 months with the starting month being November 2019 (he'll only be able to select a starting month-year that is equal to or later than the current month-year). The charts below, which are generated in subreports, should be what gets generated off of the above employee information.
All Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |
| Position2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Primary Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Now I already have a solution in place but it's extremely inefficient and involves a query for each cell (2 Charts X 12 Months X 17 positions = 408 Queries when a report is generated). I'm hoping to do something more efficient with a crosstab query.
The tables are set up as such (only listing relevant fields):
Emp_table
ID (autoNum)
contractStarted (Date)
contractEnd (Date)
Cert_individual
ID (autoNum)
certID (num, many->one relationship to cert_table.ID)
EmpID (num, many->one relationship to Emp_table.ID)
date_cert_received (date)
primary (yes/no)
cert_table
ID (autoNum)
cert_name (short text)
Obviously I'd need to do a couple of INNER JOINS in order to get everything together and I tried using the format from this website for my crosstab query but it would only add an individual cert to a count on the month-year that the employee received it and not to every month that the employee will hold the cert.
So my question is:
Is there a way in SQL or VBA to get a cert counted across multiple columns (month-years) based off of when the employee received the cert and when their contract is scheduled to terminate?

As far as I know, the main problem in getting the crosstab query is that it can only generate columns with data that you already have.
A solution for you to get the monthly columns would be to have a side table with the 12 dates and then use the Cartesian product to generate the monthly data for each of your records in your certification table. This "date" table can be updated and maintained to match the months that you require in your report with a query.
For example, if you have a table named TempDates :
And a table with Employees with the following data :
You can generate the cartesian product with a query that I named QryCertsDates :
SELECT Employees.*, TempDates.* FROM TempDates, Employees;
Which lets you attach all the wanted dates with your original date from the table Employees in order to obtain data similar to below :
Now you can generate your crosstab query pivoting on the month and year and filtering the dates with the WHERE criteria such as :
TRANSFORM Count(QryCertsDates.Cert) AS CountOfCert SELECT QryCertsDates.Cert FROM QryCertsDates WHERE (((CDate([Yr] & "-" & [Mo])) Between CDate([Start]) And CDate([Expire]))) GROUP BY QryCertsDates.Cert PIVOT CDate([Yr] & "-" & Format([Mo],"00"));
You will end up ultimately with something like this :
You can do the same thing to get your second table/report as well. I don't know your database structure, so you will most likely need to do some adaptation. The other possible way that you can achieve a similar result would be to fill in a table using VBA.
However, this might be the easier solution to implement. Good luck!

Related

Finding how many days left per user per year

I have a table that tracks leave days for each user:
ID | Start | End | IDUser
1 | 02-02-2020 | 03-02-2020 | 2
2 | 01-02-2020 | 21-02-2020 | 2
IDUser connects to the Users Table, that has IDUser and Username columns
I have a view / exhibition / query that shows previous mentioned columns data PLUS a column named UsedDays that counts how many leave days were used:
DATEDIFF(DAY, dbo.leavedays.start, dbo.leavedays.[end]) + 1
This is what I have now:
Start | End | IDUser | UsedDays
02-02-2020 | 03-02-2020 | 2 | 1
01-02-2020 | 21-02-2020 | 1 | 20
Each user has a total available number of days per year so I would like to have a column that subtracts from those total possible days of each user, and show how many are left.
Example:
John (IDUser = 2) has 30 days available this year and he already used 1, so there are 29 left
Start | End | IDUser | TotalDaysYear | UsedDays | LeftDays
02-02-2020 | 03-02-2020 | 2 | 30 | 1 | 29
01-02-2020 | 21-02-2020 | 1 | 20 | 20 | 0
I believe I have to create a table for TotalDaysYear, probably with:
ID | Year | TotalDaysYear | IDUser
1 | 2020 | 30 | 2
2 | 2020 | 20 | 1
IDUser connects to the Users Table, that has IDUser and Username columns
But I'm having trouble finding the logic for the relationship and how to find the result that I want, since it depends also on the year (available days may change per yer, per user).
Assuming you are using SQL Server, this should work:
SELECT
ld.start,
ld.[end],
ld.IDUser,
ldy.TotalDaysYear,
SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as UsedDays,
ldy.TotalDaysYear - SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as LeftDays
FROM leavedays ld
LEFT JOIN leavedaysperyear ldy
ON YEAR(ld.start) = ldy.Year AND ld.IDUser = ldy.IDUser
Basic idea is to have a running total of Used Days per user, per year and then subtract it to total available days for that user, during that same year.
Here's a SQLFiddle
NB. The example provided doesn't handle leave periods across years

IBM i5OS SQL: Forecasting periodic events

I have a situation where I have about 4000 tasks that all have different periodic rules for occurences.
They are preventive maintenance tasks. The table I get them from only provides me the start date and frequency of occurence in units of weeks.
Example:
Task (A) is scheduled to occur every two weeks, starting on week 1 of 2015.
Task (B) is scheduled to occur every 6 weeks, starting on week 2 of 2011.
And so on...
What I need to do is produce a resultset that contains a record for each occurence since the start point, for each task.
It's like generating a sequence.
Example:
Task | Year | Week
------|-------|-------
A | 2015 | 1
A | 2015 | 3
A | 2015 | 5
A | 2015 | 7
[...]
B | 2011 | 2
B | 2011 | 8
And so on...
You probably think "hey, that is simple, just put it in a loop then your good."
Not so fast!
The trick is that I need this to be within one SQL query.
I know I probably should be doing it in a stored procedure or a function. But I can't, for now. I could also do it in some VbA code since it will go in an Excel spreadsheet. But Excel has become an unstable product lately and I do not want to risk my code to fail after an update from Microsoft. So I try as much as possible to stay within the limits of IBM i5OS SQL queries.
I know the answer could be that it is impossible. But I believe in this community.
Thanks in advance,
EDIT :
I have found this post where it shows how to list dates within a range.
IBM DB2: Generate list of dates between two dates
I tried to generate a list of dates based on periodicity and it worked.
I am still struggling on the generation of multiple sequences based on multiple periodicity.
Here's the code I have so far:
SELECT d.min + num.n DAYS AS DATES
FROM
(VALUES(DATE('2017-01-01'), DATE('2017-03-01'))) AS d(min, max)
JOIN
(
-- Creates a table of numbers based on periodicity
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
-- I just need to replace the 2nd argument by the desired frequency */
WHERE MOD(n1.n+n10.n+n100.n, 6 )=0
ORDER BY n1.n + n10.n + n100.n
) num
ON
d.min + num.n DAYS<= d.max
ORDER BY num.n
In other words, I need the dates in table d to be dynamic as well as the periodicity (6) in num's table WHERE clause.
Should I be using a WITH statement? If so, can someone please guide me because I am not very used to this kind of statement.
EDIT#2:
Here is the table structure I'm working with:
TABLE NAME: SGTRCDP (Programmed Tasks):
| | Start | Start | Freq.
Asset | Task | Year | Week | (week)
--------------|------------|----------|----------|----------
TMPC531 | VER0560 | 2011 | 10 | 26
BAT0404 | IPNET030 | 2011 | 2 | 4
B-EXTINCT-151 | 001H-0011 | 2014 | 15 | 17
[...] | [...] | [...] | [...] | [...]
4000 more like these, the unique key being combination of `Asset` and `Task` fields.
What I would like to have is this:
Asset | Task | Year | Week
--------------|------------|----------|----------
TMPC531 | VER0560 | 2011 | 10
TMPC531 | VER0560 | 2011 | 36
TMPC531 | VER0560 | 2012 | 10
TMPC531 | VER0560 | 2012 | 36
TMPC531 | VER0560 | 2013 | 10
TMPC531 | VER0560 | 2013 | 36
TMPC531 | VER0560 | 2014 | 10
TMPC531 | VER0560 | 2014 | 36
TMPC531 | VER0560 | 2015 | 10
TMPC531 | VER0560 | 2015 | 36
TMPC531 | VER0560 | 2016 | 10
TMPC531 | VER0560 | 2016 | 36
TMPC531 | VER0560 | 2017 | 10
TMPC531 | VER0560 | 2017 | 36
BAT0404 | IPNET030 | 2011 | 2
BAT0404 | IPNET030 | 2011 | 6
BAT0404 | IPNET030 | 2011 | 10
BAT0404 | IPNET030 | 2011 | 14
BAT0404 | IPNET030 | 2011 | 18
BAT0404 | IPNET030 | 2011 | 22
BAT0404 | IPNET030 | 2011 | 26
BAT0404 | IPNET030 | 2011 | 30
BAT0404 | IPNET030 | 2011 | 34
BAT0404 | IPNET030 | 2011 | 38
[...] | [...] | [...] | [...]
BAT0404 | IPNET030 | 2017 | 34
BAT0404 | IPNET030 | 2017 | 38
B-EXTINCT-151 | 001H-0011 | 2014 | 15
B-EXTINCT-151 | 001H-0011 | 2014 | 32
B-EXTINCT-151 | 001H-0011 | 2014 | 49
B-EXTINCT-151 | 001H-0011 | 2015 | 14
B-EXTINCT-151 | 001H-0011 | 2015 | 31
[...] | [...] | [...] | [...]
B-EXTINCT-151 | 001H-0011 | 2017 | 8
B-EXTINCT-151 | 001H-0011 | 2017 | 24
I was able to make it using CTE, but it generates so many records that whenever I want to filter or order data, it takes forever. Same goes for downloading the whole resultset.
And I wouldn't risk creating a temporary table and bust the disk space.
Another caveat of CTE, is that It cannot be referenced as a subquery.
And guess what, my plan was to use it as a subquery in FROM clause of a SELECT joining it with the actual work orders table and do Asset-Task-Year-Week matching to see if the programmed tasks were executed as planned or not.
Anyway, here is the CTE I used to get it:
WITH PPM (EQ, TASK, FREQ, OCCYR, OCCWK, OCCDAT, NXTDAT) AS
(
SELECT
TRCD.DLACCD EQ,
TRCD.DLJ1CD TASK,
INT(SUBSTR(TRCD.DLL1TX,9,3)) FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (INT(SUBSTR(TRCD.DLL1TX,9,3)) * 7) DAYS) NXTDAT
FROM
(SELECT * FROM SGTRCDP WHERE DLIMST<>'H' AND TRIM(DLK5Cd)='S') TRCD
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=SUBSTR(TRCD.DLL1TX,1,4) AND AOAQNB=INT(SUBSTR(TRCD.DLL1TX,12,2))
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
UNION ALL
SELECT
PPMNXT.EQ,
PPMNXT.TASK,
PPMNXT.FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) NXTDAT
FROM
PPM
PPMNXT
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=YEAR(PPMNXT.NXTDAT) AND AOAQNB=WEEK_ISO(PPMNXT.NXTDAT)
WHERE
YEAR(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) <= YEAR(CURRENT_DATE)
)
SELECT EQ, TASK, OCCYR, OCCWK, OCCDAT FROM PPM
That was the best I could do.
You will notice that I set a root to a specific Asset and Task:
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
Normally I would not filter data in order to retrieve all the scheduled weeks for each tasks. I had to filter on one root key to avoid the query to eventually eat up resources make our AS/400 crash.
Again, I am not an expert in CTEs, there might be a better solution.
Thanks

SQL-design issue - Ordering worker for work teams on a weekly basis

I'm making a web solution using ASP.net MVC6 and Azure SQL-db.
My goal is to make an order system for ordering work teams on a weekly basis and it must be possible to display the work order 6 weeks ahead from todays date. Each work order is connected to a project. A manager should be able to choose a project and then start ordering different kinds of workers (disiplin), assign his need for manpower for each disiplin for 6 weeks ahead. A disiplin can be carpenter, painter, bricklayer etc.
Each project can have any number of disiplin assigned so it's not possible to hard code this into the table structur. You can hardcode the week either as week 2 in 2016 is different from week 2 in 2017
A workorder can look like this:
Project A
Disiplin | Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6
Carpenter | 4 | 3 | 0 | 0 | 3 | 0
Painter | 0 | 0 | 2 | 3 | 3 | 3
Next project can look like this:
Project B
Disiplin | Week 44 | Week 45 | Week 46 | Week 47 | Week 48 | Week 49
Carpenter | 4 | 3 | 0 | 0 | 3 | 0
Painter | 0 | 0 | 2 | 3 | 3 | 3
Bricklayer| 1 | 2 | 1 | 5 | 3 | 0
Carpentry | 4 | 3 | 0 | 0 | 3 | 0
As you see the week number and number of disiplin may vary from project to project. I can't seem to wrap my head around how to design the SQL-tables to efficently store these values.
Can anyone review this issue and point me in the right direction? Thanks.
EDIT:
The problem is really not to store data but how to query for them. You never know for how many weeks each disiplin has registered data and you don't know how many disiplins registered on each project. In addition for week 2 you may have registered the manpower-needs for Carpenters but not for Painters. I could make a query for each disipline, but I would preferably have one query to get the complete grid.

Old Excel Data - Merge With SQL Transaction Log

So I have two years of total data from 2013 and 2014 from excel sheets. They're summed up and put onto my server. For 2015 we have a log with dates and each individual transaction. This makes it convenient to just add it all up for 2015. I want to merge the two: sum up the 2015s and then add that as its own column for the master sheet.
What I have now:
Company | 2013 | 2014 |
----------------------
Apple | 300 | 200 |
Toyota | 250 | 250 |
2015:
Date | Company | Units
-------------------------
1/1/15 | Apple | 30
2/28/15 | Toyota | 14
3/14/15 | Toyota | 22
Ideal Look:
Company | 2013 | 2014 | 2015
-----------------------------
Apple | 300 | 200 | 300
Toyota | 250 | 250 | 400
Without knowing your DBMS, I'm taking a shot in the dark, but I think you can handle this all directly within SQL:
with data_2015 as (
select company, sum (units) as units
from table_2015_data
group by company
)
select
h.Company, h.Units_2013, h.Units_2014, d.units as Units_2015
from
history h
left join data_2015 d on
h.Company = d.Company
From there, just bring it into Excel using MSQuery.

Get latest child record without given order

Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id. The Column checks.old describes a type of check. Now I want to get for each migration the check with the biggest time where old is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id) instead of max(time).)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
A simple solution with the Postgres specific DISTINCT ON:
Query 1 ("for each migration the check with the biggest time where old is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE condition for Query 2:
...
WHERE NOT old
...
Details:
Select first row in each GROUP BY group?
But if you want better read performance with big tables, use JOIN LATERAL (Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Optimize GROUP BY query to retrieve latest record per user
SQL Fiddle.
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.