Old Excel Data - Merge With SQL Transaction Log - sql

So I have two years of total data from 2013 and 2014 from excel sheets. They're summed up and put onto my server. For 2015 we have a log with dates and each individual transaction. This makes it convenient to just add it all up for 2015. I want to merge the two: sum up the 2015s and then add that as its own column for the master sheet.
What I have now:
Company | 2013 | 2014 |
----------------------
Apple | 300 | 200 |
Toyota | 250 | 250 |
2015:
Date | Company | Units
-------------------------
1/1/15 | Apple | 30
2/28/15 | Toyota | 14
3/14/15 | Toyota | 22
Ideal Look:
Company | 2013 | 2014 | 2015
-----------------------------
Apple | 300 | 200 | 300
Toyota | 250 | 250 | 400

Without knowing your DBMS, I'm taking a shot in the dark, but I think you can handle this all directly within SQL:
with data_2015 as (
select company, sum (units) as units
from table_2015_data
group by company
)
select
h.Company, h.Units_2013, h.Units_2014, d.units as Units_2015
from
history h
left join data_2015 d on
h.Company = d.Company
From there, just bring it into Excel using MSQuery.

Related

Take sum of a concatenated column SQL

I want to use post and pre revenue of an interaction to calculate net revenue. Sometimes there are multiple customers in an interaction. The data is like:
InteractionID | Customer ID | Pre | Post
--------------+-------------+--------+--------
1 | ab12 | 10 | 30
2 | cd12 | 40 | 15
3 | de12;gh12 | 15;30 | 20;10
Expected output is to take sum in pre and post call to calculate net
InteractionID | Customer ID | Pre | Post | Net
--------------+---------------+--------+-------+------
1 | ab12 | 10 | 30 | 20
2 | cd12 | 40 | 15 | -25
3 | de12;gh12 | 45 | 30 | -15
How do I get the net revenue column?
The proper solution is to normalize your relational design by adding a separate table for customers and their respective pre and post.
While stuck with the current design, this would do it:
SELECT *, post - pre AS net
FROM (
SELECT interaction_id, customer_id
,(SELECT sum(x::numeric) FROM string_to_table(pre, ';') x) AS pre
,(SELECT sum(x::numeric) FROM string_to_table(post, ';') x) AS post
FROM tbl
) sub;
db<>fiddle here
string_to_table() requires at least Postgres 14.
You did not declare your Postgres version, so I assume the current version Postgres 14.
For older versions replace with regexp_split_to_table() or unnest(string_to array)).

How can I get a record to be counted in multiple columns of a Crosstab Query?

Background information:
My company requires employees to maintain at least one certification (cert) on a position. There are a total of 17 different certifications that an employee can get.
An employee can hold multiple certs. But on any one day they can only "sit" one of the positions that they are certified in. Most employees primarily sit the highest level position that they hold a cert in, but can sit a lower level position if there are manning shortages in that position and if they hold that particular cert (some employees come to us holding the higher level certs but none of the lower ones because they let them expire).
Multiple employees can hold the same cert.
Around 90% of employees are on contract, meaning they have a set termination date. Contracts can be extended but for the sake of this Access database, and the report to be generated, we're presuming that the termination date is set in stone.
My boss (and boss' boss) are wanting to put together a manning projection report so that they don't get caught off guard should we start running low on employees certified in any one position.
Example of what they want:
Lets say you have three employees:
Employee1 has certs in position1, position2, and position3 but he primarily sits as position3 and his contract expires June 2020.
Employee2 has certs in position1 and position2 but primarily sits as position2 and her contract expires in February 2022.
Employee3 is new and arrived August 2019 and is in training to get position1, maximum allowed training time for initial cert is 3 months, so presumably he should have his position1 cert by December 2019 and his contract expires August 2025.
Lets say my boss wants to project out 12 months with the starting month being November 2019 (he'll only be able to select a starting month-year that is equal to or later than the current month-year). The charts below, which are generated in subreports, should be what gets generated off of the above employee information.
All Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |
| Position2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Primary Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Now I already have a solution in place but it's extremely inefficient and involves a query for each cell (2 Charts X 12 Months X 17 positions = 408 Queries when a report is generated). I'm hoping to do something more efficient with a crosstab query.
The tables are set up as such (only listing relevant fields):
Emp_table
ID (autoNum)
contractStarted (Date)
contractEnd (Date)
Cert_individual
ID (autoNum)
certID (num, many->one relationship to cert_table.ID)
EmpID (num, many->one relationship to Emp_table.ID)
date_cert_received (date)
primary (yes/no)
cert_table
ID (autoNum)
cert_name (short text)
Obviously I'd need to do a couple of INNER JOINS in order to get everything together and I tried using the format from this website for my crosstab query but it would only add an individual cert to a count on the month-year that the employee received it and not to every month that the employee will hold the cert.
So my question is:
Is there a way in SQL or VBA to get a cert counted across multiple columns (month-years) based off of when the employee received the cert and when their contract is scheduled to terminate?
As far as I know, the main problem in getting the crosstab query is that it can only generate columns with data that you already have.
A solution for you to get the monthly columns would be to have a side table with the 12 dates and then use the Cartesian product to generate the monthly data for each of your records in your certification table. This "date" table can be updated and maintained to match the months that you require in your report with a query.
For example, if you have a table named TempDates :
And a table with Employees with the following data :
You can generate the cartesian product with a query that I named QryCertsDates :
SELECT Employees.*, TempDates.* FROM TempDates, Employees;
Which lets you attach all the wanted dates with your original date from the table Employees in order to obtain data similar to below :
Now you can generate your crosstab query pivoting on the month and year and filtering the dates with the WHERE criteria such as :
TRANSFORM Count(QryCertsDates.Cert) AS CountOfCert SELECT QryCertsDates.Cert FROM QryCertsDates WHERE (((CDate([Yr] & "-" & [Mo])) Between CDate([Start]) And CDate([Expire]))) GROUP BY QryCertsDates.Cert PIVOT CDate([Yr] & "-" & Format([Mo],"00"));
You will end up ultimately with something like this :
You can do the same thing to get your second table/report as well. I don't know your database structure, so you will most likely need to do some adaptation. The other possible way that you can achieve a similar result would be to fill in a table using VBA.
However, this might be the easier solution to implement. Good luck!

IBM i5OS SQL: Forecasting periodic events

I have a situation where I have about 4000 tasks that all have different periodic rules for occurences.
They are preventive maintenance tasks. The table I get them from only provides me the start date and frequency of occurence in units of weeks.
Example:
Task (A) is scheduled to occur every two weeks, starting on week 1 of 2015.
Task (B) is scheduled to occur every 6 weeks, starting on week 2 of 2011.
And so on...
What I need to do is produce a resultset that contains a record for each occurence since the start point, for each task.
It's like generating a sequence.
Example:
Task | Year | Week
------|-------|-------
A | 2015 | 1
A | 2015 | 3
A | 2015 | 5
A | 2015 | 7
[...]
B | 2011 | 2
B | 2011 | 8
And so on...
You probably think "hey, that is simple, just put it in a loop then your good."
Not so fast!
The trick is that I need this to be within one SQL query.
I know I probably should be doing it in a stored procedure or a function. But I can't, for now. I could also do it in some VbA code since it will go in an Excel spreadsheet. But Excel has become an unstable product lately and I do not want to risk my code to fail after an update from Microsoft. So I try as much as possible to stay within the limits of IBM i5OS SQL queries.
I know the answer could be that it is impossible. But I believe in this community.
Thanks in advance,
EDIT :
I have found this post where it shows how to list dates within a range.
IBM DB2: Generate list of dates between two dates
I tried to generate a list of dates based on periodicity and it worked.
I am still struggling on the generation of multiple sequences based on multiple periodicity.
Here's the code I have so far:
SELECT d.min + num.n DAYS AS DATES
FROM
(VALUES(DATE('2017-01-01'), DATE('2017-03-01'))) AS d(min, max)
JOIN
(
-- Creates a table of numbers based on periodicity
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
-- I just need to replace the 2nd argument by the desired frequency */
WHERE MOD(n1.n+n10.n+n100.n, 6 )=0
ORDER BY n1.n + n10.n + n100.n
) num
ON
d.min + num.n DAYS<= d.max
ORDER BY num.n
In other words, I need the dates in table d to be dynamic as well as the periodicity (6) in num's table WHERE clause.
Should I be using a WITH statement? If so, can someone please guide me because I am not very used to this kind of statement.
EDIT#2:
Here is the table structure I'm working with:
TABLE NAME: SGTRCDP (Programmed Tasks):
| | Start | Start | Freq.
Asset | Task | Year | Week | (week)
--------------|------------|----------|----------|----------
TMPC531 | VER0560 | 2011 | 10 | 26
BAT0404 | IPNET030 | 2011 | 2 | 4
B-EXTINCT-151 | 001H-0011 | 2014 | 15 | 17
[...] | [...] | [...] | [...] | [...]
4000 more like these, the unique key being combination of `Asset` and `Task` fields.
What I would like to have is this:
Asset | Task | Year | Week
--------------|------------|----------|----------
TMPC531 | VER0560 | 2011 | 10
TMPC531 | VER0560 | 2011 | 36
TMPC531 | VER0560 | 2012 | 10
TMPC531 | VER0560 | 2012 | 36
TMPC531 | VER0560 | 2013 | 10
TMPC531 | VER0560 | 2013 | 36
TMPC531 | VER0560 | 2014 | 10
TMPC531 | VER0560 | 2014 | 36
TMPC531 | VER0560 | 2015 | 10
TMPC531 | VER0560 | 2015 | 36
TMPC531 | VER0560 | 2016 | 10
TMPC531 | VER0560 | 2016 | 36
TMPC531 | VER0560 | 2017 | 10
TMPC531 | VER0560 | 2017 | 36
BAT0404 | IPNET030 | 2011 | 2
BAT0404 | IPNET030 | 2011 | 6
BAT0404 | IPNET030 | 2011 | 10
BAT0404 | IPNET030 | 2011 | 14
BAT0404 | IPNET030 | 2011 | 18
BAT0404 | IPNET030 | 2011 | 22
BAT0404 | IPNET030 | 2011 | 26
BAT0404 | IPNET030 | 2011 | 30
BAT0404 | IPNET030 | 2011 | 34
BAT0404 | IPNET030 | 2011 | 38
[...] | [...] | [...] | [...]
BAT0404 | IPNET030 | 2017 | 34
BAT0404 | IPNET030 | 2017 | 38
B-EXTINCT-151 | 001H-0011 | 2014 | 15
B-EXTINCT-151 | 001H-0011 | 2014 | 32
B-EXTINCT-151 | 001H-0011 | 2014 | 49
B-EXTINCT-151 | 001H-0011 | 2015 | 14
B-EXTINCT-151 | 001H-0011 | 2015 | 31
[...] | [...] | [...] | [...]
B-EXTINCT-151 | 001H-0011 | 2017 | 8
B-EXTINCT-151 | 001H-0011 | 2017 | 24
I was able to make it using CTE, but it generates so many records that whenever I want to filter or order data, it takes forever. Same goes for downloading the whole resultset.
And I wouldn't risk creating a temporary table and bust the disk space.
Another caveat of CTE, is that It cannot be referenced as a subquery.
And guess what, my plan was to use it as a subquery in FROM clause of a SELECT joining it with the actual work orders table and do Asset-Task-Year-Week matching to see if the programmed tasks were executed as planned or not.
Anyway, here is the CTE I used to get it:
WITH PPM (EQ, TASK, FREQ, OCCYR, OCCWK, OCCDAT, NXTDAT) AS
(
SELECT
TRCD.DLACCD EQ,
TRCD.DLJ1CD TASK,
INT(SUBSTR(TRCD.DLL1TX,9,3)) FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (INT(SUBSTR(TRCD.DLL1TX,9,3)) * 7) DAYS) NXTDAT
FROM
(SELECT * FROM SGTRCDP WHERE DLIMST<>'H' AND TRIM(DLK5Cd)='S') TRCD
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=SUBSTR(TRCD.DLL1TX,1,4) AND AOAQNB=INT(SUBSTR(TRCD.DLL1TX,12,2))
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
UNION ALL
SELECT
PPMNXT.EQ,
PPMNXT.TASK,
PPMNXT.FREQ,
AOAGNB OCCYR,
AOAQNB OCCWK,
CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END OCCDAT,
(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) NXTDAT
FROM
PPM
PPMNXT
JOIN
(
SELECT
AOAGNB,
AOAQNB,
min(AOADDT) aoaddt
FROM SGCALDP
GROUP BY AOAGNB, AOAQNB
) CLND
ON AOAGNB=YEAR(PPMNXT.NXTDAT) AND AOAQNB=WEEK_ISO(PPMNXT.NXTDAT)
WHERE
YEAR(CASE
WHEN aoaddt/1000000 >= 1 THEN
DATE('20'||substr(aoaddt,2,2)||'-'||substr(aoaddt,4,2)||'-'||substr(aoaddt,6,2))
ELSE
DATE('19'||substr(aoaddt,1,2)||'-'||substr(aoaddt,3,2)||'-'||substr(aoaddt,5,2))
END + (PPMNXT.FREQ * 7) DAYS) <= YEAR(CURRENT_DATE)
)
SELECT EQ, TASK, OCCYR, OCCWK, OCCDAT FROM PPM
That was the best I could do.
You will notice that I set a root to a specific Asset and Task:
WHERE DLACCD='CON0539' AND DLJ1CD='CON0539-04'
Normally I would not filter data in order to retrieve all the scheduled weeks for each tasks. I had to filter on one root key to avoid the query to eventually eat up resources make our AS/400 crash.
Again, I am not an expert in CTEs, there might be a better solution.
Thanks

SQL Group by Client Location

Sample of Data I am trying to manipulate
Order | OrderDate | ClientName| ClientAddress | City | State| Zip |
-------|-----------|-----------|---------------|--------|------|-------|
CO101 | 1/5/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0102 | 2/6/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0103 | 1/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0104 | 3/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0105 | 5/7/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0106 | 1/8/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0107 | 7/9/2015 | Client XYZ| 51 Testing Rd | Austin | TX | 73301 |
I have a database setup in MS-SQL Server with all client orders for the past two year period. Some clients only have one location, others have multiple locations. I would like to write a script that will show me the number of orders a customer placed by location over the total number of weeks there was at least one order.
Based on the results of this script, I would like to be able to deduce every customer location's summary of unique orders (placed at various times). For example:
Client ABC has placed 45 orders over 35 total weeks at location A
Client ABC has placed 35 orders over 15 total weeks at location B
Client ABC has placed 15 orders over 15 total weeks at location C
I would like see this information for each unique location for each client. I am not sure how to aggregate the data in such a way. Here is where I am at with my script:
SELECT t1.ClientName, (SELECT DISTINCT t2.ClientAddress), COUNT(DISTINCT t2.Orders) AS TotalOrders,
DATEPART(week, t1.OrderDate) AS Week
FROM database t1
INNER JOIN database t2 on t1.Orders = t2.Orders
GROUP BY DATEPART(week, t1.OrderDate), t1.ClientAddress, t2.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
The results that I get show me the unique orders by location by week, but I'm not sure how to count the number of weeks in the way that I need; I have tried writing subqueries but I keep running into issues. I realize that in this script I am showing number of order by location by each individual week, I would like to count the total number of weeks within the time frame of where there is at least one order.
The results structure is as followed:
| ClientName| ClientAddress | TotalOrders | Week |
|-----------|---------------|--------------|------|
|Client ABC |101 Park Drive | 30 | 21 |
|Client ABC |101 Park Drive | 29 | 13 |
|Client ABC |101 Park Drive | 28 | 10 |
|Client XYZ |1 Binary Road | 27 | 19 |
|Client XYZ |1 Binary Road | 25 | 7 |
|Client XYZ |51 Testing Rd | 22 | 9 |
Any and all help would be greatly appreciated; thank you in advance.
Isn't this what you want?
SELECT t1.ClientName, ClientAddress, COUNT(DISTINCT t1.Orders) AS TotalOrders,
COUNT(DISTINCT DATEPART(week, t1.OrderDate)) AS Weeks
FROM database t1
GROUP BY t1.ClientName, t1.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
I don't really follow why you're doing a self-join. Seems useless to me, but I left it in, just in case, and to focus only on the change I made to get your result.

Convert rows to columns for a Report

Using instructions found here I've tried to create a crosstab query to show historical data from three previous years and I would like to output it in a report.
I've got a few complications that are making this difficult and I'm having trouble getting the data to show correctly.
The query it is based on is structured like this:
EmpID | ReviewYearID | YearName | ReviewDate | SelfRating | ManagerRating | NotSelfRating |
1 | 5 | 2013 | 01/09/2013 | 3.5 | 3.5 | 3.5 |
1 | 6 | 2014 | 01/09/2014 | 2.5 | 2.5 | 2.5 |
1 | 7 | 2015 | 01/09/2015 | 4.5 | 4.5 | 4.5 |
2 | 6 | 2014 | 01/09/2014 | 2.0 | 2.0 | 2.0 |
2 | 7 | 2015 | 01/09/2015 | 2.0 | 2.0 | 2.0 |
3 | 7 | 2015 | 01/09/2015 | 5.0 | 5.0 | 5.0 |
[Edit]: Here is the SQL for the base query. It is combining data from two tables:
SELECT tblEmployeeYear.EmployeeID AS EmpID, tblReviewYear.ID AS ReviewYearID, tblReviewYear.YearName, tblReviewYear.ReviewDate, tblEmployeeYear.SelfRating, tblEmployeeYear.ManagerRating, tblEmployeeYear.NotSelfRating
FROM tblReviewYear INNER JOIN tblEmployeeYear ON tblReviewYear.ID = tblEmployeeYear.ReviewYearID;
[/Edit]
I would like a crosstab query that transposes the columns/rows to show historical data for up to 3 previous years (based on review date) for a specific employee. The end result would look something like this for Employee ID 1:
Year | 2015 | 2014 | 2013 |
SelfRating | 4.5 | 2.5 | 3.5 |
ManagerRating | 4.5 | 2.5 | 3.5 |
NotSelfRating | 4.5 | 2.5 | 3.5 |
Other employees would have less columns since they don't have data for previous years.
I'm having issues with filtering it down to a specific employee and sorting the years by their review date (the name isn't always a reliable way to sort them).
In the end I'm looking to use this as the data for a report.
If there is a different way than a crosstab query to accomplish this I would be okay with that as well.
Thanks!
You need a column for all the rating types, not an individual column for each type. If you can't redesign the table, I would suggest creating a new one for your purposes. The below uses a union to add in that type column referred to above. You create a column and hardcode the value (SelfRating, ManagerRating, etc):
SELECT * INTO EmployeeRatings
FROM (SELECT tblEmployeeYear.EmployeeId AS EmpId, ReviewYearId, "SelfRating" AS Category, SelfRating AS Score
FROM tblEmployeeYear
WHERE SelfRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "ManagerRating", ManagerRating
FROM tblEmployeeYear
WHERE ManagerRating Is Not Null
UNION ALL
SELECT tblEmployeeYear.EmployeeId, ReviewYearId, "NotSelfRating", NotSelfRating
FROM tblEmployeeYear
WHERE NotSelfRating Is Not Null)
Then use the newly created table in place of tblEmployeeYear. Note that I use Year([ReviewDate]) which will return only the year. Also, since it looks like it may be possible to have more than one of each review type per year, I averaged the Score for the year.
TRANSFORM Avg(Score)
SELECT EmpId, Category
FROM (SELECT EmpId, Category, ReviewDate, Score
FROM tblReviewYear
INNER JOIN EmployeeRatings
ON tblReviewYear.ID = EmployeeRatings.ReviewYearID) AS Reviews
GROUP BY EmpId, Category
PIVOT Year([ReviewDate]);