Structure of a relational database for comparing multiple dates

Structure of a relational database for comparing multiple dates - sql

We have a Microsoft Access Database at work to track an ongoing list of customers. Each customer has to sign a contract with several departments - totally 13 (!) departments - for which we want to keep track about the current progress for each customer when a contract is sent and received. This structure looks similar to something like this:
Table 1
-------------------------------------------------------------------------------------------------------------------
CUSTOMER_ID | ... | DEP_A_SENT | DEP_A_RECEIVED | DEP_B_SENT | DEP_B_RECEIVED | DEP_C_SENT | DEP_C_RECEIVED | ... |
-------------------------------------------------------------------------------------------------------------------
1 | ... | 2015-05-01 | 2015-05-03 | 2015-05-04 | 2015-05-09 | 2015-05-01 | 2015-05-05 | ... |
2 | ... | 2015-05-01 | 2015-05-05 | 2015-05-01 | 2015-05-03 | 2015-05-13 | --- | ... |
...
I want to be able to calculate the timespan between DEP_X_SENT with DEP_X_RECEIVED for customer and department (such as "department A: 2 days, department B: 5 days..." for customer ID 1)
More importantly, I want to compare all the DEP_X_RECEIVED dates with each other for one customer: Determining the first (MIN) and the last (MAX) date a contract has been received to finding how many days it takes for each customer until all contracts are received. (such as "the contracts were received within 6 days" for customer ID 1, because the first was received on May 3rd. and the last on May 9th). Furthermore, I want to calculate the average timespan this took for all customers. If the contract is not received yet, the is no value in that field.
In MySQL I can work with functions such GREATEST and LEAST to compare values between different columns, but in Access I have to rely for now on VBA and I think it is considered bad practice. How can I normalize and restructure my table for archieving my goals with rather simple MAX, MIN and AVGoperations? Many thanks!

Simply fold your existing table into this structure:
create table TABLE_1 (
CUSTOMER_ID int,
DEPARTMENT_ID int, -- foreign key reference to DEPARTMENT table
SENT date,
RECEIVED date
);
Now you can perform the required analysis simply, and retrieve the original layout as either a Pivot report or LEFT OUTER JOIN from the DEPARTMENT table to the new TABLE_1.

Related

Auto generate columns in Microsoft Access table

How can we auto generate column/fields in microsoft access table ?
Scenario......
I have a table with personal details of my employee (EmployDetails)
I wants to put their everyday attendance in an another table.
Rather using separate records for everyday, I want to use a single record for an employ..
Eg : I wants to create a table with fields like below
EmployID, 01Jan2020, 02Jan2020, 03Jan2020,.........25May2020 and so on.......
It means everyday I have to generate a column automatically...
Can anybody help me ?

Generally you would define columns manually (whether that is through a UI or SQL).
With the information given I think the proper solution is to have two tables.
You have your "EmployDetails" which you would put their general info (name, contact information etc), and the key which would be the employee ID (unique, can be autogenerated or manual, just needs to be unique)
You would have a second table with a foreign key to the empployee ID in "EmployDetails" with a column called Date, and another called details (or whatever you are trying to capture in your date column idea).
Then you simply add rows for each day. Then you do a join query between the tables to look up all the "days" for an employee. This is called normalisation and how relational databases (such as Access) are designed to be used.
Employee Table:
EmpID | NAME | CONTACT
----------------------
1 | Jim | 222-2222
2 | Jan | 555-5555
Detail table:
DetailID | EmpID (foreign key) | Date | Hours_worked | Notes
-------------------------------------------------------------
10231 | 1 | 01Jan2020| 5 | Lazy Jim took off early
10233 | 2 | 02Jan2020| 8 | Jan is a hard worker
10240 | 1 | 02Jan2020| 7.5 | Finally he stays a full day
To find what Jim worked you do a join:
SELECT Employee.EmpID, Employee.Name, Details.Date, Details.Hours_worked, Details.Notes
FROM Employee
JOIN Details ON Employee.EmpID=Details.EmpID;
Of course this will give you a normalised result (which is generally what's wanted so you can iterate over it):
EmpID | NAME | Date | Hours_worked | Notes
-----------------------------------------------
1 | Jim | 01Jan2020 | 5 | ......
1 | Jim | 02Jan2020 | 7 | .......
If you want the results denormalised you'll have to look into pivot tables.
See more on creating foreign keys

Select PERIOD_BEGIN and PERIOD_END dates from historical data containing only timestamp in Oracle SQL

I've run into a bit of a problem. Background: I work as a business controller in a financial institution that offers wealth management services and it falls to me to do internal reporting on euros coming and going. As this is one of the KPIs used to evaluate managers' performance, I need to be able to report these numbers per manager. This bit is straightforward as each customer has a manager assigned to it. Now here's the fun thing - some questionable DW design choices were made in the past and the table containing the manager/customer relationship lacks all the relevant temporal information such as 'valid from' or 'valid until'. Basically it just stores the current state. Occasionally customers and portfolios get reassigned to other managers and this causes all the transaction done during the old manager's reign to show up as belonging to the new manager.
E.g. manager Joe is managing a customer called Blammo Ltd between January and March and the customer subscribes funds with 10 million $. Joe leaves the company and the customer gets assigned to manager Helen. During April the customer withdraws 5 mil. When I compile my reports at the end of April, Joe's KPI reads +-0 while Helen's shows +5 million while in truth it should tell Joe made 10 million and Helen lost 5.
We do have an audit table that contains all the rows from the table containing the manager/customer relationships and each row has a timestamp when it was created. What I hope to achieve is to build a view that uses these timestamps to build a table that has a VALID_FROM and VALID_UNTIL dates so I can easily assign transactions to specific managers by joining the transaction between the VALID dates.
So basically what I have is...
CUSTOMERID MANAGERID TIMESTAMP
------------ ----------- ------------
1 A 01-01-2018
1 B 28-02-2018
1 A 31-05-2018
1 C 31-08-2018
And what I need is...
CUSTOMERID MANAGERID VALID_FROM VALID_UNTIL
------------ ----------- ------------ -------------
1 A 01-01-2018 28-02-2018
1 B 28-02-2018 31-05-2018
1 A 31-05-2018 31-08-2018
1 C 31-08-2018
What I've tried is
SELECT
CUSTOMERID,
MANAGERID,
MIN(TIMESTAMP) AS VALID_FROM,
MAX(TIMESTAMP) AS VALID_UNTIL
FROM CUSMAN.CUS_MAN_AUDIT
GROUP BY
CUSTOMERID,
MANAGERID
and this would work in a case where customers are never reassigned back to a previous manager. However due to maternal leaves etc. the customers get assigned back and forth between managers so the solution above won't produce correct result - joining a transaction made by customer '1' on '30-04-2018' to the customer/manager relationship data would produce two results - both managers A and B. Below is the table the query above would produce.
CUSTOMERID MANAGERID VALID_FROM VALID_UNTIL
------------ ----------- -------------- -------------
1 A 01-01-2018 31-08-2018
1 B 28-02-2018 31-05-2018
1 C 31-08-2018
It feels like there's a simple way to do this but I'm stumped. Any ideas?
EDIT
Bloody 'ell, I forgot to mention that the table CUS_MAN_AUDIT also contains plenty of other columns, such as customer name, legal form etc and now Caius's answer returns a result set shown below (CUSTOMERNAME included for sake of clarity, not in actual result set)
+------------+-----------+------------+-------------+--------------+
| CUSTOMERID | MANAGERID | VALID_FROM | VALID_UNTIL | CUSTOMERNAME |
+------------+-----------+------------+-------------+--------------+
| 1 | A | 01-01-2018 | 02-01-2018 | Blam-O Litnd |
| 1 | A | 02-01-2018 | 15-01-2018 | Blamo Litd |
| 1 | A | 15-01-2018 | 28-02-2018 | Blammo Ltd |
+------------+-----------+------------+-------------+--------------+
while it should (or at least what I'd like it to)
+------------+-----------+------------+-------------+
| CUSTOMERID | MANAGERID | VALID_FROM | VALID_UNTIL |
+------------+-----------+------------+-------------+
| 1 | A | 01-01-2018 | 28-02-2018 |
+------------+-----------+------------+-------------+
And I can't remember how I formatted my tables in the original post, sorry...

You can do it with a window function that gets the LEAD (next) value of the date, per customer, ordered by the timestamp
SELECT
CUSTOMERID,
MANAGERID,
TIMESTAMP AS VALID_FROM,
LEAD(TIMESTAMP) OVER(PARTITION BY CUSTOMER ORDER BY TIMESTAMP) as VALID_TIL
FROM CUSMAN.CUS_MAN_AUDIT
If it aids your understanding it's functionally similar to this:
SELECT
CUSTOMERID,
MANAGERID,
cur.TIMESTAMP AS VALID_FROM,
MIN(nxt.TiMESTAMP) as VALID_TIL
FROM
CUSMAN.CUS_MAN_AUDIT cur
LEFT OUTER JOIN
CUSMAN.CUS_MAN_AUDIT nxt
ON
cur.CUSTOMERID = nxt.CUSTOMERID AND
cur.TIMESTAMP < nxt.TIMESTAMP
GROUP BY
CUSTOMERID,
MANAGERID,
cur.TIMESTAMP
Joining the table back to itself on the same customer but where each cur record is associated with every record that has a later date (nxt) and then getting the MIN of the later dates..

Structuring Month-Based Data in SQL

I'm curious about what the best way to structure data in a SQL database where I need to keep track of certain fields and how they differ month to month.
For example, if I had a users table in which I was trying to store 3 different values: name, email, and how many times they've logged in each month. Would it be best practice to create a new column for each month and store the number of times they logged in that month under that column? Or would it be better to create a new row/table for each month?
My instinct says creating new columns is the best way to reduce redundancy, however I can see it getting a little unwieldy when the number of columns in the table changes over time. (I was also thinking that if I were to do it by column, it would warrant having a total_column that keeps track of all months at a time).
Thanks!

In my opinion, the best approach is to store each login for each user.
Use a query to summarize the data the way you need it when you query it.
You should only be thinking about other structures if summarizing the detail doesn't meet performance requirements -- which for a monthly report don't seem so onerous.
Whatever you do, storing counts in separate columns is not the right thing to do. Every month, you would need to add another column to the table.

I'm not an expert but in my opinion, it is best to store data in a separate table (in your case). That way you can manipulate the data easily and you don't have to modify the table design in the future.
PK: UserID & Date or New Column (Ex: RowNo with auto increment)
+--------+------------+-----------+
| UserID | Date | NoOfTimes |
+--------+------------+-----------+
| 01 | 2018.01.01 | 1 |
| 01 | 2018.01.02 | 3 |
| 01 | 2018.01.03 | 5 |
| .. | | |
| 02 | 2018.01.01 | 2 |
| 02 | 2018.01.02 | 6 |
+--------+------------+-----------+
Or
PK: UserID, Year & Month or New Column (Ex: RowNo with auto increment)
+--------+------+-------+-----------+
| UserID | Year | Month | NoOfTimes |
+--------+------+-------+-----------+
| 01 | 2018 | Jan | 10 |
| 01 | 2018 | feb | 13 |
+--------+------+-------+-----------+
Before you create the table, please take a look at the database normalization. Especially 1st (1NF), 2nd (2NF) and 3rd (3NF) normalization forms.
https://www.tutorialspoint.com/dbms/database_normalization.htm
https://www.lifewire.com/database-normalization-basics-1019735
https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/
https://www.studytonight.com/dbms/database-normalization.php
https://medium.com/omarelgabrys-blog/database-normalization-part-7-ef7225150c7f

Either approach is valid, depending on query patterns and join requirements.
One row for each month
For a user, the row containing login count for the month will be inserted when data is available for the month. There will be 1 row per month per user. This design will make it easier to do joins by month column. However, multiple rows will need to be accessed to get data for a user for the year.
-- column list
name
email
month
login_count
-- example entries
'user1', 'user1#email.com','jan',100
'user2', 'user2#email.com','jan',65
'user1', 'user1#email.com','feb',90
'user2', 'user2#email.com','feb',75
One row for all months
You do not need to dynamically add columns, since number of months is known in advance. The table can be initially created to accommodate all months. By default, all month_login_count columns would be initialized to 0. Then, the row would be updated as the login count for the month is populated. There will be 1 row per user. This design is not the best for doing joins by month. However, only one row will need to be accessed to get data for a user for the year.
-- column list
name
email
jan_login_count
feb_login_count
mar_login_count
apr_login_count
may_login_count
jun_login_count
jul_login_count
aug_login_count
sep_login_count
oct_login_count
nov_login_count
dec_login_count
-- example entries
'user1','user1#email.com',100,90,0,0,0,0,0,0,0,0,0,0
'user2','user2#email.com',65,75,0,0,0,0,0,0,0,0,0,0

SQL payments matrix

I want to combine two tables into one:
The first table: Payments
id | 2010_01 | 2010_02 | 2010_03
1 | 3.000 | 500 | 0
2 | 1.000 | 800 | 0
3 | 200 | 2.000 | 300
4 | 700 | 1.000 | 100
The second table is ID and some date (different for every ID)
id | date |
1 | 2010-02-28 |
2 | 2010-03-01 |
3 | 2010-01-31 |
4 | 2011-02-11 |
What I'm trying to achieve is to create table which contains all payments before the date in ID table to create something like this:
id | date | T_00 | T_01 | T_02
1 | 2010-02-28 | 500 | 3.000 |
2 | 2010-03-01 | 0 | 800 | 1.000
3 | 2010-01-31 | 200 | |
4 | 2010-02-11 | 1.000 | 700 |
Where T_00 means payment in the same month as 'date' value, T_01 payment in previous month and so on.
Is there a way to do this?
EDIT:
I'm trying to achieve this in MS Access.
The problem is that I cannot connect name of the first table's column with the date in the second (the easiest way would be to treat it as variable)
I added T_00 to T_24 columns in the second (ID) table and was trying to UPDATE those fields
set T_00 =
iif(year(date)&"_"&month(date)=2010_10,
but I realized that that would be to much code for access to handle if I wanted to do this for every payment period and every T_xx column.
Even if I would write the code for T_00 I would have to repeat it for next 23 periods.

Your Payments table is de-normalized. Those date columns are repeating groups, meaning you've violated First Normal Form (1NF). It's especially difficult because your field names are actually data. As you've found, repeating groups are a complete pain in the ass when you want to relate the table to something else. This is why 1NF is so important, but knowing that doesn't solve your problem.
You can normalize your data by creating a view that UNIONs your Payments table.
Like so:
CREATE VIEW NormalizedPayments (id, Year, Month, Amount) AS
SELECT id,
2010 AS Year,
1 AS Month,
2010_01 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
2 AS Month,
2010_02 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
3 AS Month,
2010_03 AS Amount
FROM Payments
And so on if you have more. This is how the Payments table should have been designed in the first place.
It may be easier to use a date field with the value '2010-01-01' instead of a Year and Month field. It depends on your data. You may also want to add WHERE Amount IS NOT NULL to each query in the UNION, or you might want to use Nz(2010_01,0.000) AS Amount. Again, it depends on your data and other queries.
It's hard for me to understand how you're joining from here, particularly how the id fields relate because I don't see how they do with the small amount of data provided, so I'll provide some general ideas for what to do next.
Next you can join your second table with this normalized Payments table using a method similar to this or a method similar to this. To actually produce the result you want, include a calculated field in this view with the difference in months. Then, create an actual Pivot Table to format your results (like this or like this) which is the proper way to display data like your tables do.

Best way to join the two tables including duplicates from one table

Accounts (table)
+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
| 1 | 123xyz | Boston | 2 |
| 2 | 245xyz | Chicago | 2 |
| 3 | 425xyz | Chicago | 3 |
+----+----------+----------+-------+
PayOut (table)
+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
| 5 | 245xyz | Chicago | 01-15-2009 | 25 |
| 6 | 123xyz | Boston | 10-15-2011 | 50 |
| 7 | 123xyz | Boston | 10-15-2011 | -50 |
| 8 | 123xyz | Boston | 10-15-2011 | 50 |
| 9 | 425xyz | Chicago | 10-15-2011 | 100 |
+----+----------+----------+-------------+--------+
I have accounts table and I have payout table. Payout table comes from abroad so we do not have any control over it. This leaves us with a problem that we can't join the two tables based on record ID field, that is one problem which we can't solved. We therefore join based on Account#, SupplierID (2nd and 3rd column). This creates a problem that it creates (possibly) many to many relationship. But we filter our records if they are active and we use a second filter on payout table when the payout was created. Payout are created months to month. There are two problems with this in my view
The query takes quite a bit of time to complete (could be inefficient)
There are certain duplicates that are removed which should not be removed. Example is record 6 and 8 in payout table. What happened here is, we got a customer, then the customer cancelled then he got him back. In this case +50, -50 and +50. Again all values are valid and must show in the report for audit purposes. Currently only one +50 is shown, the other is lost. There are a couple of other problems within the report that comes once in a while.
Here is the query. It uses groups by to remove duplicates. I would like to have an advance query which outperforms and which does takes into account that no record in PayOut table is duplicated as long as they come up in the month of the report.
Here is our current query
/* Supplied to Store Procedure */
-----------------------------------
#RepID // the person for whome payout is calculated
#Month // of payment date
#year // year of payment date
-----------------------------------
select distinct
A.col1,
A.col2,
...
A.col10,
B.col2,
B.Col2,
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = #Month /* parameter to stored procedure */
and datepart(yyyy, B.datecreated) = #Year
and A.[rep ID] = #RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName
Is this query optimum? Can I improve it using CROSS APPLY or WHERE EXISTs that will make it faster as well as remove the duplicate problem?
Note that this query is used to get payout of a rep. Hence every record has repid field who it is assigned to. Ideally I would like to use Select WHERE Exist query.

It's difficult to understand exactly what you want because in one place you say you 'want' the duplicates but then you say that you are using the group by to remove duplicates. So the first thought would be "Why not just get rid of the group by?". But I have to believe you are smart enough to have thought of that yourself, so I assume it's got to be there for a reason.
I think someone here could help you pretty easily if you could post the actual query, but since you say you can't I will just try to give you some direction in solving the problem...
Instead of trying to do everything in one statement, use temporary tables or views to split it up. It may be easier for you to think about how to get rid of the duplicates you don't want and keep the ones you do first and put those into a temporary table, and then join the tables together and work with that.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas