Best way to join the two tables *including* duplicates from one table - sql

Accounts (table)
+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
| 1 | 123xyz | Boston | 2 |
| 2 | 245xyz | Chicago | 2 |
| 3 | 425xyz | Chicago | 3 |
+----+----------+----------+-------+
PayOut (table)
+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
| 5 | 245xyz | Chicago | 01-15-2009 | 25 |
| 6 | 123xyz | Boston | 10-15-2011 | 50 |
| 7 | 123xyz | Boston | 10-15-2011 | -50 |
| 8 | 123xyz | Boston | 10-15-2011 | 50 |
| 9 | 425xyz | Chicago | 10-15-2011 | 100 |
+----+----------+----------+-------------+--------+
I have accounts table and I have payout table. Payout table comes from abroad so we do not have any control over it. This leaves us with a problem that we can't join the two tables based on record ID field, that is one problem which we can't solved. We therefore join based on Account#, SupplierID (2nd and 3rd column). This creates a problem that it creates (possibly) many to many relationship. But we filter our records if they are active and we use a second filter on payout table when the payout was created. Payout are created months to month. There are two problems with this in my view
The query takes quite a bit of time to complete (could be inefficient)
There are certain duplicates that are removed which should not be removed. Example is record 6 and 8 in payout table. What happened here is, we got a customer, then the customer cancelled then he got him back. In this case +50, -50 and +50. Again all values are valid and must show in the report for audit purposes. Currently only one +50 is shown, the other is lost. There are a couple of other problems within the report that comes once in a while.
Here is the query. It uses groups by to remove duplicates. I would like to have an advance query which outperforms and which does takes into account that no record in PayOut table is duplicated as long as they come up in the month of the report.
Here is our current query
/* Supplied to Store Procedure */
-----------------------------------
#RepID // the person for whome payout is calculated
#Month // of payment date
#year // year of payment date
-----------------------------------
select distinct
A.col1,
A.col2,
...
A.col10,
B.col2,
B.Col2,
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = #Month /* parameter to stored procedure */
and datepart(yyyy, B.datecreated) = #Year
and A.[rep ID] = #RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName
Is this query optimum? Can I improve it using CROSS APPLY or WHERE EXISTs that will make it faster as well as remove the duplicate problem?
Note that this query is used to get payout of a rep. Hence every record has repid field who it is assigned to. Ideally I would like to use Select WHERE Exist query.

It's difficult to understand exactly what you want because in one place you say you 'want' the duplicates but then you say that you are using the group by to remove duplicates. So the first thought would be "Why not just get rid of the group by?". But I have to believe you are smart enough to have thought of that yourself, so I assume it's got to be there for a reason.
I think someone here could help you pretty easily if you could post the actual query, but since you say you can't I will just try to give you some direction in solving the problem...
Instead of trying to do everything in one statement, use temporary tables or views to split it up. It may be easier for you to think about how to get rid of the duplicates you don't want and keep the ones you do first and put those into a temporary table, and then join the tables together and work with that.

Related

How can I trigger an update to a value in a table when criteria is met on a different table?

Aware there is an almost identical question here, but that covers the SQL query required, rather than the mechanism of event triggering.
Lets say I have two tables. One table contains performance data for each staff member each week. The other table is a table that holds the staff members information. What I want is to update a value in the table to a Y or N based on whether that staff member left at the week date.
staffTable
+----------+----------------+------------+
| staff_id | staff_name | leave_date |
+----------+----------------+------------+
| 1 | Joseph Blogges | 2020-01-24 |
| 2 | Joe Bloggs | 9999-12-31 |
| 3 | Joey Blogz | 9999-12-31 |
+----------+----------------+------------+
targetTable
+------------+----------+--------+-----------+
| week_start | staff_id | target | left_flag |
+------------+----------+--------+-----------+
| 2020-01-13 | 1 | 10 | N |
| 2020-01-20 | 1 | 10 | N |
| 2020-01-27 | 1 | 8 | Y |
+------------+----------+--------+-----------+
What I am trying to do is have the left_flag automatically change from 'N' to 'Y' when the week_start value is greater than leave_date of the staff member (in the other table).
I have tried successfully putting this into a view, which works, but the problem is that existing applications, views and queries will need to all reference a new view instead of a table and I want to be able to query the data table as my front-end has issues interacting in live with a view instead of a table.
I have also successfully used a UDF to return the leave_date and then create computed column that will check if this UDF variable is greater than the start_date column and this worked fine until I realised that the UDF is the most resource consuming query on the entire server and is completely disproportionate.
Is there a way that I can trigger an update to the staffTable when a criteria is met in another table, or is there a totally better and different way of doing this? If it can't be done easily, I'll try to switch to a view and work around it in the front-end.
I'm going to describe the process rather than writing the code.
What you are describing can be accomplished using triggers on staffTable. When a new row is inserted or updated the trigger would change any rows in targetTable. This would be an after insert/update trigger.
The heart of the trigger would be:
update tt
set left_flag = 'Y'
from targettable tt join
inserted i
on tt.staff_id = i.staff_id
where i.leave_date < tt.week_start and
tt.left_flag <> 'Y';

Structure of a relational database for comparing multiple dates

We have a Microsoft Access Database at work to track an ongoing list of customers. Each customer has to sign a contract with several departments - totally 13 (!) departments - for which we want to keep track about the current progress for each customer when a contract is sent and received. This structure looks similar to something like this:
Table 1
-------------------------------------------------------------------------------------------------------------------
CUSTOMER_ID | ... | DEP_A_SENT | DEP_A_RECEIVED | DEP_B_SENT | DEP_B_RECEIVED | DEP_C_SENT | DEP_C_RECEIVED | ... |
-------------------------------------------------------------------------------------------------------------------
1 | ... | 2015-05-01 | 2015-05-03 | 2015-05-04 | 2015-05-09 | 2015-05-01 | 2015-05-05 | ... |
2 | ... | 2015-05-01 | 2015-05-05 | 2015-05-01 | 2015-05-03 | 2015-05-13 | --- | ... |
...
I want to be able to calculate the timespan between DEP_X_SENT with DEP_X_RECEIVED for customer and department (such as "department A: 2 days, department B: 5 days..." for customer ID 1)
More importantly, I want to compare all the DEP_X_RECEIVED dates with each other for one customer: Determining the first (MIN) and the last (MAX) date a contract has been received to finding how many days it takes for each customer until all contracts are received. (such as "the contracts were received within 6 days" for customer ID 1, because the first was received on May 3rd. and the last on May 9th). Furthermore, I want to calculate the average timespan this took for all customers. If the contract is not received yet, the is no value in that field.
In MySQL I can work with functions such GREATEST and LEAST to compare values between different columns, but in Access I have to rely for now on VBA and I think it is considered bad practice. How can I normalize and restructure my table for archieving my goals with rather simple MAX, MIN and AVGoperations? Many thanks!
Simply fold your existing table into this structure:
create table TABLE_1 (
CUSTOMER_ID int,
DEPARTMENT_ID int, -- foreign key reference to DEPARTMENT table
SENT date,
RECEIVED date
);
Now you can perform the required analysis simply, and retrieve the original layout as either a Pivot report or LEFT OUTER JOIN from the DEPARTMENT table to the new TABLE_1.

SQL payments matrix

I want to combine two tables into one:
The first table: Payments
id | 2010_01 | 2010_02 | 2010_03
1 | 3.000 | 500 | 0
2 | 1.000 | 800 | 0
3 | 200 | 2.000 | 300
4 | 700 | 1.000 | 100
The second table is ID and some date (different for every ID)
id | date |
1 | 2010-02-28 |
2 | 2010-03-01 |
3 | 2010-01-31 |
4 | 2011-02-11 |
What I'm trying to achieve is to create table which contains all payments before the date in ID table to create something like this:
id | date | T_00 | T_01 | T_02
1 | 2010-02-28 | 500 | 3.000 |
2 | 2010-03-01 | 0 | 800 | 1.000
3 | 2010-01-31 | 200 | |
4 | 2010-02-11 | 1.000 | 700 |
Where T_00 means payment in the same month as 'date' value, T_01 payment in previous month and so on.
Is there a way to do this?
EDIT:
I'm trying to achieve this in MS Access.
The problem is that I cannot connect name of the first table's column with the date in the second (the easiest way would be to treat it as variable)
I added T_00 to T_24 columns in the second (ID) table and was trying to UPDATE those fields
set T_00 =
iif(year(date)&"_"&month(date)=2010_10,
but I realized that that would be to much code for access to handle if I wanted to do this for every payment period and every T_xx column.
Even if I would write the code for T_00 I would have to repeat it for next 23 periods.
Your Payments table is de-normalized. Those date columns are repeating groups, meaning you've violated First Normal Form (1NF). It's especially difficult because your field names are actually data. As you've found, repeating groups are a complete pain in the ass when you want to relate the table to something else. This is why 1NF is so important, but knowing that doesn't solve your problem.
You can normalize your data by creating a view that UNIONs your Payments table.
Like so:
CREATE VIEW NormalizedPayments (id, Year, Month, Amount) AS
SELECT id,
2010 AS Year,
1 AS Month,
2010_01 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
2 AS Month,
2010_02 AS Amount
FROM Payments
UNION ALL
SELECT id,
2010 AS Year,
3 AS Month,
2010_03 AS Amount
FROM Payments
And so on if you have more. This is how the Payments table should have been designed in the first place.
It may be easier to use a date field with the value '2010-01-01' instead of a Year and Month field. It depends on your data. You may also want to add WHERE Amount IS NOT NULL to each query in the UNION, or you might want to use Nz(2010_01,0.000) AS Amount. Again, it depends on your data and other queries.
It's hard for me to understand how you're joining from here, particularly how the id fields relate because I don't see how they do with the small amount of data provided, so I'll provide some general ideas for what to do next.
Next you can join your second table with this normalized Payments table using a method similar to this or a method similar to this. To actually produce the result you want, include a calculated field in this view with the difference in months. Then, create an actual Pivot Table to format your results (like this or like this) which is the proper way to display data like your tables do.

Adding another column based on different criteria (SQL-server)

I do quite a bit of data analysis and use SQL on a daily basis but my queries are rather simple, usually pulling a lot of data which I thereafter manipulate in excel, where I'm a lot more experienced.
This time though I'm trying to generate some Live Charts which have as input a single SQL query. I will now have to create complex tables without the aid of the excel tools I'm so familiar with.
The problem is the following:
We have telesales agents that book appointments by answering to inbound calls and making outbound cals. These will generate leads that might potentially result in a sale. The relevant tables and fields for this problem are these:
Contact Table
Agent
Sales Table
Price
OutboundCallDate
I want to know for each telesales agent their respective Total Sales amount in one column, and their outbound sales value in another.
The end result should look something like this:
+-------+------------+---------------+
| Agent | TotalSales | OutboundSales |
+-------+------------+---------------+
| Tom | 30145 | 0 |
| Sally | 16449 | 1000 |
| John | 10500 | 300 |
| Joe | 50710 | 0 |
+-------+------------+---------------+
With the below SQL I get the following result:
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id
GROUP BY contact.agent
+-------+------------+
| Agent | TotalSales |
+-------+------------+
| Tom | 30145 |
| Sally | 16449 |
| John | 10500 |
| Joe | 50710 |
+-------+------------+
I want to add the third column to this query result, in which the price is summed only for records where the OutboundCallDate field contains data. Something a bit like (where sales.OutboundCallDate is Not Null)
I hope this is clear enough. Let me know if that's not the case.
Use CASE
SELECT c.Agent,
SUM(s.price) AS TotalSales,
SUM(CASE
WHEN s.OutboundCallDate IS NOT NULL THEN s.price
ELSE 0
END) AS OutboundSales
FROM contact c, sales s
WHERE c.id = s.id
GROUP BY c.agent
I think the code would look
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate)
GROUP BY contact.agent
notI'm assuming your Sales table contains something like Units and Price. If it's just a sales amount, then replace the calculation with the sales amount field name.
The key thing here is that the value summed should only be the sales amount if the OutboundCallDate exists. If the OutboundCallDate is not NULL, then we're using a value of 0 for that row.
select Agent.Agent, TotalSales = sum (sales.Price*Units)
, OutboundSales = sum (
case when Outboundcalldate is not null then price*Units
else 0
end)
From Sales inner join Agent on Sales.Agent = Agent.Agent
Group by Agent.Agent

SQL find all orders but delete cancellations related to upgrade orders

Really difficult problem at the moment, as this is something that I am trying to do in SQL but I know should be done with another language such as PHP.
Essentially what I have is a database which has order numbers, and product codes related to policies.
For some orders there are multiple phases such as a cancellation, and an upgrade (for example when an upgrade is made the original policy is cancelled and then an upgrade is made for the difference)
What I want to do, is run a query that will exclude all cancellations related to an upgrade order (dictated as UP in product code) but keep all cancellations related to a change of detail (dictated as CD in product code).
So in short;
The query will need to find all orders including cancellations, but then delete any cancellation which has the same order number as a UP.
I know that this essentially could be done with a 'foreach' which finds all UP product codes, and the related order number, and then deletes all cancellations associated with that order number - but I don't know how to achieve this in SQL without creating multiple tables on a temporary basis and deleting them after.
Table Structure is as such:
| Order Number | Product Code | Transaction Value |
| 1 | RRCN | -30 |
| 1 | RRUP | 12 |
| 2 | SMFP | 30 |
| 3 | SMCN | -12 |
| 3 | SMCD | 12 |
| 4 | HUCN | -30 |
So I would need the query to show all the table, but remove the RRCN related to the RRUP. So query should look for RRUP, see it is order 1, then find the RRCN related to order 1 and remove this from the results - please note, results NOT the original table.
Any ideas? Much appreciated! (I am running SQL Server 2008)
This query should do the job:
SELECT * FROM Q19427600
WHERE RIGHT(ProductCode,2) <> 'CN'
OR (RIGHT(ProductCode,2) = 'CN' AND
OrderNumber NOT IN (SELECT OrderNumber FROM Q19427600 WHERE RIGHT(ProductCode,2) = 'UP'));
Returns results:
OrderNumber ProductCode TransactionValue
1 RRUP 12
2 SMFP 30
3 SMCN -12
3 SMCD 12
4 HUCN -30