Combining Data from Two Tables into One Row - sql

I've read around quite a bit for a solution to my problem but I can't seem to get it to work. It seems like a simple problem but I'm not getting the result set I want.
I'm working on a report that needs to pull from two tables and essentially create one row of data for each employee. The file needs to be uploaded to a healthcare vendor.
Here is an example of the data
Table1: EmployeeCheckDeduction
Employee ID Deduction Amount Check Date
1234 50.00 6/30/2015
1234 50.00 7/15/2015
4567 100.00 6/30/2015
4567 100.00 7/15/2015
9876 75.00 6/30/2015
9876 75.00 7/15/2015
Table2: EmployerContribution
Employee ID Contribution Amount Check Date
1234 25.00 6/30/2015
1234 30.00 7/15/2015
4567 50.00 6/30/2015
4567 60.00 7/15/2015
Part of the problem is that not every record in Table1 will have a corresponding match in Table 2. If they are maxed out on contributions, they won't receive one on that pay. What I want is a result set that looks like this:
Employee ID Deduction Amount Contribution Amount Check Date
1234 50.00 25.00 6/30/2015
1234 50.00 30.00 7/15/2015
4567 100.00 50.00 6/30/2015
4567 100.00 60.00 7/15/2015
9876 75.00 0.00 6/30/2015
9876 75.00 0.00 7/15/2015
No matter how I try and join, it's just duplicating data. I've tried using subqueries or distinct records and no matter what I try, it's not giving me what I want. I can't figure out what I'm doing wrong.
Edit. See links below for dataset results.
http://s000.tinyupload.com/index.php?file_id=01551050904538574848
http://s000.tinyupload.com/index.php?file_id=63978789937644749322
http://s000.tinyupload.com/index.php?file_id=28700836121558977952
I think part of the problem is that in the Employee Check Deduction table there is a specific deduction code that I'm pulling out. In the employer deduction table that code also exists. However, whenever I try and add the join on those 2 fields in addition to employee ID and check date, it doesn't return results from the employees who have a deduction amount in the employee check deduction table when they don't have a corresponding record in the Employer Contribution Table. I hope that helps.

select a.employee_id, a.deduction_amount, b.contribution_amount,
a.check_date
from employeecheckdeduction a join employercontribution b
on a.employee_id = b.employee_id
and a.check_date = b.check_date
Try this by renaming the columns if necessary. You should join on employed and checkdate.

SELECT a.EMPLID as EmployeeID, a.AL_AMOUNT as EmployeeContribution, ISNULL(b.AL_AMOUNT, 0) AS EmployerContribution, a.CHECK_DT
FROM PS_AL_CHK_DED as a
LEFT JOIN PS_AL_CHK_MEMO as b ON a.EMPLID = b.EMPLID AND a.CHECK_DT = b.CHECK_DT
WHERE a.AL_DEDCD = 'H'
ORDER BY a.CHECK_DT

The problem here is that you are dealing with TWO different things:
a) You want a FULL list of all employees who have ever had a contribution AND/OR a deduction
b) You then want to display it in one row
While there are many ways to go about this, the first thing would be two separate out those two and handle them.
Also it is not clear if you want to GROUP BY the employee ID or if you want to see a full history. If you are to look at history, the best thing would be using a FULL JOIN such as:
select coalesce(d.employee_id,c.employee_id) employee_id
, d.deduction_amount
, c.contribution_amount
, coalesce(d.deducation_date,c.contribution_date) check_date
from employee_deduction d
full join employee_contribution c on c.employee_id = d.employee_id
where d.DED_CD = 'H' or c.MEMO_CD = 'H'
What a FULL JOIN does is include ALL records from the first table and ALL records from the joining table regardless if there are nulls in either table.
That is why you have to use a coalesce statement in the select clause to make sure you don't wind up with a null value in the result set.
Hope this works.

Related

Duplicate rows because 1 column has multiple distinct values

I'm running a SELECT query to get data across multiple tables in the same server instance. However I've just noticed that the rows pulled on some data get duplicated because the main table I'm pulling from has a few different values in one of the columns. Here's the query:
SELECT DISTINCT BIF030.C_ACCOUNT AS ACCOUNTNUMBER,
BIF003.C_ACCOUNTTYPE AS ACCOUNTTYPECODE,
CON013.C_DESCRIPTION AS ACCOUNTTYPE,
BIF003.C_DIVISION AS ZONE_DIVISONCODE,
CON028.C_DESCRIPTION AS ZONE_DIVISION,
BIF030.C_METER as METERNUMBER,
BIF005.C_METERCUSTOM1 AS REGISTERNUMBER,
CONVERT(DECIMAL(20,2), BIF030.N_CONSUMP) AS CONSUMPTION,
CON007.C_DESCRIPTION AS UNITS,
BIF030.T_READDATE AS READINGDATE,
MONTH(BIF030.T_READDATE) AS READINGMONTH,
DAY(BIF030.T_READDATE) AS READINGDAY,
YEAR(BIF030.T_READDATE) AS READINGYEAR,
BIF030.I_DAYS AS READINGDAYSCOUNT
FROM ADVANCED.BIF030
LEFT JOIN ADVANCED.CON007 ON CON007.C_UNITS=BIF030.C_UNITS
LEFT JOIN ADVANCED.BIF005 ON BIF005.C_METER=BIF030.C_METER
LEFT JOIN ADVANCED.BIF003 ON BIF003.C_ACCOUNT=BIF030.C_ACCOUNT
LEFT JOIN ADVANCED.CON013 ON CON013.C_ACCOUNTTYPE=BIF003.C_ACCOUNTTYPE
LEFT JOIN ADVANCED.CON028 ON CON028.C_DIVISION=BIF003.C_DIVISION
WHERE T_READDATE > '01-01-2014'
ORDER BY ACCOUNTNUMBER, READINGDATE ASC
I know SELECT DISTINCT is frowned upon, but I get even more rows without it. Here's a sample of what the data looks like when pulled:
ACCOUNTNUMBER
ACCOUNTTYPECODE
ACCOUNTTYPE
ZONE_DIVISIONCODE
ZONE_DIVISION
METERNUMBER
REGISTERNUMBER
CONSUMPTION
UNITS
READINGDATE
READINGMONTH
READINGDAY
READINGYEAR
READINGDAYSCOUNT
1234567
SP
ACCOUNT TYPE 1
00
00-NO ZONE
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
MF
ACCOUNT TYPE 2
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
SR
ACCOUNT TYPE 3
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
I also know the column that is messing this up is the "AccountTypeCode" because other accounts that don't have multiple codes associated with the "AccountNumber" only show 1 set of rows. So this one specifically (and probably others) is tripling the amount of rows pulled when it should only pull one for each "ReadingDate".
Also if anyone knows a good way to optimize the query I'd be happy to learn. I know just enough SQL to be dangerous, but not enough to figure this out. Thanks.
Ok. So good news and I want to add this in case it helps anyone else in the future. I found out that since the ACCOUNTTYPECODE and ZONE_DIVISIONCODE were coming from the table BIF003 I needed to add more in the WHERE statement. This is what fixed it for me:
AND BIF030.C_CUSTOMER = BIF003.C_CUSTOMER
Because the C_CUSTOMER column was different (it's a column in the BIF003 and BIF030 tables) which lead to the separate ACCOUNTTYPECODE results I need to check it in the WHERE statement.
Thanks everyone for kick starting my brain on this one.

Select columns value based on logic applied on the rows

It is a self-join I need, but I'm having difficulty with this problem and I hope someone can help me.
I have a table with MAT_CODE, MATERIAL and VENDOR and I am trying to generate a new column with NEW_MATCODE as per the below scenario.
Sample Data :
NEW_MATCODE MAT_CODE MATERIAL VENDOR WIN_VENDOR
X-043223065 GP002134 GP002134
3065 X-043223065 USD005 P10011
3065 3065 X-043223065 EUR003 P10011
4567 4567 X-023065 UD00005 UD00005
4567 X-023065 DF00388 UD00005
4321 X-04065 P24005 P24005
4321 4321 X-04065 D41111 P24005
4321 X-04065 D46732 P24005
X-0432065 US7800 D0230005
X-0432065 EUR234 D067805
123 123 X-04322 P0008 P0008
123 1234 X-04322 EU0323 P0008
123 1262 X-04322 EUR0032 P0008
2345 2345 X-04322 DFGH322 P12008
123456 123456 X-04322 EUR00323 P12008
1113 1113 X-04322 EUR0032 P12008
Logic for 1,2 and 3 sets of data:
Pick up the MATERIAL AND WIN_VENDOR combination and get the unique MAT_CODE and apply it across all MATERIAL- WIN_VENDOR combinations as the NEW_MATCODE
Logic for 4th set :
If no combination for MAT_CODE exists then leave it as-is
Logic for 5th set:
When different MAT_CODE exists for the same MATERIAL and WIN_VENDOR combination, apply NEW_MATCODE as the MAT_CODE from MATERIAL - VENDOR where VENDOR = WIN_VENDOR
Logic for 6th set:
When different MAT_CODE exists for the same MATERIAL and WIN_VENDOR combination, and VENDOR <> WIN_VENDOR leave MAT_CODE as-is.
Hope it is clear. Any help would be appreciated.
Thanks.
I think the following query will get you most of the way to what you are looking for:
SELECT mat_code, material, vendor, win_vendor,
CASE
WHEN COUNT(DISTINCT mat_code) OVER (PARTITION BY material, win_vendor) = 0 THEN mat_code
WHEN COUNT(DISTINCT mat_code) OVER (PARTITION BY material, win_vendor) = 1 THEN MAX(mat_code) OVER (PARTITION BY material, win_vendor)
ELSE NVL((SELECT sub.mat_code FROM material_info sub WHERE sub.material = mi.material AND sub.vendor = sub.win_vendor), mi.mat_code)
END AS NEW_MAT
FROM material_info mi;
The case statement is making use of the analytical functions to handle cases 1-4. The else branch is attempting to grab Case 5 and if it isn't found defaulting to Case 6.

Using select distinct with sum

I am not sure if what I want can be done but basically, I have a homework where I am supposed to come up with sql statements to filter through a database of a made up TCM company in any way. I wanted to filter branch to see which branch earns the most revenue.
but I am stuck here because I don't know how to filter it.
select distinct b.branch_ID, t.fees
from treatment t,branch b, appointment a, appointment_treatment at
where t.treatment_ID = at.treatment_ID
and at.appt_ID = a.appt_ID
and b.branch_ID = a.branch_ID
The result is like so.
branch revenue
BR001 30.00
BR001 45.00
BR001 75.00
BR002 28.00
BR002 40.00
BR002 60.00
BR003 28.00
BR003 60.00
BR004 28.00
However, what I want is:
branch (sum of) revenue for each branch
BR001 30.00
BR002 45.00
BR003 75.00
BR004 28.00
BR005 40.00
but I can't think of a way to make this work!
If I understood you problem correctly, you can use GROUP BY clause combined with SUM function to achieve what you want, like this:
select
b.branch_ID, sum(t.fees)
from
treatment t,branch b, appointment a, appointment_treatment at
where
t.treatment_ID = at.treatment_ID
and at.appt_ID = a.appt_ID
and b.branch_ID = a.branch_ID
group by
b.branch_ID

Joining a table to two one-to-many relationship tables in SQL Server

Happy Friday folks,
I'm trying to write an SSRS report displaying data from three (actually about 12, but only three relevant) tables that have akward relationships and the SQL query behind the data is proving difficult.
There are three entities involved - a Purchase Order, a Sales Order, and a Delivery. The problem is the a Purchase Order can have many sales orders, and also many deliveries which are NOT linked to the sales orders...that would be too easy.
Both the Sales Order and Delivery tables can be linked to the Purchase Order table by foreign keys and an intermediate table each.
I need to basically list Purchase Orders, a list of sales orders and a list of deliveries next to them, with NULLs for any fields that aren't valid so that'll give the required output in SSRS/when read by a human, ie, for a purchase order with 2 sales orders and 4 delivery dates;
PO SO Delivery
1234 ABC 05/10
1234 DEF 09/10
1234 NULL 10/12
1234 NULL 14/12
The above (when grouped by PO) will tell the users there are two sales orders and four (unlinked) delivery dates.
Likewise if there are more SOs than deliveries, we need NULLs in the Delivery column;
PO SO Delivery
1234 ABC 03/08
1234 DEF NULL
1234 GHI NULL
1234 JKL NULL
Above would be the case with 4 SOs and one delivery date.
Using Left Outer joins alone gives too much duplication - in this case 8 rows, as it gives 4 delivery dates for each match on the sales order;
PO SO Delivery
1234 ABC 05/10
1234 ABC 09/10
1234 ABC 10/12
1234 ABC 14/12
1234 DEF 05/10
1234 DEF 09/10
1234 DEF 10/12
1234 DEF 14/12
It's fine that the PO column is duplicated as SSRS can visually group that - but the SO/Delivery fields can't be allowed to duplicate as this can't be got rid of in the report - if I group the column in SSRS by SO then it still spits out 4 delivery dates for each one.
The only situation our query works nice is when there is just one SO per PO. In that case the single PO and SO numbers are duplicated together for x deliveries and can both be neatly grouped in SSRS. Unfortunately this is a rare occurence in the data.
I've thought of trying to use some sort of windowing function or CROSS APPLY but both fall down as they will repeat for every PO number listed and end up spitting out too much data.
At the point of thinking this just isn't set-based enough to be doable in SQL, I know the data is horrible..
Any help much appreciated.
EDIT - basical sqlfiddle link to the table schemas. Omitted many columns which aren't relevant. http://sqlfiddle.com/#!2/5ba16
Example data...
Purchase Order
PO_Number Style
1001 Black work boots
1002 Green hat
1006 Red Scarf
Sales Order
Sales_order_number PO_number Qty Retailer
A100-21 1001 15 Walmart
A100-22 1001 29 Walmart
A200-31 1006 1000 Asda
Delivery
Delivery_ID Delivery_Date PO_number
1543285 10/05/2014 1001
1543286 12/05/2014 1001
1543287 17/05/2014 1001
1543288 21/05/2014 1002
If you assign row numbers to the elements in salesorders and deliveries, you can link on that.
Something like this
declare #salesorders table (po int, so varchar(10))
declare #deliveries table (po int, delivery date)
declare #purchaseorders table (po int)
insert #purchaseorders values (123),(456)
insert #salesorders values (123,'a'),(123,'b'),(456,'c')
insert #deliveries values (123,'2014-1-1'),(456,'2014-2-1'),(456,'2014-2-1')
select *
from
(
select numbers.number, p.po, so.so, d.delivery from #purchaseorders p
cross join (Select number from master..spt_values where type='p') numbers
left join (select *,ROW_NUMBER() over (partition by po order by so) sor from #salesorders ) so
on p.po = so.po and numbers.number = so.sor
left join (select * , ROW_NUMBER() over (partition by po order by delivery) dor from #deliveries) d
on p.po = d.po and numbers.number = d.dor
) v
where so is not null or delivery is not null
order by po,number

access sql changing where parameters

I have a problem with a DB I'm working on, I suspect is actually a very simple problem, still I can't solve it.
For sake of brevity I will use a very simplified case.
Let's say you have a table were have been stored data about expenses made by different persons, something like this:
IDPerson Expense data
1 2.00 1/1/14
1 1.00 1/2/14
1 1.00 1/3/14
2 1.00 1/1/14
2 1.00 1/2/14
3 1.00 1/3/14
What i need is a query that gives me, for each IDPerson, the totale of expenses made before or on 1/2/14 AND the total of the expenses unregarding the date, so
IDPerson TotalExpense ExpenseBefore1/2/14
1 4.00 2.00
2 2.00 2.00
3 1.00 0.00
It seems to me that to get this you shuld have different where clauses for each column(hence the title)
What I tried:
I tried to use two queries, one without where clause and one with
Where data<=1/2/14
then I used UNION to get the final result, technically it worked, but since this it is not really handy expecially because the real project is much more complicated than this and the final table has much more fields; is there any other way to obtain this?
P.S. You may have noticed that English is not my mothertongue, sorry for any error
SELECT y.IDPerson,y.ExpensesTotal,isnull(x.ExpensesBefore_1_2_2014,0) as ExpensesBefore_1_2_2014 from
(SELECT IDPerson ,sum(expenses) as ExpensesTotal from tableA GROUP BY IDPerson) as y full join ( SELECT tableA.IDPerson , sum(expenses) as ExpensesBefore_1_2_2014 from tableA
where data < '01.02.2014'
GROUP BY tableA.IDPerson ) as x on x.IDPerson = y.IDPerson
This query should work, also it should be preaty quick if the IDPerson is Indexed or PK.
I've updated the query, try this one.
SQL Fiddle : http://sqlfiddle.com/#!3/fdaf0/1