How to return a single value built from values stored in multiple records? - sql

I am an application developer unfortunately put in the position of needing to write (/update) the SQL statement in order to return data for the application. My experience with SQL is limited, so would appreciate any help.
We have a Oracle Database 11g (11.2.0.4.0)
Example Tables
I've created the following example which replicates our set-up. It consists of:
A main table which contains records of trips around different cities. (MAIN_TRIP_TABLE)
Various additional tables which contain additional properties linked to these trips via INNER JOINs. (ADDITIONAL_TABLE)
A separate table showing the steps taken along the journey (ie. interim locations visited). A value of STEP_NUM = 1 is always the final destination, and thus there is always at least 1 record in this table per trip in the main table. If there were any interim stops made of the journey they are listed in this table as separate records with STEP_NUM iterating upwards. (JOURNEY_STEPS_TABLE)
MAIN_TRIP_TABLE
RECORD_ID | PROP_1 | PROP_2 | FINAL_DEST | ...
-------------------------------------------------
10001 | A | 1 | London | ...
10002 | A | 0 | Reading | ...
10003 | B | 1 | Leeds | ...
10004 | B | 0 | York | ...
ADDITIONAL_TABLE
RECORD_ID | PROP_3 | ...
------------------------
10001 | X | ...
10002 | Y | ...
10003 | Y | ...
10004 | X | ...
JOURNEY_STEPS_TABLE
RECORD_ID | STEP_NUM | LOCATION | ...
--------------------------------------
10001 | 1 | London | ...
10002 | 1 | Reading | ...
10002 | 2 | Bath | ...
10003 | 1 | Leeds | ...
10003 | 2 | York | ...
10003 | 3 | Bristol | ...
10004 | 1 | York | ...
10004 | 2 | Cardiff | ...
10004 | 3 | Oxford | ...
10004 | 4 | London | ...
Issue
I want to retrieve something that looks like:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, MAIN_TRIP_TABLE.PROP_1
, MAIN_TRIP_TABLE.PROP_2
, ADDITIONAL_TABLE.PROP_3
, <Concatenation/Array of JOURNEY_STEPS_TABLE> as "InterimStops"
FROM MAIN_TRIP_TABLE
INNER JOIN ADDITIONAL_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = ADDITIONAL_TABLE.RECORD_ID
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
Where the "InterimStops" value above is some sort of concatenation of any and all values in found in the JOURNEY_STEPS_TABLE, for that particular RECORD_ID, in order of increasing STEP_NUM, with some sort of deliminator. (eg for '10001' I would want just "London", and for '10004' I would want "York,Cardiff,Oxford,London").
If I get something like this, I can then separate these out to an JSON array, within the application I'm developing.
Note: The actual SQL SELECT query is already significantly more complex with other fields and tables, so changing the query away from 1 SELECT query (ie. instead using multiple queries), is something I'd like to avoid unless absolutely necessary.
Things I've tried
After some Googling, I started to build a SQL statement using LISTAGG, and to begin with it looked promising:
SELECT
MAIN_TRIP_TABLE.RECORD_ID
, LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER) "InterimStops"
FROM MAIN_TRIP_TABLE
LEFT OUTER JOIN JOURNEY_STEPS_TABLE ON MAIN_TRIP_TABLE.RECORD_ID = JOURNEY_STEPS_TABLE.RECORD_ID
GROUP BY MAIN_TRIP_TABLE.RECORD_ID
This returned exactly the sort of value I was looking for, but this failed as soon as I tried to bring back in the other values from both the main table and additional tables (eg: MAIN_TRIP_TABLE.PROP_1, MAIN_TRIP_TABLE.PROP_2, ADDITIONAL_TABLE.PROP_3). This gave me a "ORA-00979: not a GROUP BY expression" error.
I then tried to get this data via a subquery but struggled to get anything working.
Any help, insight, or pointing in the right direct, would be very much appreciated.
Many Thanks

It's easier to do this with a subquery so you don't have to group the data on the joined set of columns (as you allready tried):
SELECT MAIN_TRIP_TABLE.RECORD_ID
, (SELECT LISTAGG(JOURNEY_STEPS_TABLE.LOCATION, ',') WITHIN GROUP (ORDER BY JOURNEY_STEPS_TABLE.STEP_NUMBER)
FROM JOURNEY_STEPS_TABLE
WHERE JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID) "InterimStops"
FROM MAIN_TRIP_TABLE
The other possibility is to LEFT JOIN the grouped data:
SELECT MAIN_TRIP_TABLE.RECORD_ID
, JOURNEY_STEPS_TABLE."InterimStops"
FROM MAIN_TRIP_TABLE
LEFT JOIN (SELECT RECORD_ID
, LISTAGG(LOCATION, ',') WITHIN GROUP (ORDER BY STEP_NUMBER) "InterimStops"
FROM JOURNEY_STEPS_TABLE
GROUP BY RECORD_ID) JOURNEY_STEPS_TABLE
ON JOURNEY_STEPS_TABLE.RECORD_ID = MAIN_TRIP_TABLE.RECORD_ID

Related

Columns to Rows Two Tables in Cross Apply

Using SQL Server I have two tables, below sample Table #T1 in DB has well over a million rows, Table #T2 has 100 rows. Both tables are in Column format and I need to Pivot to rows and join both.
Can I get it all in one query with Cross Apply and remove the cte?
This is my code, I have correct output but is this the most efficient way to do this considering number of rows?
with cte_sizes
as
(
select SizeRange,Size,ColumnPosition
from #T2
cross apply (
values(Sz1,1),(Sz2,2),(Sz3,3),(Sz4,4)
) X (Size,ColumnPosition)
)
select a.ProductID,a.SizeRange,c.Size,isnull(x.Qty,0) as Qty
from #T1 a
cross apply (
values(a.Sale1,1),(a.Sale2,2),(a.Sale3,3),(a.Sale4,4)
) X (Qty,ColumnPosition)
inner join cte_sizes c
on c.SizeRange = a.SizeRange
and c.ColumnPosition = x.ColumnPosition
I have also code and considered this but is this the CROSS APPLY a better method?
with cte_sizes
as
(
select 1 as SizePos
union all
select SizePos + 1 as SizePos
from cte_sizes
where SizePos < 4
)
select a.ProductID
,a.SizeRange
,(case when b.SizePos = 1 then c.Sz1
when b.SizePos = 2 then c.Sz2
when b.SizePos = 3 then c.Sz3
when b.SizePos = 4 then c.Sz4 end
) as Size
,isnull((case when b.SizePos = 1 then a.Sale1
when b.SizePos = 2 then a.Sale2
when b.SizePos = 3 then a.Sale3
when b.SizePos = 4 then a.Sale4 end
),0) as Qty
from #T1 a
inner join #T2 c on c.SizeRange = a.SizeRange
cross join cte_sizes b
This is wild guessing, but my magic crystall ball told me, that you might be looking for something like this:
For this we do not need your table #TS at all.
WITH Unpivoted2 AS
(
SELECT t2.SizeRange,A.* FROM #t2 t2
CROSS APPLY(VALUES(1,t2.Sz1)
,(2,t2.Sz2)
,(3,t2.Sz3)
,(4,t2.Sz4)) A(SizePos,Size)
)
SELECT t1.ProductID
,Unpivoted2.SizeRange
,Unpivoted2.Size
,Unpivoted1.Qty
FROM #t1 t1
CROSS APPLY(VALUES(1,t1.Sale1)
,(2,t1.Sale2)
,(3,t1.Sale3)
,(4,t1.Sale4)) Unpivoted1(SizePos,Qty)
LEFT JOIN Unpivoted2 ON Unpivoted1.SizePos=Unpivoted2.SizePos AND t1.SizeRange=Unpivoted2.SizeRange
ORDER BY t1.ProductID,Unpivoted2.SizeRange;
The result:
+-----------+-----------+------+------+
| ProductID | SizeRange | Size | Qty |
+-----------+-----------+------+------+
| 123 | S-XL | S | 1 |
+-----------+-----------+------+------+
| 123 | S-XL | M | 12 |
+-----------+-----------+------+------+
| 123 | S-XL | L | 13 |
+-----------+-----------+------+------+
| 123 | S-XL | XL | 14 |
+-----------+-----------+------+------+
| 456 | 8-14 | 8 | 2 |
+-----------+-----------+------+------+
| 456 | 8-14 | 10 | 22 |
+-----------+-----------+------+------+
| 456 | 8-14 | 12 | NULL |
+-----------+-----------+------+------+
| 456 | 8-14 | 14 | 24 |
+-----------+-----------+------+------+
| 789 | S-L | S | 3 |
+-----------+-----------+------+------+
| 789 | S-L | M | NULL |
+-----------+-----------+------+------+
| 789 | S-L | L | 33 |
+-----------+-----------+------+------+
| 789 | S-L | XL | NULL |
+-----------+-----------+------+------+
The idea in short:
The cte will return your #T2 in an unpivoted structure. Each name-numbered column (something you should avoid) is return as a single row with an index indicating the position.
The SELECT will do the same with #T1 and join the cte against this set.
UPDATE: After a lot of comments...
If I get this (and the changes to the initial question) correctly, the approach above works perfectly well, but you want to know, what was best in performance.
The first answer to "What is the fastest approach?" is Race your horses by Eric Lippert.
Good to know 1: A CTE is nothing more then syntactic sugar. It will allow to type a sub-query once and use it like a table, but it has no effect to the way how the engine will work this down.
Good to know 2: It is a huge difference whether you use APPLY or JOIN. The first will call the sub-source once per row, using the current row's values. The second will have to create two sets first and will then join them by some condition. There is no general "what is better"...
For your issue: As there is one very big set and one very small set, all depends on when you reduce the big set usig any kind of filter. The earlier the better.
And most important: It is - in any case - a sign of bad structures - when you find name numbering (something like phone1, phone2, phoneX). The most expensive work will be to transform your 4 name-numbered columns to some dedicated rows. This should be stored in normalized format...
If you still need help, I'd ask you to start a new question.

Multiple Self-Join based on GROUP BY results

I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example:
| session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified |
|------------|-------------|---------|---------------------|------------------|----------------------|-------------|----------------|
| 1 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-05T01:00:00Z | 30900 | 11111111 |
| 2 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-04T01:00:00Z | 30000 | 22222222 |
| 3 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-03T01:00:00Z | 30000 | 22222222 |
| 4 | server01 | Windows | Windows File System | Scheduled Backup | 2017-12-02T01:00:00Z | 30000 | 22222222 |
| 5 | server01 | Windows | Windows VSS | Scheduled Backup | 2017-12-01T01:00:00Z | 30000 | 33333333 |
| 6 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-05T02:00:00Z | 30000 | 44444444 |
| 7 | server02 | Windows | Windows File System | Scheduled Backup | 2017-12-04T02:00:00Z | 30900 | 55555555 |
| 8 | server03 | Windows | Windows File System | On-Demand Backup | 2017-12-05T03:00:00Z | 30000 | 66666666 |
| 9 | server04 | Windows | Windows File System | Validate | 2017-12-05T03:00:00Z | 30000 | 66666666 |
Each client_name (server) can have multiple datasets, and each dataset can have multiple plugin_names. So I have a created a SQL statement that does a GROUP BY of these three columns to get a list of "job" activity over time.
(http://sqlfiddle.com/#!15/f15556/1)
select
client_name,
dataset,
plugin_name
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
Each of these Jobs can be successful or fail based on a status_code column. Using self-join with subqueries I'm able to get results of the Last Good backup along with it's completed_ts (completed time) and bytes_modified and more:
(http://sqlfiddle.com/#!15/f15556/16)
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
I can do the same thing separately to get the Last Attempt details by removing WHERE's status_code line: http://sqlfiddle.com/#!15/f15556/3. Note that most times LastGood and LastAttempted are the same row but sometimes they are not, depending if the last backup was successful.
What I'm having problems with is merging these two statements together (if possible). So I will get this result:
| client_name | dataset | plugin_name | lastgood | lastgood_bytes | lastattempt | lastattempt_bytes |
|-------------|---------|---------------------|----------------------|-----------------|----------------------|-------------------|
| server01 | Windows | Windows File System | 2017-12-04T01:00:00Z | 22222222 | 2017-12-05T01:00:00Z | 11111111 |
| server01 | Windows | Windows VSS | 2017-12-01T01:00:00Z | 33333333 | 2017-12-01T01:00:00Z | 33333333 |
| server02 | Windows | Windows File System | 2017-12-05T02:00:00Z | 44444444 | 2017-12-05T02:00:00Z | 44444444 |
| server03 | Windows | Windows File System | 2017-12-05T03:00:00Z | 66666666 | 2017-12-05T03:00:00Z | 66666666 |
I attempted just adding another RIGHT JOIN to the end (http://sqlfiddle.com/#!15/f15556/4) and getting NULL rows. After doing some reading I see that the first two JOINs run first creating a temporary table before the 2nd join occurs, but at that point the data I need is lost so I get NULL rows.
Using PostgreSQL 8 via groovy scripting. I also only have read-only access to the DB.
You apparently have two intermediate inner join output tables and you want to get columns from each about some things identified by a common key. So inner join them on the key.
select
g.client_name,
g.dataset,
g.plugin_name,
LastGood,
g.status_code,
LastGood_bytes
LastAttempt,
l.status_code,
LastAttempt_bytes
from
( -- cut & pasted Last Good http://sqlfiddle.com/#!15/f15556/16
select
a2.client_name,
a2.dataset,
a2.plugin_name,
a2.LastGood,
a3.status_code,
a3.bytes_modified as LastGood_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2 a2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as a2
on a3.client_name = a2.client_name and
a3.dataset = a2.dataset and
a3.plugin_name = a2.plugin_name and
a3.completed_ts = a2.LastGood
) as g
join
( -- cut & pasted Last Attempt http://sqlfiddle.com/#!15/f15556/3
select
a1.client_name,
a1.dataset,
a1.plugin_name,
a1.LastAttempt,
a3.status_code,
a3.bytes_modified as LastAttempt_bytes
from v_activities_2 a3
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2 a2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as a1
on a3.client_name = a1.client_name and
a3.dataset = a1.dataset and
a3.plugin_name = a1.plugin_name and
a3.completed_ts = a1.LastAttempt
) as l
on l.client_name = g.client_name and
l.dataset = g.dataset and
l.plugin_name = g.plugin_name
order by client_name, dataset, plugin_name
This uses one of the applicable approaches in Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs. However the correspondence of chunks of code might not be so clear. Its intermediate are left vs your inner & group_concat is your max. (But it has more approaches because of particulars of group_concat & its query.)
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.
Whether this actually serves your purpose in general depends on your actual specification and constraints. Even if the two joins you link are what you want you need to explain exactly what you mean by "merge". You don't say what you want if the joins have different sets of values for the grouped columns. Force yourself to use the English language to say what rows go in the result based on what rows are in the input.
PS 1 You have undocumented/undeclared/unenforced constraints. Please declare when possible. Otherwise enforce by triggers. Document in question text if not in code. Constraints are fundamental to multiple subrow value instances in join & to group by.
PS 2 Learn the syntax/semantics for select. Learn what left/right outer join ons return--whatinner join on does plus unmatched left/right table rows extended by nulls.
PS 3 Is there any rule of thumb to construct SQL query from a human-readable description?
Here is an alternate way that also works but harder to follow and likely more particular to my dataset: http://sqlfiddle.com/#!15/f15556/114
select
Actvty.client_name,
Actvty.dataset,
Actvty.plugin_name,
ActvtyGood.LastGood,
ActvtyGood.status_code as LastGood_status,
ActvtyGood.bytes_modified as LastGood_bytes,
ActvtyOnly.LastAttempt,
Actvty.status_code as LastAttempt_status,
Actvty.bytes_modified as LastAttempt_bytes
from v_activities_2 Actvty
-- 1. Get last attempt of each job (which may or may not match last good)
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastAttempt
from v_activities_2
where
type like '%Backup%'
group by
client_name, dataset, plugin_name
) as ActvtyOnly
on Actvty.client_name = ActvtyOnly.client_name and
Actvty.dataset = ActvtyOnly.dataset and
Actvty.plugin_name = ActvtyOnly.plugin_name and
Actvty.completed_ts = ActvtyOnly.LastAttempt
-- 4. join the list of good runs with the table of last attempts, there would never be a job that has a last good without also a last attempt.
join (
-- 3. join last good runs with the full table to get the additional details of each
select
ActvtyGoodSub.client_name,
ActvtyGoodSub.dataset,
ActvtyGoodSub.plugin_name,
ActvtyGoodSub.LastGood,
ActvtyAll.status_code,
ActvtyAll.bytes_modified
from v_activities_2 ActvtyAll
-- 2. Get last Good run of each job
join (
select
client_name,
dataset,
plugin_name,
max(completed_ts) as LastGood
from v_activities_2
where
type like '%Backup%'
and status_code in (30000,30005) -- Successful (Good) Status codes
group by
client_name, dataset, plugin_name
) as ActvtyGoodSub
on ActvtyAll.client_name = ActvtyGoodSub.client_name and
ActvtyAll.dataset = ActvtyGoodSub.dataset and
ActvtyAll.plugin_name = ActvtyGoodSub.plugin_name and
ActvtyAll.completed_ts = ActvtyGoodSub.LastGood
) as ActvtyGood
on Actvty.client_name = ActvtyGood.client_name and
Actvty.dataset = ActvtyGood.dataset and
Actvty.plugin_name = ActvtyGood.plugin_name

Select rows with same value in one column but different value in another column

I've been trying to build this query but am new to SQL so I'd really appreciate some help.
In the below table example, I have a Customer Code, a linked Customer Code (which is used to link a child customer to a parent customer), a salesperson, and other irrelevant columns. The goal is to have one Salesperson for each parent customer and it's children. So in the example, CustCode #100 is the parent of itself, #200, #500, and #800. All of these accounts have the same Salesperson (JASON) which is perfect. But for CustCode #300, it is the parent of itself, #400, and #600. However, there isn't one salesperson assigned - its both JIM and SUZY. I want to build a query that shows all accounts for this example. Basically, accounts where the Salesperson field isn't the same value for all of it's child customers.
I tried a Where clause for Salesperson <> Salesperson but its not showing up right.
+-----------+-----------------+------------+----------------------+
| CustCode | Linked CustCode | Salesperson| additional columns...|
+-----------+-----------------+------------+----------------------+
| 100 | 100 | JASON | ... |
| 200 | 100 | JASON | ... |
| 300 | 300 | JIM | ... |
| 400 | 300 | JIM | ... |
| 500 | 100 | JASON | ... |
| 600 | 300 | SUZY | ... |
| 700 | NULL | JIM | ... |
| 800 | 100 | JASON | ... |
+-----------+-----------------+------------+----------------------+
Thanks so much for your help!
You can do self join on the table.
select distinct r2.* from
table r1
join table r2
on
r1.linkedcustcode = r2.linkedcustcode and r1.salesperson <> r2.salesperson
This solution uses a recursive CTE first to build the hierarchy and find the leading code for each row, even if a linked code points to a row which is pointing to an upper row itself.
The final query shows the count of different Salespersons:
DECLARE #tbl TABLE(CustCode INT,[Linked CustCode] INT,Salesperson VARCHAR(100));
INSERT INTO #tbl VALUES
(100,100,'JASON')
,(200,100,'JASON')
,(300,300,'JIM')
,(400,300,'JIM')
,(500,100,'JASON')
,(600,300,'SUZY')
,(700,NULL,'JIM')
,(800,100,'JASON');
--The query
WITH CleanUp AS
(
SELECT CustCode
,CASE WHEN [Linked CustCode]=CustCode THEN NULL ELSE [Linked CustCode] END AS [Linked CustCode]
,Salesperson
FROM #tbl
)
,recCTE AS
(
SELECT CustCode AS LeadingCode,CustCode,[Linked CustCode],Salesperson
FROM CleanUp
WHERE [Linked CustCode] IS NULL
UNION ALL
SELECT recCTE.LeadingCode,t.CustCode,t.[Linked CustCode],t.Salesperson
FROM recCTE
INNER JOIN CleanUp AS t ON t.[Linked CustCode]=recCTE.CustCode
)
SELECT LeadingCode,COUNT(DISTINCT Salesperson) AS CountSalesperson
FROM recCTE
GROUP BY LeadingCode
The result
LeadingCode CountSalesperson
100 1
300 2
700 1

prevent from double/triple SUMing when JOINing

i am joining two tables: accn_demographics and accn_payments. The relationship between the two tables is one to many between accn_demographics.accn_id and accn_payments.accn_id
My question is when I am summing the PAID_AMT and COPAY_AMT, I am getting double/triple/quadrouple the number that I should be getting.
Is there an obvious problem with my join condition?
select sum(p.paid_amt) as SumPaidAmount
, sum(p.copay_amt) as SumCoPay
, p.pmt_date
, d.load_Date
, p.ACCN_ID
from accn_payments p
join
(
select distinct load_date, accn_id
from accn_demographics
) d
on p.ACCN_ID=d.ACCN_ID
where p.POSTED='Y'
and p.pmt_date between '20120701' and '20120731'
group by p.pmt_date, d.load_Date,p.ACCN_ID
order by 3 desc
thanks so much for your guidance.
You need to do the summation in a subquery:
select sum(p.SumPaidAmount) as SumPaidAmount, sum(p.SumCoPay) as SumCoPay,
p.pmt_date, d.load_Date, p.ACCN_ID
from (select accn_id, p.pmt_date, sum(paid_amt) as SumPaidAmt,
sum(copay_amt) as SumCoPay
from accn_payments p
where p.POSTED='Y' and
p.pmt_date between '20120701' and '20120731'
group by accn_id, pmt_date
) p join
(select distinct load_date, accn_id from accn_demographics) d
on p.ACCN_ID=d.ACCN_ID
group by p.pmt_date, d.load_Date,p.ACCN_ID
order by 3 desc
Question: do you really intend for pmt_date to be in the final results? It looks like you want to remove it from both the outer SELECT and the subquery.
The only thing I can see if that (select distinct load_date, accn_id from accn_demographics) might return several matches. Look at your data and run a separate query
select distinct load_date, accn_id from accn_demographics WHERE accn_id=SomeID
where SomeID is one of the result accounts that is returning double/triple values. That should pinpoint your problem.
Yes, but it's not so obvious for beginners. What happens is that for every accn_payments record, you're matching on ONLY the accn_id, which means if there are multiple records in accn_demographics for that particular accn_id, then you will get duplicate accn_payment records due to the join. Is there another limiting field on accn_demographics to join back to the payments?
Ultimately, think of it this way:
accn_payments (p):
accn_id | paid_amt | copay_amt | ...
----------------------------------------------------
1 | 100.00 | 20.00 | ...
accn_demographics (d):
accn_id | load_date | ...
------------------------------------
1 | 2012/01/01 | ...
1 | 2012/03/05 | ...
1 | 2012/06/23 | ...
After joining, your results will look like this:
p.accn_id | p.paid_amt | p.copay_amt | p... | d.accn_id | d.load_date | d...
----------------------------------------------------------------------------
1 | 100.00 | 20.00 | .... | 1 | 2012/01/01 | ....
1 | 100.00 | 20.00 | .... | 1 | 2012/03/05 | ....
1 | 100.00 | 20.00 | .... | 1 | 2012/06/21 | ....
As you can see, the same row from accn_payments gets replicated for every matching accn_demographics record, since you specified only the accn_id column to be the join criteria. It can't limit the results any further, so it the DB engine says "Hey, look, this p record matches for all these d records, this must be what he was asking for!" Obviously not what was intended, as when you sum on the p.paid_amt and p.copay_amt, it performs a sum for ALL ROWS (even though they are duplicated).
Ultimately, see if you can limit the join criteria for accn_demographics even further (by some date, perhaps), that way you limit the number of duplicate payment records during the join.

join on three tables? Error in phpMyAdmin

I'm trying to use a join on three tables query I found in another post (post #5 here). When I try to use this in the SQL tab of one of my tables in phpMyAdmin, it gives me an error:
#1066 - Not unique table/alias: 'm'
The exact query I'm trying to use is:
select r.*,m.SkuAbbr, v.VoucherNbr from arrc_RedeemActivity r, arrc_Merchant m, arrc_Voucher v
LEFT OUTER JOIN arrc_Merchant m ON (r.MerchantID = m.MerchantID)
LEFT OUTER JOIN arrc_Voucher v ON (r.VoucherID = v.VoucherID)
I'm not entirely certain it will do what I need it to do or that I'm using the right kind of join (my grasp of SQL is pretty limited at this point), but I was hoping to at least see what it produced.
(What I'm trying to do, if anyone cares to assist, is get all columns from arrc_RedeemActivity, plus SkuAbbr from arrc_Merchant where the merchant IDs match in those two tables, plus VoucherNbr from arrc_Voucher where VoucherIDs match in those two tables.)
Edited to add table samples
Table arrc_RedeemActivity
RedeemID | VoucherID | MerchantID | RedeemAmt
----------------------------------------------
1 | 2 | 3 | 25
2 | 6 | 5 | 50
Table arrc_Merchant
MerchantID | SkuAbbr
---------------------
3 | abc
5 | def
Table arrc_Voucher
VoucherID | VoucherNbr
-----------------------
2 | 12345
6 | 23456
So ideally, what I'd like to get back would be:
RedeemID | VoucherID | MerchantID | RedeemAmt | SkuAbbr | VoucherNbr
-----------------------------------------------------------------------
1 | 2 | 3 | 25 | abc | 12345
2 | 2 | 5 | 50 | def | 23456
The problem was you had duplicate table references - which would work, except for that this included table aliasing.
If you want to only see rows where there are supporting records in both tables, use:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
JOIN arrc_Merchant m ON m.merchantid = r.merchantid
JOIN arrc_Voucher v ON v.voucherid = r.voucherid
This will show NULL for the m and v references that don't have a match based on the JOIN criteria:
SELECT r.*,
m.SkuAbbr,
v.VoucherNbr
FROM arrc_RedeemActivity r
LEFT JOIN arrc_Merchant m ON m.merchantid = r.merchantid
LEFT JOIN arrc_Voucher v ON v.voucherid = r.voucherid