Duplicate rows because 1 column has multiple distinct values

Duplicate rows because 1 column has multiple distinct values - sql

I'm running a SELECT query to get data across multiple tables in the same server instance. However I've just noticed that the rows pulled on some data get duplicated because the main table I'm pulling from has a few different values in one of the columns. Here's the query:
SELECT DISTINCT BIF030.C_ACCOUNT AS ACCOUNTNUMBER,
BIF003.C_ACCOUNTTYPE AS ACCOUNTTYPECODE,
CON013.C_DESCRIPTION AS ACCOUNTTYPE,
BIF003.C_DIVISION AS ZONE_DIVISONCODE,
CON028.C_DESCRIPTION AS ZONE_DIVISION,
BIF030.C_METER as METERNUMBER,
BIF005.C_METERCUSTOM1 AS REGISTERNUMBER,
CONVERT(DECIMAL(20,2), BIF030.N_CONSUMP) AS CONSUMPTION,
CON007.C_DESCRIPTION AS UNITS,
BIF030.T_READDATE AS READINGDATE,
MONTH(BIF030.T_READDATE) AS READINGMONTH,
DAY(BIF030.T_READDATE) AS READINGDAY,
YEAR(BIF030.T_READDATE) AS READINGYEAR,
BIF030.I_DAYS AS READINGDAYSCOUNT
FROM ADVANCED.BIF030
LEFT JOIN ADVANCED.CON007 ON CON007.C_UNITS=BIF030.C_UNITS
LEFT JOIN ADVANCED.BIF005 ON BIF005.C_METER=BIF030.C_METER
LEFT JOIN ADVANCED.BIF003 ON BIF003.C_ACCOUNT=BIF030.C_ACCOUNT
LEFT JOIN ADVANCED.CON013 ON CON013.C_ACCOUNTTYPE=BIF003.C_ACCOUNTTYPE
LEFT JOIN ADVANCED.CON028 ON CON028.C_DIVISION=BIF003.C_DIVISION
WHERE T_READDATE > '01-01-2014'
ORDER BY ACCOUNTNUMBER, READINGDATE ASC
I know SELECT DISTINCT is frowned upon, but I get even more rows without it. Here's a sample of what the data looks like when pulled:
ACCOUNTNUMBER
ACCOUNTTYPECODE
ACCOUNTTYPE
ZONE_DIVISIONCODE
ZONE_DIVISION
METERNUMBER
REGISTERNUMBER
CONSUMPTION
UNITS
READINGDATE
READINGMONTH
READINGDAY
READINGYEAR
READINGDAYSCOUNT
1234567
SP
ACCOUNT TYPE 1
00
00-NO ZONE
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
MF
ACCOUNT TYPE 2
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
1234567
SR
ACCOUNT TYPE 3
02
02-GRAVITY
123456789
987654321
3.00
Thousands of Gallons
2014-01-16 00:00:00.00
1
16
2014
30
I also know the column that is messing this up is the "AccountTypeCode" because other accounts that don't have multiple codes associated with the "AccountNumber" only show 1 set of rows. So this one specifically (and probably others) is tripling the amount of rows pulled when it should only pull one for each "ReadingDate".
Also if anyone knows a good way to optimize the query I'd be happy to learn. I know just enough SQL to be dangerous, but not enough to figure this out. Thanks.

Ok. So good news and I want to add this in case it helps anyone else in the future. I found out that since the ACCOUNTTYPECODE and ZONE_DIVISIONCODE were coming from the table BIF003 I needed to add more in the WHERE statement. This is what fixed it for me:
AND BIF030.C_CUSTOMER = BIF003.C_CUSTOMER
Because the C_CUSTOMER column was different (it's a column in the BIF003 and BIF030 tables) which lead to the separate ACCOUNTTYPECODE results I need to check it in the WHERE statement.
Thanks everyone for kick starting my brain on this one.

Related

SELECT from 50 columns

I have a table that has many columns around 50 columns that have datetime data that represent steps user takes when he/she do a procedure
SELECT UserID, Intro_Req_DateTime, Intro_Onset_DateTime, Intro_Comp_DateTime, Info_Req_DateTime, Info_Onset_DateTime, Info_Comp_DateTime,
Start_Req_DateTime, Start_Onset_DateTime, Start_Comp_DateTime,
Check_Req_DateTime, Check_Onset_DateTime, Check_Comp_DateTime,
Validate_Req_DateTime, Validate_Onset_DateTime, Validate_Comp_DateTime,
....
FROM MyTable
I want to find the Step the user did after certain datetime
example I want to find user ABC what the first step he did after 2 May 2019 17:25:36
I cannot use case to check this will take ages to code
is there an easier way to do that?
P.S. Thanks for everyone suggested redesigning the database.. not all databases can be redesigned, this database is for one of the big systems we have and it is been used for more than 20 years. redesigning is out of the equation.

You can use CROSS APPLY to unpivot the values. The syntax for UNPIVOT is rather cumbersome.
The actual query text should be rather manageable. No need for complicated CASE statements. Yes, you will have to explicitly list all 50 column names in the query text, you can't avoid that, but it will be only once.
SELECT TOP(1)
A.StepName
,A.dt
FROM
MyTable
CROSS APPLY
(
VALUES
('Intro_Req', Intro_Req_DateTime)
,('Intro_Onset', Intro_Onset_DateTime)
,('Intro_Comp', Intro_Comp_DateTime)
.........
) AS A (StepName, dt)
WHERE
MyTable.UserID = 'ABC'
AND A.dt > '2019-05-02T17:25:36'
ORDER BY dt DESC;
See also How to unpivot columns using CROSS APPLY in SQL Server 2012

The best way is to design your table with your action type and datetime that action was done. Then you can use a simple where clause to find what you want. The table should be like the table below:
ID ActionType ActionDatetime
----------- ----------- -------------------
1492 1 2019-05-13 10:10:10
1494 2 2019-05-13 11:10:10
1496 3 2019-05-13 12:10:10
1498 4 2019-05-13 13:10:10
1500 5 2019-05-13 14:10:10
But in your current solution, you should use UNPIVOT to get what you want. You can find more information in this LINK.

SQL with Bob and John owing each other money

I have got the following 3 fields in a file: person_ows person_is_owed amount
Example content:
Bob John 100
John Bob 110
What does a SQL look like that produces:
Bob John 100 110
John Bob 110 100
Sorry if this is a trivial question, but I am just trying to learn SQL and I find it really like HELL!

So, what you need is to be able to JOIN two rows. In this case you'll probably want an OUTER JOIN assuming that there isn't always a match of each owing the other. Now you just need to come up with your JOIN criteria, which in this case is going to be based on the names (person_owes and person_is_owed):
SELECT
T1.person_owes,
T1.person_is_owed,
T1.amount AS owes_amount,
COALESCE(T2.amount, 0) AS is_owed_amount
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON T2.person_is_owed = T1.person_owes
The COALESCE is just to make sure that when there is no match that you get a value of 0 instead of NULL.
Also, this assumes that there is only going to be one of each combination of person_owes and person_is_owed. If you might have two rows showing that John owes Bill two different amounts of money then you would have to adjust the SQL above and it would be a bit more complex.
If you plan to use SQL much then you should invest the time in reading one (or preferably more) beginning books on the subject.

Assuming that the combination of (person_ows, person_is_owed) is unique
select person_ows,
person_is_owed,
amount,
(select t2.amount
from the_table t2
where (t2.person_ows, t2.person_is_owed) = (t1.person_is_owed, t1.person_ows))
from the_table t1

Is it possible to match the "next" unmatched record in a SQL query where there is no strictly unique common field between tables?

Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.

I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.

Finding the next occurrence of a value in a table

Sorry in advance if this has already been covered.
I am working on a database which isnt particularly well structured but it is owned by a third party and cannot be changed.
I need some assistance with t-sql in find the next occurrence of a value within the table and return records based on the result. Let me first explain the data. I have simplified this to make it easier to understand.
Polref Effective Date Transaction Type Suffix Value
ABCD1 01/06/2010 New Bus 1 175.00
ABCD1 01/06/2011 Ren 2 200.00
ABCD1 19/08/2011 Adjust 3 50.00
ABCD1 23/04/2012 Adjust 4 50.00
ABCD1 01/06/2012 Ren 5 275.00
So if I ran my query for 2011, the code would need to return in this example rows with suffix 2,3 and 4. So what I have been trying to do is find the first suffix of a New Bus or Ren for the specified year and then finding the next suffix for a New Bus or Ren for the same polref and then using those two suffix values to limit my recordset. It aint working!!
I cant use MAX() as transactions for 2013 have already been added to the system to I would get more records than I actually need.
There result I should be expecting for this example data would be:
ABCD1 300.00
Any help would be greatly appreciated.
To answer another question, If I select 2011 as my year to run the report, there should only be one New Bus or Ren transaction for 2011 so if its a New Bus transaction, the next main transaction will be a Ren, if its a Ren then the next main transaction will be a Ren. Again in my example below, if I run for 2011, it should find the Ren from 01/06/2011 so I want to return that Ren and the two Adjust records.
Sorry, I've not used this forum before so apologies if I was a little vague.
The table I am using has many polrefs so I need this code to calculate totals for all polrefs that fall within the date range. Some polrefs may only have one row, a New Bus, some will have many rows depending on how many adjustments have been made throughout the year of the policy

Partial answer:
This query:
declare #t table (PolRef char(5) not null, EffectiveDate date not null,TransactionType varchar(10) not null,Suffix int not null,Value decimal(10,2) not null)
insert into #t (Polref,EffectiveDate,TransactionType,Suffix,Value) values
('ABCD1','20100601','New Bus',1,175.00),
('ABCD1','20110601','Ren',2,200.00),
('ABCD1','20110819','Adjust',3,50.00),
('ABCD1','20120423','Adjust',4,50.00),
('ABCD1','20120601','Ren',5,275.00)
;With StartTransactions as (
select PolRef,Suffix,ROW_NUMBER() OVER (PARTITION BY PolRef ORDER BY Suffix) rn
from #t where TransactionType in ('New Bus','Ren')
), Periods as (
select st1.PolRef,st1.Suffix as StartSuffix,st2.Suffix as EndSuffix
from
StartTransactions st1
left join
StartTransactions st2
on
st1.PolRef = st2.PolRef and
st1.rn = st2.rn - 1
)
select
p.PolRef,t2.EffectiveDate,SUM(t.Value) as Total
from
Periods p
inner join
#t t
on
p.PolRef = t.PolRef and
p.StartSuffix <= t.Suffix and
(p.EndSuffix > t.Suffix or
p.EndSuffix is null)
inner join
#t t2
on
p.PolRef = t2.PolRef and
t2.Suffix = p.StartSuffix
group by
p.PolRef,t2.EffectiveDate
Groups each set of transactions based on each successive Ren or New Bus transaction:
PolRef EffectiveDate Total
------ ------------- ---------------------------------------
ABCD1 2010-06-01 175.00
ABCD1 2011-06-01 300.00
ABCD1 2012-06-01 275.00
From that, it should be trivial to e.g. select out only the ones you're interested in from a particular year. But your question is still vague on some specifics, so I'm not taking it any further at this point.

SQL SUM with Repeating Sub Entries - Best Practice?

I hit this issue regularly but here is an example....
I have a Order and Delivery Tables. Each order can have one to many Deliveries.
I need to report totals based on the Order Table but also show deliveries line by line.
I can write the SQL and associated Access Report for this with ease ....
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
until I get to the summing element. I obviously only want to sum each Order once, not the 1-many times there are deliveries for that order.
e.g. The SQL might return the following based on 2 Orders (ignore the banalness of the report, this is very much simplified)
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
South 3 £50 23-04-2012
I would want to report:
Total Sales North - £173
Delivery 12-04-2012
Delivery 14-04-2012
Delivery 01-05-2012
Delivery 03-05-2012
Delivery 07-05-2012
Total Sales South - £50
Delivery 23-04-2012
The bit I'm referring to is the calculation of the £173 and £50 which the first of which obviously shouldn't be £419!
In the past I've used things like MAX (for a given Order) but that seems like a fudge.
Surely there must be a regular answer to this seemingly common problem but I can't find one.
I don't necessarily need the code - just a helpful point in the right direction.
Many thanks,
Chris.

A roll up operator may not look pretty. However, it would do the regular aggregates that you see now, and it show the subtotals of the order. This is what you're looking for.
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
GROUP BY xxx
WITH ROLLUP;
I'm not exactly sure how the rest of your query is set up, but it would look something like this:
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
NULL NULL f419 NULL

I believe what you want is called a windowing function for your aggregate operation. It looks like the following:
SELECT xxx, SUM(Value) OVER (PARTITION BY Order.Region) as OrderTotal
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
Here's the MSDN article. The PARTITION BY tells the SUM to be done separately for each distinct Order.Region.
Edit: I just noticed that I missed what you said about orders being counted multiple times. One thing you could do is SUM() the values before joining, as a CTE (guessing at your schema a bit):
WITH RegionOrders AS (
SELECT Region, OrderNo, SUM(Value) OVER (PARTITION BY Region) AS RegionTotal
FROM Order
)
SELECT Region, OrderNo, Value, DeliveryDate, RegionTotal
FROM RegionOrders RO
INNER JOIN Delivery D on D.OrderNo = RO.OrderNo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas