Delete or ignore rows where several fields are duplicate - sql

My data is from electric suppliers. Some charges are fixed some are usage charge. I live a problem for fixed charges. When the electricity meter change during the billing period, two fixed charge lines are created with different meterID and contrat number. Other all fields are same and I want to get one of these. Because it is monthly fixed charge.
If you help me I will be happy,
Thank you very much,
https://www.designcise.com/web/tutorial/how-to-remove-all-duplicate-rows-except-one-in-sql#:~:text=How%20to%20Remove%20All%20Duplicate%20Rows%20Except%20One,Duplicates%20and%20Keep%20Row%20With%20Highest%20ID%20 I created a view without these two fields and get unique fields and created another view and added that two fields, gave small values than real values for compare.
My values for second view (A10000000' AS MeterUniqueNo, '100000' as MeterContractID)
Original examples K18D01652, 646802)
delete from main_table
Inner join view2 on view2.MeterUniqueNo < main_table.MeterUniqueNo
and view2.EnergyChargesRecord_InvoiceNumber = main_table.EnergyChargesRecord_InvoiceNumber
and view2.EnergyChargesRecord_MPANNumber = main_table.EnergyChargesRecord.MPANNumber
It is not working, because values are different.
T-SQL: Deleting all duplicate rows but keeping one
I can not use this method. Because I have to check MPAN number and Invoice number. Not just one value...

Since you didn't provide any sample data, I'm guessing at the actual format of your data. I created a minimal example of how to use a ROW_NUMBER function to order duplicates and select only the most recent one. Again, I'm guessing at the sample data, but the common data between the duplicate rows is the MPAN_number column. This is only an example, provide sample data for better, or more-specific-to-your-application, answers.
--Create test table.
CREATE TABLE charges (
meter_id int
, contract_number int
, charge_amt decimal(19,2)
, invoice_number int
, MPAN_number nvarchar(100)
, charge_date date
);
--Insert test data.
INSERT INTO charges (
meter_id, contract_number, charge_amt, invoice_number, MPAN_number
, charge_date)
VALUES
(123, 998, 25.54, 3216549, '123AM234ASF', '1/2/2022')
, (456, 12399, 25.54, 3216668, '123AM234ASF', '1/15/2022')
, (987, 887, 25.54, 3589765, 'K18D01652', '1/5/2022')
, (654, 123488, 25.54, 3548892, 'K18D01652', '1/28/2022')
;
--For debugging, show all test data.
SELECT * FROM charges;
--Use a CTE to add a row_num column.
--This row_num column will sequence "duplicate" charge lines by charge_date with the most recent charge as row_num = 1.
--The common data in the example data is the MPAN_number.
--This is only an example, for more specific help, you need to create
--a mimimal reproducible example just like this.
WITH prelim as (
SELECT *, ROW_NUMBER() OVER(PARTITION BY MPAN_number ORDER BY charge_date DESC) as row_num
FROM charges
)
SELECT *
FROM prelim
WHERE row_num = 1
;
--Here's an example of how to delete all "duplicate" charges that are not the most recent charge.
DELETE c
FROM charges as c
INNER JOIN (
SELECT *, ROW_NUMBER() OVER(PARTITION BY MPAN_number ORDER BY charge_date DESC) as row_num
FROM charges
) as oldDups
ON oldDups.MPAN_number = c.MPAN_number
AND oldDups.meter_id = c.meter_id
AND oldDups.contract_number = c.contract_number
AND oldDups.row_num <> 1
;
--For debugging, show the test data after deletions.
SELECT * FROM charges;
Showing all test data:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
123
998
25.54
3216549
123AM234ASF
2022-01-02
456
12399
25.54
3216668
123AM234ASF
2022-01-15
987
887
25.54
3589765
K18D01652
2022-01-05
654
123488
25.54
3548892
K18D01652
2022-01-28
Showing the most recent charges using SELECT:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
row_num
456
12399
25.54
3216668
123AM234ASF
2022-01-15
1
654
123488
25.54
3548892
K18D01652
2022-01-28
1
Showing the test data remaining after a DELETE operation:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
456
12399
25.54
3216668
123AM234ASF
2022-01-15
654
123488
25.54
3548892
K18D01652
2022-01-28
fiddle

Related

Accounting Calculate Debit credit in SQL(ssms)

I have an accounting calculation problem. I want to write it with SQL Query (in ssms).
I have two groups of documents related to one person (creditor and debtor)
Creditor documents cover debtor documents.
Consider the following example: (How can the result be achieved?)
USE [master]
GO
DROP TABLE IF EXISTS #credit/*creditor=0*/,#debit/*Debtor=1*/
SELECT *
INTO #debit
FROM (values
(88,'2/14',1,5,1),(88,'2/15',2,5,1)
)A (personID,DocDate,DocID,Fee,IsDebit)
SELECT *
INTO #credit
FROM (values
(88,'2/16',3,3,0),(88,'2/17',4,7,0)
)A (personID,DocDate,DocID,Fee,ISDeb)
SELECT * FROM #credit
SELECT * FROM #debit
--result:
;WITH res AS
(
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 3 Cre_DocID ,3 Cre_Fee, 0 remain_Cre_Fee
UNION
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 5 remain_Cre_Fee
UNION
SELECT 88 AS personID ,2 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 0 remain_Cre_Fee
)
SELECT *
FROM res
Sample data
Using an ISO date format to avoid any confusion.
The docdate and isdebit columns will not be used in the solution...
I ignored the docdate under the assumptions that the values are incremental and that it is allow to deposit a credit fee before any debit fee.
The isdebit flag seems redundant if you are going to store debit and credit transactions in separate tables anyway.
Updated sample data:
create table debit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into debit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-14', 1, 5, 1),
(88, '2021-02-15', 2, 5, 1);
create table credit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into credit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-16', 3, 3, 0),
(88, '2021-02-17', 4, 7, 0);
Solution
Couple steps here:
Construct a rolling sum for the debit fees. Done with a first common table expression (cte_debit).
Construct a rolling sum for the credit fees. Done with a second common table expression (cte_credit).
Take all debit info (select * from cte_debit)
Find the first credit info that applies to the current debit info. Done with a first cross apply (cc1). This contains the docid of the first document that applies to the debit document.
Find the last credit info that applies to the current debit info. Done with a second cross apply (cc2). This contains the docid of the last document that applies to the debit document.
Find all credit info that applies to the current debit info by selecting all documents between the first and last applicable document (join cte_credit cc on cc.docid >= cc1.docid and cc.docid <= cc2.docid).
Combine the rolling sum numbers to calculate the remaining credit fees (cc.credit_sum - cd.debit_sum). Use a case expression to filter out negative values.
Full solution:
with cte_debit as
(
select d.personid,
d.docid,
d.fee,
sum(d.fee) over(order by d.docid rows between unbounded preceding and current row) as debit_sum
from debit d
),
cte_credit as
(
select c.personid,
c.docid,
c.fee,
sum(c.fee) over(order by c.docid rows between unbounded preceding and current row) as credit_sum
from credit c
)
select cd.personid,
cd.docid as deb_docid,
cd.fee as deb_fee,
cc.docid as cre_docid,
cc.fee as cre_fee,
case
when cc.credit_sum - cd.debit_sum >= 0
then cc.credit_sum - cd.debit_sum
else 0
end as cre_fee_remaining
from cte_debit cd
cross apply ( select top 1 cc1.docid, cc1.credit_sum
from cte_credit cc1
where cc1.personid = cd.personid
and cc1.credit_sum <= cd.debit_sum
order by cc1.credit_sum desc ) cc1
cross apply ( select top 1 cc2.docid, cc2.credit_sum
from cte_credit cc2
where cc2.personid = cd.personid
and cc2.credit_sum >= cd.debit_sum
order by cc2.credit_sum desc ) cc2
join cte_credit cc
on cc.personid = cd.personid
and cc.docid >= cc1.docid
and cc.docid <= cc2.docid
order by cd.personid,
cd.docid,
cc.docid;
Result
personid deb_docid deb_fee cre_docid cre_fee cre_fee_remaining
-------- --------- ------- --------- ------- -----------------
88 1 5 3 3 0
88 1 5 4 7 5
88 2 5 4 7 0
Fiddle to see things in action. This also contains the intermediate CTE results and some commented helper columns that can be uncommented to help to further understand the solution.

Is there SQL Logic to reduce type 2 table along a dimension

I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.
I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also
Deal with an unknown number of rows of the same price
Deal with products going back to an old price
As an example if i have this raw data:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
9/19/18
120
123456
9/20/18
11/8/18
120
123456
11/9/18
11/29/18
120
123456
11/30/18
12/6/18
120
123456
12/7/18
12/19/18
85
123456
12/20/18
1/1/19
85
123456
1/2/19
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
3/19/19
85
123456
3/20/19
5/22/19
85
123456
5/23/19
10/10/19
85
123456
10/11/19
6/19/19
80
123456
6/20/20
12/31/99
80
I need to transform it into this:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80
You can first find the intervals where the price does not change, and then group on those intervals:
with to_r as (select row_number() over (order by (select 1)) r, t.* from data_table t),
to_group as (select t.*, (select sum(t1.r < t.r and t1.price != t.price) from to_r t1) c from to_r t)
select t.product, min(t.effective), max(t.expiration), max(t.price) from to_group t group by t.c order by t.r;
Output:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80
This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.
The code to reconstruct the data is:
select product, price, min(effective_date), max(expiration_date)
from (select t.*,
sum(case when prev_expiration_date = effective_date - interval '1 day' then 0 else 1 end) over (partition by product order by effective_date) as grp
from (select t.*,
lag(expiration_date) over (partition by product, price order by effective_date) as prev_expiration_date
from t
) t
) t
group by product, price, grp;
Note that the logic for date arithmetic varies depending on the database.
Save this result into a temporary table, temp_t or whatever, using select into, create table as, or whatever your database supports.
Then empty the current table and reload it:
truncate table t;
insert into t
select product, price, effective_date, expiration_date
from temp_t;
Notes:
Validate the data before using truncate_table!
If there are triggers or columns with default values, you might want to be careful.
It sounds like you are asking for a temporal schema? Where for a given date you can know the price of an asset?
This is done with two tables; price_current and price_history.
price_id
item_id
price
rec_created
1
1
100
'2015-04-18'
price_id
item_id
from
to
price
1
1
'2001-01-01'
'2004-05-01'
114
1
1
'2004-05-01'
'2015-04-18'
102
i.e. for any item, you can ascertain the date it was set without polluting your "current" table. For this to work effectively you will need to have UPDATE triggers on your current_table. When you update a record you insert into the history table the details and the period it was valid from.
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated
END
Now you have a distinction between current and historical, without your current table (presumably the busier table) getting out of hand because of maintaining historical state. Hope i understood the question.
To ignore 'dummy' updates, just alter the trigger to ignore empty changes (if that's not handled by the DBMS anyway). Tbh, this should and could be done application side easily enough, but to manage it via the trigger:
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated u
INNER JOIN price_current ON u.price_id = p.price_id
WHERE u.price <> p.price
END
i.e. rows_updated contains the record from the update, we insert into the history table the previous row, providing the previous row's price is different from the current row's price.
(edited to include new trigger. I also changed the date held in rec_created, this must be the date the row is created, not the first instance that product had a price assigned to it. that was a mistake. Regarding the dates, I am lazy to put the full DD-MM-YYYY hh:mm:ss:zzz, but that would generally be useful in between queries)
What you are asking for is a versioning system. Many RDBMS platforms implement support for this out of the box (it's a SQL standard), which may be suitable, depending on your requirements.
You have not tagged a specific platform so it's not possible to be specific to your situation. I use the concept of system versioning regularly in MS Sql Server, where you would implement it thus:
assuming schema "history" exists,
alter table dbo.MyTable add
ValidFrom datetime2 generated always as row start hidden constraint DF_MyTableSysStart default sysutcdatetime(),
ValidTo datetime2 generated always as row end hidden constraint DF_MyTableSysEnd default convert(datetime2, '9999-12-31 23:59:59.9999999'),
period for system_time (ValidFrom, ValidTo);
end
alter table MyTable set (system_versioning = on (history_table = History.MyTable));
create clustered index ix_MyTable on History.MyTable (ValidTo, ValidFrom) with (data_compression = page,drop_existing=on) on History;
A number of syntax extensions exist to aid querying the temporal data for example to find historical data at a point in time.
Alternatively, to utilise a single table but handle the duplication, you could create an instead of trigger.
the idea here is that the trigger gets to intercept the data before it is inserted, where you can check to see of the value is different to the last value and discard or insert as appropriate.
something along the lines of:
WITH keeps AS
(
SELECT p.product_id, p.effective, p.expires, p.price, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.effective = DATEADD(DAY, p.exires, 1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_after, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.expires = DATEADD(DAY, p.effective, -1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_before
FROM prices p
)
SELECT * FROM keeps
WHERE has_after = 1
OR has_before = 1
UNION ALL
SELECT p.product_id, p.effective, p.exires, p.price
FROM prices p
WHERE p.effective = (SELECT MIN(effective) FROM prices p1 WHERE p1.product_id = p.product_id)
What's it doing:
Find all the entries where there exists another entry whose effective date is that of the previous entry's expiry date + 1, and the price of that new entry is different. This gives us all the actual changes in price. But we miss the first price entry, so we simply include that in the results.
e.g.:
product_id
effective
expires
price
has_before
has_after
123456
6/22/18
9/19/18
120
0
0
123456
9/20/18
11/8/18
120
0
0
123456
11/9/18
11/29/18
120
0
0
123456
11/30/18
12/6/18
120
0
1
123456
12/7/18
12/19/18
85
1
0
123456
12/20/18
1/1/19
85
0
0
123456
2/1/19
2/19/19
85
0
1
123456
2/20/19
2/20/19
120
1
1
123456
2/21/19
3/19/19
85
1
0

Query help on sales transaction table

I have a table that contains transaction level sales data. I am trying to satisfy a reporting request as efficiently as possible which I don't think I am succeeding at right now. Here is some test data:
DROP TABLE IF EXISTS TMP_SALES_DATA;
CREATE TABLE TMP_SALES_DATA ([DATE] DATE, [ITEM] INT, [STORE] CHAR(6), [TRANS] INT, [SALES] DECIMAL(8,2));
INSERT INTO TMP_SALES_DATA
VALUES
('9-29-2020',101,'Store1',123,1.00),
('9-29-2020',102,'Store1',123,2.00),
('9-29-2020',103,'Store1',123,3.00),
('9-29-2020',101,'Store1',124,1.00),
('9-29-2020',101,'Store1',125,1.00),
('9-29-2020',103,'Store1',125,3.00),
('9-29-2020',102,'Store1',126,2.00),
('9-29-2020',101,'Store2',88,1.00),
('9-29-2020',102,'Store2',88,2.00),
('9-29-2020',103,'Store2',88,3.00),
('9-29-2020',101,'Store2',89,1.00),
('9-29-2020',101,'Store2',90,1.00),
('9-29-2020',102,'Store2',91,2.00),
('9-29-2020',103,'Store2',91,3.00),
('9-29-2020',101,'Store3',77,1.00);
And I need to represent both individual item sales as well as total transaction sales for every transaction in which the specified items were present. Examples:
-- Item sales
SELECT [ITEM], SUM([SALES]) AS [SALES]
FROM TMP_SALES_DATA
WHERE [ITEM] IN (101,103) AND [STORE] IN ('Store1','Store2' ,'Store3') AND [DATE] = '9-29-2020'
GROUP BY [ITEM]
Returns this:
ITEM SALES
101 7.00
103 12.00
And I can get the total transaction sales in which a single item was present this way:
-- Total transaction sales in which ITEM 101 exists
SELECT SUM(S1.[SALES]) AS [TTL_TRANS_SALES]
FROM TMP_SALES_DATA S1
WHERE EXISTS (SELECT 1 FROM TMP_SALES_DATA S2 WHERE S2.[DATE]=S1.[DATE] AND S2.[STORE]=S1.[STORE] AND S2.[TRANS]=S1.[TRANS] AND S2.[ITEM]=101 AND S2.[STORE] IN ('Store1','Store2','Store3') AND S2.[DATE] = '9-29-2020')
-- Total transaction sales in which ITEM 103 exists
SELECT SUM(S1.[SALES]) AS [TTL_TRANS_SALES]
FROM TMP_SALES_DATA S1
WHERE EXISTS (SELECT 1 FROM TMP_SALES_DATA S2 WHERE S2.[DATE]=S1.[DATE] AND S2.[STORE]=S1.[STORE] AND S2.[TRANS]=S1.[TRANS] AND S2.[ITEM]=103 AND S2.[STORE] IN ('Store1','Store2','Store3') AND S2.[DATE] = '9-29-2020')
But I am failing to find a clean, efficient, and dynamic way to return it all in one query. The end user will be able to specify the items/stores/dates for this report. The end result I would like to see is this:
ITEM SALES TTL_TRANS_SALES
101 7.00 20.00
103 12.00 21.00
If I understand correctly, you can use window functions to summarize by transaction and then aggregate:
select item, sum(sales), sum(trans_sale)
from (select ts.*, sum(sales) over (partition by trans) as trans_sale
from tmp_sales_data ts
) ts
group by item;
Here is a db<>fiddle.
You can add appropriate filtering in the subquery.

Last record per transaction

I am trying to select the last record per sales order.
My query is simple in SQL Server management.
SELECT *
FROM DOCSTATUS
The problem is that this database has tens of thousands of records, as it tracks all SO steps.
ID SO SL Status Reason Attach Name Name Systemdate
22 951581 3 Processed Customer NULL NULL BW 2016-12-05 13:33:27.857
23 951581 3 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
24 947318 1 Hold Customer NULL NULL bw 2016-12-05 13:54:27.173
25 947318 1 Invoices Submit Customer NULL NULL bw 2016-13-05 13:54:27.300
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
I would to see the most recent record per the SO
ID SO SL Status Reason Attach Name Name Systemdate
23 951581 4 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
Well I'm not sure how that table has two Name columns, but one easy way to do this is with ROW_NUMBER():
;WITH cte AS
(
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY SO ORDER BY Systemdate DESC)
FROM dbo.DOCSTATUS
)
SELECT ID, SO, SL, Status, Reason, ..., Systemdate
FROM cte WHERE rn = 1;
Also please always reference the schema, even if today everything is under dbo.
I think you can keep it this simple:
SELECT *
FROM DOCSTATUS
WHERE ID IN (SELECT MAX(ID)
FROM DOCSTATUS
GROUP BY SO)
You want only the maximum ID from each SO.
An efficient method with the right index is a correlated subquery:
select t.*
from t
where t.systemdate = (select max(t2.systemdate) from t t2 where t2.so = t.so);
The index is on (so, systemdate).

Sum all rows where column is different in SQL?

I have a simple table.
The relevant fields are:Return Value, and Return Number
So this table shows me all items that were returned, what return number this return is, and what was the value of all items in this return.
So an example table can look something like this
Line # | Item Number | Quantity Returned | Return Value | Return Number | Cust Order #
1 789 1 $40 123 456
1 780 1 $40 123 456
1 780 1 $20 124 456
I just want it to sum up all return values by different return numbers. So for example, there are two rows with return number 123 and one row with return number 124. So it should take one of the 123 and sum it to 124, giving my $60
I've tried
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Customer_Purchase_Order_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Return_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Return_Number Order by rh.Customer_Purchase_Order_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Customer_Purchase_Order_Number Order by rh.Return_Number) as Total_Returned_Value
None of these seem to work and I feel that I don't have a great grasp on order by and partition by
This is my full code
select rh.Return_Number,
rd.Odet_Line_Number, rd.Item_Number, rd.Color_Code, rd.Quantity_Returned,
(rh.Total_Value-rh.Freight_Charges)as Returned_Value, rh.Remarks,
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY /*rh.Return_Number Order by*/ rh.Customer_Purchase_Order_Number) as Total_Returned_Value
from
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Header rh (nolock)
LEFT JOIN
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Detail rd (nolock) on rd.Return_Number = Rh.Return_number
WHERE rh.Customer_Purchase_Order_Number = #Shopify
You probably got multiple detail rows per header resulting in duplicate header data. If you want to sum by unique return number do the calculation on the header first in a CTE and join the result to the detail, like this
with rh as
( select -- assuming the rh.Return_Number is unique
rh.Return_Number,
(rh.Total_Value-rh.Freight_Charges)as Returned_Value,
rh.Remarks,
SUM((rh.Total_Value-rh.Freight_Charges))
OVER (PARTITION BY rh.Customer_Purchase_Order_Number) as Total_Returned_Value
-- don't know if this is the PARTITION you want, maybe none
from
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Header rh (nolock)
)
select rh.Return_Number,
rd.Odet_Line_Number, rd.Item_Number, rd.Color_Code, rd.Quantity_Returned,
rh.Returned_Value, rh.Remarks,
rh.Total_Returned_Value
from
rh
LEFT JOIN
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Detail rd (nolock) on rd.Return_Number = Rh.Return_number
WHERE rh.Customer_Purchase_Order_Number = #Shopify