I have a table that contains transaction level sales data. I am trying to satisfy a reporting request as efficiently as possible which I don't think I am succeeding at right now. Here is some test data:
DROP TABLE IF EXISTS TMP_SALES_DATA;
CREATE TABLE TMP_SALES_DATA ([DATE] DATE, [ITEM] INT, [STORE] CHAR(6), [TRANS] INT, [SALES] DECIMAL(8,2));
INSERT INTO TMP_SALES_DATA
VALUES
('9-29-2020',101,'Store1',123,1.00),
('9-29-2020',102,'Store1',123,2.00),
('9-29-2020',103,'Store1',123,3.00),
('9-29-2020',101,'Store1',124,1.00),
('9-29-2020',101,'Store1',125,1.00),
('9-29-2020',103,'Store1',125,3.00),
('9-29-2020',102,'Store1',126,2.00),
('9-29-2020',101,'Store2',88,1.00),
('9-29-2020',102,'Store2',88,2.00),
('9-29-2020',103,'Store2',88,3.00),
('9-29-2020',101,'Store2',89,1.00),
('9-29-2020',101,'Store2',90,1.00),
('9-29-2020',102,'Store2',91,2.00),
('9-29-2020',103,'Store2',91,3.00),
('9-29-2020',101,'Store3',77,1.00);
And I need to represent both individual item sales as well as total transaction sales for every transaction in which the specified items were present. Examples:
-- Item sales
SELECT [ITEM], SUM([SALES]) AS [SALES]
FROM TMP_SALES_DATA
WHERE [ITEM] IN (101,103) AND [STORE] IN ('Store1','Store2' ,'Store3') AND [DATE] = '9-29-2020'
GROUP BY [ITEM]
Returns this:
ITEM SALES
101 7.00
103 12.00
And I can get the total transaction sales in which a single item was present this way:
-- Total transaction sales in which ITEM 101 exists
SELECT SUM(S1.[SALES]) AS [TTL_TRANS_SALES]
FROM TMP_SALES_DATA S1
WHERE EXISTS (SELECT 1 FROM TMP_SALES_DATA S2 WHERE S2.[DATE]=S1.[DATE] AND S2.[STORE]=S1.[STORE] AND S2.[TRANS]=S1.[TRANS] AND S2.[ITEM]=101 AND S2.[STORE] IN ('Store1','Store2','Store3') AND S2.[DATE] = '9-29-2020')
-- Total transaction sales in which ITEM 103 exists
SELECT SUM(S1.[SALES]) AS [TTL_TRANS_SALES]
FROM TMP_SALES_DATA S1
WHERE EXISTS (SELECT 1 FROM TMP_SALES_DATA S2 WHERE S2.[DATE]=S1.[DATE] AND S2.[STORE]=S1.[STORE] AND S2.[TRANS]=S1.[TRANS] AND S2.[ITEM]=103 AND S2.[STORE] IN ('Store1','Store2','Store3') AND S2.[DATE] = '9-29-2020')
But I am failing to find a clean, efficient, and dynamic way to return it all in one query. The end user will be able to specify the items/stores/dates for this report. The end result I would like to see is this:
ITEM SALES TTL_TRANS_SALES
101 7.00 20.00
103 12.00 21.00
If I understand correctly, you can use window functions to summarize by transaction and then aggregate:
select item, sum(sales), sum(trans_sale)
from (select ts.*, sum(sales) over (partition by trans) as trans_sale
from tmp_sales_data ts
) ts
group by item;
Here is a db<>fiddle.
You can add appropriate filtering in the subquery.
Related
My data is from electric suppliers. Some charges are fixed some are usage charge. I live a problem for fixed charges. When the electricity meter change during the billing period, two fixed charge lines are created with different meterID and contrat number. Other all fields are same and I want to get one of these. Because it is monthly fixed charge.
If you help me I will be happy,
Thank you very much,
https://www.designcise.com/web/tutorial/how-to-remove-all-duplicate-rows-except-one-in-sql#:~:text=How%20to%20Remove%20All%20Duplicate%20Rows%20Except%20One,Duplicates%20and%20Keep%20Row%20With%20Highest%20ID%20 I created a view without these two fields and get unique fields and created another view and added that two fields, gave small values than real values for compare.
My values for second view (A10000000' AS MeterUniqueNo, '100000' as MeterContractID)
Original examples K18D01652, 646802)
delete from main_table
Inner join view2 on view2.MeterUniqueNo < main_table.MeterUniqueNo
and view2.EnergyChargesRecord_InvoiceNumber = main_table.EnergyChargesRecord_InvoiceNumber
and view2.EnergyChargesRecord_MPANNumber = main_table.EnergyChargesRecord.MPANNumber
It is not working, because values are different.
T-SQL: Deleting all duplicate rows but keeping one
I can not use this method. Because I have to check MPAN number and Invoice number. Not just one value...
Since you didn't provide any sample data, I'm guessing at the actual format of your data. I created a minimal example of how to use a ROW_NUMBER function to order duplicates and select only the most recent one. Again, I'm guessing at the sample data, but the common data between the duplicate rows is the MPAN_number column. This is only an example, provide sample data for better, or more-specific-to-your-application, answers.
--Create test table.
CREATE TABLE charges (
meter_id int
, contract_number int
, charge_amt decimal(19,2)
, invoice_number int
, MPAN_number nvarchar(100)
, charge_date date
);
--Insert test data.
INSERT INTO charges (
meter_id, contract_number, charge_amt, invoice_number, MPAN_number
, charge_date)
VALUES
(123, 998, 25.54, 3216549, '123AM234ASF', '1/2/2022')
, (456, 12399, 25.54, 3216668, '123AM234ASF', '1/15/2022')
, (987, 887, 25.54, 3589765, 'K18D01652', '1/5/2022')
, (654, 123488, 25.54, 3548892, 'K18D01652', '1/28/2022')
;
--For debugging, show all test data.
SELECT * FROM charges;
--Use a CTE to add a row_num column.
--This row_num column will sequence "duplicate" charge lines by charge_date with the most recent charge as row_num = 1.
--The common data in the example data is the MPAN_number.
--This is only an example, for more specific help, you need to create
--a mimimal reproducible example just like this.
WITH prelim as (
SELECT *, ROW_NUMBER() OVER(PARTITION BY MPAN_number ORDER BY charge_date DESC) as row_num
FROM charges
)
SELECT *
FROM prelim
WHERE row_num = 1
;
--Here's an example of how to delete all "duplicate" charges that are not the most recent charge.
DELETE c
FROM charges as c
INNER JOIN (
SELECT *, ROW_NUMBER() OVER(PARTITION BY MPAN_number ORDER BY charge_date DESC) as row_num
FROM charges
) as oldDups
ON oldDups.MPAN_number = c.MPAN_number
AND oldDups.meter_id = c.meter_id
AND oldDups.contract_number = c.contract_number
AND oldDups.row_num <> 1
;
--For debugging, show the test data after deletions.
SELECT * FROM charges;
Showing all test data:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
123
998
25.54
3216549
123AM234ASF
2022-01-02
456
12399
25.54
3216668
123AM234ASF
2022-01-15
987
887
25.54
3589765
K18D01652
2022-01-05
654
123488
25.54
3548892
K18D01652
2022-01-28
Showing the most recent charges using SELECT:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
row_num
456
12399
25.54
3216668
123AM234ASF
2022-01-15
1
654
123488
25.54
3548892
K18D01652
2022-01-28
1
Showing the test data remaining after a DELETE operation:
meter_id
contract_number
charge_amt
invoice_number
MPAN_number
charge_date
456
12399
25.54
3216668
123AM234ASF
2022-01-15
654
123488
25.54
3548892
K18D01652
2022-01-28
fiddle
I have an accounting calculation problem. I want to write it with SQL Query (in ssms).
I have two groups of documents related to one person (creditor and debtor)
Creditor documents cover debtor documents.
Consider the following example: (How can the result be achieved?)
USE [master]
GO
DROP TABLE IF EXISTS #credit/*creditor=0*/,#debit/*Debtor=1*/
SELECT *
INTO #debit
FROM (values
(88,'2/14',1,5,1),(88,'2/15',2,5,1)
)A (personID,DocDate,DocID,Fee,IsDebit)
SELECT *
INTO #credit
FROM (values
(88,'2/16',3,3,0),(88,'2/17',4,7,0)
)A (personID,DocDate,DocID,Fee,ISDeb)
SELECT * FROM #credit
SELECT * FROM #debit
--result:
;WITH res AS
(
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 3 Cre_DocID ,3 Cre_Fee, 0 remain_Cre_Fee
UNION
SELECT 88 AS personID ,1 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 5 remain_Cre_Fee
UNION
SELECT 88 AS personID ,2 deb_DocID ,5 deb_Fee , 4 Cre_DocID ,7 Cre_Fee, 0 remain_Cre_Fee
)
SELECT *
FROM res
Sample data
Using an ISO date format to avoid any confusion.
The docdate and isdebit columns will not be used in the solution...
I ignored the docdate under the assumptions that the values are incremental and that it is allow to deposit a credit fee before any debit fee.
The isdebit flag seems redundant if you are going to store debit and credit transactions in separate tables anyway.
Updated sample data:
create table debit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into debit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-14', 1, 5, 1),
(88, '2021-02-15', 2, 5, 1);
create table credit
(
personid int,
docdate date,
docid int,
fee int,
isdebit bit
);
insert into credit (personid, docdate, docid, fee, isdebit) values
(88, '2021-02-16', 3, 3, 0),
(88, '2021-02-17', 4, 7, 0);
Solution
Couple steps here:
Construct a rolling sum for the debit fees. Done with a first common table expression (cte_debit).
Construct a rolling sum for the credit fees. Done with a second common table expression (cte_credit).
Take all debit info (select * from cte_debit)
Find the first credit info that applies to the current debit info. Done with a first cross apply (cc1). This contains the docid of the first document that applies to the debit document.
Find the last credit info that applies to the current debit info. Done with a second cross apply (cc2). This contains the docid of the last document that applies to the debit document.
Find all credit info that applies to the current debit info by selecting all documents between the first and last applicable document (join cte_credit cc on cc.docid >= cc1.docid and cc.docid <= cc2.docid).
Combine the rolling sum numbers to calculate the remaining credit fees (cc.credit_sum - cd.debit_sum). Use a case expression to filter out negative values.
Full solution:
with cte_debit as
(
select d.personid,
d.docid,
d.fee,
sum(d.fee) over(order by d.docid rows between unbounded preceding and current row) as debit_sum
from debit d
),
cte_credit as
(
select c.personid,
c.docid,
c.fee,
sum(c.fee) over(order by c.docid rows between unbounded preceding and current row) as credit_sum
from credit c
)
select cd.personid,
cd.docid as deb_docid,
cd.fee as deb_fee,
cc.docid as cre_docid,
cc.fee as cre_fee,
case
when cc.credit_sum - cd.debit_sum >= 0
then cc.credit_sum - cd.debit_sum
else 0
end as cre_fee_remaining
from cte_debit cd
cross apply ( select top 1 cc1.docid, cc1.credit_sum
from cte_credit cc1
where cc1.personid = cd.personid
and cc1.credit_sum <= cd.debit_sum
order by cc1.credit_sum desc ) cc1
cross apply ( select top 1 cc2.docid, cc2.credit_sum
from cte_credit cc2
where cc2.personid = cd.personid
and cc2.credit_sum >= cd.debit_sum
order by cc2.credit_sum desc ) cc2
join cte_credit cc
on cc.personid = cd.personid
and cc.docid >= cc1.docid
and cc.docid <= cc2.docid
order by cd.personid,
cd.docid,
cc.docid;
Result
personid deb_docid deb_fee cre_docid cre_fee cre_fee_remaining
-------- --------- ------- --------- ------- -----------------
88 1 5 3 3 0
88 1 5 4 7 5
88 2 5 4 7 0
Fiddle to see things in action. This also contains the intermediate CTE results and some commented helper columns that can be uncommented to help to further understand the solution.
I'm developing a new analyst feature for an internal tool my company will (hopefully if I do well) use.
For simplicity sake, let's say we have
CREATE TABLE Products (
ProductID varchar,
Description varchar,
....
);
and
CREATE TABLE Orders (
ProductID varchar,
Bought date,
Returned date,
....
);
The tables would look something like this:
Products
ProductID
Description
SPO00
Sports product 1
SPO01
Sports product 2
SPO02
Sports product 3
ELE00
Electronics product 1
ELE02
Electronics product 2
Orders
ProductID
Bought
Returned
ELE00
2021-01-05
2021-01-07
SPO00
2021-01-01
NULL
SPO00
2021-01-05
2021-01-08
SPO00
2021-01-08
NULL
SPO01
2021-01-10
NULL
SPO01
2021-01-15
NULL
SPO02
2021-01-18
2021-01-20
I'd like to make a request to our DB and retrieve the description of specific products, and the percentage of bought products that are eventually returned.
I'd would also like to add specific parameters to the query, for example select only orders from beginning of the year as well as only the products from a specific department, for example.
So, it would looks something like this:
Description
ratio returned
Sports product 1
0.33
Sports product 2
0.00
Sports product 3
1.0
So, the products table might have product lines of electronics and sports and ProductID would be ELE00-ELE05 and SPO00-SPO03, respectively.
The above table is grabbing all products that have ProductID with SPO prefix and getting that specific products bought and returned ratio.
I've only been able to get the specific products, but the returned ratio is the same for each row. I think because its not doing the ratio calculation for each distinct product. I think its doing one overall ratio calculation and displaying that for each product.
Here is the query I've tried.
SELECT DISTINCT Product.Description, (CAST((SELECT DISTINCT COUNT(*) FROM Orders WHERE(ProductID like 'SPO%' AND Returned > '2021-01-01') AS FLOAT)) / (CAST((SELECT DISTINCT COUNT(*) FROM Orders WHERE (ProductID like 'SPO%' AND Bought > '2021-01-01') AS FLOAT)) AS returnedRatio
FROM Product INNER JOIN
Orders ON Orders.ProductID = Product.ProductID
I'm thinking I might need to do a nested query to get the ratios for each product and then get the description?
All help would be greatly appreciated because I've never done very complex queries so I'm still learning.
Does this work for you?
I used a case expression inside the count() function to count the number of returned products.
The * 1.0 turns the integer division into a decimal division without explicitly casting.
Sample data
CREATE TABLE Products (
ProductID nvarchar(5),
Description nvarchar(50)
);
insert into Products (ProductId, Description) values
('SPO00', 'Sports product 1'),
('SPO01', 'Sports product 2'),
('SPO02', 'Sports product 3'),
('ELE00', 'Electronics product 1'),
('ELE02', 'Electronics product 2');
CREATE TABLE Orders (
ProductID nvarchar(5),
Bought date,
Returned date
);
insert into Orders (ProductID, Bought, Returned) values
('ELE00', '2021-01-05', '2021-01-07'),
('SPO00', '2021-01-01', NULL),
('SPO00', '2021-01-05', '2021-01-08'),
('SPO00', '2021-01-08', NULL),
('SPO01', '2021-01-10', NULL),
('SPO01', '2021-01-15', NULL),
('SPO02', '2021-01-18', '2021-01-20');
Solution
select p.Description,
count(case when o.Returned is not null then 1 end) as ReturnCount,
count(1) TotalCount,
count(case when o.Returned is not null then 1 end) * 1.0 / count(1) as ReturnRatio
from Products p
join Orders o
on o.ProductID = p.ProductID
where p.ProductID like 'SPO%'
and o.Bought >= '2021-01-01'
group by p.Description;
Result
Description ReturnCount TotalCount ReturnRatio
---------------- ----------- ---------- --------------
Sports product 1 1 3 0.333333333333
Sports product 2 0 2 0
Sports product 3 1 1 1
Fiddle to see things in action.
product saletype qty
-----------------------------
product1 regular 10
product1 sale 1
product1 feature 2
I have a sales table as seen above, and products can be sold 1 of 3 different ways (regular price, sale price, or feature price).
All sales regardless of type accumulate into regular, but sale and feature also accumulate into their own "saletype" also. So in the above example, I've sold 10 products total (7 regular, 1 sale, 2 feature).
I want to return the regular quantity minus the other two columns as efficiently as possible, and also the other saletypes too. Here is how I am currently doing it:
create table query_test
(product varchar(20), saletype varchar(20), qty int);
insert into query_test values
('product1','regular',10),
('product1','sale',1),
('product1','feature',2)
select
qt.product,
qt.saletype,
CASE WHEN qt.saletype = 'regular' THEN sum(qt.qty)-sum(lj.qty) ELSE sum(qt.qty) END as [qty]
from
query_test qt
left join
(
select product, sum(qty) as [qty]
from query_test
where saletype in ('sale','feature')
group by product
) lj on lj.product=qt.product
group by
qt.product, qt.saletype;
...which yields what I am after:
product saletype qty
-----------------------------
product1 feature 2
product1 regular 7
product1 sale 1
But I feel like there has to be a better way than essentially querying the same information twice.
You can use the window function sum and some arithmetic to do this.
select product,
saletype,
case when saletype='regular' then 2*qty-sum(qty) over(partition by product)
else qty end as qty
from query_test
This assumes there is atmost one row for saletype 'regular'.
Apologies for the confusing question title, but I'm not exactly sure how to describe the issue at hand.
I have two tables in Oracle 9i:
Pricing
-------
SKU
ApplicableTime
CostPerUnit
Inventory
---------
SKU
LastUpdatedTime
NumberOfUnits
Pricing contains incremental updates to the costs of each particular SKU item, at a specific Unix time. For example, if I have records:
SKU ApplicableTime CostPerUnit
------------------------------------
12345 1000 1.00
12345 1500 1.50
, then item 12345 is $1.00 per unit for any time between 1000 and 1500, and $1.50 for any time after 1500.
Inventory contains SKU, last updated time, and number of units.
What I'm trying to do is construct a query such that for each row in Inventory, I join the two tables based on SKU, I find the largest value for Pricing.ApplicableTime that is NOT greater than Inventory.LastUpdatedTime, get the CostPerUnit of that particular record from Pricing, and calculate TotalCost = CostPerUnit * NumberOfUnits:
SKU TotalCost
-----------------------------------------------------------------------------------
12345 (CostPerUnit at most recent ApplicableTime <= LastUpdatedTime)*NumberOfUnits
12346 <same>
... ...
How would I do this?
SELECT *
FROM
(select p.SKU,
p.ApplicableTime,
p.CostPerUnit*i.NumberOfUnits as cost,
row_number over (partition by p.SKU order by p.ApplicableTime desc) as rnk
from Pricing p
join
Inventory i on (p.sku = i.sku and i.LastUpdatedTime > p.ApplicableTime)
)
where rnk=1
select SKU, i1.NumberOfUnits * p1.CostPerUnit as TotalCost
from Inventory i1,
join (
select SKU, max(ApplicableTime) as ApplicableTime, max(i.LastUpdatedTime) as LastUpdatedTime
from Pricing p
join Inventory i on p.sku = i.sku
where p.ApplicableTime < i.LastUpdatedTime
group by SKU
) t on i1.sku = t.sku and i1.LastUpdatedTime = t.LastUpdatedTime
join Pricing p1 on p1.sku = t.sku and p1.ApplicableTime = t.ApplicableTime