SQL Query to remove duplicated data and take single column sum

SQL Query to remove duplicated data and take single column sum - sql

I have the following table resulted from
SELECT m.MedName as [Medicine],m.MedSellPrice as [RetailPrice],m.MedType as [Type],
m.SoldQuantity as [Sold],m.Quantity as [Available],b.BillAmount as [Total Bill],b.BillDate
FROM BillMedicine AS bm LEFT JOIN
Medicine AS m
ON bm.MedicineID=m.id LEFT JOIN
Bill AS b
ON bm.BilIID = b. ID
but now I want to remove the repeated rows except the Sum of 'TotalBill'.

Use GROUP BY:
SELECT
m.MedName AS [Medicine],
m.MedSellPrice AS [RetailPrice],
m.MedType AS [Type],
m.SoldQuantity AS [Sold],
m.Quantity AS [Available],
SUM(b.BillAmount) AS [Total Bill]
FROM BillMedicine AS bm
LEFT JOIN Medicine AS m
ON bm.MedicineID = m.id
LEFT JOIN Bill AS b
ON bm.BilIID = b.ID
GROUP BY
m.MedName,
m.MedSellPrice,
m.MedType,
m.SoldQuantity,
m.Quantity;
Note that for the billing date, the two "duplicate" records you have highlighted have different dates. It is not clear which date, if any, you want to report here. I have omitted this column.

GROUP BY Is Best Option for DUPLICATE DATE Removed & SUM.
Select Column1,column2....., SUM(Total) as Total From Tablename Group BY column1,column2

You seem to want most (or all) columns from m and then the sum from another table. One method is a lateral join or correlated subquery:
SELECT m.*, -- or whatever columns you want,
(SELECT SUM(b.BillAmount)
FROM BillMedicine bm JOIN
Bill b
ON bm.BilIID = b.ID
WHERE bm.MedicineID = m.id
) as [Total Bill]
FROM Medicine m ;
I suggest this approach for several reasons.
This is often more efficient than an outer aggregation.
You have LEFT JOINs but they do not look correct. I suspect you want to start with the Medicine table.
You are including a date/time in the results, but clearly that is not appropriate when combining multiple rows.

Related

sql multiple left joins with sum

I have 3 tables as below. What I need to do is create a sumamry after left joining the 1st table to the 2nd and the 2nd to the 3rd.
The code I'm using ends up resulting in a cartesian join. My query to create the 1st table (person) is complicated and resource intensive while the volume of data is table 2(shopping list) is massive so having a nested query is not ideal. Below is the code I'm using right now and the expected output (image 1) & what I get (image 2)
select
a.ID,
a.Name,
sum(b.cost) total_cost,
sum(c.discount_amount) total_discount
from
person a,
left join shopping_list b on a.id=b.id
left join discount c on b.item = c.item
group by
a.ID,
a.Name
I've looked at the below links but I was hoping there's a solution that may work better give the size of my dataset
https://dba.stackexchange.com/questions/217220/how-i-use-multiple-sum-with-multiple-left-joins
Multiple Left Join with sum
Thanks in advance for your help

You have multiple rows for the discounts, so presummarize those:
select p.id, p.name, coalesce(sl.cost, 0) as cost,
coalesce(d.discount_amount, 0) as discount_amount
from person p left join
shopping_list sl
on sl.id = p.id left join
(select d.item, sum(discount_amount) as discount_amount
from discount
group by d.item
) d
on sl.item = d.item
group by p.id, p.name;
The problem with your query is that the multiple rows of discount end up multiplying the rows of shopping_list -- resulting in the inaccurate totals.
Notice that in this query, the table aliases are abbreviations for the table names. This is a best practice that makes it much, much easier to follow the logic of a query.

LEFT JOIN help in sql

I have to make a list of customer who do not have any invoice but have paid an invoice … maybe twice.
But with my code (stated below) it contains everything from the left join. However I only need the lines highlighted with green.
How should I make a table with only the 2 highlights?
Select paymentsfrombank.invoicenumber,paymentsfrombank.customer,paymentsfrombank.value
FROM paymentsfrombank
LEFT OUTER JOIN debtors
ON debtors.value = paymentsfrombank.value

You only want to select columns from paymentsfrombank. So why do you even join?
select invoice_number, customer, value from paymentsfrombank
except
select invoice_number, customer, value from debtors;
(This requires exact matches as in your example, i.e. same amount for the invoice/customer).

There are two issues in your SQL. First, you need to join on Invoice number, not on value, as joining on value is pointless. Second, you need to only pick those payments where there are no corresponding debts, i.e. when you left-join, the table on the right has "null" in the joining column. The SQL would be something like this:
SELECT paymentsfrombank.invoicenumber,paymentsfrombank.customer,paymentsfrombank.value
FROM paymentsfrombank
LEFT OUTER JOIN debtors
ON debtors.InvoiceNumber = paymentsfrombank.InvoiceNumber
WHERE debtors.InvoiceNumber is NULL

in mysql we usually have this way to flip the relation and extract the rows that dosen't have relation.
Select paymentsfrombank.invoicenumber,paymentsfrombank.customer,paymentsfrombank.value
FROM paymentsfrombank
LEFT OUTER JOIN debtors
ON debtors.value = paymentsfrombank.value where debtors.value is null

You can use NOT EXISTS :
SELECT p.*
FROM paymentsfrombank p
WHERE NOT EXISTS (SELECT 1 FROM debtors d WHERE d.invoice_number = p.invoice_number);
However, the LEFT OUTER JOIN would also work if you add filtered with WHERE Clause to filtered out only missing customers that haven't any invoice information :
SELECT p.invoicenumber, p.customer, p.value
FROM paymentsfrombank P LEFT OUTER JOIN
debtors d
ON d.InvoiceNumber = p.InvoiceNumber
WHERE d.InvoiceNumber IS NULL;
Note : I have used table alias (p & d) that makes query to easier read & write.

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks

Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.

You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]

Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.

If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).

I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10

Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Sum Distinct Rows Only In Sql Server

I have four tables,in which First has one to many relation with rest of three tables named as (Second,Third,Fourth) respectively.I want to sum only Distinct Rows returned by select query.Here is my query, which i try so far.
select count(distinct First.Order_id) as [No.Of Orders],sum( First.Amount) as [Amount] from First
inner join Second on First.Order_id=Second.Order_id
inner join Third on Third.Order_id=Second.Order_id
inner join Fourth on Fourth.Order_id=Third.Order_id
The outcome of this query is :
No.Of Orders Amount
7 69
But this Amount should be 49,because the sum of First column Amount is 49,but due to inner join and one to many relationship,it calculate sum of also duplicate rows.How to avoid this.Kindly guide me

I think the problem is cartesian products in the joins (for a given id). You can solve this using row_number():
select count(t1234.Order_id) as [No.Of Orders], sum(t1234.Amount) as [Amount]
from (select First.*,
row_number() over (partition by First.Order_id order by First.Order_id) as seqnum
from First inner join
Second
on First.Order_id=Second.Order_id inner join
Third
on Third.Order_id=Second.Order_id inner join
Fourth
on Fourth.Order_id=Third.Order_id
) t1234
where seqnum = 1;
By the way, you could also express this using conditions in the where clause, because you appear to be using the joins only for filtering:
select count(First.Order_id) as [No.Of Orders], sum(First.Amount) as [Amount]
from First
where exists (select 1 from second where First.Order_id=Second.Order_id) and
exists (select 1 from third where First.Order_id=third.Order_id) and
exists (select 1 from fourth where First.Order_id=fourth.Order_id);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas