I have a situation where I am bridging a 3rd party table between 2 other tables:
client (one-to-many) -> containers
containers -> containers_invoices (many-to-many) -> invoices
What I want is to get the SUM of the paid invoices for each client. The client is related to the containers, thus I have to connect the containers to the invoices and the clients to the containers to make the bridge. I used the following query to do so:
$sql = "
SELECT
SUM(invoices.invoice_eur) AS invoice_eur,
SUM(invoices.invoice_usd) AS invoice_usd,
invoices.status_id
FROM containers_invoices
LEFT JOIN invoices
ON containers_invoices.invoice_id = invoices.invoice_id
LEFT JOIN containers
ON containers_invoices.container_id = containers.container_id
WHERE containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
GROUP BY containers.client_id
";
$x = $this->fetch_query($sql);
if (isset($x[0]->invoice_eur)) $eur = $x[0]->invoice_eur . ' EUR';
if (isset($x[0]->invoice_usd)) $usd = $x[0]->invoice_usd . ' USD';
if (isset($x[0]->invoice_eur) && isset($x[0]->invoice_usd)) $spacer = ' | ';
return $eur . $spacer . $usd;
here is an example of how the invoices should look like:
invoice 1 -> cont A, cont B -> 100 USD
invoice 2 -> cont A, cont B -> 7000 USD
invoice 3 -> cont A, cont B -> 75 USD
invoice 4 -> cont A, cont B -> 7000 USD
invoice 5 -> cont C -> 1000 USD
invoice 6 -> cont D -> 1000 USD
The issue is that when one invoice is made for 2 or more containers, the sum is calculated for each individually. In the case of invoice 2 the query sees it as 14000 USD because there are 2 containers. The solution is to introduce DISTINCT before invoices.invoice_usd and that solves the doubling problem, but then this approach is too aggressive because then DISTINCT looks at invoice 2 and invoice 4 (7000 USD) and sees them as double as well, and thus it skips invoice 4. The same happens for invoice 5 and 6 (1000 USD).
Is there a possible solution to this? Thanks in advance!
Let's create the tables that are in the question here. For the sake of brevity, we'll use these abbreviations:
clid -> client_id
cid -> container_id
inv -> invoice_id
eur -> invoice_eur
usd -> invoice_usd
Let's create sample rows now:
containers
clid1, cid1
clid1, cid2
clid2, cid3
container_invoices
cid1, inv1
cid1, inv2
cid1, inv3
cid2, inv1
cid2, inv2
cid3, inv3
invoices
inv1, eur1, usd1
inv2, eur2, usd2
inv3, eur3, usd3
If we're to take the JOIN of these three tables on appropriate columns, a cross table that would be generated would look like this:
containers_container_invoices_invoices
clid1, cid1, inv1, eur1, usd1
clid1, cid1, inv2, eur2, usd2
clid1, cid1, inv3, eur3, usd3
clid1, cid2, inv1, eur1, usd1
clid1, cid2, inv2, eur2, usd2
clid2, cid3, inv3, eur3, usd3
As you can see in the table above, row 1 and row 4 are identical and row 2 and row 5 are identical as well. If we take the sum grouping only by clid, this would obviously double calculate the inv1 and inv2 entries for clid1. In order to only pick the legitimate values we can use DISTINCT. Here we would want this tuple (clid, inv) to be distinct (non-identical). Since eur and usd are columns of invoices. It is safe to assume we want (clid, inv, eur, usd) to be distinct (non-identical). So, the query would look like this:
select sum(invoice_eur) as invoice_eur, sum(invoice_usd) as invoice_usd
from (
select distinct containers.client_id, invoices.invoice_id, invoices.invoice_eur, invoices.invoice_usd
from container_invoices
left join invoices
on container_invoices.invoice_id = invoices.invoice_id
left join containers
on container_invoices.container_id = containers.container_id
where containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
) client_invoices;
Notice that you don't need to do a GROUP BY here since you already have a WHERE clause on client_id. See this DB Fiddle for more clarity.
This can be achieved with DISTINCT.
How DISTINC works:
DISTINCT looks for distinct values in the specified column.
In the result rows which do not have a distinct value in the specified column do not show up.If multiple columns are specified in the select-statement the value-combination of the specified columns have to be distinct in order for the row to be counted as distinct.
To your Example:
You select for invoice_eur, invoice_usd and status_id.
If you were to just add DISTINCT to your select statement all rows with distinct combinations of theese three columns will show up.
E.G. If you had one row with
invoice_eur: 100
invoice_usd: 100
status_id : paid
and a second row with
invoice_eur: 100
invoice_usd: 100
status_id : pending
both rows with show up. Even though invoice_eur and invoice_usd is identical for both rows, the combination of all three values is distinct.
Solution for your problem:
For your Invoice 5 and your Invoice 6 the values for all specified columns is identical. Therefore they are not seen as DISTINCT.
You can fix this by specifying another column that makes Invoice 5 and 6 discint. I would recommend invoice_id.
This makes sure that invoices that come up more than once (because they are in multiple containers) are still filtered out because the combination (invoice_eur, invoice_usd, status_id, invoice_id) is not distinct for these cases.
However you should not put DISTINCT infront of your SUM.
That way the SUM would be calculated first and DISTINCT would be applied on the result on SUM. SUM only gives one result(row) so this is pointless.
One way around this is a nested SQL-statement.
I recreated your database and the Query that works for me is this one:
SELECT
SUM(invoice_eur) AS invoice_eur
FROM (
SELECT DISTINCT
invoices.invoice_id,
containers.client_id,
invoices.status_id,
invoices.invoice_eur AS invoice_eur
FROM dbo.containers_invoices
LEFT JOIN dbo.invoices
ON containers_invoices.invoice_id = invoices.invoice_id
LEFT JOIN dbo.containers
ON containers_invoices.container_id = containers.container_id) invoices_paid
GROUP BY client_id;
You would only have to add rest of your select arguments and the WHERE-statement.
Related
I have clients which are placing deposits. Some of them are placing deposits over than 9000 USD and I wanted to check what deposits they are doing after the date when they placed 9000 USD deposit. Unfortunately, with my join it is showing duplicates in column B when the condition based on column D is true. I would like to see entry in column B only once, for the closest date in column D. I created join like that, but is not working as expected:
SELECT a."ACCOUNT_ID", a."PROCESSED_DATE", a."AMOUNT_USD", b."PROCESSED_DATE" as date_transfer_over_9000
from deposits a
inner join (SELECT "ACCOUNT_ID", "PROCESSED_DATE"
FROM deposits
where "AMOUNT_USD" >= 9000) b ON
a."ACCOUNT_ID" = b."ACCOUNT_ID"
and a."PROCESSED_DATE" > b."PROCESSED_DATE"
It is duplicating entries in column B when the condition based on column D is true:
I would like to have a result like that:
Is it possible with Exists function or other in Redshift?
This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.
I was wondering if it is possible to get 1 sql statement for my stocklevels of my different articles instead of doing that for all parts individually. This, to reduce the amount of communication with the server and to be more efficient.
The starting point is the next statement:
SELECT SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT=FK_LOT
WHERE FK_LOT = 123456789
This gives of article 123456789 the difference between the 2 tables (StockIN and StockOUT). This is the stock level.
SELECT SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT=FK_LOT
WHERE FK_LOT IN (1234567,4567,654321,2345)
This one gives the difference between the tables (stockIN and StockOUT) of a couple of articles combined. The result will be 1 number.
What I am looking for is the amount fo stock for each article in 1 SQL:
1234567 = A
4567 = B
654321 = C
2345 = D
Is that possible or do I have to execute the first SQL a lot of times for all the different articles?
EDIT: ( I do not know if I have to do it like this on this forum or if I may use the reply button.... I know, on tis forum, the moderation is strickt..)
I have added GROUP BY and that works. But....
Other Strange things happens:
I understand that the below SQL is not logical but it is a reduction of my initial SQL.. IT just gives a strange result and therefore my big sql goes wrong....
Even when reducing the SQL to:
SELECT
SUM(R_STOCKIN.QUANTITY)
From R_STOCKIN INNER Join R_STOCKOUT ON R_STOCKIN.FK_LOT=R_STOCKOUT.FK_LOT
WHERE R_STOCKIN.FK_LOT =1350
Gives a different result as:
SELECT sum(QUANTITY)
FROM [Speeltuin].[dbo].[R_STOCKIN] WHERE FK_LOT = 1350
It is a bigger number but he does not add the QUANTITY of the STOCK out table... I can not find out what he is doing.
Sum of stock in: 144
Sum of stock out: 122
Result of combined query: 864..
Anybody an idea?
It probably has to do with the fact that in STOCKOUT also a key FK_STOCKIN exists.
Stockout has 6 result and stockin has 2 results.. HE combines it to 12 results.
But, how to overcome this? Anybody an idea?
Does it need to be done without the JOIN statement? If yes, how?
Simply GROUP BY:
SELECT FK_LOT, SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT.FK_LOT
WHERE FK_LOT IN (1234567, 4567, 654321, 2345)
GROUP BY FK_LOT
Edit: Do a UNION ALL instead, use negative QUANTITY values for STOCKOUT. GROUP BY the result:
select FK_LOT, SUM(QUANTITY)
from
(
select FK_LOT, QUANTITY from STOCKIN
UNION ALL
select FK_LOT, -QUANTITY from STOCKOUT
) dt
group by FK_LOT
I have two tables that I want to join together:
contracts:
id
customer_id_1
customer_id_2
customer_id_3
date
1
MAIN1
TRAN1
TRAN2
20201101
2
MAIN2
20201001
3
MAIN3
TRAN5
20200901
4
MAIN4
TRAN7
TRAN8
20200801
customers:
id
customer_id
info
date
1
MAIN1
blah
20200930
2
TRAN2
blah
20200929
3
TRAN5
blah
20200831
4
TRAN7
blah
20200801
In my contracts table, each row represents a contract with a customer, who may have 1 or more different IDs they are referred to by in the customers table. In the customers table, I have info on customers (can be zero or multiple records on different dates for each customer). I want to perform a join from contracts onto customers such that I get the most recent info available on a customer at the time a contract is recorded, ignoring any potential customer info that may be available after the contract date. I am also not interested in contracts which have no info on the customers. The main problem here is that in customers, each customer record can reference any 1 of the 3 IDs that may exist.
I currently have the following query which performs the task as intended but the problem is that is extremely slow when run on data in the 50-100k rows range. If I remove the OR statements in the INNER JOIN and just join on the the first ID, the query performs in seconds as opposed to ~ half an hour.
SELECT
DISTINCT ON (ctr.id)
ctr.id,
ctr.customer_id_1,
ctr.date AS contract_date,
cst.info,
cst.date AS info_date
FROM
contracts ctr
INNER JOIN customers cst ON (
cst.customer_id = ctr.customer_id_1
OR cst.customer_id = ctr.customer_id_2
OR cst.customer_id = ctr.customer_id_3
)
AND ctr.date >= cst.date
ORDER BY
ctr.id,
cst.date DESC
Result:
id
customer_id_1
contract_date
info
info_date
1
MAIN1
20201101
blah
20200930
3
MAIN3
20200901
blah
20200831
4
MAIN4
20200801
blah
20200801
It seems like OR statements in JOINs aren't very common (I've barely found any examples online) and I presume this is because there must be a better way of doing this. So my question is, how can this be optimised?
OR often is a performance killer in SQL predicates.
One alternative unpivots before joining:
select distinct on (ctr.id)
ctr.id,
ctr.customer_id_1,
ctr.date as contract_date,
cst.info,
cst.date as info_date
from contracts ctr
cross join lateral (values
(ctr.customer_id_1), (ctr.customer_id_2), (ctr.customer_id_3)
) as ctx(customer_id)
inner join customers cst on cst.customer_id = ctx.customer_id and ctr.date >= cst.date
order by ctr.id, cst.date desc
The use of this techniques pinpoints that your could vastly improve your data model: the relation between contracts and customers should be stored in a separate table, with each customer/contract tuple on a separate row - essentially, what the query does is virtually build that derived table in the lateral join.
I'm trying to create a view using a double join between tables.
I'm working on some travel software, managing holiday bookings. The different items a person pays for can be in different currencies.
I've a table of bookings, and a table of currencies.
There are many different items a person can pay for, all stored in different tables. I've created a view showing the total owed per payment Item type.
e.g. owed for Transfers:
BookingID CurrencyID TotalTransfersPrice
1 1 340.00
2 1 120.00
2 2 100.00
e.g. owed for Extras:
BookingID CurrencyID TotalExtrasPrice
1 1 200.00
1 2 440.00
2 1 310.00
All is good so far.
What I'd like to do is to create a master view that brings this all together:
BookingID CurrencyID TotalExtrasPrice TotalTransfersPrice
1 1 200.00 340.00
1 2 440.00 NULL
2 1 310.00 120.00
2 2 NULL 100.00
I can't figure out how to make the above. I've been experimenting with double joins, as I'm guessing I need to do joins both for the BookingID and the CurrencyID?
Any ideas?
Thanks!
Phil.
For SQL Server
This query allows each {BookingId, CurrencyId} have more than one row in the Transfer and Extras tables.
since you stated
I've created a view showing the total owed per payment Item type.
I'm accumulating them by BookinID and CurrencyID
SELECT ISNULL(transfers.BookingId, extras.BookingId) AS BookingId,
ISNULL(transfers.CurrencyId, extras.CurrencyId) AS CurrencyId,
SUM(TotalExtrasPrice) AS TotalExtrasPrice,
SUM(t.TotalTransfersPrice) AS TotalTransfersPrice
FROM transfers
FULL OUTER JOIN extras ON transfers.BookingId = extras.BookingId and transfers.CurrencyId = extras.CurrencyId
GROUP BY ISNULL(transfers.BookingId, extras.BookingId),ISNULL(transfers.CurrencyId, extras.CurrencyId)
You should try to use the full outer join in joining the two tables: Transfers & Extras. Assuming you are using MySQL platform, the sql query can be:
SELECT t.BookingId,t.CurrencyId,e.TotalExtrasPrice,t.TotalTransfersPrice
FROM transfers as t FULL OUTER JOIN extras as e
ON t.BookingId = e.BookingId AND t.CurrencyId = e.CurrencyId;
use joins
select t.BookingId,t.CurrencyId,e.TotalExtrasPrice,t.TotalTransfersPrice
from transfers as t
join extras as e
on t.BookingId = e.BookingId and t.CurrencyId = e.CurrencyId
If you want to cover the case where a combination of BookingID and CurrencyID only exist in either Transfers or Extras and you still want to include them in the result (rather than finding the intersect) this query will do that:
SELECT IDs.BookingId, IDs.CurrencyID, e.TotalExtrasPrice,t.TotalTransfersPrice
FROM (
SELECT BookingId,CurrencyId FROM transfers
UNION
SELECT BookingId,CurrencyId FROM extras
) IDs
LEFT JOIN transfers t ON IDs.BookingId=t.BookingId AND IDs.CurrencyID=t.CurrencyID
LEFT JOIN extras e ON IDs.BookingId=e.BookingId AND IDs.CurrencyID=e.CurrencyID
This query will produce a result identical to your example.
This works. All you need is a simple full outer join.
SELECT "BookingID", "CurrencyID",
ext."TotalExtrasPrice", trans."TotalTransfersPrice"
FROM Transfers trans FULL OUTER JOIN Extras ext
USING ("BookingID", "CurrencyID");
SQLFiddle demo using Oracle.