How to show record only once when condition is true? - sql

I have clients which are placing deposits. Some of them are placing deposits over than 9000 USD and I wanted to check what deposits they are doing after the date when they placed 9000 USD deposit. Unfortunately, with my join it is showing duplicates in column B when the condition based on column D is true. I would like to see entry in column B only once, for the closest date in column D. I created join like that, but is not working as expected:
SELECT a."ACCOUNT_ID", a."PROCESSED_DATE", a."AMOUNT_USD", b."PROCESSED_DATE" as date_transfer_over_9000
from deposits a
inner join (SELECT "ACCOUNT_ID", "PROCESSED_DATE"
FROM deposits
where "AMOUNT_USD" >= 9000) b ON
a."ACCOUNT_ID" = b."ACCOUNT_ID"
and a."PROCESSED_DATE" > b."PROCESSED_DATE"
It is duplicating entries in column B when the condition based on column D is true:
I would like to have a result like that:
Is it possible with Exists function or other in Redshift?

Related

SUM for 3rd party table in many to many relationship

I have a situation where I am bridging a 3rd party table between 2 other tables:
client (one-to-many) -> containers
containers -> containers_invoices (many-to-many) -> invoices
What I want is to get the SUM of the paid invoices for each client. The client is related to the containers, thus I have to connect the containers to the invoices and the clients to the containers to make the bridge. I used the following query to do so:
$sql = "
SELECT
SUM(invoices.invoice_eur) AS invoice_eur,
SUM(invoices.invoice_usd) AS invoice_usd,
invoices.status_id
FROM containers_invoices
LEFT JOIN invoices
ON containers_invoices.invoice_id = invoices.invoice_id
LEFT JOIN containers
ON containers_invoices.container_id = containers.container_id
WHERE containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
GROUP BY containers.client_id
";
$x = $this->fetch_query($sql);
if (isset($x[0]->invoice_eur)) $eur = $x[0]->invoice_eur . ' EUR';
if (isset($x[0]->invoice_usd)) $usd = $x[0]->invoice_usd . ' USD';
if (isset($x[0]->invoice_eur) && isset($x[0]->invoice_usd)) $spacer = ' | ';
return $eur . $spacer . $usd;
here is an example of how the invoices should look like:
invoice 1 -> cont A, cont B -> 100 USD
invoice 2 -> cont A, cont B -> 7000 USD
invoice 3 -> cont A, cont B -> 75 USD
invoice 4 -> cont A, cont B -> 7000 USD
invoice 5 -> cont C -> 1000 USD
invoice 6 -> cont D -> 1000 USD
The issue is that when one invoice is made for 2 or more containers, the sum is calculated for each individually. In the case of invoice 2 the query sees it as 14000 USD because there are 2 containers. The solution is to introduce DISTINCT before invoices.invoice_usd and that solves the doubling problem, but then this approach is too aggressive because then DISTINCT looks at invoice 2 and invoice 4 (7000 USD) and sees them as double as well, and thus it skips invoice 4. The same happens for invoice 5 and 6 (1000 USD).
Is there a possible solution to this? Thanks in advance!
Let's create the tables that are in the question here. For the sake of brevity, we'll use these abbreviations:
clid -> client_id
cid -> container_id
inv -> invoice_id
eur -> invoice_eur
usd -> invoice_usd
Let's create sample rows now:
containers
clid1, cid1
clid1, cid2
clid2, cid3
container_invoices
cid1, inv1
cid1, inv2
cid1, inv3
cid2, inv1
cid2, inv2
cid3, inv3
invoices
inv1, eur1, usd1
inv2, eur2, usd2
inv3, eur3, usd3
If we're to take the JOIN of these three tables on appropriate columns, a cross table that would be generated would look like this:
containers_container_invoices_invoices
clid1, cid1, inv1, eur1, usd1
clid1, cid1, inv2, eur2, usd2
clid1, cid1, inv3, eur3, usd3
clid1, cid2, inv1, eur1, usd1
clid1, cid2, inv2, eur2, usd2
clid2, cid3, inv3, eur3, usd3
As you can see in the table above, row 1 and row 4 are identical and row 2 and row 5 are identical as well. If we take the sum grouping only by clid, this would obviously double calculate the inv1 and inv2 entries for clid1. In order to only pick the legitimate values we can use DISTINCT. Here we would want this tuple (clid, inv) to be distinct (non-identical). Since eur and usd are columns of invoices. It is safe to assume we want (clid, inv, eur, usd) to be distinct (non-identical). So, the query would look like this:
select sum(invoice_eur) as invoice_eur, sum(invoice_usd) as invoice_usd
from (
select distinct containers.client_id, invoices.invoice_id, invoices.invoice_eur, invoices.invoice_usd
from container_invoices
left join invoices
on container_invoices.invoice_id = invoices.invoice_id
left join containers
on container_invoices.container_id = containers.container_id
where containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
) client_invoices;
Notice that you don't need to do a GROUP BY here since you already have a WHERE clause on client_id. See this DB Fiddle for more clarity.
This can be achieved with DISTINCT.
How DISTINC works:
DISTINCT looks for distinct values in the specified column.
In the result rows which do not have a distinct value in the specified column do not show up.If multiple columns are specified in the select-statement the value-combination of the specified columns have to be distinct in order for the row to be counted as distinct.
To your Example:
You select for invoice_eur, invoice_usd and status_id.
If you were to just add DISTINCT to your select statement all rows with distinct combinations of theese three columns will show up.
E.G. If you had one row with
invoice_eur: 100
invoice_usd: 100
status_id : paid
and a second row with
invoice_eur: 100
invoice_usd: 100
status_id : pending
both rows with show up. Even though invoice_eur and invoice_usd is identical for both rows, the combination of all three values is distinct.
Solution for your problem:
For your Invoice 5 and your Invoice 6 the values for all specified columns is identical. Therefore they are not seen as DISTINCT.
You can fix this by specifying another column that makes Invoice 5 and 6 discint. I would recommend invoice_id.
This makes sure that invoices that come up more than once (because they are in multiple containers) are still filtered out because the combination (invoice_eur, invoice_usd, status_id, invoice_id) is not distinct for these cases.
However you should not put DISTINCT infront of your SUM.
That way the SUM would be calculated first and DISTINCT would be applied on the result on SUM. SUM only gives one result(row) so this is pointless.
One way around this is a nested SQL-statement.
I recreated your database and the Query that works for me is this one:
SELECT
SUM(invoice_eur) AS invoice_eur
FROM (
SELECT DISTINCT
invoices.invoice_id,
containers.client_id,
invoices.status_id,
invoices.invoice_eur AS invoice_eur
FROM dbo.containers_invoices
LEFT JOIN dbo.invoices
ON containers_invoices.invoice_id = invoices.invoice_id
LEFT JOIN dbo.containers
ON containers_invoices.container_id = containers.container_id) invoices_paid
GROUP BY client_id;
You would only have to add rest of your select arguments and the WHERE-statement.

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

Pull column value from second to last row

I'm stuck in a loop of figuring out a game plan for this in SQL. Below is my sample data. I'm trying to create another column called "Starting Balance" which would be the amount in "Ending Balance" for the previous LINE. When I have that, I would only like to display where reason = count and forget about the rest.
I can't even fathom what approach to take and any advice would be appreciated.
Sample Data:
ITEM ID
ITEM
LAST UPDATED
REASON
ENDING BALANCE
LINE
123
Pencil
9/1/2020
Correction
400
1
123
Pencil
9/2/2020
Correction
450
2
123
Pencil
9/3/2020
Count
500
3
Expected Output:
ITEM ID
ITEM
LAST UPDATED
REASON
Starting Balance
ENDING BALANCE
123
Pencil
9/3/2020
Count
450
500
if "previous LINE" means the row with Lastupdated before current row:
select * from (
select * , lag(ENDING_BALANCE,1,0) over (partition by ItemId order by LASTUPDATED) as Starting Balance
from table
) t where t.REASON = 'count'
I can't bugtest this, but maybe something like:
SELECT
a.ITEM_ID,
a.ITEM,
a.LAST_UPDATED,
"Count" AS REASON,
b.ENDING_BALANCE AS Starting_Balance,
a.ENDING_BALANCE AS ENDING_BALANCE,
FROM table a
LEFT JOIN table b
ON a.ITEM_ID = b.ITEM_ID, a.LINE = b.LINE + 1
Note that we're joining two copies of the same table together here and labeling them a and b.
No clue if you can do ONs like that, but you could also make the join table have a LINE +1 column which you then use to join.

SQL : Currency Conversion from given currency to destination currency

I have two tables, Table1 with amount , amountCurrency and destination currency and Table2 have the conversion rate(As mentioned in image conversation rate are in terms of USD) .
What I want to do :
Convert Amount from amountCurrency to destination currency and update it in last column of Table1
Example : Amount in row one of Table1 is in INR and I want to convert it to CAD. As per math I will get conversation rate for 1 INR on given conversion_date from Table2 multiple it by AmountCurrency. something like,
select Rate from Table1 where converstion_Date = '2014-06-30' and Currency = 'INR'.
Above query will give me 0.0160752000 and we will convert INR TO USD i.e
100 * 0.0160752000 = 1.60752 USD
Since we want to convert it to CAD get conversion rate for 1 CAD on given conversion_date, 1 CAD = 0.9399380000 USD, now we need to convert 1.60752 USD to CAD that can be done by dividing it with CAD rate i.e 1.60752/1.60752 = 1.71024 CAD.
My Table1 has around 10000 rows and Table2 has conversion rate for all the currencies for all dates till now. What is the best way to iterate Table1 rows and do the conversion and update it in CalculateAmountInDestinationCurrency column.
I was thinking to have a loop like,
While (Select Count(*) From Table1) > 0
begin
// step1 : Get top row
//step2 : Get conversition rate for **Amountcurrency** using select query to table2
//step3 : Multiply with amount (Here we have USD value for amount)
//step4: Get conversion rate for **DestinationCurrency**
//step5: Divide USD Amount with result from step 4
//Update
end
Any help is appreciated. Is this good way to do this? Is there any better way?
All you need is one query with an instance of your base table and two instances of your conversion table, one joined on base.amountCurrency = rate.currency and the other joined on base.destinationCurrency = rate.currency. If you need to add conversion_date criteria in, you can do that, too, if you have multiple rates for each currency over time and you want the most recent one, or whichever.
something like:
select
b.*,
a.rate as rate_amount,
d.rate as rate_destination,
amount * a.rate / d.rate as amtInDestCurrency
from
base b inner join
rate a on
b.amountcurrency = a.currency and
b.conversion_date = a.conversion_date inner join
rate d on
b.destinationCurrency = d.currency and
b.conversion_date = d.conversion_date

SQL Server: Subtraction between 2 queries

i have 2 queries, where in one table the amount is shown for cars such as
Amount_Table Cars
800 Car A
900 Car B
2100 Car C
Second Table shows discount respectively for Car A, B & C.
Discount_table
40
10
80
I wish to have a final query where in the Amount-Discount values are displayed
The amount table has one query made and discount table has another query. hence i wish to do
(amount-query) - (discount query)
I did
Select ( (amount-query) - (discount-query))
but that threw error of
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
Please help!
try something like this:
Select AmountTable.Amount-isnull(DiscountTable.Discount, 0)
from AmountTable left join
on AmountTable.Car = DiscountTable.Car
You cannot "subtract" queries. You have to do a join between tables (or subqueries), and make expressions using columns' names.
You need to join:
SELECT *
,cars_table.amount - discounts_table.discount
FROM cars_table
INNER JOIN discounts_table
ON cars.some_key = discounts_table.some_key