Conditional Join Big Query - sql

I am beginner with BigQuery and SQL in general. I have a query that looks like this:
SELECT
base.*
IF( regexp_contains(rate_name, 'usd'), price * ft.usd, IF(regexp_contains(rate_name, 'gbp'), price * ft.gbp, price )) AS converted_price
FROM base_table base
JOIN
finance_table ft
ON
base.date = ft.date
In short, I have a table with some data (base) and depending on the currency that is the price, I want to convert using the rate stored in another table. The table with the rates (finance_table) has data only for 2021 but the base_table has data for dates before that.
What I want to do is to use this query as is when the date exists in the finance_table, otherwise use the rates from 2021-01-01 (this first date of finance_table).
What I tried is to join on this:
ON
IF( ft.date IS NOT NULL, base.date = ft.date, ft.date = '2021-01-01')
However, this doesn't give me any results when I query for a random date from 2020. I am sure that the condition is wrong, so any ideas?
P.S. Another thing that would suffice is using fixed numbers, e.g. if the date doesn't exist, multiply the price with 0.85 or 1.15, but this would probably make things more complicated.
EDIT:
Tables look like this:
BASE:
DATE | PRODUCT_NAME | PRICE | RATE_NAME
2020-01-01| APPLE | 0.5 | usd
2021-01-01| ORANGE | 0.4 | gbp
FINANCE_TABLE:
DATE | USD | GBP
2021-01-01| 0.844 | 1.443
2021-01-02| 0.846 | 1.423
The final result should look like this, when I query for date = '2021-01-01'
DATE | PRODUCT_NAME| PRICE | RATE_NAME | CONVERTED_PRICE
2021-01-01 | ORANGE | 0.4 | gbp | 0.5772
The problem lies in the case where I query for dates that don't exist in the finance_table.

You can use two joins. A direct translation into your query is:
SELECT price
(CASE WHEN base.rate_name = 'usd'
THEN base.price * coalesce(ft.usd, ft1.usd)
WHEN base.rage_name = 'gbp'
THEN base.price * coalesce(ft.gbp, ft.gbp)
ELSE base.price
END) AS converted_price
FROM base_table base LEFT JOIN
finance_table ft
ON base.date = ft.date JOIN
finance_table ft1
ON ft1.date = DATE '2020-01-01';

Related

SQL Distinct Pair Groupings

I am interested in manipulating my data like so:
My Source Data:
From | To | Rate
----------------
EUR | AUD | 1.5895
EUR | BGN | 1.9558
EUR | GBP | 0.7347
EUR | USD | 1.1151
GBP | AUD | 2.1633
GBP | BGN | 2.6618
GBP | EUR | 1.3610
GBP | USD | 1.5176
USD | AUD | 1.4254
USD | BGN | 1.7539
USD | EUR | 0.8967
USD | GBP | 0.6589
In regards to "distinct pairs", I consider the following to be "duplicates".
EUR | USD matches USD | EUR
EUR | GBP matches GBP | EUR
GBP | USD matches USD | GBP
I want my source data to be filtered such that it removes any 1 of the above "duplicates", such that my final table is 3 records less than the original. I do not care which record from the "duplicates" is kept or removed, just so long as only 1 is selected.
I have tried many variations of Joins, Exists, Except, Distinct, Group By, logical comparisons (< >) and I feel like I am so close with any given approach... but it just does not seem to click.
My favorite effort has involved inner joining on EXCEPT:
SELECT a.[FROM], a.[TO], a.[Rate]
FROM Table a
INNER JOIN
(
SELECT DISTINCT [From], [To]
FROM Table
EXCEPT
(
SELECT [TO] as [From], [From] as [To]
FROM Table
)
) b
ON a.[From] = b.[From] AND a.[To] = b.[To]
But alas, it removes all of the matched pairs.
I can suggest something very easy, if it doesn't matter which one of then you want, than you can pick only the one that his rate is bigger than 1 or on the contrary the one smaller. Each pare should be 1 rate bigger and one smaller (make sense) so
Select * from table where rate>1
One way to remove the duplicates that doesn't depend on the rates:
select s.*
from source s
where from < to
union all
select s.*
from source s
where to > from and
not exists (select 1 from source s2 where s.from = s2.to and s.to = s2.from);
Note: I did not put escape characters around from and to, although you would need them in your actual query.
Just to make it complete an DISTINCT ON solution:
SELECT DISTINCT ON(Least(from, to), Greatest(from, to)) *
FROM
source AS s1
ORDER BY Least(from, to), Greatest(from, to)

Asymmetric columns - Picking latest given value

Version Used: Microsoft SQL Server Management Studio, SQL Server 2008
I'm facing a frustrating issue that is caused by asymmetric columns. Basically, I want to calculate the effects of a discount on given spot prices. Both are set up as indexes in the same table pricevalues. Spot prices are given 5 days รก week, while discounts are only stated on the day they were updated. So, for example:
pricevalues(priceindex, price, pricedate)
PRICEINDEX PRICE PRICEDATE
-------------------- ------------------ ------------------
DISCOUNT_INDEX_ID | 15.5 | 2013-02-26
DISCOUNT_INDEX_ID | 10.5 | 2013-04-05
DISCOUNT_INDEX_ID | 16.0 | 2013-07-10
SPOT_INDEX_ID | 356.5 | 2013-07-22
SPOT_INDEX_ID | 355.0 | 2013-07-23
SPOT_INDEX_ID | 354.6 | 2013-07-24
SPOT_INDEX_ID | 357.0 | 2013-07-25
SPOT_INDEX_ID | 358.5 | 2013-07-26
How would I best go about calculating the difference between PRICE's for SPOT_INDEX_ID and DISCOUNT_INDEX_ID on all dates that SPOT_INDEX_ID is given, if the latest given (relative to the PRICEDATE of the spot price) discount PRICE is to be used?
For example, the discount on a spot on 2013-07-22 is 16.0 (2013-07-10), while the discount on a spot on 2013-05-15 is 10.5 (2013-04-05) and the discount on a spot on 2013-03-03 is 15.5 (2013-02-26)
I only know how to do it when the PRICEDATE's match for both DISCOUNT_INDEX_ID and SPOT_INDEX_ID, so:
SELECT
(pv1.price - pv2.price) AS 'Total Price',
pv1.price AS 'Spot Price',
pv2.price AS 'Discount'
FROM
pricevalues pv1, pricevalues pv2
WHERE
pv1.priceindex = 'SPOT_INDEX_ID' AND
pv1.pricedate = pv2.pricedate AND
pv2.priceindex = 'DISCOUNT_INDEX_ID'
This is of course not possible whith these huge gaps in the discount index, so when the dates do not match, how do I instead get the value of the latest given discount?
EDIT: I would like the output to look like the following:
PRICEDATE SPOT_INDEX DISCOUNT_INDEX SPOT_PRICE
---------------- ------------------- --------------------- ----------- --->>>
2013-07-26 | SPOT_INDEX_ID | DISCOUNT_INDEX_ID | 358.5 |
DISCOUNT_PRICE TOTAL_PRICE
---------------- -------------------
16.0 | 342.5 |
You can take the Discount price in a variable for a given date and then use it in the main query. here is the sample:
Declare #Discount_Price money
select Max(pricedate),#Discount_Price=Price from pricevalues where PriceIndex='DISCOUNT_INDEX_ID' group by Price having Price=Max(PriceDate)
SELECT
(pv1.price - #Discount_Price) AS PriceDiff,
pv1.price AS 'Spot Price',
#Discount_Price AS 'Discount'
FROM
pricevalues pv1
WHERE
pv1.priceindex = 'SPOT_INDEX_ID'
If you only want to return one record this will work:
DECLARE #date DATE = GETDATE()
SELECT TOP 1 a.PRICEDATE, b.PRICE 'Spot', a.PRICE 'Discount', b.Price - a.Price 'Total'
FROM #Table1 a
JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY PRICEINDEX ORDER BY PRICEDATE DESC)'RowRank'
FROM #Table1
WHERE PRICEINDEX = 'SPOT_INDEX_ID'
AND PRICEDATE <= #date
)b
ON b.RowRank = 1
WHERE a.PRICEINDEX = 'DISCOUNT_INDEX_ID'
AND a.PRICEDATE <= #date
ORDER BY a.PRICEDATE DESC
Change GETDATE() to whatever your inquiry date is.
I wasn't sure what you wanted for PRICEDATE, perhaps that should be b.PRICEDATE, or maybe just #date?
I solved the problem with a combination of FETCHES in SQL and processing in Excel where I also took care of the gaps in the discount index.

Calculations over Multiple Rows SQL Server

If I have data in the format;
Account | Period | Values
Revenue | 2013-01-01 | 5432
Revenue | 2013-02-01 | 6471
Revenue | 2013-03-01 | 7231
Costs | 2013-01-01 | 4321
Costs | 2013-02-01 | 5672
Costs | 2013-03-01 | 4562
And I want to get results out like;
Account | Period | Values
Margin | 2013-01-01 | 1111
Margin | 2013-02-01 | 799
Margin | 2013-03-01 | 2669
M% | 2013-01-01 | .20
M% | 2013-02-01 | .13
M% | 2013-03-01 | .37
Where Margin = Revenue - Costs and M% is (Revenue - Costs)/Revenue for each period.
I can see various ways of achieving this but all are quite ugly and I wanted to know if there was elegant general approach for these sorts of multi-row calculations.
Thanks
Edit
Some of these calculations can get really complicated like
Free Cash Flow = Margin - Opex - Capex + Change in Working Capital + Interest Paid
So I am hoping for a general method that doesn't require lots of joins back to itself.
Thanks
Ok, then just Max over a Case statement, like such:
with RevAndCost as (revenue,costs,period)
as
(
select "Revenue" = Max(Case when account="Revenue" then Values else null end),
"Costs" = MAX(Case when account="Costs" then values else null end),
period
from data
group by period
)
select Margin = revenue-costs,
"M%" = (revenue-costs)/nullif(revenue,0)
from RevAndCost
Use a full self-join with a Union
Select 'Margin' Account,
coalesce(r.period, c.period) Period,
r.Values - c.Values Values
From myTable r
Full Join Mytable c
On c.period = r.period
Union
Select 'M%' Account,
coalesce(r.period, c.period) Period,
(r.Values - c.Values) / r.Values Values
From myTable r
Full Join Mytable c
On c.period = r.period
Here I use a Common Table Expression to do a full outer join between two instances of your data table to pull in Revenue and Costs into 1 table, then select from that CTE.
with RevAndCost as (revenue,costs,period)
as
(
select ISNULL(rev.Values,0) as revenue,
ISNULL(cost.values,0) as costs,
ISNULL(rev.period,cost.period)
from data rev full outer join data cost
on rev.period=cost.period
)
select Margin = revenue-costs,
"M%" = (revenue-costs)/nullif(revenue,0)
from RevAndCost
I'd do it like this:
SELECT r.PERIOD, r.VALUES AS revenue, c.VALUES AS cost,
r.VALUES - c.VALUES AS margin, (r.VALUES - c.VALUES) / r.VALUES AS mPct
FROM
(SELECT PERIOD, VALUES FROM t WHERE
ACCOUNT = 'revenue') r INNER JOIN
(SELECT PERIOD, VALUES FROM t WHERE
ACCOUNT = 'costs') c ON
r.PERIOD = c.PERIOD

SQL to find the date when the price last changed

Input:
Date Price
12/27 5
12/21 5
12/20 4
12/19 4
12/15 5
Required Output:
The earliest date when the price was set in comparison to the current price.
For e.g., price has been 5 since 12/21.
The answer cannot be 12/15 as we are interested in finding the earliest date where the price was the same as the current price without changing in value(on 12/20, the price has been changed to 4)
This should be about right. You didn't provide table structures or names, so...
DECLARE #CurrentPrice MONEY
SELECT TOP 1 #CurrentPrice=Price FROM Table ORDER BY Date DESC
SELECT MIN(Date) FROM Table WHERE Price=#CurrentPrice AND Date>(
SELECT MAX(Date) FROM Table WHERE Price<>#CurrentPrice
)
In one query:
SELECT MIN(Date)
FROM Table
WHERE Date >
( SELECT MAX(Date)
FROM Table
WHERE Price <>
( SELECT TOP 1 Price
FROM Table
ORDER BY Date DESC
)
)
This question kind of makes no sense so im not 100% sure what you are after.
create four columns, old_price, new_price, old_date, new_date.
! if old_price === new_price, simply print the old_date.
What database server are you using? If it was Oracle, I would use their windowing function. Anyway, here is a quick version that works in mysql:
Here is the sample data:
+------------+------------+---------------+
| date | product_id | price_on_date |
+------------+------------+---------------+
| 2011-01-01 | 1 | 5 |
| 2011-01-03 | 1 | 4 |
| 2011-01-05 | 1 | 6 |
+------------+------------+---------------+
Here is the query (it only works if you have 1 product - will have to add a "and product_id = ..." condition on the where clause if otherwise).
SELECT p.date as last_price_change_date
FROM test.prices p
left join test.prices p2 on p.product_id = p2.product_id and p.date < p2.date
where p.price_on_date - p2.price_on_date <> 0
order by p.date desc
limit 1
In this case, it will return "2011-01-03".
Not a perfect solution, but I believe it works. Have not tested on a larger dataset, though.
Make sure to create indexes on date and product_id, as it will otherwise bring your database server to its knees and beg for mercy.
Bernardo.

How can I represent a single row from result set as multiple rows?

Given for example a currency rates table with these columns (used 3 here, but in my situation there are about 30):
date | eur | usd | gbp
2010-01-28 | X | Y | Z
How do I convert it to this one (using row with the latest date):
currency | rate
eur | X
usd | Y
gbp | Z
I've come up with a query like this:
SELECT 'eur' AS currency, eur AS rate FROM rates WHERE date = (SELECT MAX(date) FROM rates)
UNION
SELECT 'usd' AS currency, usd AS rate FROM rates WHERE date = (SELECT MAX(date) FROM rates)
UNION
...
It's huge and ugly. Are there other solutions ?
Sometimes the easiest solution (if you want nice-looking queries) is to re-engineer the schema. It may well be that the best solution is to change your table to be:
date | currency | rate
-----------+----------+-----
2010-01-28 | eur | X
2010-01-28 | usd | Y
2010-01-28 | gbp | Z
with suitable indexes on date and currency for performance. That's the way it should be in 3NF since the rates depend on each other, violating the 3NF rule:
Every column must depend on the key, the whole key and nothing but the key, so help me Codd.
(I love that little ditty). Another alternative is to provide a view which does the same thing, then you query the view. It's no less work for the DBMS but your query at least looks prettier (the create view still looks ugly though).
Or you could just accept the fact that some queries look ugly, document it well, and move on :-)
Do you have to do it in SQL?
This is quite trivial using a programming language.
In PHP:
$q = mysql_query("
SELECT eur, usd, gbp
FROM rates
ORDER BY date DESC
LIMIT 1
");
$table = false;
if($val = mysql_fetch_array($q, MYSQL_ASSOC))
{
$table = array(
'eur' => $val['eur'],
'usd' => $val['usd'],
'gbp' => $val['gbp'],
);
}
echo "currency | rate\n";
echo "-----------------";
foreach($table as $cur => $rate)
echo $curr." | ".$rate."\n";