I have spent all morning on this and just can't get it right... I'd really appreciate the help of someone more knowledgable than myself to get this working.
I have a table with some data in that looks like this:
MonthYear WeekBeg. Week Value
Dec-10 27/12/2010 1 66.66
Jan-11 3/01/2011 2 50
Jan-11 10/01/2011 3 17.5
Jan-11 17/01/2011 4 20
Jan-11 24/01/2011 5 0
Jan-11 31/01/2011 6 50
Feb-11 7/02/2011 7 0
Feb-11 14/02/2011 8 74
Feb-11 21/02/2011 9 100
I'm sorry the table above doesn't look better... I need to calculate the difference between the values from week to week - so the results column in this case would be:
16.66
32.5
2.5
20
50
50
74
26
I've looked at lots of code on the net - (e.g. from this site) but can't seem to make it work. I added in the ABS function to make sure the differences were absolute values and got this working but the numbers themselves just aren't right.
I haven't posted what I ended up with as it just got into a bigger and bigger mess, but what I started with was the link above. Again, I'd be really grateful for any insight anyone is able to offer.
Many thanks
ADDED:
Thanks so much for the fast reply. Got this working easily - added a few bits:
SELECT T1.MonthYear AS [From], T2.MonthYear AS [To], T1.Week AS Week, T1.WeekBeg AS WeekBeg, ABS(T1.Value - T2.Value) AS Difference FROM Test AS T1 LEFT JOIN Test AS T2 ON T2.Week = T1.Week + 1
Only thing is the resulting difference values need to be in the second of the two rows whereas here they are in the first of the two. Is there any easy way of modifying this?
Many thanks again.
ADDED:
Would definitely be worth using the second option if possible as can't always guarantee weeks won't be missed out. I am probably missing something, but when I run the second option from Thomas, I get the message:
'The specified field [T1].[Datavalue] could refer to more than one table listed in the FROM clause of your SQL statement'.
I thought this might be to do with the field in the table being VALUE not DataValue, but when I change it, I get 'Type Mismatch in Expression' instead.
Many thanks.
Presuming the Week column is perfectly sequential:
Select T1.MonthYear As T1Year
, T1.WeekBeg As T1WeekBeg
, T2.MonthYear As T2Year
, T2.WeekBeg As T2WeekBeg
, [T2].[Value]-[T1].[Value] AS Expr1
From TableWithData AS T1
Left Join TableWithData AS T2
On T1.Week = T2.Week + 1;
It should be noted that this will not compile in the QBE designer. You will have to view and modify it purely through the SQL View (or in code)
If for some reason you could not depend on the Week number being sequential, then it gets trickier as you need to use a derived table. Again, this solution will only work in SQL View or in code:
Select T1.MonthYear, T1.WeekBeg
, T2.MonthYear, T2.WeekBeg
, [T2].[Value]-[T1].[Value] AS Diff
From (TableWithData AS T1
Inner Join (
Select T1.WeekBeg As T1WeekBeg
, Min(T2.WeekBg) As T2WeekBeg
From TableWithData As T1
Left Join TableWithData AS T2
On T2.WeekBeg > T1.WeekBeg
Group By T1.WeekBeg
) As Query1
On T1.WeekBeg = Query1.T1WeekBeg)
Inner Join TableWithData AS T2
On Query1.T2WeekBeg = T2.WeekBeg;
A version based off of the sample query from your base link. (It uses ORDERY BY on the Week field and TOP 1 too isolate a scalar value.)
SELECT t1.Value - (SELECT TOP 1 t2.Value FROM myTable AS t2
WHERE t2.Week < t1.Week
ORDER BY t2.Week DESC) AS t2Val
FROM myTable t1
WHERE (SELECT TOP 1 t3.Value FROM myTable AS t3
WHERE t1.Week < t3.Week) Is Not Null
ORDER BY t1.Week;
Should be close to working but the aliasing is very error prone. I suggest that if the week numbers are indded sequential that you go with Thomas' answer.
Related
Say we have a dataset of 500 000 flights from Los Angeles to 80 cities in Europe and back and from Saint Petersburg to same 80 cities in Europe and back. We want to find such 4 flights:
from LA to city X, from city X back to LA, from St P to city X and from city X back to St P
all 4 flights have to be in a time window of 4 days
we are looking for the cheapest combined price of 4 flights
city X can be any of 80 cities, we want to find such cheapest combination for all of them and get the list of these 80 combinations
The data is stored in BigQuery and I've created an SQL query, but it has 3 joins and I assume that under the hood it can have complexity of O(n^4), because the query didn't finish in 30 minutes and I had to abort it.
Here's the schema for the table:
See the query below:
select * from (
select in_led.`from` as city,
in_led.price + out_led.price + in_lax.price + out_lax.price as total_price,
out_led.carrier as out_led_carrier,
out_led.departure as out_led_departure,
in_led.departure as in_led_date,
in_led.carrier as in_led_carrier,
out_lax.carrier as out_lax_carrier,
out_lax.departure as out_lax_departure,
in_lax.departure as in_lax_date,
in_lax.carrier as in_lax_carrier,
row_number() over(partition by in_led.`from` order by in_led.price + out_led.price + in_lax.price + out_lax.price) as rn
from skyscanner.quotes as in_led
join skyscanner.quotes as out_led on out_led.`to` = in_led.`from`
join skyscanner.quotes as out_lax on out_lax.`to` = in_led.`from`
join skyscanner.quotes as in_lax on in_lax.`from` = in_led.`from`
where in_led.`to` = "LED"
and out_led.`from` = "LED"
and in_lax.`to` in ("LAX", "LAXA")
and out_lax.`from` in ("LAX", "LAXA")
and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) < 4
and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) > 0
and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) < 4
and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) > 0
order by total_price
)
where rn=1
Additional details:
all flights' departure dates fall in a 120 days window
Questions:
Is there a way to optimize this query for better performance?
How to properly classify this problem? The brute force solution is way too slow, but I'm failing to see what type of problem this is. Certainly doesn't look like something for graphs, kinda feels like sorting the table a couple of times by different fields with a stable sort might help, but still seems sub-optimal.
Below is for BigQuery Standard SQL
The brute force solution is way too slow, but I'm failing to see what type of problem this is.
so I would like to see solutions other than brute force if anyone here has ideas
#standardSQL
WITH temp AS (
SELECT DISTINCT *, UNIX_DATE(DATE(departure)) AS dep FROM `skyscanner.quotes`
), round_trips AS (
SELECT t1.from, t1.to, t2.to AS back, t1.price, t1.departure, t1.dep first_day, t1.carrier, t2.departure AS departure2, t2.dep AS last_day, t2.price AS price2, t2.carrier AS carrier2,
FROM temp t1
JOIN temp t2
ON t1.to = t2.from
AND t1.from = t2.to
AND t2.dep BETWEEN t1.dep + 1 AND t1.dep + 3
WHERE t1.from IN ('LAX', 'LED')
)
SELECT cityX, total_price,
( SELECT COUNT(1)
FROM UNNEST(GENERATE_ARRAY(t1.first_day, t1.last_day)) day
JOIN UNNEST(GENERATE_ARRAY(t2.first_day, t2.last_day)) day
USING(day)
) overlap_days_in_cityX,
(SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
FROM UNNEST([t1])) AS LAX_CityX_LAX,
(SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
FROM UNNEST([t2])) AS LED_CityX_LED
FROM (
SELECT AS VALUE ARRAY_AGG(t ORDER BY total_price LIMIT 1)[OFFSET(0)]
FROM (
SELECT t1.to cityX, t1.price + t1.price2 + t2.price + t2.price2 AS total_price, t1, t2
FROM round_trips t1
JOIN round_trips t2
ON t1.to = t2.to
AND t1.from < t2.from
AND t1.departure2 > t2.departure
AND t1.departure < t2.departure2
) t
GROUP BY cityX
)
ORDER BY overlap_days_in_cityX DESC, total_price
with output (just top 10 out of total 60 rows)
Brief explanation:
temp CTE: Dedup data and introduce dep field - number of days since epoch to eliminate costly TIMESTAMP functions
round_trips CTE: identify all round trip candidates with at most 4 days apart
identify those LAX and LED round trips which have overlaps
for each cityX take the cheapest combination
final output does extra calculation on overlapping days in cityX and lean a little output to have info about all involve flights
Note: in your data - duration field are all zeros - so it is not involved - but if you would have it - it is easy to add it to logic
the query didn't finish in 30 minutes and I had to abort it.
Is there a way to optimize this query for better performance?
My "generic recommendation" is to always learn the data, profile it, clean it - before actual coding! In your example - the data you shared has 469352 rows full of duplicates. After you remove duplicates - you got ONLY 14867 rows. So then I run your original query against that cleaned data and it took ONLY 97 sec to get result. Obviously, it does not mean we cannot optimize code itself - but at least this addresses your issue with "query didn't finish in 30 minutes and I had to abort it"
DBMS: SQL Server 2012
I have a table with the following columns
What I need to be able to do is figure out the NextCarDepartureTime. So for example, Car2 would depart at 9/30/2014 1:12 AM, Car3 would depart at 10/1/2014 12:10 AM. The ultimate goal is to figure out the difference in hours between CarDepartedDateTime and NextCarDepartedTime. So the end result would look like
Any help would be greatly appreciated! Thanks
You can use the LEAD analytic function introduced in SQL server 2012. This function access subsequent row from your table. So your query now look like this -
;WITH myCTE
AS ( SELECT T1.Year ,
T1.Month ,
T1.CarNumberID ,
T1.CarDepartedDateTime ,
T1.CarDepartedNumberID ,
LEAD(T1.CarDepartedDateTime) OVER ( ORDER BY T1.id ) AS [NextCarDepartureTime] ,
LEAD(T1.CarNumberID) OVER ( ORDER BY T1.id ) AS [NextCarNumberID]
FROM Test T1
)
SELECT * ,
DATEDIFF(HOUR, CarDepartedDateTime, NextCarDepartureTime) AS TurnAroundTime
FROM myCTE;
I have written one blog post for analytic function in my blog here : krishnrajrana.wordpress.com
This would be the basic syntax you would need:
SELECT
T1.Year, T1.Month, T1.CarNumberID, T1.CarDepartedDateTime, T1.NextCarNumberID,
T2.CarDepartedDateTime as [NextCarDepartureTime],
CONVERT(varchar,(T2.CarDepartedDateTime - T1.CarDepartedDateTime, 108) as [Turn Around Time]
FROM
DatabaseName..TableName T1
Left outer join DatabaseName..TableName T2 on T2.CarNumberID = T1.NextCarNumberID
Depending on other details in the table (Primary Keys, etc.) you may want to add details to the join filters and/or a WHERE clause.
There are various options available for how you want to format/display the [Turn Around Time] field using different values for the last parameter of the CONVERT function. See https://msdn.microsoft.com/en-us/library/ms187928.aspx for a list of different options.
I have a table with the below format:
ID curr Date Bid Ask
1 AUD/NZD 20090501 00:00:00.833 1.2866 1.28733
2 AUD/NZD 20090501 01:01:01.582 1.28667 1.2874
3 AUD/NZD 20090501 02:01:01.582 1.28667 1.28747
Now I need to select the change of Bid and Ask column and store into a different table...The result should be like the following
Bid Change Ask Change
0.0000700 0.0000700
0.0000000 0.0000700
select
t1.id,
t1.curr,
t1.date,
t1.bid,
t1.bid - t2.bid [Bid Change],
t1.ask,
t1.ask - t2.ask [Ask Change]
from tbltest t1
left join tbltest t2 on t1.ID = t2.ID + 1
order by date
This query returns everything correct except the format of Bid Change and Ask Change...like the following
BID Change ASK CHANGE
7.00000000000145E-05 7.00000000000145E-05
Am really clueless on what to do with this situation...any little help will work.
Thanks in advance!
It doesn't seem necessary to me to store them in a different table, you can just calculate them on the fly using APPLY, this way any changes to the underlying data will not cause your change data to be stale:
SELECT T.*,
BidChange = t.Bid - prev.Bid,
AskChange = t.Ask - prev.Ask
FROM T
OUTER APPLY
( SELECT TOP 1 T2.Bid, T2.Ask
FROM T AS T2
WHERE T2.Curr = T.Curr
AND T2.Date < T.Date
ORDER BY T2.Date DESC
) AS prev;
If this is something you will need regularly then you may want to consider a view, rather than storing it in a table.
I know that this type of question is answered many times here, but I can't use any answer to solve my problem, so please help. Here is my problem.
table 1
ID1 CustID Owe
Table 2
ID2 CustID Paid
I need simple thing, in one sql query I need sum(TotalOwe - TotalPaid) as Result where 1.custID=2.custID=#custID (this is not example of my query, don't correct it, this is just explanation). Or even more simpler, Customer with ID = 112 have TotalOwe of xxx and he is already paid TotalPaid, so he now owes TotalOwe - TotalPaid.
This looks really simple, I am even little embarrassed for asking, but I really don't have any more time for experimenting. I was close in one moment, but values of TotalOwe and TotalPaid was doubled, I don't know why but that is another thing.
SELECT COALESCE(TotalOwed,0) - COALESCE(TotalPaid,0)
FROM ( SELECT CustID,
SUM(Owe) TotalOwed
FROM table1
GROUP BY CustID) T1
FULL JOIN ( SELECT CustID,
SUM(Paid) TotalPaid
FROM table2
GROUP BY CustID) T2
ON T1.CustID = T2.CustID
WHERE COALESCE(T1.CustID,T2.CustID) = 112
Edit: Realized there's tons of these questions. Trying a subquery and looking through those.
Edit2: Just needed a subquery. Works now. For the sake of anyone else looking at this, I added
where t2.startdate = (select max(startdate) from table2 as sub
where sub.item = t1.item)
and t1.effectivedate = (select max(startdate) from table2 as sub2)
I'm currently writing a query to pull two rates from two separate tables, and return the difference between the two rates. I'm having trouble with getting the proper rates. I need to only get the rates for the most recent listing for each item. My data in the tables looks like this:
todate item rate
2014-01-15 pencil -0.07
2014-01-17 pencil -0.03
2014-02-22 pencil -0.05
2014-01-15 pen -0.013
2014-01-17 pen -0.02
2014-02-22 pen -0.032
I want it to return this (assuming both tables are exactly the same):
Item Rate1 Rate2 Difference Date
Pencil -0.05 -0.05 0 2014-02-22
Pen -0.032 -0.032 0 2014-02-22
Both tables are more or less the same thing, just with different rates. My problem is I end up getting multiple dates regardless of how I change the query.
I have this right now:
use db
select t1.item, t2.Rate as t2Rate, t1.Rate as t1Rate,(abs(t2.Rate) - abs(t1.Rate))
as Dffrnce, t2.startDate
from table2 as t2 join table1 as t1
on t2.item = t1.item
where t2.StartDate = t1.EffectiveDate
group by t1.item, t2.StartDate, t2.Rate, t1.Rate
having t2.StartDate = max(t2.StartDate)
order by t1.item
I'm guessing my problem is stemming from me not checking each item for their max date specifically. But I'm not entirely sure how to do that. I tried using distinct but that returned the same result. Am I missing something obvious? I only want to grab the rates from the most recent date. I've tried joining on the item and max date, a having statement having max(t2.StartDate) = t1.effectivedate but nothing seems to be working.
Just saw your edit that says you got it working. Nice.
You might consider a different way to identify the rows you want to work with.
Which flavor of SQL are you using? Not everything works everywhere.
DECLARE #Rate1 AS TABLE (id INT, rateDate DATE, rate INT)
DECLARE #Rate2 AS TABLE (id INT, rateDate DATE, rate INT)
INSERT INTO #Rate1 (id, rateDate, rate) VALUES (1, '2000-01-01', 1),(1, '2001-01-01', 3),(2, '2000-01-01', 4)
INSERT INTO #Rate2 (id, rateDate, rate) VALUES (1, '2001-01-01', 2),(2, '2002-01-01', 3)
;WITH r1 AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY rateDate DESC) rn FROM #Rate1)
, r2 AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY rateDate DESC) rn FROM #Rate2)
SELECT *
FROM r1
INNER JOIN
r2 ON r1.id=r2.id
WHERE r1.rn=1
AND r2.rn=1