Date convolution thru timestamped transition table -- how to - google-bigquery

We need to convolute every day from a point in the past up to now against a list of timestamped, boolean device transitions. The final output should be a table that has a date:device_id entry for every day it is online (otherwise no entry for that date).
Here is an example transition table for a single device:
To generate the convolution calendar:
calendar AS (
SELECT day
FROM UNNEST (GENERATE_DATE_ARRAY('2011-05-15', CURRENT_DATE())) AS day
),
Then, to generate at least a table that only has the transition dates AFTER the transition event, so they can then subsequently be ranked and the most recent chosen (CROSS JOIN here -- yuck!):
joined_with_cal AS (
SELECT
cal.day as online_date,
otr.when_changed,
otr.device_id,
otr.is_online,
otr.rank_by_date
FROM
calendar AS cal
CROSS JOIN
ordered_transitions otr
WHERE
cal.day >= DATE(otr.when_changed)
),
Then, the code that attempts to rank and choose the most recent record in the partition by timestamp (when_changed or ranked_by_date -- neither seems to work):
SELECT
online_date,
when_changed,
device_id,
is_online,
rank_by_date,
FROM (
SELECT
online_date,
when_changed,
device_id,
is_online,
rank_by_date,
RANK() OVER (PARTITION BY device_id ORDER BY rank_by_date ASC) as final_rank
FROM
joined_with_cal
)
WHERE
final_rank = 1 AND
-- online_date < '2017-08-01' AND
device_id = 419609
ORDER BY
online_date,
when_changed,
device_id
However, this doesn't work and is obviously ugly.
Can someone suggest a correct, elegant solution?
Thanks in advance!

#Mikhail: thanks for looking at it and sorry my explanation was not clearer.
After a discussion with a colleague, I ended up using a self-join which seems to work:
trans_as_range_not_first AS (
SELECT
t1.device_id,
t1.rank_by_when,
t2.when_changed as online_start,
t1.when_changed as online_stop,
t1.account_id,
t1.account_name,
t1.server_type
FROM
ordered_trans AS t1 -- lower in rank index, later in time
LEFT JOIN
ordered_trans AS t2 -- greater in rank index, earlier in time
ON
t1.device_id = t2.device_id AND
t1.rank_by_when+1 = t2.rank_by_when -- current and next row
WHERE
t1.is_online = 0 AND t2.is_online = 1
GROUP BY
device_id,
rank_by_when,
online_start,
online_stop,
account_id,
account_name,
server_type
),

Related

SQL: How to create supplemental time-series records "out of thin air" from existing records

Suppose I have a table CUSTEVENTS listing customers active in certain months. I now want to consider a customer as being active even if it was in the prior two months.
Simple example, the data might start as:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
It should be transformed into the following, given the expanded definition of active:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
**2022-12-31
72378**
**2023-01-31
72378***
I'm arrive at the simplest / most elegant way to get there. I could certainly explode out the data using a time series reference table which would list all the pairs of MONTH_ENDING and "additional" MONTH_ENDING values that "count". Or perhaps I could UNION three subqueries that take the MONTH_ENDING, add_months(MONTH_ENDING,1) add_months(MONTH_ENDING,2). But, maybe there's something even more concise not involving multiple unioned queries or an instrumental time-mapping table.
I happen to be using Teradata but I'm not sure I care about platform-specificity; if there's a Teradata-only approach that works, I'll gladly take it.
The general approach is to first calculate the "Last" event time for a given customer, which is handled by something like
LAG(EVENT_DT) OVER (PARTITION BY CUSTNUM ORDER BY EVENT_DT)
The next concept is islands. You want to calculate that an island begins if the event happened after {your window} has elapsed from the prior one. Vice versa to calculate the island's end.
You can actually find some great online articles about this classic problem: Gaps and Islands problem.
If you understand CTE's, you can probably follow it through this example code I wrote. The first CTE is there to simply allow you to easily add a condition (instead of 1=1) for the events you care about.
WITH CTE_CONDITION AS (
SELECT
EVENT_DT AS dtm,
CUSTNUM
FROM
My_First_Table
WHERE
1 = 1
AND EVENT_DT is not null
),
CTE_LAGGED AS (
SELECT
dtm,
CUSTNUM,
LAG(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS previous_datetime,
LEAD(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS next_datetime,
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
CTE_CONDITION.dtm
) AS island_location
FROM
CTE_CONDITION
),
CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_start_datetime,
island_location AS island_start_location
FROM
CTE_LAGGED
WHERE
(
DATEDIFF(MONTH, previous_datetime, dtm) > 2
OR CTE_LAGGED.previous_datetime IS NULL
)
),
CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_end_datetime,
island_location AS island_end_location
FROM
CTE_LAGGED
WHERE
DATEDIFF(MONTH, dtm, next_datetime) > 2
OR CTE_LAGGED.next_datetime IS NULL
)
SELECT
CTE_ISLAND_START.CUSTNUM,
CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime,
DATEDIFF(
MONTH, CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime
) AS ISLAND_DURATION_MONTH,
(
SELECT
COUNT(*)
FROM
CTE_LAGGED
WHERE
CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime
AND CTE_ISLAND_END.island_end_datetime
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
) AS island_row_count
FROM
CTE_ISLAND_START
INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
AND CTE_ISLAND_START.CUSTNUM = CTE_ISLAND_END.CUSTNUM
I wrote this into a Rasgo template using Snowflake syntax, but only minor adjustments should be needed to get this to work in Teradata.
Once you have this result, then this tells you the periods of activity that include the 2 month window. You can then use a calendar table at each month-begin and query or not whether the customer was "active" or not based on whether that date falls into these active ranges.

Sum over a given time period

The following codes gives the total duration that a light has been switched on.
CREATE TABLE switch_times (
id SERIAL PRIMARY KEY,
is1 BOOLEAN,
id_dec INTEGER,
label TEXT,
ts TIMESTAMP WITH TIME ZONE default current_timestamp
);
CREATE VIEW makecount AS
SELECT *, row_number() OVER (PARTITION BY id_dec ORDER BY id) AS count
FROM switch_times;
select c1.label, SUM(c2.ts-c1.ts) AS sum
from
(makecount AS c1
inner join
makecount AS c2 ON c2.count = c1.count + 1)
where c2.is1=FALSE AND c1.id_dec = c2.id_dec AND c2.is1 != c1.is1
GROUP BY c1.label;
Link to working demo https://dbfiddle.uk/ZR8pLEBk
Any suggestions on how to alter the code so that it would give the sum over a given specific time period, say the 25th, during which all three lights were switched on for 12 hours? Problem 1: current code gives total sum, as follows. Problem 2: all durations that have not ended are disregarded, because there is no switch off time.
label sum
0x29 MH3 1 day 03:00:00
0x2B MH1 1 day 01:00:00
0x2C MH2 1 day 02:00:00
The expected results is just over a a given date, i.e.
label sum
0x29 MH3 12:00:00
0x2B MH1 12:00:00
0x2C MH2 12:00:00
Assuming the following (which should be defined in the question):
Postgres 15.
The table is big, many rows per label, performance matters, we can add indexes.
All columns are actually NOT NULL, you just forgot to declare columns as such.
Evey "light" has a distinct id_dec and a distinct label. Having both in switch_times is redundant. (Normalization!)
A light is "switched on" if the most recent earlier entry has is1 IS TRUE. Else it's considered "off".
The order of rows is established by ts, not by id as used in your query (typically incorrect).
Consecutive entries do not have to change the state.
No duplicate entries for (id_dec, ts). (There is a unique index enforcing that.)
There is no minimum or maximum time interval between entries.
"The 25th" is supposed to mean tstzrange '[2022-11-25 0:0+02, 2022-11-26 0:0+02)' (Note the time zone offsets.)
You want results for all labels that were switched on at all during the given time interval.
There is a table "labels" with one distinct entry per relevant light. If you don't have one, create it.
Indexes
Have at least these indexes to make everything fast:
CREATE INDEX ON switch_times (id_dec, ts DESC);
CREATE INDEX ON switch_times (ts);
Optional step to create table labels
CREATE TABLE labels AS
WITH RECURSIVE cte AS (
(
SELECT id_dec, label
FROM switch_times
ORDER BY 1
LIMIT 1
)
UNION ALL
(
SELECT s.id_dec, s.label
FROM cte c
JOIN switch_times s ON s.id_dec > c.id_dec
ORDER BY 1
LIMIT 1
)
)
TABLE cte;
ALTER TABLE labels
ADD PRIMARY KEY (id_dec)
, ALTER COLUMN label SET NOT NULL
, ADD CONSTRAINT label_uni UNIQUE (label)
;
Why this way? See:
Optimize GROUP BY query to retrieve latest row per user
Main query
WITH bounds(lo, hi) AS (
SELECT timestamptz '2022-11-25 0:0+02' -- enter time interval here *once*
, timestamptz '2022-11-26 0:0+02'
)
, snapshot AS (
SELECT id_dec, label, is1, ts
FROM switch_times s, bounds b
WHERE s.ts >= b.lo
AND s.ts < b.hi
UNION ALL -- must be separate
SELECT s.*
FROM labels l
JOIN LATERAL ( -- latest earlier entry
SELECT s.id_dec, s.label, s.is1, b.lo AS ts -- cut off at lower bound
FROM switch_times s, bounds b
WHERE s.id_dec = l.id_dec
AND s.ts < b.lo
ORDER BY s.ts DESC
LIMIT 1
) s ON s.is1 -- ... if it's "on"
)
SELECT label, sum(z - a) AS duration
FROM (
SELECT label
, lag(is1, 1, false) OVER w AS last_is1
, lag(ts) OVER w AS a
, ts AS z
FROM snapshot
WINDOW w AS (PARTITION BY label ORDER BY ts ROWS UNBOUNDED PRECEDING)
) sub
WHERE last_is1
GROUP BY 1;
fiddle
CTE bounds is an optional convenience feature to enter lower and upper bound of your time interval once.
CTE snapshot collects all rows of interest, which consists of
all rows inside the time interval (1st leg of UNION ALL query)
the latest earlier row if it was "on" (2nd leg of UNION ALL query)
We need to gather 2. separately to cover corner cases where the light was switched on earlier and there is no entry for the given time interval! But we can replace the timestamp to the lower bound immediately.
The final query gets the previous (is1, ts) for every row in a subquery, defaulting to "off" if there was no previous row.
Finally sum up intervals in the outer SELECT. Only sum what's switched on at the begin (no matter the final state).
Related:
Jump SQL gap over specific condition & proper lead() usage
My assumption
actual on time is time difference between is1 is true to next is1 false order by ts
Below query will calculate total sum of on time between two dates
select
id_dec ,
label,
sum(to_timestamp(nexttime)-ts) as time_def
from
(
select
id_dec,
"label",
ts,
is1,
case
when is1 = true then lead(extract(epoch from ts))over(partition by id_dec
order by
id_dec ,
ts asc)
else 0
end nexttime
from
switch_times
where
ts between '2022-11-24' and '2022-11-28'
) as a
where
nexttime <> 0
group by
id_dec,
label

Get apps with the highest review count since a dynamic series of days

I have two tables, apps and reviews (simplified for the sake of discussion):
apps table
id int
reviews table
id int
review_date date
app_id int (foreign key that points to apps)
2 questions:
1. How can I write a query / function to answer the following question?:
Given a series of dates from the earliest reviews.review_date to the latest reviews.review_date (incrementing by a day), for each date, D, which apps had the most reviews if the app's earliest review was on or later than D?
I think I know how to write a query if given an explicit date:
SELECT
apps.id,
count(reviews.*)
FROM
reviews
INNER JOIN apps ON apps.id = reviews.app_id
group by
1
having
min(reviews.review_date) >= '2020-01-01'
order by 2 desc
limit 10;
But I don't know how to query this dynamically given the desired date series and compile all this information in a single view.
2. What's the best way to model this data?
It would be nice to have the # of reviews at the time for each date as well as the app_id. As of now I'm thinking something that might look like:
... 2020-01-01_app_id | 2020-01-01_review_count | 2020-01-02_app_id | 2020-01-02_review_count ...
But I'm wondering if there's a better way to do this. Stitching the data together also seems like a challenge.
I think this is what you are looking for:
Postgres 13 or newer
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT app_id, total_ct
FROM cte c
WHERE c.earliest_review >= d.review_window_start
ORDER BY total_ct DESC
FETCH FIRST 1 ROWS WITH TIES -- new & hot
) sub
GROUP BY 1
) a ON true;
WITH TIES makes it a bit cheaper. Added in Postgres 13 (currently beta). See:
Get top row(s) with highest value, with ties
Postgres 12 or older
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT total_ct, app_id
, rank() OVER (ORDER BY total_ct DESC) AS rnk
FROM cte c
WHERE c.earliest_review >= d.review_window_start
) sub
WHERE rnk = 1
GROUP BY 1
) a ON true;
db<>fiddle here
Same as above, but without WITH TIES.
We don't need to involve the table apps at all. The table reviews has all information we need.
The CTE cte computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.
It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED in Postgres 12 or later to force it. See:
How to force evaluation of subquery before joining / pushing down to foreign server
The optimized generate_series() call produces the series of days from earliest to latest review. See:
Generating time series between two dates in PostgreSQL
Join a count query on generate_series() and retrieve Null values as '0'
Finally, the LEFT JOIN LATERAL you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start. Alternatively, define tiebreaker(s) to get at most one winner. See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
If you are looking for hints, then here are a few:
Are you aware of generate_series() and how to use it to compose a table of dates given a start and end date? If not, then there are plenty of examples on this site.
To answer this question for any given date, you need to have only two measures for each app, and only one of these is used to compare an app against other apps. Your query in part 1 shows that you know what these two measures are.
Hints 1 and 2 should be enough to get this done. The only thing I can add is for you not to worry about making the database do "too much work." That is what it is there to do. If it does not do it quickly enough, then you can think about optimizations, but before you get to that step, concentrate on getting the answer that you want.
Please comment if you need further clarification on this.
The missing piece for me was lateral join.
I can accomplish just about what I want using the following:
select
review_windows.review_window_start,
id,
review_total,
earliest_review
from
(
select
date_trunc('day', review_windows.review_windows) :: date as review_window_start
from
generate_series(
(
SELECT
min(reviews.review_date)
FROM
reviews
),
(
SELECT
max(reviews.review_date)
FROM
reviews
),
'1 year'
) review_windows
order by
1 desc
) review_windows
left join lateral (
SELECT
apps.id,
count(reviews.*) as review_total,
min(reviews.review_date) as earliest_review
FROM
reviews
INNER JOIN apps ON apps.id = reviews.app_id
where
reviews.review_date >= review_windows.review_window_start
group by
1
having
min(reviews.review_date) >= review_windows.review_window_start
order by
2 desc,
3 desc
limit
2
) apps_most_reviews on true;

Percentage difference between numbers in two columns

My SQL experience is fairly minimal so please go easy on me here. I have a table tblForEx and I'm trying to create a query that looks at one particular column LastSalesRateChangeDate and also ForExRate.
Basically what I want to do is for the query to check that LastSalesRateChangeDate and then pull the ForExRate that is on the same line (obviously in the ForExRate column), then I need to check to see if there is a +/- 5% change since the last time the LastSalesRateChangeDate changed. I hope this makes sense, I tried to explain it as clearly as possible.
I believe I would need to create a 'subquery' to look at the LastSalesRateChangeDate and pull the ForEx rate from that date, but I just don't know how to go about this.
I should add this is being done in Access (SQL)
Sample data, here is what the table looks like:
| BaseCur | ForCur | ForExRate | LastSalesRateChangeDate
| USD | BRL | 1.718 | 12/9/2008
| USD | BRL | 1.65 | 11/8/2008
So I would need a query to look at the LastSalesRateChangeDate column, check to see if the date has changed, if so take the ForExRate value and then give a percentage difference of that ForExRate value since the last record.
So the final result would likely look like
"BaseCur" "ForCur" "Percentage Change since Last Sales Rate Change"
USD BRL X%
Gordon's answer pointed in the right direction:
SELECT t2.*, (SELECT top 1 t.ForExRate
FROM tblForEx t
where t.BaseCur=t2.BaseCur AND t.ForCur=t2.ForCur and t.LastSalesRateChangeDate<t2.LastSalesRateChangeDate
order by t.LastSalesRateChangeDate DESC, t.ForExRate DESC
) AS PreviousRate, [ForExRate]/[PreviousRate]-1 AS ChangeRatio
FROM tblForEx AS t2;
Access gives errors where the TOP 1 in the subquery causes "ties". We broke the ties and therefore removed the error by adding an extra item to the ORDER BY clause. To get the ratio to display as a percentage, switch to the design view and change the properties of that column accordingly.
If I understand correctly, you want the previous value. In MS Access, you can use a correlated subquery:
select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t;
Now, with this as a subquery, you can get the previous exchange rate using a join:
select t.*, ( (t.ForExRate / tprev.ForExRate) - 1) as change_ratio
from (select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t
) as t inner join
tblForEx as tprev
on tprev.BaseCur = t.BaseCur and tprev.ForCur = t.ForCur
tprev.LastSalesRateChangeDate = t.prev_LastSalesRateChangeDate;
As per my understanding, you can use LEAD function to get last changed date Rate in a new column by using below query:
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float)
FROM CTE
Problem here is:
for every last row in group by you will have new calculalted column which we have made using LEAD function.
If there is only a single row for a particular BaseCur and ForCur, then also you will have NULL in column.
Resolution:
If you are sure that there will be at least two rows for each BaseCur and ForCur, then you can use WHERE clause to remove NULL values in final result.
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float) Percentage
FROM CTE
WHERE LastValue IS NOT NULL
SELECT basetbl.BaseCur, basetbl.ForCur, basetbl.NewDate, basetbl.OldDate, num2.ForExRate/num1.ForExRate*100 AS PercentChange FROM
(((SELECT t.BaseCur, t.ForCur, MAX(t.LastSalesRateChangeDate) AS NewDate, summary.Last_Date AS OldDate
FROM (tblForEx AS t
LEFT JOIN (SELECT TOP 2 BaseCur, ForCur, MAX(LastSalesRateChangeDate) AS Last_Date FROM tblForEx AS t1
WHERE LastSalesRateChangeDate <>
(SELECT MAX(LastSalesRateChangeDate) FROM tblForEx t2 WHERE t2.BaseCur = t1.BaseCur AND t2.ForCur = t1.ForCur)
GROUP BY BaseCur, ForCur) AS summary
ON summary.ForCur = t.ForCur AND summary.BaseCur = t.BaseCur)
GROUP BY t.BaseCur, t.ForCur, summary.Last_Date) basetbl
LEFT JOIN tblForEx num1 ON num1.BaseCur=basetbl.BaseCur AND num1.ForCur = basetbl.ForCur AND num1.LastSalesRateChangeDate = basetbl.OldDate))
LEFT JOIN tblForEx num2 ON num2.BaseCur=basetbl.BaseCur AND num2.ForCur = basetbl.ForCur AND num2.LastSalesRateChangeDate = basetbl.NewDate;
This uses a series of subqueries. First, you are selecting the most recent date for the BaseCur and ForCur. Then, you are joining onto that the previous date. I do that by using another subquery to select the top two dates, and exclude the one that is equal to the previously established most recent date. This is the "summary" subquery.
Then, you get the BaseCur, ForCur, NewDate, and OldDate in the "basetbl" subquery. After that, it is two simple joins of the original table back onto those dates to get the rate that was applicable then.
Finally, you are selecting your BaseCur, ForCur, and whatever formula you want to use to calculate the rate change. I used a simple ratio in that one, but it is easy to change. You can remove the dates in the first line if you want, they are there solely as a reference point.
It doesn't look pretty, but complicated Access SQL queries never do.

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate