Cross joins in results - sql

Mapping:A single address id can have different tracking ids. Each tracking id and each address id will have distinct lat and long pairs. Each tracking id can have multiple route ids although most of the time it will be a single route id to tracking id mapping.
Update: The tracking ids that I am selecting from T1_2 may or may not exist in the other tables. Also, there are no duplicates for each of the temp tables I am using for the final select statement(based on the key value).
I am having a problem with the results of the following query.The query is supposed to produce metrics for distance deviations of delivery points from addresses. Its performing some cross joins on the columns and hence the data is more than it should be. I know this is related to granularity and is a basic mistake but its hard for me to find where I went wrong. If someone can give me some pointers,please do. A subset of the results have been attached as a link and I have also highlighted a sample tracking id which should have come only once(with only route id).The results should include address ids repeating as many times as there are distinct tracking_ids which in turn should be in sync with no_pkg column. The query is also attached for reference.
Results subset
CREATE OR REPLACE FUNCTION f_stop_distance (Float, Float, Float, Float) /* This calculates distance in meters between two sets of lat and long */
RETURNS FLOAT
IMMUTABLE
AS $$
SELECT
2 * 6373000 * ASIN( SQRT( ( SIN( RADIANS(($3 - $1) / 2) ) ) ^ 2 + COS(RADIANS($1)) * COS(RADIANS($3)) * (SIN(RADIANS(($4 - $2) / 2))) ^ 2))
$$ LANGUAGE sql
;
CREATE TEMPORARY TABLE T1 AS /* This is to get top 1000 address ids which are unique identifiers for addresses in terms of orders frequency which is decided by number of distinct ordering order ids */
SELECT destination_address_id
,COUNT(DISTINCT ordering_order_id)a
,COUNT(DISTINCT tracking_id) no_pkg
FROM lmaa_pm.perfectmile_onroad_events_na
where shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND tracking_id IS NOT NULL
GROUP BY destination_address_id,delivery_station_code
ORDER BY a DESC
LIMIT 1000
;
CREATE TEMPORARY TABLE T1_2 AS /* This is to get tracking ids corresponding to those top 1000 address ids */
SELECT DISTINCT destination_address_id
,tracking_id
FROM lmaa_pm.perfectmile_onroad_events_na
WHERE destination_address_id IN (SELECT destination_address_id FROM T1)
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND tracking_id IS NOT NULL
GROUP BY 1,2
;
CREATE TEMPORARY TABLE T2 AS /* This is to get lat long pairs for addresses and delivery point respectively */
SELECT DISTINCT gdd.lat1
,gdd.long1
,gdd.external_address_id destination_address_id
,gdd.tracking_id
,gdd.actual_lat
,gdd.actual_long
,ROW_NUMBER() OVER(PARTITION BY tracking_id ORDER BY deliverydate DESC) rn /* This is to avoid duplicates since this table contains duplicates */
FROM gtech.geocoding_data_daily_na gdd
WHERE gdd.shipment_status_id in (51,'DELIVERED')
AND tracking_id IN(SELECT tracking_id FROM T1_2)
AND confidence1 = 'high'
AND gdd.station_code='DCH1'
AND deliverydate BETWEEN '2018-12-01' AND '2018-12-31'
AND actual_lat IS NOT NULL
AND actual_long IS NOT NULL
;
CREATE TEMPORARY TABLE T2_2 AS
SELECT *
FROM T2
WHERE rn = 1
;
CREATE TEMPORARY TABLE T3 AS
SELECT T2_2.lat1
,T2_2.long1
,T2_2.actual_lat
,T2_2.actual_long
,T2_2.tracking_id
,T2_2.destination_address_id
,CASE /* This function is for identifying distance deviations in the order of 0 - 10 metres, 10-20 metres and so on */
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) <=10 THEN '0_to_10'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >10
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=20 THEN '10_to_20'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long)>20
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=50 THEN '20_to_50'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >50 THEN 'gt_50'
END AS Dev_from_address
FROM T2_2
ORDER BY T2_2.tracking_id
;
CREATE TEMPORARY TABLE T4 AS /* Doing some percentage calculations based on the new buckets created in the previous temp table namely percentage calculations out of total */
SELECT SUM(CASE WHEN Dev_from_address = '0_to_10' THEN 1 ELSE 0 END)a
,SUM(CASE WHEN Dev_from_address = '10_to_20' THEN 1 ELSE 0 END)b
,SUM(CASE WHEN Dev_from_address = '20_to_50' THEN 1 ELSE 0 END)c
,SUM(CASE WHEN Dev_from_address = 'gt_50' THEN 1 ELSE 0 END)d
,tracking_id
,(a/(a+b+c+d)::DECIMAL(10,2) * 100) AS e
,(b/(a+b+c+d)::DECIMAL(10,2) * 100) AS f
,(c/(a+b+c+d)::DECIMAL(10,2) * 100) AS g
,(d/(a+b+c+d)::DECIMAL(10,2) * 100) AS h
FROM T3
GROUP BY tracking_id
;
CREATE TEMPORARY TABLE T5 AS /* adding info for route id to the existing data */
SELECT DISTINCT route_id
,tracking_id
,ROW_NUMBER() OVER (PARTITION BY tracking_id ORDER BY DATE DESC) rnnn /* to avoid duplicates */
FROM omw.route_actuals_na
WHERE tracking_id IN (SELECT tracking_id FROM T1_2)
AND stop_type = 'Dropoff'
AND scan_status = 'DELIVERED'
;
CREATE TEMPORARY TABLE T5_final AS
SELECT *
FROM T5
WHERE rnnn = 1
;
/* final select */
SELECT DISTINCT T1_2.destination_address_id
,T3.lat1
,T3.long1
,T3.actual_lat
,T3.actual_long
,T3.Dev_from_address
,T1_2.tracking_id
,T1.no_pkg
,T4.e
,T4.f
,T4.g
,T4.h
,T5_final.route_id
FROM T3
JOIN T4 ON T4.tracking_id = T3.tracking_id
JOIN T1 ON T1.destination_address_id = T3.destination_address_id
JOIN T1_2 ON T1_2.destination_address_id = T3.destination_address_id
JOIN T5_final ON T5_final.tracking_id = T3.tracking_id
ORDER BY T1_2.destination_address_id

strictly - no full cross joins there - however you may have a many to many join.
To track this down try taking a look at each of your joins to see whether you have >1 key value
select tracking_id,count(*) from t4 group by 1 having count(*) > 1;
select destination_address_id,count(*) from t1 group by 1 having count(*) > 1;
select tracking_id ,count(*) from t5_final group by 1 having count(*) > 1;
where you have values returned, that could be your cause. this may help you identify where you have a many to many join.

Related

ORA-00904 "invalid identifier" but identifier exists in query

I'm working in a fault-reporting Oracle database, trying to get fault information out of it.
The main table I'm querying is Incident, which includes incident information. Each record in Incident may have any number of records in the WorkOrder table (or none) and each record in WorkOrder may have any number of records in the WorkLog table (or none).
What I am trying to do at this point is, for each record in Incident, find the WorkLog with the minimum value in the field MXRONSITE, and, for that worklog, return the MXRONSITE time and the REPORTDATE from the work order. I accomplished this using a MIN subquery, but it turned out that several worklogs could have the same MXRONSITE time, so I was pulling back more records than I wanted. I tried to create a subsubquery for it, but it now says I have an invalid identifier (ORA-00904) for WOL1.WONUM in the WHERE line, even though that identifier is in use elsewhere.
Any help is appreciated. Note that there is other stuff in the query, but the rest of the query works in isolation, and this but doesn't work in the full query or on its own.
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM)
ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE (WL1.WORKLOGID IN
(SELECT MIN(WL3.WORKLOGID)
FROM (SELECT MIN(WL3.MXRONSITE), WL3.WORKLOGID
FROM Maximo.Worklog WL3 WHERE WOL1.WONUM = WL3.RECORDKEY))
or WL1.WORKLOGID is null)
To clarify, what I want is:
For each fault in Incident,
the earliest MXRONSITE from the Worklog table (if such a value exists),
For that worklog, information from the associated record from the WorkOrder table.
This is complicated by Incident records having multiple work orders, and work orders having multiple work logs, which may have the same MXRONSITE time.
After some trials, I have found an (almost) working solution:
WITH WLONSITE as (
SELECT
MIN(WLW.MXRONSITE) as "ONSITE",
WLWOW.ORIGRECORDID as "TICKETID",
WLWOW.WONUM as "WONUM"
FROM
MAXIMO.WORKLOG WLW
INNER JOIN
MAXIMO.WORKORDER WLWOW
ON
WLW.RECORDKEY = WLWOW.WONUM
WHERE
WLWOW.ORIGRECORDCLASS = 'INCIDENT'
GROUP BY
WLWOW.ORIGRECORDID, WLWOW.WONUM
)
select
incident.ticketid,
wlonsite.onsite,
wlonsite.wonum
from
maximo.incident
LEFT JOIN WLONSITE
ON WLONSITE.TICKETID = Incident.TICKETID
WHERE
(WLONSITE.ONSITE is null or WLONSITE.ONSITE = (SELECT MIN(WLONSITE.ONSITE) FROM WLONSITE WHERE WLONSITE.TICKETID = Incident.TICKETID AND ROWNUM=1))
AND Incident.AFFECTEDDATE >= TO_DATE ('01/12/2015', 'DD/MM/YYYY')
This however is significantly slower, and also still not quite right, as it turns out a single Incident can have multiple Work Orders with the same ONSITE time (aaargh!).
As requested, here is a sample input, and what I want to get from it (apologies for the formatting). Note that while TICKETID and WONUM are primary keys, they are strings rather than integers. WORKLOGID is an integer.
Incident table:
TICKETID / Description / FieldX
1 / WORD1 / S
2 / WORD2 / P
3 / WORDX /
4 / / Q
Work order table:
WONUM / ORIGRECORDID / REPORTDATE
11 / 1 / 2015-01-01
12 / 2 / 2015-01-01
13 / 2 / 2015-02-04
14 / 3 / 2015-04-05
Worklog table:
WORKLOGID / RECORDKEY / MXRONSITE
101 / 11 / 2015-01-05
102 / 12 / 2015-01-04
103 / 12 /
104 / 12 / 2015-02-05
105 / 13 /
Output:
TICKETID / WONUM / WORKLOGID
1 / 11 / 101
2 / 12 / 102
3 / /
4 / /
(Worklog 101 linked to TICKETID 1, has non-null MXRONSITE, and is from work order 11)
(Worklogs 102-105 linked to TICKETID 2, of which 102 has lowest MXRONSITE, and is work order 12)
(No work logs associated with faults 103 or 104, so work order and worklog fields are null)
Post Christmas attack!
I have found a solution which works:
The method I found was to use multiple WITH queries, as follows:
WLMINL AS (
SELECT
RECORDKEY, MXRONSITE, MIN(WORKLOGID) AS "WORKLOG"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY, MXRONSITE
),
WLMIND AS (
SELECT
RECORDKEY, MIN(MXRONSITE) AS "MXRONSITE"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY
),
WLMIN AS (
SELECT
WLMIND.RECORDKEY AS "WONUM", WLMIND.MXRONSITE AS "ONSITE", WLMINL.WORKLOG AS "WORKLOGID"
FROM
WLMIND
INNER JOIN
WLMINL
ON
WLMIND.RECORDKEY = WLMINL.RECORDKEY AND WLMIND.MXRONSITE = WLMINL.MXRONSITE
)
Thus for each work order finding the first date, then for each work order and date finding the lowest worklogid, then joining the two tables. This is then repeated at a higher level to find the data by incident.
However this method does not work in a reasonable time, so while it may be suitable for smaller databases it's no good for the behemoths I'm working with.
I would do this with row_number function:
SQLFiddle
select ticketid, case when worklogid is not null then reportdate end d1, mxronsite d2
from (
select i.ticketid, wo.reportdate, wl.mxronsite, wo.wonum, wl.worklogid,
row_number() over (partition by i.ticketid
order by wl.mxronsite, wo.reportdate) rn
from incident i
left join workorder wo on wo.origrecordid = i.ticketid
and wo.origrecordclass = 'INCIDENT'
left join worklog wl on wl.recordkey = wo.wonum )
where rn = 1 order by ticketid
When you nest subqueries, you cannot access columns that belong two or more levels higher; in your statement, WL1 is not accessible in the innermost subquery. (There is also a group-by clause missing, btw)
This might work (not exactly sure what output you expect, but try it):
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (
Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM
) ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE WL1.WORKLOGID =
( SELECT MIN(WL3.WORKLOGID)
FROM Maximo.WorkOrder WOL3
LEFT JOIN Maximo.Worklog WL3
ON WL3.RECORDKEY = WOL3.WONUM
WHERE WOL3.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL3.MXRONSITE IS NOT NULL
)
OR WL1.WORKLOGID IS NULL AND NOT EXISTS
( SELECT MIN(WL4.WORKLOGID)
FROM Maximo.WorkOrder WOL4
LEFT JOIN Maximo.Worklog WL4
ON WL4.RECORDKEY = WOL4.WONUM
WHERE WOL4.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL4.MXRONSITE IS NOT NULL )
I may not have the details right on what you're trying to do... if you have some sample input and desired output, that would be a big help.
That said, I think an analytic function would help a lot, not only in getting the output but in organizing the code. Here is an example of how the max analytic function in a subquery could be used.
Again, the details on the join may be off -- if you can furnish some sample input and output, I'll bet someone can get to where you're trying to go:
with wo as (
select
wonum, origrecordclass, origrecordid, reportdate,
max (reportdate) over (partition by origrecordid) as max_date
from Maximo.workorder
where origrecordclass = 'INCIDENT'
),
logs as (
select
worklogid, mxronsite, recordkey,
max (mxronsite) over (partition by recordkey) as max_mx
from Maximo.worklog
)
select
i.ticketid,
l.mxronsite as "Date_First_Onsite",
wo.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo on
wo.origrecordid = i.ticketid and
wo.reportdate = wo.max_date
left join logs l on
wo.wonum = l.recordkey and
l.mxronsite = l.max_mx
-- edit --
Based on your sample input and desired output, this appears to give the desired result. It does do somewhat of an explosion in the subquery, but hopefully the efficiency of the analytic functions will dampen that. They are typically much faster, compared to using group by:
with wo_logs as (
select
wo.wonum, wo.origrecordclass, wo.origrecordid, wo.reportdate,
l.worklogid, l.mxronsite, l.recordkey,
max (reportdate) over (partition by origrecordid) as max_date,
min (mxronsite) over (partition by recordkey) as min_mx
from
Maximo.workorder wo
left join Maximo.worklog l on wo.wonum = l.recordkey
where wo.origrecordclass = 'INCIDENT'
)
select
i.ticketid, wl.wonum, wl.worklogid,
wl.mxronsite as "Date_First_Onsite",
wl.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo_logs wl on
i.ticketid = wl.origrecordid and
wl.mxronsite = wl.min_mx
order by 1

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit

My query is returning duplicates

I have written an SQL query to filter for a number of conditions, and have used distinct to find only unique records.
Specifically, I need only for the AccountID field to be unique, there are multiple AddressClientIDs for each AccountID.
The query works but is however producing some duplicates.
Further caveats are:
There are multiple trans for each AccountID
There can be trans record both Y and N for an AccountID
I only want to return AccountIDs which have transaction for statuses other than what's specified, hence why I used not in, as I do not want the 2 statuses.
I would like to find only unique values for the AccountID column.
If anyone could help refine the query below, it would be much appreciated.
SELECT AFS_Account.AddressClientID
,afs_transunit.AccountID
,SUM(afs_transunit.Units)
FROM AFS_TransUnit
,AFS_Account
WHERE afs_transunit.AccountID IN (
-- Gets accounts which only have non post statuses
SELECT DISTINCT accountid
FROM afs_trans
WHERE accountid NOT IN (
SELECT accountid
FROM afs_trans
WHERE STATUS IN (
'POSTPEND'
,'POSTWAIT'
)
)
-- This gets the unique accountIDs which only have transactions with Y status,
-- and removes any which have both Y and N.
AND AccountID IN (
SELECT DISTINCT accountid
FROM afs_trans
WHERE IsAllocated = 'Y'
AND accountid NOT IN (
SELECT DISTINCT AccountID
FROM afs_trans
WHERE IsAllocated = 'N'
)
)
)
AND AFS_TransUnit.AccountID = AFS_Account.AccountID
GROUP BY afs_transunit.AccountID
,AFS_Account.AddressClientID
HAVING SUM(afs_transunit.Units) > 100
Thanks.
Since you confirmed that you have one-to-many relationship across two tables on AccountID column, you could use Max value of your AccountID to get distinct values:
SELECT afa.AddressClientID
,MAX(aft.AccountID)
,SUM(aft.Units)
FROM AFS_TransUnit aft
INNER JOIN AFS_Account afa ON aft.AccountID = afa.AccountID
GROUP BY afa.AddressClientID
HAVING SUM(aft.Units) > 100
AND MAX(aft.AccountID) IN (
-- Gets accounts which only have non post statuses
-- This gets the unique accountIDs which only have transactions with Y status,
-- and removes any which have both Y and N.
SELECT DISTINCT accountid
FROM afs_trans a
WHERE [STATUS] NOT IN ('POSTPEND','POSTWAIT')
AND a.accountid IN (
SELECT t.accountid
FROM (
SELECT accountid
,max(isallocated) AS maxvalue
,min(isallocated) AS minvalue
FROM afs_trans
GROUP BY accountid
) t
WHERE t.maxvalue = 'Y'
AND t.minvalue = 'Y'
)
)
SELECT AFS_Account.AddressClientID
,afs_transunit.AccountID
,SUM(afs_transunit.Units)
FROM AFS_TransUnit
INNER JOIN AFS_Account ON AFS_TransUnit.AccountID = AFS_Account.AccountID
INNER JOIN afs_trans ON afs_trans.acccountid = afs_transunit.accountid
WHERE afs_trans.STATUS NOT IN ('POSTPEND','POSTWAIT')
-- AND afs_trans.isallocated = 'Y'
GROUP BY afs_transunit.AccountID
,AFS_Account.AddressClientID
HAVING SUM(afs_transunit.Units) > 100
and max(afs_trans.isallocated) = 'Y'
and min(afs_trans.isallocated) = 'Y'
Modified your query with ANSI SQL join syntax. As you are joining the tables, you just need to specify the conditions without using the sub-queries you have.

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1
You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO
You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle
The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.

Combine 2 or more rows having characters and others having currency

I have an output like this by a query:
ID customer_name_Now customer_name_Before MOVEMENT
123451 Rustle Bock ltd N £2,121
123451 N Rustle Bock ltd -£25,666,899
123452 Little Garage Ltd N £6,987
123453 N The Big Shop £15,850
Story is that, I have 2 months of data. In both months the customer may or may not have movements depending if it was our customer last month or if its a customer now. Many cases it is customer in both months, hence I get 2 rows like the above.
The ideal output should be :
ID customer_name_Now customer_name_Before MOVEMENT
123451 Rustle Bock ltd Rustle Bock ltd -£25,664,778
123452 Little Garage Ltd N £6,987
123453 N The Big Shop £15,850
So movements should sum to give me the actual movement in months and the names of customer should be in both columns given the customer has relation in both months.
#DMK the query I used to get the initial output is:
select /*+ NO_REWRITE */
customer_id,
customer_name_now,
customer_name_before,
movement
from
(select /*+ NO_REWRITE */
main.customer_id,
main.customer_name_now,
main.customer_name_before,
main.limits_before,
main.limits_now,
sum(main.limits_now-main.limits_before) as movement
from
(select /*+ NO_REWRITE */
customer_id,
(customer_name_before) as customer_name_before,
(customer_name_now) as customer_name_now,
sum(limits_current) as limits_now,
sum(limits_previous) as limits_before
from
(select /*+ NO_REWRITE */
sub.customer_id,
sub.customer_name_now,
sub.customer_name_before,
sub.limits_current,
sub.limits_previous
from
(select /*+ NO_REWRITE */
T2.customer_ID,
(T2.customer_name) customer_name_now,
'N' customer_name_before,
sum(T26.AGREED_LIMIT) limits_current,
0 limits_previous
from
DWH_customer_HISTORY T2,
DWH_TIME_DIM T25,
DWH_FACILITY_MONTHLY T2
where
---some internal filters are applied here, i habe ot shown coz of security reasons----
and
T25.MONTH_END = '2012-11-30' and
group by
T2.customer_ID,
T2.customer_name,
) sub
union all
select /*+ NO_REWRITE */
sub.customer_id,
sub.customer_name_now,
sub.customer_name_before,
sub.limits_current,
sub.limits_previous
from
(select /*+ NO_REWRITE */
T2.customer_ID,
'N' as customer_name_now,
(T2.customer_name)customer_name_before,
0 limits_current,
sum(T2.AGREED_LIMIT) limits_previous,
from
DWH_customer_HISTORY T2,
DWH_TIME_DIM T25,
DWH_FACILITY_MONTHLY T2
where
---some internal filters are applied here, i habe ot shown coz of security reasons----
and
T25.MONTH_END = '2012-10-31'
group by
T2.customer_ID,
T2.customer_name,) sub
) un
group by
customer_id,
customer_name_now,
customer_name_before,) main
group by
main.customer_id,
main.customer_name_now,
main.customer_name_before,
main.limits_before,
main.limits_now)
I've assumed your using SqlServer, though the query below will work in MySql also.
Select c1.ID, c1.customer_name_Now, c2.customer_name_Before, Total
from Customers c1
left Join Customers c2
on c2.ID = c1.ID
left join
(select ID as ID2, sum(MOVEMENT) as Total, count(*) as Cnt
from Customers
group by ID) t1
on ID2 = c1.ID
where (c1.customer_name_Now <> 'N' and c2.customer_name_Before <> 'N')
or CNT = 1
Have a look at the following demo if your unsure
SqlFiddle
After looking at the Query you just added the above should still work. You either need to
Substitute my table Customers with your query
Or move your results from your query to a temporary table and substitute my table Customers with the temporary table
I'd go for the second. Saves re-running the same query over again.
In SQLServer2005+ use OVER clause
SELECT DISTINCT
ID,MAX(customer_name_Now) OVER (PARTITION BY ID) AS customer_name_Now,
MAX(customer_name_Before) OVER (PARTITION BY ID) AS customer_name_Before,
SUM(MOVEMENT) OVER (PARTITION BY ID) AS MOVEMENT
FROM your_table
OR
SELECT ID, MAX(customer_name_Now) AS customer_name_Now,
MAX(customer_name_Before) AS customer_name_Before,
SUM(MOVEMENT) AS MOVEMENT
FROM your_table
GROUP BY ID
Demo on SQLFiddle