HAVING Clause Issues in SQL Server 2008 - sql

I am experiencing issues with my HAVING clause.
Explanation: Each orderno has at least one rxnum tied to it. But each rxnum can have multiple scripts (scriptitemcnt)'s. My problem is that I am trying to use a HAVING clause to ONLY pull orderno's that have the SUM of items less than or equal to 8. The query will execute, but it is still pulling orders that have item sum greater than 8. Here is my code:
SELECT
oh.orderno,
od.rxnum,
SUM(od.scriptitemcnt) as scriptitemcnt,
od.ndctopick,
od.drugdesc,
od.unitno,
od.status,
od.datetimefilled,
od.packingunit,
od.datetimepacked,
oh.totesideinorder
FROM
mck_hvs.oldorderdetails od with( nolock ),
mck_hvs.oldorderheader oh with( nolock )
WHERE
oh.orderno = od.orderno
and od.status != 5
and ( #dateFrom is NULL or od.datetimepacked >= cast( #dateFrom as datetime ) )
and ( #dateTo is NULL or od.datetimepacked < cast( #dateTo as datetime ) + 1 )
and oh.totesideinorder = 'N'
and od.packingunit NOT IN (695, 696, 697, 698)
GROUP BY
oh.orderno,
od.rxnum,
od.scriptitemcnt,
od.ndctopick,
od.drugdesc,
od.rxnum,
od.unitno,
od.status,
od.datetimefilled,
od.packingunit,
od.datetimepacked,
oh.totesideinorder
HAVING
SUM(od.scriptitemcnt) <= '8'
ORDER BY
oh.orderno asc,
od.rxnum asc

I see two possible problems. First, you're comparing the count against the string '8' instead of the number 8. Depending on how SQL server interprets things, that could result in numbers like 10 being treated as less than 8.
Now that's the simple issue. From your description it sounds like you want orders that have a total count not greater than 8. If that's right, then the bigger issue follows...
When you use GROUP BY, aggregates (like sum()) count within a group. Because you're showing (i.e. SELECTing) - and therefore grouping by - the RXNUM, the sum is per RXNUM, not per order. If you want to pull a record for each RXNUM but filter based on an aggregate for the entire ORDERNO it's a little trickier than what you're doing here.
You can use SUM() OVER() instead of SUM() as a starting point. But IIRC this can't be used directly for filtering in sql-server, so you end up needing a subquery. Something like
select *
from (select oh.orderno, od.rxnum
, sum(od.scriptitemcnt) over(partition by orderno) as order_scriptitemcnt
-- other data
from -- ...
)
where order_scriptitemcnt <= 8

Related

Slow Aggregates using as-of date

I have a query that's intended as the base dataset for an AR Aging report in a BI tool. The report has to be able to show AR as of a given date across a several-month range. I have the logic working, but I'm seeing pretty slow performance. Code below:
WITH
DAT AS (
SELECT
MY_DATE AS_OF_DATE
FROM
NS_REPORTS."PUBLIC".NETSUITE_DATE_TABLE
WHERE
CAST(CAST(MY_DATE AS TIMESTAMP) AS DATE) BETWEEN '2020-01-01' AND CAST(CAST(CURRENT_DATE() AS TIMESTAMP) AS DATE)
), INV AS
(
WITH BASE AS
(
SELECT
BAS1.TRANSACTION_ID
, DAT.AS_OF_DATE
, SUM(BAS1.AMOUNT) ORIG_AMOUNT_BASE
FROM
"PUBLIC".BILL_TRANS_LINES_BASE BAS1
CROSS JOIN DAT
WHERE
BAS1.TRANSACTION_TYPE = 'Invoice'
AND BAS1.TRANSACTION_DATE <= DAT.AS_OF_DATE
--AND BAS1.TRANSACTION_ID = 6114380
GROUP BY
BAS1.TRANSACTION_ID
, DAT.AS_OF_DATE
)
, TAX AS
(
SELECT
TRL1.TRANSACTION_ID
, SUM(TRL1.AMOUNT_TAXED * - 1) ORIG_AMOUNT_TAX
FROM
CONNECTORS.NETSUITE.TRANSACTION_LINES TRL1
WHERE
TRL1.AMOUNT_TAXED IS NOT NULL
AND TRL1.TRANSACTION_ID IN (SELECT TRANSACTION_ID FROM BASE)
GROUP BY
TRL1.TRANSACTION_ID
)
SELECT
BASE.TRANSACTION_ID
, BASE.AS_OF_DATE
, BASE.ORIG_AMOUNT_BASE
, COALESCE(TAX.ORIG_AMOUNT_TAX, 0) ORIG_AMOUNT_TAX
FROM
BASE
LEFT JOIN TAX ON TAX.TRANSACTION_ID = BASE.TRANSACTION_ID
)
SELECT
AR.*
, CASE
WHEN AR.DAYS_OUTSTANDING < 0
THEN 'Current'
WHEN AR.DAYS_OUTSTANDING BETWEEN 0 AND 30
THEN '0 - 30'
WHEN AR.DAYS_OUTSTANDING BETWEEN 31 AND 60
THEN '31 - 60'
WHEN AR.DAYS_OUTSTANDING BETWEEN 61 AND 90
THEN '61 - 90'
WHEN AR.DAYS_OUTSTANDING > 90
THEN '91+'
ELSE NULL
END DO_BUCKET
FROM
(
SELECT
AR1.*
, TRA1.TRANSACTION_TYPE
, DATEDIFF('day', AR1.AS_OF_DATE, CAST(CAST(TRA1.DUE_DATE AS TIMESTAMP) AS DATE)) DAYS_OUTSTANDING
, AR1.ORIG_AMOUNT_BASE + AR1.ORIG_AMOUNT_TAX + AR1.PMT_AMOUNT AMOUNT_OUTSTANDING
FROM
(
SELECT
INV.TRANSACTION_ID
, INV.AS_OF_DATE
, INV.ORIG_AMOUNT_BASE
, INV.ORIG_AMOUNT_TAX
, COALESCE(PMT.PMT_AMOUNT, 0) PMT_AMOUNT
FROM
INV
LEFT JOIN (
SELECT
TLK.ORIGINAL_TRANSACTION_ID
, DAT.AS_OF_DATE
, SUM(TLK.AMOUNT_LINKED * - 1) PMT_AMOUNT
FROM
CONNECTORS.NETSUITE."TRANSACTION_LINKS" AS TLK
CROSS JOIN DAT
WHERE
TLK.LINK_TYPE = 'Payment'
AND CAST(CAST(TLK.ORIGINAL_DATE_POSTED AS TIMESTAMP) AS DATE) <= DAT.AS_OF_DATE
GROUP BY
TLK.ORIGINAL_TRANSACTION_ID
, DAT.AS_OF_DATE
) PMT ON PMT.ORIGINAL_TRANSACTION_ID = INV.TRANSACTION_ID
AND PMT.AS_OF_DATE = INV.AS_OF_DATE
) AR1
JOIN CONNECTORS.NETSUITE."TRANSACTIONS" TRA1 ON TRA1.TRANSACTION_ID = AR1.TRANSACTION_ID
)
AR
WHERE
1 = 1
--AND CAST(AMOUNT_OUTSTANDING AS NUMERIC(15, 2)) > 0
AND AS_OF_DATE >= '2020-04-22'
As you can see, I'm using a date table for the as-of date logic. I think this is the best way to do it, but I welcome any suggestions for better practice.
If I run the query with a single as-of date, it takes 1 min 6 sec and the two main aggregates, on TRANSACTION_LINKS and BILL_TRANS_LINES_BASE, each take about 25% of processing time. I'm not sure why. If I run with the filter shown, >= '2020-04-22', it takes 3 min 33 sec and the aggregates each take about 10% of processing time; they're lower because the ResultWorker takes 63% of processing time to write the results because it's so many rows.
I'm new to Snowflake but not to SQL. My understanding is that Snowflake does not allow manual creation of indexes, but again, I'm happy to be wrong. Please let me know if you have any ideas for improving the performance of this query.
Thanks in advance.
EDIT 1:
Screenshot of most expensive node in query profile
Without seeing the full explain plan and having some sample data to play with it is difficult to give any definitive answers, but here a few thoughts, for what they are worth...
The first are more about readability and may not help performance much:
Don't embed CTEs within each other, just define them in the order that they are needed. There is no need to define BASE and TAX within INV
Use CTEs as much as possible. Your main SELECT statement has 2 other SELECT statements embedded within it. It would be much more readable if these were defined using CTEs
Specific performance issues:
Keep data volumes as low as possible for as long as possible. Your CROSS JOINs obviously create cartesian products that massively increases the volume of data - therefore implement this as late in your SQL as possible rather than right at the start as you have done
While it may make your SQL less readable, use as few SQL statements as possible. For example, you should be able to create your INV CTE with a single SELECT statement rather than the 3 statements/CTEs that you are using

Using Window Functions With Comparison Operator

I have the following query within a cte.
SELECT [item_id]
FROM [AWS_Stage]
WHERE [yr] IN('2020')
GROUP BY [item_id]
HAVING SUM(ISNULL([frcst_qty], 0)) >= 0
In the past I just needed all item_id's in 2020 greater than 0.
I now need to have all item id's greater than 0 by customer groupings. A line of code like the following makes sense:
SUM([frcst_qty]) OVER (PARTITION BY [item_id], [keycust4]) >= 0
I can't use a window function in a HAVING clause, and I can't have a comparison operator in the SELECT statement.
Any advice on how to make this work?
Could you please try using a subquery like below? This calculated the sum using window function and then using filter on the result. I removed year condition.
SELECT
[item_id]
FROM
(
SELECT
[item_id],
SUM([frcst_qty]) OVER (PARTITION BY [item_id], [keycust4] ORDER BY [item_id], [keycust4]) sum_qty
FROM
[AWS_Stage] ) subq
WHERE
sum_qty >=0
GROUP BY
[item_id]
select [keycust4], [item_id]
from [aws_stage]
where [yr] = '2020'
group by [keycust4], [item_id]
having SUM(ISNULL([frcst_qty], 0)) >= 0
try this. not sure though if i understood your question correctly

How to use DISTINCT in one column when SELECTING from many columns

I'm trying to select columns from two different views but I only want to use the DISTINCT statement on one specific column. I thought using the GROUP BY statement would work but it's throwing an error.
SELECT DISTINCT
[Act].[ClientId]
, [Ref].[Agency]
, [Act].[FundCode]
, [Act].[VService]
, [Act].[Service]
, [Act].[Attended]
, [Act].[StartDate]
FROM [dbo].[FS_v_CrossReference_ALL] AS [Ref]
INNER JOIN [dbo].[FS_v_Activities] AS [Act] ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] BETWEEN '1/1/2015' AND '12/31/2015'
GROUP BY [Act].[ClientId]
I want to use the DISTINCT statement on [Act].[ClientId]. Is there any way to do this?
Presumably, you want row_number():
SELECT ar.*
FROM (SELECT Act.*, Reg.Agency,
ROW_NUMBER() OVER (PARTITION BY Act.ClientId ORDER BY ACT.StartDate DESC) as seqnum
FROM [dbo].[FS_v_CrossReference_ALL] [Ref] JOIN
[dbo].[FS_v_Activities] Act
ON [Ref].[VendorId] = [Act].[VendorId]
WHERE [Act].[StartDate] >= '2015-01-01' AND
[Act].[StartDate] < '2016-01-01'
) ar
WHERE seqnum = 1;
Particularly note the changes to the date comparisons:
The dates are in standard format (YYYY-MM-DD or YYYYMMDD).
BETWEEN is replaced by two inequalities. This makes the code robust if the date is really a date/time with a time component.

Create weighted average in SQL using dates

I have a SQL query that lists details about a certain item. Everything works as should except for the last column. I want the weight of transaction column to report back a difference in days.
So for example the 4th row in the txdate column is 05/21/2014 and the 3rd row is 05/12/20014. The weight of transaction column in the 4th row should say 9.
I read about the Lag and Lead functions, but I'm not sure how to implement those with dates (if it's even possible). If it isn't possible is there a way to accomplish this?
Select t.txNumber,
t.item,
t.txCode,
t.txdate,
(t.onhandlocold + t.stockQty) as 'Ending Quantity',
tmax.maxtnumber 'Latest Transaction Code',
tmax.maxdate 'Latest Transaction Date',
tmin.mindate 'First Transaction Date',
(t.txdate - tmin.mindate) 'weight of transaction'
From tbliminvtxhistory t
Left outer join
(Select t.item, max(t.txnumber) as maxtnumber, max(t.txdate) as maxdate
From tbliminvtxHistory t
Where t.txCode != 'PAWAY'
Group By Item) tmax
on t.item = tmax.item
Left Outer Join
(Select t.item, min(t.txdate) as mindate
From tbliminvtxHistory t
WHere t.txCode != 'PAWAY'
and t.txdate > DateAdd(Year, -1, GetDate())
Group By Item) tmin
on t.item = tmin.item
where t.item = 'LR50M'
and t.txCode != 'PAWAY'
and t.txdate > DateAdd(Year, -1, GetDate())
Check out the DATEDIFF function, which will return the difference between two dates.
I think this is what you are looking for:
DATEDIFF(dd,tmin.mindate,t.txdate)
UPDATE:
Now that I understand your question a little better, here is an update. As mentioned in a comment on the above post, the LAG function is only supported in SQL 2012 and up. An alternative is to use ROW_NUMBER and store the results into a temp table. Then you can left join back to the same table on the next ROW_NUMBER in the results. Then you would use your DATEDIFF to compare the dates. This will do the exact same thing as the the LAG function.
Example:
SELECT ROW_NUMBER() OVER (ORDER BY txdate) AS RowNumber,*
INTO #Rows
FROM tbliminvtxhistory
SELECT DATEDIFF(dd,r2.txdate,r.txdate),*
FROM #Rows r
LEFT JOIN #Rows r2 ON r.RowNumber=r2.RowNumber+1
DROP TABLE #Rows
I think you are looking for this expression:
Select . . . ,
datediff(day, lag(txdate) over (order by xnumber), txdate)
This assumes that the rows are ordered by the first column, which seems reasonable given your explanation and the sample data.
EDIT:
Without lag() you can use outer apply. For simplicity, let me assume that your query is defined as a CTE:
with cte as (<your query here>)
select . . . ,
datediff(day, prev.txdate , cte.txdate)
from cte cross apply
(select top 1 cte2.*
from cte cte2
where cte2.xnumber < cte.xnumber
order by cte2.xnumber desc
) prev

How can I include in schedules today's departures after midnight using GTFS?

I began with GTFS and offhand ran into big problem with my SQL query:
SELECT *, ( some columns AS shortcuts )
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN( $incodes )
AND trips.service_id IN ( $service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t,l,sm
ORDER BY t ASC, l DESC
LIMIT 14
This should show departures from some stop in next 3 hours.
It works but with approaching midnight (e.g. 23:50) it catch only "today's departure". After midnight it catch only "new day departures" and departures from previous day are missing, because they have departure_time e.g. "24:05" (=not bigger than $time 00:05).
Is possible to use something lighter than UNION same query for next day?
If UNION is using, how can I ORDER departures for trimming by LIMIT?
Trips.start_time and end_time are my auxiliary variables for accelerate SQL query execution, it means sequence1-arrival_time and MAXsequence-departure_time of any trip.
Using UNION to link together a query for each day is going to be your best bet, unless perhaps you want to issue two completely separate queries and then merge the results together in your application. The contortionism required to do all this with a single SELECT statement (assuming it's even possible) would not be worth the effort.
Part of the complexity here is that the set of active service IDs can vary between consecutive days, so a distinct set must be used for each one. (For a suggestion of how to build this set in SQL using a subquery and table join, see my answer to "How do I use calendar exceptions to generate accurate schedules using GTFS?".)
More complexity arises from the fact the results for each day must be treated differently: For the result set to be ordered correctly, we need to subtract twenty-four hours from all of (and only) yesterday's times.
Try a query like this, following the "pseudo-SQL" in your question and assuming you are using MySQL/MariaDB:
SELECT *, SUBTIME(departure_time, '24:00:00') AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $yesterdays_service_ids )
AND ( departure_time >= ADDTIME($time, '24:00:00') )
AND ( trips.end_time >= ADDTIME($time, '24:00:00') )
AND ( trips.start_time <= ADDTIME($time_plus_3hrs, '24:00:00') )
UNION
SELECT *, departure_time AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $todays_service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t, l, sm
ORDER BY t ASC, l DESC
LIMIT 14