Optimize slow running count query

Optimize slow running count query - sql

I am trying to summarize banner ad views from a table that is pretty good size(18,243,847 rows). I need a count for views in the past two years. I tried adding an index to the date and tried different variations of the query below. Most runs are about 25 seconds, however it seems when passed in a web service, the target page is timing out. I know the issue is with count, but have not been able to reduce that portion to lower than 11 seconds. Seems like not a lot, but why the issue with my web service? Anyway, first things first, is this query doing the best I can do?
SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate, isNull(v.viewCount,0) AS viewCount, isNull(c.clickCount,0) AS clickCount
FROM bannerAds ba
LEFT OUTER JOIN (SELECT adID, count(viewID) AS viewCount
FROM bannerAdsViews
WHERE viewDateTime IS NOT NULL AND viewDateTime >= DateAdd(yy, -2, GetDate())
GROUP BY adID) v ON ba.adID = v.adID
LEFT OUTER JOIN (SELECT adID, count(viewID) AS clickCount
FROM bannerAdsViews
WHERE clickDateTime IS NOT NULL AND viewDateTime >= DateAdd(yy, -2, GetDate())
GROUP BY adID) c ON ba.adID = c.adID
WHERE viewCount > 0
ORDER BY name ASC
FOR XML RAW ('Banner'), ROOT ('Banners');

This query can be difficult to get really good performance on. You are summarizing a lot of data.
However, two subqueries are not needed. If I make the assumption the viewID and viewDateTime are both NULL on the same records, then I think this version is equivalent:
SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate,
COALESCE(vc.viewCount, 0) as viewCount,
COALESCE(vc.clickCount, 0) as clickCount
FROM bannerAds ba JOIN
(SELECT adID, count(viewDateTime) as viewCount,
count(clickDateTime) as clickCount
FROM bannerAdsViews
WHERE viewDateTime >= DateAdd(year, -2, GetDate())
GROUP BY adID
) vc
ON ba.adID = v.adID
WHERE viewCount > 0
ORDER BY name ASC
FOR XML RAW ('Banner'), ROOT ('Banners');
The INNER JOIN can replace the LEFT JOIN, because the WHERE clause is removing NULL values anyway.

Related

Two Table Join Into One Result

I have two tables where I am attempting to join the results into one. I am trying to get the INV_QPC which is the case pack size shown in the results (SEIITN and SKU) are the same product numbers.
The code below gives two results, but the goal is to get the bottom result into the main output, where I was hoping the join would be the lookup to show the case pack size in relation to SKU.
INV_QPC = case pack size
SKU = SKU/Product Number
SEIITN = SKU/Product Number
Thanks for looking.
SELECT
ORDER_QTY, SKU, INVOICE_NUMBER, CUSTOMER_NUMBER, ROUTE,
ALLOCATED_QTY, SHORTED_QTY, PRODUCTION_DATE,
DATEPART(wk, PRODUCTION_DATE) AS FISCAL_WEEK,
YEAR(PRODUCTION_DATE) AS FISCAL_YEAR,
CONCAT(SKU, CUSTOMER_NUMBER) AS SKU_STORE_WEEK
FROM
[database].[dbo].[ORDERS]
WHERE
[PRODUCTION_DATE] >= DATEADD(day, -3, GETDATE())
AND [PRODUCTION_DATE] <= GETDATE()
SELECT INV_QPC
FROM [database].[dbo].[PRODUCT_MASTER]
JOIN [database].[dbo].[ORDERS] ON ORDERS.SKU = PRODUCT_MASTER.SEIITN;

It looks like you are on the right track, but your second SQL statement is only returning the INV_QPC column, so it is not being joined to the first query. Here is an updated SQL statement that should give you the result you are looking for:
SELECT
ORD.ORDER_QTY, ORD.SKU, ORD.INVOICE_NUMBER, ORD.CUSTOMER_NUMBER, ORD.ROUTE,
ORD.ALLOCATED_QTY, ORD.SHORTED_QTY, ORD.PRODUCTION_DATE,
DATEPART(wk, ORD.PRODUCTION_DATE) AS FISCAL_WEEK,
YEAR(ORD.PRODUCTION_DATE) AS FISCAL_YEAR,
CONCAT(ORD.SKU, ORD.CUSTOMER_NUMBER) AS SKU_STORE_WEEK,
PROD.INV_QPC
FROM
[database].[dbo].[ORDERS] ORD
JOIN [database].[dbo].[PRODUCT_MASTER] PROD ON ORD.SKU = PROD.SEIITN
WHERE
ORD.PRODUCTION_DATE >= DATEADD(day, -3, GETDATE())
AND ORD.PRODUCTION_DATE <= GETDATE()
In this query, I have added the INV_QPC column to the SELECT statement, and also included the join condition in the JOIN clause. Additionally, I have given aliases to the tables in the FROM and JOIN clauses to make the query easier to read. Finally, I have updated the WHERE clause to reference the ORD alias instead of the table name directly.

SQL Order by with multiple scenarios

I have a complex SQL query and I want to manipulate resultant data based on certain conditions.
Here is a look at my data structure. It's a combination of 3 tables which allows storing of different activities and its registration period.
Here is a basic query (part of the more complex query):
select
act.ID, act.Name, arp.RegistrationPeriodId,
rp.StartDateTime, rp.EndDateTime
from
Activity act
join
ActivityRegistrationPeriod arp on arp.ActivityId = act.ID
join
RegistrationPeriod rp on rp.Id = arp.RegistrationPeriodId
What I want to achieve
I have to order this data in below conditions, priority vise.
Show programs that are now registering first, (so I guess it would be today is in between StartDateTime and EndDateTime)
Closing soon, (end date is nearest to today)
Opening soon (start data is nearest to today)
What I tried so far
I tried creating a temp table and storing data for each of these conditions (although that's not working properly) but I'm thinking of ordering it if possible and that's why I bring this issue here so a experienced SO can guide me.
Any help would be really appreciated. Thanks!
Updated query:
select
act.ID, act.Name, arp.RegistrationPeriodId,
rp.StartDateTime, rp.EndDateTime
from
Activity act
join
ActivityRegistrationPeriod arp on arp.ActivityId = act.ID
join
RegistrationPeriod rp on rp.Id = arp.RegistrationPeriodId
--where
-- act.AccountId = 3106
order by
(case when GETDATE() between StartDateTime and endDateTime then 1 else 2 end),
(case when rp.EndDateTime is null then 2 else 1 end),
rp.EndDateTime asc,
(case when rp.StartDateTime is null then 2 else 1 end),
rp.StartDateTime desc

You can use a case expression:
order by (case when current_date between StartDateTime and endDateTime then 1 else 2 end),
endDateTime asc,
StartDateTime desc

How to filter Users that meet CASE criteria without nesting WHERE in SQL?

Right now I have a query that lets me know which users didn't make a purchase 12 months prior to becoming members. These users have MEM_PRE_12=0 and I want to filter off those users more natively using SQL partitions rather than always putting rudimentary WHERE criteria.
Here is the SQL I use to find the users I want/don't want.
SELECT SUM(CASE WHEN DATE <= DATEADD(month, -12, U.INSERTED_AT) THEN 1 ELSE 0 END) AS MEM_PRE_12, I.CLIENTID, I.INSTALLATIONID
FROM <<<My_Joined_Tables>>>
GROUP BY I.CLIENTID, I.INSTALLATIONID
HAVING MEM_PRE_12 != 0
ORDER BY MEM_PRE_12
After this I'm going to have to go back and say where I.CLIENTID in the above nested query and select the actual information I want from users who made purchases greater than their insertion date.
How can I do this without so much nesting of all these joined tables?

If you want the detailed rows for customers who made a purchase in the last 12 months, you can use window functions:
with q as (
<whatever your query logic is>
)
select q.*
from (select q.*,
SUM(CASE WHEN DATE <= DATEADD(month, -12, U.INSERTED_AT) THEN 1 ELSE 0 END) over (partition by CLIENTID, INSTALLATIONID) as AS MEM_PRE_12
from q
) q
where mem_pre_12 > 0;

Group By column throwing off query

I have a query that checks a database to see if a customer has visited multiple times a day. If they have it counts the number of visits, and then tells me what times they visited. The problem is it throws "Tickets.lcustomerid" into the group by clause, causing me to miss 5 records (Customers without barcodes). How can I change the below query to remove "tickets.lcustomerid" from the group by clause... If I remove it I get an error telling me "Tickets.lCustomerID" is not a valid select because it's not part of an aggregate or groupby clause.
The Query that works:
SELECT Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME) AS dtCreatedDate, COUNT(Customers.sBarcode) AS [Number of Scans],
MAX(Customers.sLastName) AS LastName
FROM Tickets INNER JOIN
Customers ON Tickets.lCustomerID = Customers.lCustomerID
WHERE (Tickets.dtCreated BETWEEN #startdate AND #enddate) AND (Tickets.dblTotal <= 0)
GROUP BY Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME)
HAVING (COUNT(*) > 1)
ORDER BY dtCreatedDate
The Output is:
sBarcode dtcreated Date Number of Scans slastname
1234 1/4/2013 12:00:00 AM 2 Jimbo
1/5/2013 12:00:00 AM 3 Jimbo2
1578 1/6/2013 12:00:00 AM 3 Jimbo3
My current Query with the subquery
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS sub
WHERE ( lcustomerid = tickets.lcustomerid )
AND ( dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME),
tickets.lcustomerid
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
The Current output (notice the record without a barcode is missing) is:
sBarcode dtcreated Date Number of Scans slastname Times Scanned
1234 1/4/2013 12:00:00 AM 2 Jimbo 12:00PM, 1:00PM
1578 1/6/2013 12:00:00 AM 3 Jimbo3 03:05PM, 1:34PM

UPDATE: Based on our "chat" it seems that customerid is not the unique field but barcode is, even though customer id is the primary key.
Therefore, in order to not GROUP BY customer id in the subquery you need to join to a second customers table in there in order to actually join on barcode.
Try this:
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS subticket
inner join
customers as subcustomers
on
subcustomers.lcustomerid = subticket.lcustomerid
WHERE ( subcustomers.sbarcode = customers.sbarcode )
AND ( subticket.dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME)
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate

I can't directly solve your problem because I don't understand your data model or what you are trying to accomplish with this query. However, I can give you some advice on how to solve the problem yourself.
First do you understand exactly what you are trying to accomplish and how the tables fit together? If so move on to the next step, if not, get this knowledge first, you cannot do complex queries without this understanding.
Next break up what you are trying to accomplish in little steps and make sure you have each covered before moving to the rest. So in your case you seem to be missing some customers. Start with a new query (I'm pretty sure this one has more than one problem). So start with the join and the where clauses.
I suspect you may need to start with customers and left join to tickets (which would move the where conditions to the left joins as they are on tickets). This will get you all the customers whether they have tickets or not. If that isn't what you want, then work with the jon and the where clasues (and use select * while you are trying to figure things out) until you are returning the exact set of customer records you need. The reason why you use select * at this stage is to see what in the data may be causeing the problem you are having. That may tell you how to fix.
Usually I start with a the join and then add in the where clasues one at a time until I know I am getting the right inital set of records. If you have multiple joins, do them one at time to know when you suddenly start have more or less records than you would expect.
Then go into the more complex parts. Add each in one at a time and check the results. If you suddenly go from 10 records to 5 or 15, then you have probably hit a problem. When you work one step at a time and run into a problem, you know exactly what caused the problem making it much easier to find and fix.
Group BY is important to understand thoroughly. You must have every non-aggregated field in the group by or it will not work. Think of this as law like the law of gravity. It is not something you can change. However it can be worked around through the use of derived tables or CTEs. Please read up on those a bit if you don't know what they are, they are very useful techniques when you get into complex stuff and you shoud understand them thoroughly. I suspect you will need to use the derived table approach here to group on only the things you need and then join that derived table to the rest of teh query to get the ontehr fields. I'll show a simple example:
select
t1.table1id
, t1.field1
, t1.field2
, a.field3
, a.MostRecentDate
From table1 t1
JOIN
(select t1.table1id, t2.field3, max (datefield) as MostRecentDate
from table1 t1
JOin Table2 t2 on t1.table1id = t2.table1id
Where t2.field4 = 'test'
group by t1.table1id,t2.field3) a
ON a.table1id = t1.table1id
Hope this approach helps you solve this problem.

Multiple joins to single table in SQL Server query throwing off counts

I am writing a stored procedure in SQL Server Management Studio 2005 to return a list of states and policy counts for two different time periods, month to date and year to date. I have created a couple of views to gather the required data and a stored procedure for use in a Reporting Services report.
Below is my stored procedure:
SELECT DISTINCT
S.[State],
COUNT(HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(HP_YTD.PolicyID) AS PolicyCount_YTD
FROM tblStates S
LEFT OUTER JOIN vwHospitalPolicies HP_MTD ON S.[State] = HP.[State]
AND HP.CreatedDate BETWEEN DATEADD(MONTH, -1, GETDATE()) AND GETDATE()
LEFT OUTER JOIN vwHospitalPolicies HP_YTD ON S.[State] = HP.[State]
AND HP.CreatedDate BETWEEN DATEADD(YEAR, -1, GETDATE()) AND GETDATE()
GROUP BY S.[State]
ORDER BY S.[State] ASC
The problem I am running into is my counts are bloating when a second LEFT OUTER JOIN is added, even the COUNT() that isn't referencing the second join. I need a left join since not all states will have policies for the given period, but they should still appear on the report.
Any suggestions would be greatly appreciated!

It sounds like you need:
COUNT(DISTINCT HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(DISTINCT HP_YTD.PolicyID) AS PolicyCount_YTD
instead of:
COUNT(HP_MTD.PolicyID) AS PolicyCount_MTD,
COUNT(HP_YTD.PolicyID) AS PolicyCount_YTD
Your original query is including the number of matching rows in the second join. Adding a DISTINCT clause inside the COUNT limits it to unique occurrences of the PolicyID.

I prefer CTEs for this sort of work - they're basically a sort of inline view.
Your actual problem is that some policies are being counted twice - once for the month-to-date, and once for the year-to-date - in both count columns. So, if you have source tables like this:
year-to-date
state policy
==================
1 1
1 2
1 3
2 4
month-to-date
state policy
==================
1 1
1 2
The result table for the JOINs (before COUNT is assesed) looks like this:
temp
state monthPolicy yearPolicy
================================
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
2 - 4
Clearly not what you want. Often, the solution is to use a CTE or other table reference to present summed-up records for the final join. Something like this:
WITH PoliciesMonthToDate (state, count) as (
SELECT state, COUNT(*)
FROM vwHospitalPolicies
WHERE createdDate >= DATEADD(MONTH, -1, GETDATE())
AND createdDate < DATEADD(DAY, 1, GETDATE())
GROUP BY state),
PoliciesYearToDate (state, count) as (
SELECT state, COUNT(*)
FROM vwHospitalPolicies
WHERE createdDate >= DATEADD(YEAR, -1, GETDATE())
AND createdDate < DATEADD(DAY, 1, GETDATE())
GROUP BY state)
SELECT a.state, COALESCE(b.count, 0) as policy_count_mtd,
COALESCE(c.count, 0) as policy_count_ytd
FROM tblStates a
LEFT JOIN PoliciesMonthToDate b
ON b.state = a.state
LEFT JOIN PoliciesYearToDate c
ON c.state = a.state
ORDER BY a.state
(minor nitpick - don't use prefixes for tables and views. It's noise, and if for some reason you switch between them, it would require a code change. That, or the name would then be misleading)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas