Conditional Distinct - sql

Sorry for the kinda longish introduction, but I want to make clear what I'm trying to do. If someone is able to come up with a more appropriate title, please feel free to edit.
I wrote an SNMP Collector that queries every switch in our data center once per hour and checks which ports are online, and stores the results in a MS SQL 2k12 DB. The motivation was that often admins don't report a discontinued server or some other device and we are running out of switch ports.
The DB schema looks like this (simplified screenshot):
The Interfaces table is a child table to the Crawls (Crawl = Run of the SNMP collector) table as the number of interfaces is not constant for every switch but changes between Crawls, as line cards are inserted or removed.
Now I want to write a query that returns every Interface on every Switch that ALWAYS had an ifOperStatus value of 2 and NEVER had an ifOperStatus of 1.
I wrote a query that has three nested sub-queries, is ugly to read and slow as hell. There sure has to be an easier way.
My approach was to filter the ports that NEVER changed by using
HAVING (COUNT(DISTINCT dbo.Interfaces.ifOperStatus) = 1)
and than inner-joining against a list of ports that had an ifOperStatus of 2 during the last crawl. Ugly, as I said.
So, a sample output from the DB would look like this:
And I'm looking for a query that returns rows 5-7 because ifOperStatus never changed, but DOES NOT return rows 3-4 because ifOperStatus flapped.

How about
HAVING (MIN(dbo.Interfaces.ifOperStatus) = 2 AND MAX(dbo.Interfaces.ifOperStatus) = 2)
MIN and MAX don't require SQL Server to maintain a set of all values seen so far, just the highest/lowest. This may also avoid the need to join "against a list of ports that had an ifOperStatus of 2 during the last crawl".

select
s.Hostname,
s.sysDescr,
i.ifOperStatus,
i.ifAllias,
i.ifIndex,
i.ifDescr
from
interfaces i
join crawl c on c.id = i.crawlId
join switches s on s.id = c.switchId
where
i.ifOperStatus = 2
and not exists
(
select 'x'
from
interfaces ii
join crawl cc on cc.id = ii.crawlId
join switches ss on ss.id = cc.switchId
where
s.id = ss.id
and ii.ifOperStatus = 1
)

Related

How to select data with a complex condition?

Using Microsoft Access, I normally use condition (mostly where) to obtain the data I want to display.
So far, it went well. However now I have a complex filtering and I'm not sure of the best way to do it. I will explain how I do it with many queries, and I'd like to know if there is something simpler, since I feel like it's doing too much for what I accomplish.
I have Building and Energy tables. Between them, I have a link table since a Building has a list of possible energies.
My goal is to display ALL energy not already associated with the building.
I first have a simple query to display all the IDs of energy that are in the link table where building is the one of interest.
Once I do that, I have another query using this one, which display an energy if it is an energy absent from previous list.
This takes 2 queries and I feel like I could have a better way to do this. I'm fairly new to MS Access, so any suggestion is welcome.
Here is the first request to obtain the list of energies:
SELECT
Batiments.ID, Energies.ID, Energies.Type
FROM
Energies
INNER JOIN
(Batiments
INNER JOIN
Batiment_Energie ON Batiments.ID = Batiment_Energie.Batiment_ID) ON Energies.ID = Batiment_Energie.Energie_ID
WHERE
(((Batiments.ID) = " & cbxBatiments.Column(0) & "));"
You can query the non-associated energy types with
SELECT
ID, Type
FROM
Energies
WHERE
ID NOT IN (SELECT Energie_ID
FROM Batiment_Energie
WHERE Batiment_ID = 123)
where 123 is to be replaced by the Id comming from cbxBatiments.Column(0).
You can use not exists:
select e.*
from energie as e
where not exists (select 1
from Batiment_Energie as be
where be.energie_id = e.id and be.batiment_id = <your id>
);

Access query speed differs

I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds).
What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries).
SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID
WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False));
It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?
As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown.
I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable.
SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblAnimals
WHERE
tblAnimals.Species = [form]![Species]
AND tblAnimals.CurrentGroup = [form]![AnimalGroup2]
AND tblAnimals.[Dead?] = False
AND NOT EXISTS (
SELECT tblAnimalsatSighting.AnimalID
FROM tblAnimalsatSighting
WHERE
tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID
AND tblAnimalsatSighting.SightingID = [form]![SightingID]
);

SQL query seems to work for 'AND T1.email_address_ IN (subquery)', but returns 0 rows for 'AND T1.email_address_ NOT IN (subquery)'

Good morning. I'm working in Responsys Interact, which is an Oracle-based email campaign management type SAAS product. I'm creating a query to basically filter a target list for an email campaign designed to target a specific sub-set of our master email contact list. Here's the query I created a few weeks ago that appears to work:
/*
Table Symbolic Name
CONTACTS_LIST $A$
Engaged $B$
TRANSACTIONS_RAW $C$
TRANSACTION_LINES_RAW $D$
-- A Responsys Filter (Engaged) will return only an RIID_, nothing else, according to John # Responsys....so,....let's join on that to contact list...
*/
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$A$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
;
This seems to work, in that if I run the query without the sub-query, we get 720K rows; and if I add back the 'AND NOT IN...' subquery, we get about 700K rows, which appears correct based on what my user knows about her data. What I'm (supposedly) doing with the NOT IN subquery is filtering out any email addresses where the customer has purchased certain items from us in the last 60 days.
So, now I need to add in another constraint. We still don't want customers who made certain purchases in the last 60 days as above, but now also we want to exclude customers who have purchased another particular item, but now within the last 12 months. So, I thought I would add another subquery, as shown below. Now, this has introduced several problems:
Performance - the query, which took a couple minutes to run before, now takes quite a few more minutes to run - in fact it seems to time out....
So, I wondered if there's an issue having two subqueries, but before I went to think about alternatives to this, I decided to test my new subquery by temporarily deleting the first subquery, so that I had just one subquery similar to above, but with the new item = 11 and within the last 12 months logic. And so with this, the query finally returned after a few minutes now, but with zero rows.
Trying to figure out why, I tried simply changing the AND NOT IN (subquery) to AND IN (subquery), and that worked, in that it returned a few thousand rows, as expected.
So why would the same SQL when using AND IN (subquery) "work", but the exact same SQL simply changed to AND NOT IN (subquery) return zero rows, instead of what I would expect which would be my 700 something thousdand plus rows, less the couple thousand encapsulated by the subquery result?
Also, what is the best i.e. most performant way to accomplish what I'm trying to do, which is filter by some purchases made within one date range, AND by some other purchases made within a different date range?
Here's the modified version:
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
AND
$A$.EMAIL_ADDRESS_ NOT IN
(
/* get purchase transactions for another type of item within last year */
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$D$.ITEM_FAMILY_ID = 11 AND $C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -12)
)
;
Thanks for any ideas/insights. I may be missing or mis-remembering some basic SQL concept here - if so please help me out! Also, Responsys Interact runs on top of Oracle - it's an Oracle product - but I don't know off hand what version/flavor. Thanks!
Looks like my problem with the new subquery was due to poor performance due to lack of indexes. Thanks to Alex Poole's comments, I looked in Responsys and there is a facility to get an 'explain' type analysis, and it was throwing warnings, and suggesting I build some indexes. Found the way to do that on the data sources, went back to the explain, and it said, "The query should run without placing an unnecessary burden on the system". And while it still ran for quite a few minutes, it did finally come back with close to the expected number of rows.
Now, I'm on to tackle the other half of the issue, which is to now incorporate this second sub-query in addition to the first, original subquery....
Ok, upon further testing/analysis and refining my stackoverflow search critieria, the answer to the main part of my question dealing with the IN vs. NOT IN can be found here: SQL "select where not in subquery" returns no results
My performance was helped by using Responsys's explain-like feature and adding some indexes, but when I did that, I also happened to add in a little extra SQL in my sub-query's WHERE clause.... when I removed that, even after indexes built, I was back to zero rows returned. That's because as it turned out at least one of the transactions rows for the item family id I was interested in for this additional sub-query had a null value for email address. And as further explained in the link above, when using NOT IN, as soon as you have a null value involved, SQL can't definitively say it's NOT IN, since you can't really compare to null, so as soon as you have a null, the sub-query's going to evaluate 'false', thus zero rows. When using IN, even though there are nulls present, if you get one positive match, well, that's a match, so the sub-query returns 'true', so that's why you'll get rows with IN, but not with NOT IN. I hadn't realized that some of our transaction data may have null email addresses - now I know, so I just added a where not null to the where clause for the email address, and now all's good.

Multiple Joins + Lots of Data Optimization

I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit?
SELECT *
FROM (SELECT NULL AS A1,
DFS_ROHEADER.TECHID,
DFS_ROHEADER.RONUMBER,
DFS_ROHEADER.CUSTOMERNUMBER,
DFS_CUSTOMER.BNAME,
DFS_ROHEADER.UNITNUMBER,
DFS_ROHEADER.MILEAGE,
DFS_ROHEADER.OPENEDDATE,
DFS_ROHEADER.CLOSEDDATE,
DFS_ROHEADER.STATUS,
DFS_ROHEADER.PONUMBER,
DFS_TECH.REGION,
DFS_TECH.RSM,
DFS_ROPART.PARTID,
CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY
FROM DFS_ROHEADER
LEFT JOIN DFS_CUSTOMER
ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO
LEFT JOIN DFS_TECH
ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID
INNER JOIN DFS_RODETAIL
ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER
INNER JOIN DFS_ROPART
ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER
AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER
AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%'
AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%'
AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%'
AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%'
OR DFS_CUSTOMER.BNAME IS NULL )
AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%'
AND DFS_ROHEADER.CLOSEDDATE BETWEEN
FF_ClosedBegin AND FF_ClosedEnd
AND ( DFS_TECH.REGION LIKE '%$FilterRegion%'
OR DFS_TECH.REGION IS NULL )
AND ( DFS_TECH.RSM LIKE '%$FF_RSM%'
OR DFS_TECH.RSM IS NULL )
AND DFS_RODETAIL.STORY LIKE '%$FF_Story%'
AND DFS_ROPART.PARTID LIKE '%$FF_PartID%'
WHERE DFS_ROHEADER.DELETED_BY < 0
AND DFS_RODETAIL.DELETED_BY < 0
AND DFS_ROPART.DELETED_BY < 0) T
ORDER BY T.RONUMBER
This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help.
EDIT:
Execution Plan:
https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan
UPDATE:
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.
My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.

SQL add up rows in a column

I'm running SQL queries in Orion Report Writer for Solarwinds Netflow Traffic Analyzer and am trying to add up data usage for specific conversations coming from the same general sources. In this case it is netflix. I've made some progress with my query.
SELECT TOP 10000 FlowCorrelation_Source_FlowCorrelation.FullHostname AS Full_Hostname_A,
SUM(NetflowConversationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
SUM(NetflowConversationSummary.TotalBytes) AS Total_Bytes
FROM
((NetflowConversationSummary LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Source_FlowCorrelation ON (NetflowConversationSummary.SourceIPSort = FlowCorrelation_Source_FlowCorrelation.IPAddressSort)) LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Dest_FlowCorrelation ON (NetflowConversationSummary.DestIPSort = FlowCorrelation_Dest_FlowCorrelation.IPAddressSort)) INNER JOIN Nodes ON (NetflowConversationSummary.NodeID = Nodes.NodeID)
WHERE
( DateTime BETWEEN 41539 AND 41570 )
AND
(
(FlowCorrelation_Source_FlowCorrelation.FullHostname LIKE 'ipv4_1.lagg0%')
)
GROUP BY FlowCorrelation_Source_FlowCorrelation.FullHostname, FlowCorrelation_Dest_FlowCorrelation.FullHostname, Nodes.Caption, Nodes.NodeID, FlowCorrelation_Source_FlowCorrelation.IPAddress
So I've got an output that filters everything but netflix sessions (Full_Hostname_A) and their total usage for each session (Sum_Of_Bytes_Transferred)
I want to add up Sum_Of_Bytes_Transferred to get a total usage for all netflix sessions
listed, which will output to Total_Bytes. I created the column Total_Bytes, but don't know how to output a total to it.
For some asked clarification, here is the output from the above query:
I want the Total_Bytes Column to be all added up into one number.
I have no familiarity with the reporting tool you are using.
From reading your post I'm thinking you want the the first 2 columns of data that you've got, plus at a later point in the report, a single figure being the sum of the total_bytes column you're already producing.
Your reporting tool probably has some means of totalling a column, but you may need to get the support people for the reporting tool to tell you how to do that.
Aside from this, if you can find a way of calling a separate query in a latter section of the report, or if you embed a new report inside your existing report, after the detail section, and use that to run a separate query then you should be able to get the data you want with this:
SELECT Sum(Total_Bytes) as [Total Total Bytes]
FROM ( yourExistingQuery ) x
yourExistingQuery means the query you've already got, in full (doesnt have to be put on one line), the paretheses are required, and so is the "x". (The latter provides a syntax-required name for the virtual table which your query defines).
Hope this helps.