How can I write a Splunk query to take a search from one index and add a field's value from another index? I've been reading explanations that involve joins, subsearches, and coalesce, and none seem to do what I want -- even though the example is extremely simple. I am not sure what I am not understanding yet.
main-index has src field which is an IP address and a field I will restrict my results on. I will look over a short amount of time, e.g.
index="main-index" sourcetype="main-index-source" main-index-field="wildcard-restriction*" earliest=-1h | stats count by src
other-index has src_ip field which is an IP address, and has the hostname. It's DHCP leases, so I need to check a longer time frame, and return only the most recent result for a given IP address. I want to get back the hostname from src_nt_host, e.g.
index="other-index" sourcetype="other-index-sourcetype" earliest=-14d
I would like to end up with the following values:
IP address, other-index.src_nt_host, main-index.count
main-index has the smallest amount of records, if that helps for performance reasons.
If I understand you correctly, you need to look at two different time ranges in two different indices,
In that case, it is most likely to be true that a join will be needed
Here's one way it can be done:
index=ndx1 sourcetype=srctp1 field1="someval" src="*" earliest=-1h
| stats count by src
| join src
[| search index=ndx2 sourcetype=srctp2 field2="otherval" src_ip=* src_nt_host=* earliest=-14d
| stats count by src_ip src_nt_host
| fields - count
| rename src_i as src ]
You may need to flip the order of the searches, depending on how many results they each return, and how long they take to run.
You may also be able to achieve what you're looking for in another manner without the use of a join, but we'd need to have some sample data to possibly give a better result
for a metering project I use a simple SQL table in the following format
ID
Timestamp: dat_Time
Metervalue: int_Counts
Meterpoint: fk_MetPoint
While this works nicely in general I have not found an efficient solution for one specific problem: There is one Meterpoint which is a submeter of another Meterpoint. I'd be interested in the Delta of those two Meterpoints to get the remaining consumption. As the registration of counts is done by one device I get datapoints for the various Meterpoints at the same Timestamp.
I think I found a solution applying a subquery which appears to be not very efficient.
SELECT
A.dat_Time,
(A.int_Counts- (SELECT B.int_Counts FROM tbl_Metering AS B WHERE B.fk_MetPoint=2 AND B.dat_Time=A.dat_Time)) AS Delta
FROM tbl_Metering AS A
WHERE fk_MetPoint=1
How could I improve this query?
Thanks in advance
You can try using a window function instead:
SELECT m.dat_Time,
(m.int_counts - m.int_counts_2) as delta
FROM (SELECT m.*,
MAX(CASE WHEN fk.MetPoint = 2 THEN int_counts END) OVER (PARTITION BY dat_time) as int_counts_2
FROM tbl_Metering m
) m
WHERE fk_MetPoint = 1
From a query point of view, you should as a minimum change to a set-based approach instead of an inline sub-query for each row, using a group by as a minimum but it is a good candidate for a windowing query, just as suggested by the "Great" Gordon Linoff
However if this is a metering project, then we are going to expect a high volume of records, if not now, certainly over time.
I would recommend you look into altering the input such that delta is stored as it's own first class column, this moves much of the performance hit to the write process which presumably will only ever occur once for each record, where as your select will be executed many times.
This can be performed using an INSTEAD OF trigger or you could write it into the business logic, in a recent IoT project we computed or stored these additional properties with each inserted reading to greatly simplify many types of aggregate and analysis queries:
Id of the Previous sequential reading
Timestamp of the Previous sequential reading
Value Delta
Time Delta
Number of readings between this and the previous reading
The last one sounds close to your scenario, we were deliberately batching multiple sequential readings into a single record.
You could also process the received data into a separate table that includes this level of aggregation information, so as not to pollute the raw feed and to allow you to re-process it on demand.
You could redirect your analysis queries to this second table, which is now effectively a data warehouse of sorts.
My situation: My application is slow. As slow as it gets... mostly because I have the feeling my Server paging for my dataTables / grids are wrongly implemented.
Let's start:
I have a SQL Server 2008 database, one table with all the information, 10 columns in it, at the moment 19K rows
My application is based on a JavaScript and ASP.Net backend code.
My SQL query is:
WITH Ordered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Created DESC) AS 'RowNumber'
FROM Meetings
WHERE State IN ('Appointed', 'Accepted')
AND [xxx] LIKE '%1%'
AND [yyy] LIKE '%2%'
)
SELECT *
FROM Ordered
WHERE RowNumber BETWEEN 1 AND 41;
So at the moment this query runs around 27 to 32 seconds, which means over 30 seconds I got a timeout... on 19k rows in 1 year... which means in 1 month latest every query will run against dead...
As far as I am understand the order for this query is the problem: No index done here.
Because the query first sorts, then selects all with a manual row number, then selects only 40... (of course on page 2 of my grid it gets Rows 41 to 81...)
I COULD do an Index on my "Created desc" and the query would be much much faster, BUT every column is sortable for my grid which means "Created desc" could be every other column of my table and of course desc and asc order!
So, how to improve this?
//Edit:
Sorry to forget that:
The inner query (Inner Select) runs 6 seconds, while the total query runs 31 seconds...
Which means the "WITH ORDERES AS" is the problem here!
First things first: you have a performance problem, approach it with a proper methodology and measure appropriately. The inner query (Inner Select) runs 6 seconds, while the total query runs 31 seconds... Which means ... is amateurism. Read How to analyse SQL Server performance for correct ways to measure performance. And before we continue, if you start from 6 seconds you have already lost the game.
Now, on to the question.
WHERE State in('Appointed','Accepted') AND [xxx] LIKE '%1%' AND [yyy] LIKE '%2%'
This expression is basically non-indexable. Even if you add an index on State it will not help because of the low cardinality (few values with many rows each). And like '% ... %' is unindexable because it searches for values in the middle of the text.
You could try to replace like '% ... %' with a full-text search like CONTAINS ... which will be faster, provider you search for specific enough terms. But it does require you to deploy and configure properly the full-text indexes.
As for the paging, I do not favor much the ROWNUMBER approach. Even when a sort column exists, it involves a scan and count to skip the number of rows and gets slower and slower as you go to higher pages. I much more recommend the key based approach:
SELECT TOP (page size) ...
WHERE keys > <last row>
ORDER BY...
but this approach is more difficult to implement as it requires keeping track of keys rather than the page number.
But expect no miracles. You are asking a relational OLTP system to do the work of an ElasticSearch/Solr. It will never work as you expect. Use a tool appropriate for the job (a Search engine). Also read Dynamic Search Conditions in T‑SQL for a more thorough discussion, but again, expect no miracles.
Let me first start by stating that in the last two weeks I have received ENORMOUS help from just about all of you (ok ok not all... but I think perhaps two dozen people commented, and almost all of these comments were helpful). This is really amazing and I think it shows that the stackoverflow team really did something GREAT altogether. So thanks to all!
Now as some of you know, I am working at a campus right now and I have to use a windows machine. (I am the only one who has to use windows here... :( )
Now I manage to setup (ok, IT department did that for me) and populate a Postgres database (this I did on my own), with about 400 mb of data. Which perhaps is not so much for most of you heavy Ppostgre users, but I was more used to sqlite database for personal use which rarely exceeded 2mb ever.
Anyway, sorry for being so chatty - now the queries from that database work
nicely. I use ruby to do queries actually.
The entries in the Postgres database are interconnected, in as far as they are like
"pointers" - they have one field that points to another field.
Example:
entry 3667 points to entry 35785 which points to entry 15566. So it is quite simple.
The main entry is 1, so the end of all these queries is 1. So, from any other number, we can reach 1 in the end as the last result.
I am using ruby to make as many individual queries to the database until the last result returned is 1. This can take up to 10 individual queries. I do this by logging into psql with my password and data, and then performing the SQL query via -c. This probably is not ideal, it takes a little time to do these logins and queries, and ideally I would have to login only once, perform ALL queries in Postgres, then exit with a result (all these entries as result).
Now here comes my question:
- Is there a way to make conditional queries all inside of Postgres?
I know how to do it in a shell script and in ruby but I do not know if this is available in postgresql at all.
I would need to make the query, in literal english, like so:
"Please give me all the entries that point to the parent entry, until the last found entry is eventually 1, then return all of these entries."
I already "solved" it by using ruby to make several queries until 1 is eventually returned, but this strikes me as fairly inelegant and possibly not effective.
Any information is very much appreciated - thanks!
Edit (argh, I fail at pasting...):
Example dataset, the table would be like this:
id | parent
----+---------------+
1 | 1 |
2 | 131567 |
6 | 335928 |
7 | 6 |
9 | 1 |
10 | 135621 |
11 | 9 |
I hope that works, I tried to narrow it down solely on example.
For instance, id 11 points to id 9, and id 9 points to id 1.
It would be great if one could use SQL to return:
11 -> 9 -> 1
Unless you give some example table definitions, what you're asking for vaguely reminds of a tree structure which could be manipulated with recursive queries: http://www.postgresql.org/docs/8.4/static/queries-with.html .
BACKGROUND:
I've been trying to streamline the work involved in running a report in my program. Lately, I've had to supply a listing of job numbers an instrument has been used on with the listing of items for cost/benefit analysis. Mostly to see how often an instrument is used since it was last serviced/calibrated and the last time anyone did use it. I was looking to integrate this into the query that helps generate the report - but I keep hitting a brick wall of sorts with the number of uses - since I want that aggregate to be based on the date the instrument was last calibrated (a field based in the same query). I can get it to give me the number of uses in the system total - but it will not accept the limitation that I want it to be only counting the times used since the last time it was calibrated
PROBLEM:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error (don't remember the exact warning).
-- Edited to add 8/12/2011 # 16:09 --
An additional problem with the use of the Max aggregate has been found for instruments that have never been used being excluded by this query.
DETAILS:
Here is the query that does work so far:
SELECT
dbo_tblPOGaugeDetail.intGagePOID,
dbo_tblPOGaugeDetail.strGageDetailID,
dbo_Gage_Master.Description,
dbo_Gage_Master.Manufacturer,
dbo_Gage_Master.Model_No,
dbo_Gage_Master.Gage_SN,
dbo_Gage_Master.Unit_of_Meas,
dbo_Gage_Master.User_Defined,
dbo_Gage_Master.Calibration_Frequency,
dbo_Gage_Master.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank,
dbo_tblPOGaugeDetail.intGageCost,
dbo_Gage_Master.Last_Calibration_Date,
dbo_Gage_Master.Next_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate,
dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited,
dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair,
dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER,
dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail
INNER JOIN dbo_Gage_Master ON dbo_tblPOGaugeDetail.strGageDetailID = dbo_Gage_Master.Gage_ID)
INNER JOIN qryRCEquipmentLastUse ON dbo_Gage_Master.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY dbo_tblPOGaugeDetail.strGageDetailID;
But I can't seem to aggregate a count of Uses (making a Count(strCustomerJobNum)) from the tblGageActivity with the following fields:
strGageID
strCustomerJobNum
datDateEntered
datTimeEntered
I tried to add a field to the formerly listed query to do a Count(strCustomerJobNum) where datDateEntered matched the Last_Calibration_Date from the calling query - but I got the 'missing aggregate' error. If I leave this condition out - it will run - but will list every instrument ever sent out only if it's had a usage count of at least one (not what I want at all, sadly).
I also want to make sure that if I should get a zero uses count - I will get a zero back instead of my expected records minus the null results.
I hope someone out there can tell me where I am going wrong with this - I want to save the time I am currently spending running an activity report in another program whenever I want to generate this report. Thanks in advance, and let me know if you need me to post more information.
-- Edited to add 08/15/2011 # 14:41 --
I managed to solve the Max() aggregate problem by creating a 'pure' first-step query to get a listing of all instrument with most modern date as qryRCEquipmentUsed.
qryRCEquipmentLastUse:
SELECT dbo.tblGageActivity.strGageID, Max(dbo.tblGageActivity.datDateEntered) AS datLastDateUsed
FROM dbo.tblGageActivity
GROUP BY dbo.tblGageActivity.strGageID;
Then I created a 'pure' listing of all instruments that have no usage at all as a query named qryRCEquipmentNeverUsed.
qryRCEquipmentNeverUsed:
SELECT dbo_Gage_Master.Gage_ID, NULL AS datLastDateUsed
FROM dbo_Gage_Master LEFT JOIN dbo_tblGageActivity ON dbo_Gage_Master.Gage_ID = dbo_tblGageActivity.strGageID
WHERE (((dbo_tblGageActivity.strGageID) Is Null));
NOTE: The NULL was inserted so that the third combining UNION query will not fail due to a mismatch in the number of fields being retrieved from the tables.
At last, I created a UNION query named qryCombinedUseEquipment to combine the two into a list:
qryCombinedUseEquipment:
SELECT *
FROM qryRCEquipmentLastUse
UNION SELECT *
FROM qryRCEquipmentNeverUsed;
Using this last union query to feed the Last Used date to the parent query works in datasheet view, but when the parent query is called in the report - I get a blank report; so a nudge in the right direction would still be wonderfully appreciated.
APPENDIX
Same script as above, but with shorter table aliases (in case someone finds that clearer):
SELECT
gd.intGagePOID,
gd.strGageDetailID,
gm.Description,
gm.Manufacturer,
gm.Model_No,
gm.Gage_SN,
gm.Unit_of_Meas,
gm.User_Defined,
gm.Calibration_Frequency,
gm.Calibration_Frequency_UOM,
gd.bolGageLeavePriceBlank,
gd.intGageCost,
gm.Last_Calibration_Date,
gm.Next_Due_Date,
gd.bolGageEvaluate,
gd.bolGageExpedite,
gd.bolGageAccredited,
gd.bolGageCalibrate,
gd.bolGageRepair,
gd.bolGageReturned,
gd.bolGageBER,
gd.intTurnaroundDaysOut,
lu.MaxOfdatDateEntered
FROM (dbo_tblPOGaugeDetail gd
INNER JOIN dbo_Gage_Master gm ON gd.strGageDetailID = gm.Gage_ID)
INNER JOIN qryRCEquipmentLastUse lu ON gm.Gage_ID = lu.Gage_ID
ORDER BY gd.strGageDetailID;
Piece by piece...
First -- I suspect you're trying to answer too many questions at once (as evidenced by 23 fields in your SELECT), which will make aggregation near-impossible. Start by narrowing down the scope of the query -- What question is this query attempting to answer? (You can always make more queries to answer other questions... :-)
1) How many uses since last calibration?
2) How many uses since last ...use? (not sure what you mean by that -- maybe last sign-out, or last rental, etc.?)
Tip -- learn to use table aliases. Large queries are difficult to read; worse because of repeated table names.
1) Ex.: dbo_tbl_POGaugeDetail.intGagePOID becomes d.intGagePOID
Here's a sample that might get you started:
SELECT
d.strCustomerJobNum,
Max(d.last_calibration_date) -- not sure what you named that field
Count(d.strCustomerJobNum)
FROM
dbo_tblPOGaugeDetail d
GROUP BY
d.strCustomerJobNum
Does this work:
SELECT dbo_tblPOGaugeDetail.intGagePOID, dbo_tblPOGaugeDetail.strGageDetailID,
OuterGageMaster.Description, OuterGageMaster.Manufacturer, OuterGageMaster.Model_No,
OuterGageMaster.Gage_SN, OuterGageMaster.Unit_of_Meas, OuterGageMaster.User_Defined,
OuterGageMaster.Calibration_Frequency, OuterGageMaster.Calibration_Frequency_UOM,
dbo_tblPOGaugeDetail.bolGageLeavePriceBlank, dbo_tblPOGaugeDetail.intGageCost,
OuterGageMaster.Last_Calibration_Date, OuterGageMasterNext_Due_Date,
dbo_tblPOGaugeDetail.bolGageEvaluate, dbo_tblPOGaugeDetail.bolGageExpedite,
dbo_tblPOGaugeDetail.bolGageAccredited, dbo_tblPOGaugeDetail.bolGageCalibrate,
dbo_tblPOGaugeDetail.bolGageRepair, dbo_tblPOGaugeDetail.bolGageReturned,
dbo_tblPOGaugeDetail.bolGageBER, dbo_tblPOGaugeDetail.intTurnaroundDaysOut,
qryRCEquipmentLastUse.MaxOfdatDateEntered,
(Select Count(strCustomerJobNum)
FROM tblGageActivity WHERE
OuterGageMaster.Last_Calibration_Date=tblGageActivity.datDateEntered) As JobCount
FROM
(dbo_tblPOGaugeDetail INNER JOIN dbo_Gage_Master OuterGageMaster ON
dbo_tblPOGaugeDetail.strGageDetailID = OuterGageMaster.Gage_ID) INNER JOIN
qryRCEquipmentLastUse ON OuterGageMaster.Gage_ID = qryRCEquipmentLastUse.Gage_ID
ORDER BY
dbo_tblPOGaugeDetail.strGageDetailID;
or is that what you tried?
Summary Problem:
Attempts to put an aggregate function in my report for the number of uses since the item's calibration are met either with undesired results, or the dreaded 'aggregate missing' error.
Solution:
I decided to leave the query driving the report alone - instead choosing to employ the use of DLookup and DCount as appropriate to retrieve the last used date from a query that provides the last used date of all the instruments, and the number of uses an instrument has had since it's last calibration, using the aforementioned domain aggregates respectively.
Using the query described in the problem description, I am able to retrieve the last used date for all instruments. I used a =DLookup statement as the source for a text box on the report's subreport dealing with various items as such:
=IIf((DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]")) Is Null Or ([bolGageReturned]=True),"",DLookUp("[qryRCCombinedUseEquipment]![datLastDateUsed]","[qryRCCombinedUseEquipment]","[qryRCCombinedUseEquipment]![strGageID]=[strGageDetailID]"))
This allows items that have never been used to return a NULL result, which will display as a blank text box.
The number of uses, however, would not feed off a query using =DCount (I tried, it would take over ten minutes to retrieve results, if it ever did). However, using the underlying activity table, I used the following statement:
=IIf([bolGageReturned],"","Used " & DCount("[dbo_tblGageActivity]![strGageID]","[dbo_tblGageActivity]","[dbo_tblGageActivity]![strGageID] = [strGageDetailID] And [dbo_tblGageActivity]![datDateEntered] Between [txtLastCalibrationDate] And date()") & " times since last calibration")
It would retrieve a number of times used since the instrument was last calibrated, but no uses that are before that or after today (some jobs are post dated, strangely). Of course, this is SLOW (about thirty seconds for a large document with thirty or forty instruments).
Does anyone else have a better solution for this, or will I have to take the performance hit? If no one has any better ideas, I will accept this as the answer after five days (8/21/2011) .