Surrounding Events in KQL or Matching on Multiple Conditions - azure-log-analytics

Coming from a ELK background, Kibana had some nice functionality where you could view surrounding events of any record you wished https://www.elastic.co/guide/en/kibana/current/discover-document-context.html, i.e. view the 5 preceding and 5 proceeding events.
Does something like this exist in the Kusto Query Language?
Edit: I should also mention the requirement for this as I realise it might exist, but within a different form.
I'm looking to find several events that need to have all occurred during a specific time period, i.e. the previous 5 minutes.
Example; if EventID's 1, 2 and 3 show, I'm not interested. However, if 1, 2, 3 and 4 show (within X minutes of each other) then I would like my query to pick this up.
Any hints or tips are appreciated.

It seems that Time Window Join is what I needed - https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/join-timewindow

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

SQL Method for Cascading Workload Based on Rank and Available Hours

Recently I created an automated production scheduling tool through Excel that assigns a rank to items being produced in the same process, and then uses that rank in combination with the workload to create a schedule.
It functions exactly the way it is intended to, but due to the large amount of data and it being excel it has very slow performance, which is why I am looking to move the calculations over to SQL.
The general logic is like this:
-Always produce everything from the first day before the second day
-Always produce items from an earlier rank before items from a later rank
You can see how this plays out in the image below, where the line has 21.5 hours today, so items will be produced on day 1 until it equals 21.5, where the remainder is then carried over to day 2 and so on.
I was able to do this in excel using lengthy positional based formulas, but I am trying to think of a way to get the same result in SQL without having to rely on looking at the row above.
I am not sure how to convey something like 'Subtract from the available time production time of higher priority items produced on the same day'.
I apologize if the question is unclear, but any advice would be appreciated.
Image of Production Hours Cascading by Priority and Day
Example of Position-Based Fomula
Thanks to shawnt00, that put me in the right direction. Ultimately I had to modify the case statements a bit to go off of the cumulative total instead, but I was able to get the desired results using a sum() Over (partition by order by ) statement.

SQL Adding unmatched Join results to output

Maybe I am approaching the entire problem wrong - or inefficiently.
Essentially, I am trying to combine two views of data, one of them a log table, based upon 2 criteria:
RoomName field match
Timestamp matches
vw_FusionRVDB_Schedule (RoomName, StartTime, EndTime, Subject, etc)
This contains the schedule of events for all indexed rooms - times in UTC.
vw_FusionRVDB_DisplayUsageHistory (RoomName, OnTime, OffTime, etc)
This is a log of activity that has been paired down to just show when the room display has been turned on and off - times in UTC.
I am wanting to match display on/off activities with the events scheduled in the room when the logged activities occurred.
The query is really long, and includes a lot of derived fields. Hopefully just focusing on the join section will make it more clear.
SELECT <foo>
FROM dbo.vw_FusionRVDB_Schedule
INNER JOIN dbo.vw_FusionRVDB_DisplayUsageHistory
ON dbo.vw_FusionRVDB_Schedule.RoomName =
dbo.vw_FusionRVDB_DisplayUsageHistory.RoomName
AND dbo.vw_FusionRVDB_Schedule.EndTime >=
dbo.vw_FusionRVDB_DisplayUsageHistory.OnTime
AND dbo.vw_FusionRVDB_Schedule.StartTime <=
dbo.vw_FusionRVDB_DisplayUsageHistory.OffTime
This query is working great. By design, some events are listed more than once. This happens when there are multiple on/off display cycles that occur within the window of the same event. Similarly, if a room display is turned on before or during one event and stays on through a following event, data from that single log entry is used on both the first and second event record. So this query is doing exactly what is needed in this aspect.
However, I also want to add back into the output, scheduled events (from the vw_FusionRVDB_Schedule view) that have no corresponding logged activities in the vw_FusionRVDB_DisplayUsageHistory.
I have tried various forms on UNION on another query of the vw_FusionRVDB_Schedule view with null values in the and the fields otherwise taken or derived from vw_FusionRVDB_DisplayUsageHistory view. But it adds all scheduled activities back in - not just the ones with no match from the initial join.
I can provide more details if needed. Thank you in advance.
HepC answered in the comments. I was letting the results confuse me. A left join did the trick.

Monitoring Updates over Time Frames and/or SQL Query with Regex counting dates within string

This is probably a fork in the road question. I have a journal blog that date stamps a continuation of a single field within a record.
Example:
Proj #1 (ID): Notes (memo field:) 10/12/2012 - visited site. 10/11/2012 - updated information. 10/11/2012 - call client. 10/10/2012 - Input information.
Proj #2 (ID): Notes (memo field:) 10/10/12 - visited site. 10/10/2012 - call client. 10/9/2012 - Input information. 10/1/2012 - Started project. etc etc...
I need to count how many updates where made over a specific time frame. I know I can create a hidden field and add + 1 everytime there is an update which is useful for an OVERALL update count... but how can i keep track of number of updates over the last 5 days. Like the example above you may update it twice in one day and I may not care about updates made 2 weeks ago.
I think I need to create an SQL that counts the number of "dates" since 10/10/12 or since 10/2/12 etc.
I have done the SQL: SELECT memo FROM Projects WHERE memo IN ('%10/10/12%', '%10/9/2012%' etc)
and then the Len(memoStringCombined) - Len(Replace(searchword""etc)/Len(searchword) and it works fine for countings a single date... but if I have count multiple dates over 30 days it gets to be quite cumbersome to keep rewriting each search word. Is there a regex or obj that can loop through this for me?
Otherwise any other suggestions for counting updates between time frames would be greatly appreciated.
BTW - I can't really justify creating a new table dedicated to tracking updates because there will be 100's of updates for close to 10,000 records which means the update tracking table will be more monstrous than the data... or am I wrong with that idea too?

SQL select certain number of rows

Hello I need a SQL query statement that gets me rows 'start' to 'finish'.
For example:
A website with many items where page 1 selects only items 1-10, page 2 has 11-20 and so on.
I know how to do this with Microsoft SQL Server and MySQL but I need an implementation that is platform independent. :/
I have an Increment line for IDs but deleting in-between will mess the result when I select via
WHERE ID > number AND ID < othernumber
of course
Is this possible without fetching the whole database to a ResultSet?
I think your safest bet would be to use the BETWEEN operator. I believe it works across Oracle/MySQL/MSSQL.
WHERE ID BETWEEN number AND othernumber
Concerning your comment " I was just think for the case when first 100 IDs are gone I'll have to check further until there is something to fetch", you might wanna consider NOT actually ever deleting stuff from your database but to add a flag like "active" or something like that to your tables so you can avoid situations like the one you're now trying to avoid. The alternative is where you are now, having to find the max and min rows in a filter