I'm testing different Queries in Azure Stream Analytics, but I'm having a hard time figuring out what happens using different combinations of Window (tumbling, hopping, sliding & session) and TIMESTAMP BY
I'd expect EnqueueTime (when data was put into an eventqueue/blobstore) to be used as default (if not specifying any TIMESTAMP By), System.Time if TIMESTAMP BY System.Time and DataTime if specifying some Time property inside the data stream.
Though, based on my tests it doesn't seem to be that way....
Anyone who can explain it.. I have a hard time understanding the documentation....
EnqueuedTime is indeed the default time (EventEnqueuedUtcTime for Event Hub, or IoTHub.EnqueuedTime for IoT Hub).
When you select System.Time in combination with a time window, it will return the end of the window.
What do you observe in your tests? Let me know how we can help.
Thanks,
JS (Stream Analytics team)
Related
I have a few questions about Tableau and how dynamic it is:
Changes to the data in a relational db requires refresh, no events of refresh or something else?
Can we have visualizations at a runtime, e.g if we have some filters and we select some of them are the visualizations going to be updated?
If I have some API and want it to accept some params from the users and pass them to Tableau for querying, this would be a use case.
Tabeau queries your data source directly. Changes in your source are visible whenever Tableau launches a query.
There is no "in memory" database to be refreshed, but there are high performance extracts that can function as a buffer between your data source & Tableau, sometimes approaching in memory performance.
A "new" query to get fresh data is sent when:
Something changes, for example the user filters a view (see question 2 :-) )
The workbook or dashboard is opened
The refresh button is hit
Another periodic refresh is launched
A programmatic event triggers a refresh
It is possible to launch a reload of the extract or a new query based on a trigger. You could create a command line script on the server that is triggered by your source system to reload (using TABCMD - Tableau command line interface or TabPy - Tableau Python integration amongst other thing). Or you could use the API.
Yes, the visualizations will be updated each time you select a filter.
This behavior is very customizable, you have a lot of control on what filter refreshes what viz, even in the front end only.
EDITED for clarity
I'm adding some testing to my current project which uses Azure blob storage to store telemetry data coming from a stream analytics job. I want to do testing of the routines that get the telemetry data, so I created a separate container for test data. I downloaded a sample set of data, modified the data to serve my needs and re-uploaded (using Azure storage explorer) everything back into the new container.
The tests were immediately failing and I quickly found out that this is because the LastModified date of the files changed into the date/time of upload. This is fine, but the sequence of the upload was also different. My code uses the modified date of the file to find out which one is the most recent, which would now return a different file based on the new dates.
I found that you cannot modify this property, although you can change another property to have it update. So I know the solution: I could write a quick script which gets the sequence of files from my production instance and then touches every file in the test instance in the same sequence.
But... I was wondering whether this is the best option. I also read it's 'best practice' to store a custom datetime in a separate property, but I don't think I can do that straight from Stream Analytics (which is writing the blobs). I also considered using an Azure Function to do this (new blob => update property), but I'm than adding complexity and something that might fail for whatever reason.
So I'm looking for the best way to solve this problem. Anyone?
Update: this one probably deserves a tiny bit more explanation. Apart from using the LastModified date to sort on, I also use it to filter blobs. The blobs themselves are CSV files containing ASA output data, so telemetry records. Each record has a timestamp, but that information is IN the file. When retrieving data, I don't want to have to dive into each file to find out what the timestamp is of those records. So I use a prefilter to filter out the blobs within a certain timespan, and then only download / open those file to the records inside.
This works perfectly as long as you do not touch any of the blob, but obviously it stops working as soon as any of the blobs gets modified for whatever reason. So I'm now convinced that I need a different / better way to solve this issue; but how?
It seems to me that you have two separate things: the data that you want to store in blob storage and metadata about the blob such as the timestamp. I would create a different (azure) database for the metadata or even simpler just add metadata to the (block)blob:
blockBlob.Metadata.Add("from", dateTime.ToString());
blockBlob.Metadata.Add("to", dateTime.ToString());
blockBlob.Metadata.Add("order", "1");
For sorting I would just add a simple order property.
The comment by #Vignesh deserves the credit here, but in order to get this one marked answer I'll provide it myself.
With ASA, you can set the output to be structured by date/time. That means in this case, data is written to the blob store with a directory structure such as:
2016 / 06 / 27 / 15 / 23 (= 27-06-2016 15:23)
2016 / 06 / 28 / 11 / 02 (= 28-06-2016 11:02)
The ASA output allow you to specify how granular you want the structure to be, in my case I chose to store it by day (so not including a time path). The ASA runtime will now ensure that data from a certain point in time is stored within a blob in that resides in the correct path.
Then I subsequently changed my logic to not use the datetime stamp of the individual blob files any more, but simply read just the files from the folders that are within the timerange I'm interested in. That assures we only get data that was produced within that timerange. And if there's more than one file in a folder, I need to load them both since both were in the same timerange anyway. As long as minutes are enough granularity for you, this works excellent even though it might feel a bit strange to use a folder structure for such a thing.
Having a seperate 'index' for blobs which tracks their datetime would work too of course, but adds complexity which in this case I don't really need.
I'm using BigQuery to store logs. I recently realized BigQuery web UI seems to convert number with 16digits into date (like "12/3/2017, 10:00:13 AM") even though actual column type is "string".
Is there any way to stop web UI from doing this?
Data seems fine once I export it into CSV. It's just web UI issue I'm seeing.
After some investigation, it turned out to be the Streak extension which was doing this automatic conversion into timestamp format. One of its features is the following:
Automatically Convert Timestamps to Human Readable Format
AppEngine and BigQuery store timestamps as epoch time which is great for computers and computation, but not so good for a human to read. The extension automatically finds, parses and replaces epoch timestamps with a human friendly date/time rendering.
i have an SQL query (shown below) that i need to run on a regular basis:
db.execute("UPDATE property_info SET IsActive=false WHERE ExpiryDate > #0", CurrentDate);
This query is basically intended to check ALL properties, and to see whether or not they are past their expiration date. If they are, then it will automatically set the property to Inactive. Because "CurrentDate" is a rolling window, i want to re-run this query automatically, probably every day.
Is this something i should be using a stored procedure for?
Any suggestions on the best way to achieve this without any user interaction?
One simple way to achieve this would be to add the line of code to _PageStart.cshtml in the root of your project. This will make it execute every time any page on the site is executed. That is probably massively overkill for something that, by the looks of it, only needs to be checked once a day or so. To alleviate this you could employ a simple DateTime stamp in the Application collection to make sure it only runs a maximum of once every day or so (or tune the interval as appropriate for your needs). This is in no way a solution for fully scheduled code execution, but it may well serve your purposes (and your budget).
OK, first let me state that I have never used this control and this is also my first attempt at using a web service.
My dilemma is as follows. I need to query a database to get back a certain column and use that for my autocomplete. Obviously I don't want the query to run every time a user types another word in the textbox, so my best guess is to run the query once then use that dataset, array, list or whatever to then filter for the autocomplete extender...
I am kinda lost any suggestions??
Why not keep track of the query executed by the user in a session variable, then use that to filter any further results?
The trick to preventing the database from overloading I think is really to just limit how frequently the auto updater is allowed to update, something like once per 2 seconds seems reasonable to me.
What I would do is this: Store the current list returned by the query for word A server side and tie that to a session variable. This should be basically the entire list I would think. Then, for each new word typed, so long as the original word A exists, you can filter the session info and spit the filtered results out without having to query again. So basically, only query again when word A changes.
I'm using "session" in a PHP sense, you may be using a different language with different terminology, but the concept should be the same.
This question depends upon how transactional your data store is. Obviously if you are looking for US states (a data collection that would not change realistically through the life of the application) then I would either cache a System.Collection.Generic List<> type or if you wanted a DataTable.
You could easily set up a cache of the data you wish to query to be dependent upon an XML file or database so that your extender always queries the data object casted from the cache and the cache object is only updated when the datasource changes.
RAM is cheap and SQL is harder to scale than IIS so cache everything in memory:
your entire data source if is not
too large to load it in reasonable
time,
precalculated data,
autocomplete webservice responses.
Depending on your autocomplete desired behavior and performance you may want to precalculate data and create redundant structures optimized for reading. Make use of structs like SortedList (when you need sth like 'select top x ... where z like #query+'%'), Hashtable,...
While caching everything is certainly a good idea, your question about which data structure to use is an issue that wasn't fully answered here.
The best data structure for an autocomplete extender is a Trie.
You can find a good .NET article and code here.