Azure Input Event Arrival Early - azure-stream-analytics

Azure Input Event Arrival Early - azure-stream-analytics

I am stuck at the Input Event Arrival Earlier in azure stream analytics.
My input for Stream Analytics from Azure Event Hub and output is in CosmosDB.
Now the problem is I am collecting data offline as well. so when the user is connected back to interne, I am sending data to azure. it giving me an error as mention below.
Late input event, so for that I have increase accept late event time up to 3 days.
and now it's giving me the error as mention below
Input event arrival time is earlier than input event application timestamp by more than 5 minutes
anyone have idea to fix such an issue.

This means that the timestamp of a data point (the value in the column used by the Timestamp by statement) is greater than the wall-clock of the processing system, by more than an allowed threshold (5 mins).
One possible cause of such would be timezones - note that ASA works in UTC, so the timestamp must be in UTC, or have the timezone according to ISO 8601.
Another possible cause, is that the senders clock is significantly skewed, but the threshold is quite large, so this is less probable than the above.

Related

Time Travel error when updating Access Control on BigQuery dataset

We use python to programmatically grant authorized view / routine access to a large number of views to various datasets.
However since this week we have been receiving the following error :
Dataset time travel window can only be modified once in 1 hours. The previous change happened 0 hours ago
This is preventing our current deployment process.
And so far we have not been able to find a work around to resolve this error. Note that we do not touch the time travel configurations at all as a part of our process.

This seems to be an issue with the BigQuery API.

Google have said that they will be rolling back the breaking change to restore functionality within the day

Who is responsible for modifying timestamp fields in the app: backend or db?

For example: we have 2 fields in certain entity: created_at and updated_at. We can update those fields manually on backend after create or update operations, or create a trigger on the DB side that will fill/update these fields for us automatically.
There are some cases to consider:
Usually, on the backend after create or update we return the json of the object. In this case it'd be nice to see those timestamp fields set up on return, however if a trigger makes the modifying for us, to see these updated timestamps, backend would have to make another select just to set up these timestamps to nicely return it to the client.
Sometimes backend engineers can forget to update these fields manually leading to null records.
Not a DBA specialist myself but what do you think of the cost of the triggers? Especially in high RPS. Should I not worry about the performance that triggers have for such simple updates in the high-load systems?

Who is responsible for modifying timestamp fields in the app: backend or db?
It depends, it can be either.
There is no one "right" answer for which times to use or exclude. Depending on your system, which actors perform time-based actions (users, devices, servers, triggers), any (or all) of the list below might make sense to incorporate.
Depending on your system, you might have one (or more) of the following:
time A – when a user performs an action
this is most likely local device time (whatever the phone or computer thinks is current time)
but: anything is possible, a client could get a time from who-knows-where and report that to you
could be when a user did something (tap a button) and not when the message was sent to the backend
could be 10-20 seconds (or more) after a user did something (tap a button), and gets assigned by the device when it sends out batched data
time B – when the backend gets involved
this is server time, and could be when the server receives the data, or after the server has received and processed the data, and is about to hand it off to the next player (database, another server, etc)
note: this is probably different from "time A" due to transit time between user and backend
also, there's no guarantee that different servers in the mix all agree on time.. they can and should, but should not be relied upon as truth
time C – when a value is stored in the database
this is different from server time (B)
a server might receive inbound data at B, then do some processing which takes time, then finally submits an insert to the database (which then assigns time C)
Another highly relevant consideration in capturing time is the accuracy (or rather, the likely inaccuracy) of client-reported time. For example, a mobile device can claim to have sent a message at time X, when in fact the clock is just set incorrectly and actual time is minutes, hours - even days - away from reported time X (in the future or in the past). I've seen this kind of thing occur where data arrives in a system, claiming to be from months ago, but we can prove from other telemetry that it did in fact arrive recently (today or yesterday). Never trust device-reported times. This applies to a device – mobile, tablet, laptop, desktop – all of them often have internal clocks that are not accurate.
Remote servers and your database are probably closer to real, though they can be wrong in various ways. However, even if wrong, when the database auto-assigns datetimes to two different rows, you can trust that one of them really did arrive after or before the other – the time might be inaccurate relative to actual time, but they're accurate relative to each other.
All of this becomes further complicated if you intend to piece together history by using timestamps from multiple origins (A, B and C). It's tempting to do, and sometimes it works out fine, but it can easily be nonsense data. For example, it might seem safe to piece together history using a user time A, then a server time B, and database time C. Surely they're all in order – A happened first, then B, then C; so clearly all of the times should be ascending in value. But these are often out of order. So if you need to piece together history for something important, it's a good idea to look for secondary confirmations of order of events, and don't rely on timestamps.
Also on the subject of timestamps: store everything in UTC – database values, server times, client/device times were possible. Timezones are the worst.

Sending Notification to different time zones

I have a server in Usa and I have clients in different parts of the world, Australia, South america, Usa, Canada, Europe.
So I need to send notification of events one hour before the event take place.
So In sql server I have a table with different events those events are stored in Utc(2015-12-27 20:00:00.0000000). and in other table the timezone that belongs to every event ("Australia/Sydney").
So how could I calculate in a query when to send the notifications? or maybe I would have to do it with a server side language.
Could any one could help me with a possible solution.
Thanks

You've asked very broadly, so I can only answer with generalities. If you need a more specific answer, please edit your question to be more specific.
A few things to keep in mind:
Time zone conversions are best done in the application layer. Most server-side application platforms have time zone conversion functions, either natively or via libraries, or both.
If you must convert at the database layer (such as when using SSRS or SSAS, or complex stored procs, etc.) and you are using SQL Server, then there are two approaches to consider:
SQL Server 2016 CTP 3.1 adds native support for time zone conversions via the AT TIME ZONE statement. However, they work with Windows time zone identifiers, such as "AUS Eastern Standard Time", rather than IANA/Olson identifiers, such as the "Australia/Sydney" you specified.
You might use third-party support for time zones, such as my SQL Server Time Zone Support project, which does indeed support IANA/Olson time zone identifiers. There are other similar projects out there as well.
Regardless of whether you convert at the DB layer or at the application layer, the time zone of your server should be considered irrelevant. Always get the current time in UTC rather than local time. Always convert between UTC and a specific time zone. Never rely on the server's local time zone setting to be anything in particular. On many servers, the time zone is intentionally set to UTC, but you should not depend on that.
Nothing in your question indicates how you plan on doing scheduling or notifications, but that is actually the harder part. Specifically, scheduling events into the future should not be based on UTC, but rather on the event's specific time zone. More about this here.
You might consider finding a library for your application layer that will handle most of this for you, such as Quartz (Java) or Quartz.Net (.NET). There are probably similar solutions for other platforms.
You should read the large quantity of material already available on this subject here on Stack Overflow, including the timezone tag wiki and Daylight saving time and time zone best practices.

Handling day light savings across multiple time zones.- MS SQL Server

I've been tasked with handling import jobs in to SQL server based on various time zones. Files arrive on a Windows Server from multiple regions for example Brazil, Singapore, Australia, various parts of the U.S. and also Europe.
Each file will be imported in to SQL tables by multiple stored procedures. Each stored procedure needs to be executed based on a scheduled time according to the time zone related to the origin of the file.
Working from a set time is proving tricky due to the fact that each region adjusts for day light saving at different times of the year. Say for example the UK moves it's clock forward for day light saving, Brazil may not move their time forward for another 3 weeks (don't quote me on that, I've used those times only for example purposes).
My question is; how can I schedule jobs to run on the same server based on multiple time zones?
I can see this may be possible if I were to create a timezone lookup table in SQL which shows the relationship between each time zone at each stage of the year but this seems quite cumbersome and will also take a considerable amount of time to populate the table.
Windows scheduler seems to use the date/time settings of the local server and although it does adjust for daylight saving, this will only be appropriate for one region. Has anyone had to handle this in SQL Server before? Or can anyone recommend a scheduling tool external to SQL Server that can initiate tasks based on different time zones?
Any help or advice would be greatly appreciated.

You won't be able to transparently and easy configure a single instance of SQL Server to run several sets of tasks in different timezones, by definition (the instance is single, all sets of tasks will be in the same timezone).
You are, however, able to write your own script in any language you like (for example, CLR .NET extension for MSSQL or just plain Transact-SQL), which is configured to to the following:
Iterate over the list of each region you want the task to be run
Convert the time of the region to server time and set the action to be executed (via sp_schedule for example).
Repeat the next period.
This task should of course be run at +12 UTC, thus definitely making it execute first on that date (as the time conveniently starts in Japan).
Implementing it this way would be pretty clear and reliable regardless of daylight savings, timezone updates and everything. Just make sure to keep the configuration of your partners timezones up to date.

Google BigQuery: Slow streaming inserts performance

We are using BigQuery as event logging platform.
The problem we faced was very slow insertAll post requests (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/insertAll).
It does not matter where they are fired - from server or client side.
Minimum is 900ms, average is 1500s, where nearly 1000ms is connection time.
Even if there is 1 request per second (so no throttling here).
We use Google Analytics measurement protocol and timings from the same machines are 50-150ms.
The solution described in BigQuery streaming 'insertAll' performance with PHP suugested to use queues, but it seems to be overkill because we send no more than 10 requests per second.
The question is if 1500ms is normal for streaming inserts and if not, how to make them faster.
Addtional information:
If we send malformed JSON, response arrives in 50-100ms.

Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
We seen several side effects although:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.
Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.
There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.
You can check out this chart about your project health:
https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D
It happens that my response is on the linked other article, and I proposed the queues, because it made our exponential-backoff with retry very easy, and working with queues is very easy. We use Beanstalkd.

To my experience any request to bigquery will take long. We've tried using it as a database for performance data but eventually are moving out due to slow response times. As far as I can see. BQ is built for handling big requests within a 1 - 10 second response time. These are the requests BQ categorizes as interactive. BQ doesn't get faster by doing less. We stream quite some records to BQ but always make sure we batch them up (per table). And run all requests asynchronously (or if you have to in another theat).
PS. I can confirm what Pentium10 sais about faillures in BQ. Make sure you retry the stuff that fails and if it fails again log it to file for retrying it another time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas