Azure Streaming Analytics Jobs scaling SU - azure-stream-analytics

I'm running into an issue I can't fix:
I created a Azure Streaming Analytics Jobs, that sometime run into this error:
Resource usage is over the capacity for one or more of the query steps.
Event processing may be delayed or stop making progress. This may be a
result of large window in your query, large events in your input, large out
of order tolerance window, or a combination of the above. Please try to
partition your query, or break down your query to more steps, and add
Streaming Unit resources from the Scale tab to avoid such condition., :
So I decided to scale up the SU. I stopped the job, I open the scale pane and the input box keep grey. I can't change the SU value, no error message.
What can I do?
Many thanks!

In order to change the SU scale you need admin/owner role on the Azure Stream Analytics job.
Sorry for the inconvenience, we are changing the user experience in order to make this more explicit.
Let me know if it works for you after you get the admin role.
Thanks.,
JS

Related

When is a Stream Analytics job without Window getting triggered

I'm unclear on when exactly a Stream Analytics job is getting called if I don't specify any windowing function. Is it running periodically or is it getting called whenever a new event arrives? Unfortunately I haven't found any documentation specifically mentioning this yet.
Background: I have multiple devices sending updates to my IoT Hub and I only want to do a new calculation whenever a certain device sends an update.
Without a windowing function, your Azure Stream Analytics job will run every time an input message is received. However, it might batch the output depending on the egress rate. This depends on the type of the output as well, as described on this page.
One sidenote is that depending on how complex your query is, you need to take the Streaming Units usage into account. Even if you're not looking at windowed functions, you want to check this page if you want to use UDF's or reference data, as it might affect your memory usage.

SSAS Tabular Cube Reload (Seems to need a user to trigger the load of the data form disk)

We are seeing some odd behaviour on our SSAS instances. We process our cubes as part of an overnight job on different environments, on our prod environment we process the cube on a separate server and then sync it out to a set of user facing servers. We are however seeing this behaviour even on environments where we process and query on a single instance.
The first user that hits any environment with fresh data seems to trigger a reload of the cube data from disk. Given we have 2 cubes that run to some 20Gb this takes a while. During this we are seeing low CPU utilisation, but, we can see the memory footprint of the SSAS instance spooling up, this is very visible if the instance has just been started as it seems to start using a couple of hundred Mb initially and then spool up to 22Gb at which point is becomes responsive for end users. During the spool up DAX stuiod/Excel/SSMS all seem to hang a far as the end user is concerned. Profiler isn't showing anything usfeul other than very slow responses to META data discover requests.
Is there a setting somewhere that can change this? Or do I have to run some DAX against the cube to "prewarm" it?
Is this something I've missed in the past because all my models were pretty small (sub 1Gb)
This is SQL 2016 SP2 running Tab Models at compat 1200.
Many thanks
Steve
I see that you are suffering from an acute OLAP cube cold. :)
You need to get it warmer (as you've guessed it, you need to issue a command against it, after (re)starting the service).
What you want to do, is issue a discover command - a query like this one should be enough:
SELECT * FROM $System.DBSCHEMA_CATALOGS
If you want the full story, and a detailed explanation on how to automate this warming, you can find my post here: https://fundatament.com/2018/11/07/moments-before-disaster-ssas-tabular-is-not-responding-after-a-server-restart/
Hope it helps.
Have fun. :)

Removing BigQuery Public Dataset

Does anyone know of any way to remove the public datasets from a BigQuery project?
Though the risk is very low, I don't want my users to be able to run queries against them and rack up costs.
Thanks
Its an old question, but for those who just want to unpin the "bigquery-public-data" to tidy up the resources list, you can click the name on the side, then on the far right of the info pane there is an "unpin project button". Click that.
The whole point of public datasets is that everyone has access to them so they can test BigQuery. Even if a feature request will create the option to disable the listing in the panel of the BigQuery web UI, the users will still have access and could query the public datasets.
It will be more practical to use custom quotas.
So you would create a project with a number of users that share a quota that you consider enough for their activities. When the established quota is reached BigQuery stops and the users receive an error message when trying to run queries.
Another useful tool is creating budget alerts with a desired level that you can set taking into account the previous month's spend. The alert will notify you when the project's bill have reached the amount you set and can save you from bad surprises.
In addition, implementing the Audit Logs in your project will give comprehensive overview on the BigQuery operations. Check this example of an Audit Logs query that will give details on the performed queries. Of course, you will find out about the use of a public dataset after it happens but this will point out who’s the user that performed the query and you can reinforce the administration policy of not inquiring public datasets. To get information on the performed query, including the interrogated dataset, use this field when querying the Audit Logs:
'protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.query'
As a last resort, you can create a designated project for your users to query the public datasets and to make sure it will not create additional costs, you can remove the billing account. Though, by doing so you can only query 1 TB of data per month, the BigQuery always free usage tier.
Also keep in mind about this best practices to limit the queries costs.
if you closed current tab , public data set will disappear from google BigQuery page

Data not showing up intermittently on the OpenTSDB UI

We are running some high volume tests by pushing metrics to OpenTSDB (2.3.0) with BigTable, and a curious problem surfaces from time to time. For some metrics, an hour of data stops showing up on the web UI when we run a query. The span of "missing" data is very clearcut and borders on the hour (UTC). After a while, while rerunning the same query, the data shows up. There does not seem to be any pattern that we can deduce here, other than the hour span. Any pointers on what to look for and debug this?
How long do you have to wait before the data shows up? Is it always the most recent hour that is missing?
Have you tried using OpenTSDB CLI when this is happening and issuing a scan to see if the data is available that way?
http://opentsdb.net/docs/build/html/user_guide/cli/scan.html
You could also check via an HBase shell scan to see if you can get the raw data that way (here's information on how it's stored in HBase):
http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
If you can verify the data is there then it seems likely to be a web UI problem. If not, the next likely culprit is something getting backed up in the write pipeline.
I am not aware of any particular issue in the Google Cloud Bigtable backend layer that would cause this behavior, but I believe some folks have encountered issues with OpenTSDB compactions during periods of high load that result in degraded performance.
It's worth checking in the Google Cloud Console to see if there's any outliers in the latency, CPU or throughput graphs that correlate with the times during which you experience the issue.

Automating scaleup of Streaming units - Stream analytics job

We would like to automate scale up of streaming units for certain stream analytics job if the 'SU utilization' is high. Is it possible to achieve this using PowerShell? Thanks.
Firstly, as Pete M said, we could call REST API to create or update a transformation within a job.
Besides, Azure Stream Analytics Cmdlets New-AzureRmStreamAnalyticsTransformation could be used to update a transformation within a job.
Depends on what you mean by "automate". You can update a transformation via the API from a scheduled job, including streaming unit allocation. I'm not sure if you can do this via the PS object model but you can always make a rest call:
https://learn.microsoft.com/en-us/rest/api/streamanalytics/stream-analytics-transformation
If you mean you want to use powershell to create and configure a job to automatically scale on its own, unfortunately today that isn't possible regardless of how you create the job. ASA doesn't support elastic scaling. You have to do it "manually", either by hand or some manner of scheduled webjob or similar.
It is three years later now, but I think you can use App Insights to automatically create an alert rule based on percent utilization. Is it an absolute MUST that you use powershell? If so, there is an Azure Automation Script on Github:
https://github.com/Azure/azure-stream-analytics/blob/master/Autoscale/StepScaleUp.ps1