Autoscaling an Azure Function in a VNet - azure-function-app

I have an azure function in an S1 app service plan.
The function and its storage account are in a vnet, meaning the table storage is not publicly accessible.
I can manually scale the function via the app service plan scale out section and the function instances increase as expected.
If i try to use the autoscale option in the same section, the scaling saves, but the scaling that is defined never occurs and there is no logging to suggest the scaling evaluation is taking place.
Should autoscaling work in this vnet scenario?

I ended up using a premium plan to get the behaviour I wanted (scaling out as a queue grows in size).
As well as upgrading to a premium plan I had to:
Upgrade my ARM template for "Microsoft.Web/sites/config" to use apiVersion "2019-08-1"
In this template set functionsRuntimeScaleMonitoringEnabled to true (as outlined in https://learn.microsoft.com/en-us/azure/azure-functions/functions-networking-options)
Upgrade my storage extension package ("Microsoft.Azure.WebJobs.Extensions.Storage") to version "3.0.10"
Now the app will scale as the queues that the app is monitoring grow.

Related

Best practice with azure container instances

Was asked to build a cloud architecture plan in azure with serverless, yet with ability to transition to a kubernetes. So was thinking to use azure container instance since function not really a container deployment model. My limited understanding of aci seems to lacking a lot, e.g. health check, scale, auto heal etc
Question, whats the best practice for aci is there a workaround to support these capabilities, looking at msg website it looks promising but hard to dig out the exact recommended design
Combine ACI with the ACI Logic Apps connector, Azure queues and Azure Functions to build robust infrastructure that can elastically scale out containers on demand. With Azure Container Instances, you can run complex tasks that are capable of responding to events.
For the Azure container Instance itself, it mainly benefits for its fastest and simplest compare with the VM, AKS, Web App and so on. But you do not have much control on it. And its main aim is to test your image if it can run as you expect.
The Azure Logic or Azure Function, just help you to create and delete the Container Instance in the time you want. Or they can get the state or some message from the Container Instance and no more. So if you want to use the Azure Container Instance and other services such as Azure Logic, you need to know what it can help you.
If you have another questions about this issue, please let me know and I will try my best to help you.

.Net Core Hosted Services in a Load Balanced Environment

We are developing a Web API using .Net Core. To perform background tasks we have used Hosted Services.
System has been hosted in AWS Beantalk Environment with the Load Balancer. So based on the load Beanstalk creates/remove new instances of the system.
Our problem is,
Since background services also runs inside the API, When load balancer increases the instances, number of background services also get increased and there is a possibility to execute same task multiple times. Ideally there should be only one instance of background services.
One way to tackle this is to stop executing background services when in a load balanced environment and have a dedicated non-load balanced single instance environment for background services only.
That is a bit ugly solution. So,
1) Is there a better solution for this?
2) Is there a way to identify the primary instance while in a load balanced environment? If so I can conditionally register Hosted services.
Any help is really appreciated.
Thanks
I am facing the same scenario and thinking of a way to implement a custom service architecture that can run normally on all of the instance but to take advantage of pub/sub broker and distributed memory service so those small services will contact each other and coordinate what's to be done. It's complicated to develop yes but a very robust solution IMO.
You'll "have to" use a distributed "lock" system. You'll have to use, for example, a distributed memory cache who put a lock when someone (a node of your cluster) is working on background. If another node is trying to do the same job, he'll be locked by the first lock if the work isn't done yet.
What i mean, if all your nodes doesn't have a "sync handler" you can't handle this kind of situation. It could be SQL app lock, distributed memory cache or other things ..
There is something called Mutex but even that won't control this in multi-instance environment. However, there are ways to control it to some level (may be even 100%). One way would be to keep a tracker in the database. e.g. if the job has to run daily, before starting your job in the background service you might wanna query the database if there is any entry for today, if not then you will insert an entry and start your job.

Can VMs on Google Compute detect when they've been migrated?

Is it possible to notify an application running on a Google Compute VM when the VM migrates to different hardware?
I'm a developer for an application (HMMER) that makes heavy use of vector instructions (SSE/AVX/AVX-512). The version I'm working on probes its hardware at startup to determine which vector instructions are available and picks the best set.
We've been looking at running our program on Google Compute and other cloud engines, and one concern is that, if a VM migrates from one physical machine to another while running our program, the new machine might support different instructions, causing our program to either crash or execute more slowly than it could.
Is there a way to notify applications running on a Google Compute VM when the VM migrates? The only relevant information I've found is that you can set a VM to perform a shutdown/reboot sequence when it migrates, which would kill any currently-executing programs but would at least let the user know that they needed to restart the program.
We ensure that your VM instances never live migrate between physical machines in a way that would cause your programs to crash the way you describe.
However, for your use case you probably want to specify a minimum CPU platform version. You can use this to ensure that e.g. your instance has the new Skylake AVX instructions available. See the documentation on Specifying the Minimum CPU Platform for further details.
As per the Live Migration docs:
Live migration does not change any attributes or properties of the VM
itself. The live migration process just transfers a running VM from
one host machine to another. All VM properties and attributes remain
unchanged, including things like internal and external IP addresses,
instance metadata, block storage data and volumes, OS and application
state, network settings, network connections, and so on.
Google does provide few controls to set the instance availability policies which also lets you control aspects of live migration. Here they also mention what you can look for to determine when live migration has taken place.
Live migrate
By default, standard instances are set to live migrate, where Google
Compute Engine automatically migrates your instance away from an
infrastructure maintenance event, and your instance remains running
during the migration. Your instance might experience a short period of
decreased performance, although generally most instances should not
notice any difference. This is ideal for instances that require
constant uptime, and can tolerate a short period of decreased
performance.
When Google Compute Engine migrates your instance, it reports a system
event that is published to the list of zone operations. You can review
this event by performing a gcloud compute operations list --zones ZONE
request or by viewing the list of operations in the Google Cloud
Platform Console, or through an API request. The event will appear
with the following text:
compute.instances.migrateOnHostMaintenance
In addition, you can detect directly on the VM when a maintenance event is about to happen.
Getting Live Migration Notices
The metadata server provides information about an instance's
scheduling options and settings, through the scheduling/
directory and the maintenance-event attribute. You can use these
attributes to learn about a virtual machine instance's scheduling
options, and use this metadata to notify you when a maintenance event
is about to happen through the maintenance-event attribute. By
default, all virtual machine instances are set to live migrate so the
metadata server will receive maintenance event notices before a VM
instance is live migrated. If you opted to have your VM instance
terminated during maintenance, then Compute Engine will automatically
terminate and optionally restart your VM instance if the
automaticRestart attribute is set. To learn more about maintenance
events and instance behavior during the events, read about scheduling
options and settings.
You can learn when a maintenance event will happen by querying the
maintenance-event attribute periodically. The value of this
attribute will change 60 seconds before a maintenance event starts,
giving your application code a way to trigger any tasks you want to
perform prior to a maintenance event, such as backing up data or
updating logs. Compute Engine also offers a sample Python script
to demonstrate how to check for maintenance event notices.
You can use the maintenance-event attribute with the waiting for
updates feature to notify your scripts and applications when a
maintenance event is about to start and end. This lets you automate
any actions that you might want to run before or after the event. The
following Python sample provides an example of how you might implement
these two features together.
You can also choose to terminate and optionally restart your instance.
Terminate and (optionally) restart
If you do not want your instance to live migrate, you can choose to
terminate and optionally restart your instance. With this option,
Google Compute Engine will signal your instance to shut down, wait for
a short period of time for your instance to shut down cleanly,
terminate the instance, and restart it away from the maintenance
event. This option is ideal for instances that demand constant,
maximum performance, and your overall application is built to handle
instance failures or reboots.
Look at the Setting availability policies section for more details on how to configure this.
If you use an instance with a GPU or a preemptible instance be aware that live migration is not supported:
Live migration and GPUs
Instances with GPUs attached cannot be live migrated. They must be set
to terminate and optionally restart. Compute Engine offers a 60 minute
notice before a VM instance with a GPU attached is terminated. To
learn more about these maintenance event notices, read Getting live
migration notices.
To learn more about handling host maintenance with GPUs, read
Handling host maintenance on the GPUs documentation.
Live migration for preemptible instances
You cannot configure a preemptible instances to live migrate. The
maintenance behavior for preemptible instances is always set to
TERMINATE by default, and you cannot change this option. It is also
not possible to set the automatic restart option for preemptible
instances.
As Ramesh mentioned, you can specify the minimum CPU platform to ensure you are only migrated to an instance which has at least the minimum CPU platform you specified. At a high level it looks like:
In summary, when you specify a minimum CPU platform:
Compute Engine always uses the minimum CPU platform where available.
If the minimum CPU platform is not available or the minimum CPU platform is older than the zone default, and a newer CPU platform is
available for the same price, Compute Engine uses the newer platform.
If the minimum CPU platform is not available in the specified zone and there are no newer platforms available without extra cost, the
server returns a 400 error indicating that the CPU is unavailable.

What does each Spinnaker deployment strategy mean?

I would like to know what each strategy means and how they work behind the scenes (i.e., Highlander, Red/Black, Rolling Push). It would be very useful to have this information on the official website.
Thanks
There is useful information out there that can help you with your question, I'll do my best to summarize it below.
Type and Strategies of Deployments Introduction
"There are a variety of techniques to deploy new applications to
production, so choosing the right strategy is an important decision,
weighing the options in terms of the impact of change on the system,
and on the endusers."
Recreate: (also known as Highlander) Version A is terminated then version B is rolled out.
Ramped (also known as Rolling-Update or Incremental): Version B is slowly rolled out and replacing version A.
Blue/Green (also known as Red/Black): Version B is released alongside version A, then the traffic is switched to version B.
Canary: Version B is released to a subset of users, then proceed to a full rollout.
A/B Testing: Version B is released to a subset of users under specific condition.
Shadow: Version B receives real-world traffic alongside version A and doesn’t impact the response.
Type and Strategies of Deployments Summary Table
Ref link 1: https://thenewstack.io/deployment-strategies/
Spinnaker Deployment Strategies
Spinnaker treats cloud-native deployment strategies as first class constructs, handling the underlying orchestration such as verifying health checks, disabling old server groups and enabling new server groups.
Spinnaker supported deployment strategies (in active development):
Highlander
Red/Black (a.k.a. Blue/Green)
Rolling Red/Black
Canary
Illustrated in the Figure below as follows:
Highlander: This deployment strategy is aptly named after the film Highlander because of the famous line, "there can be only one." With this strategy, there is a load balancer fronting a single cluster. Highlander destroys the previous cluster after the deployment is completed. This is the simplest strategy, and it works well when rollback speed is unimportant or infrastructure costs need to be kept down.
Red/Black: This deployment strategy is also referred to as Blue/Green. The Red/Black strategy uses a load balancer and two target clusters / server groups (known as red/black or blue/green). The load balancer routes traffic to the active (enabled) cluster / server group. Then, a new deployment replaces servers (w/ K8s provider -> Replica Sets & Pods) in the disabled cluster / server group. When the newly enabled cluster / server group is ready, the load balancer routes traffic to this cluster and the previous cluster becomes disabled. The currently disabled cluster / server group (previously enabled cluster / server groups) is kept around by spinnaker in case a rollback is needed for the next X deployments (which is a configurable parameter).
Rolling Red/Black: is a slower red/black with more possible verification points. The process is the same as red/black, but difference is in how traffic switches over. The above image illustrates this difference. Blue is the enabled cluster. Blue instances are gradually replaced by new instances in the green cluster until all enabled instances are running the newest version. The rollout may occur in 20% increments, so it can be 80/20, 60/40, 40/60, 20/80, or 100%. Both blue/green clusters receive traffic until the rollout is complete.
Canary: deployments is a process in which a change is partially deployed, then tested against baseline metrics before continuing. This process reduces the risk that a change will cause problems when it has been completely rolled out by limiting your blast radius to a small percentage of your user-base. The baseline metrics are set when configuring the canary. Metrics may be error count or latency. Higher-than-baseline error counts or latency spikes kill the canary, and thus stop the pipeline.
Ref link 2: https://www.spinnaker.io/concepts/#deployment-strategies
Ref link 3: https://blog.armory.io/advanced-deployment-strategies-with-armory-spinnaker/
Ref link 4: https://www.weave.works/blog/kubernetes-deployment-strategies
As I understand it:
Highlander: when the new Auto Scaling group (ASG) is up and healthy, all old ASGs are destroyed automatically.
Red/Black: A new ASG is launched, some manual (or more complicated than in Highlander) verification steps are done, and only after those steps are completed is the old ASG manually deleted. Netflix blog post here: http://techblog.netflix.com/2013/08/deploying-netflix-api.html
Rolling push: "Old instances get gracefully deleted and replaced by new instances one or two at a time until all the instances in the ASG have been replaced." Netflix blog post here: http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
At my company we only use Highlander and Red/Black on a regular basis.

On NServiceBus Profiles

I've been trying to find out ways to improve our nservicebus code performance. I searched and stumbled on these profiles that you can set upon running/installing the nservicebus host.
Currently we're running the nservicebus host as-is, and I read that by default we are using the "Lite" version of the available profiles. I've also learnt from this link:
http://docs.particular.net/nservicebus/hosting/nservicebus-host/profiles
that there are Integrated and Production profiles. The documentation does not say much - has anyone tried the Production profiles and noticed an improvement in nservicebus performance? Specifically affecting the speed in consuming messages from the queues?
One major difference between the NSB profiles is how they handle storage of subscriptions.
The lite, integration and production profiles allow NSB to configure how reliable it is. For example, the lite profile uses in-memory subscription storage for all pub/sub registrations. This is a concern because in order to register a subscriber in the lite profile, the publisher has to already be running (so the publisher can store the subscriber list in memory). What this means is that if the publisher crashes for any reason (or is taken offline), all the subscription information is lost (until each subscriber is restarted).
So, the lite profile is good if you are running on a developer machine and want to quickly test how your services interact. However, it is just not suitable to other environments.
The integration profile stores subscription information on a local queue. This can be good for simple environments (like QA etc.). However, in a highly distributed environment holding the subscription information in a database is best, hence the production profile.
So, to answer your question, I don't think that by changing profiles you will see a performance gain. If anything, changing from the lite profile to one of the other profiles is likely to decrease performance (because you incur the cost of accessing queue or database storage).
Unless you tuned the logging yourself, we've seen large improvements based on reduced logging. The performance from reading off the queues is same all around. Since the queues are local, you won't gain much from the transport. I would take a look at tuning your handlers and the underlying infrastructure. You may want to check out tuning MSMQ and look at the disk you are using etc. Another spot would be to look at how distributed transactions are working assuming you are using a remote database that requires them.
Another option to increase processing time is to increase the number of threads consuming the queue. This will require a license. If a license is not an option you can have multiple instances of a single threaded endpoint running. This requires you shard your work based on message type or something else.
Continuing up the scale you can then get into using the Distributor to load balance work. Again this will require a license, but you'll be able to add more nodes as necessary. All of the opportunities above also apply to this topology.