Cannot create AKS in azure with command "az aks create" - azure-container-service

az aks create -n MyServices -g MyKubernetes --generate-ssh-keys
is not working. Error message: az aks create -n Adestis-Services -g Adestis.Kubernetes --generate-ssh-keys
A Cloud Shell credential problem occurred. When you report the issue with the error below, please mention the hostname '679e170bedd7'
Could not retrieve token from local cache.
Steps to reproduce:
az login
az account set --subscription MySubscriptionID
az group create --name "MyKubernetes" --location "westus"
az aks create -n MyServices -g MyKubernetes --generate-ssh-keys

AKS has been rolled out to East US new region. Can you please try deploying an AKS cluster in East US region?

Adding to #sauryadas_ answer, AKS is currently available in only 5 regions: East US, Central US, West Europe, Canada East and Canada Central.

to repro your issue via Bash Client, I attempted to create a new, 1 node AKS cluster on West US 2 but encountered Provisioning failed exception due to operational threshold limits - likely a capacity issue - could be the root cause of your issue given the provisioning service is down.
As a control, I created an AKS cluster on UK west and it successfully provisioned with no errors. Can you check if you are able to reproduce your issue on UK west? here's a sample cmd I used.
Create Resource Group on UKwest: az group create --name myResprpUK --location ukwest
Create Single node, AKS Cluster: az aks create --resource-group myResprpUK --name myAKSclusterUK --agent-count 1 --generate-ssh-keys
Hope this helps.

Use bash instead of Powershell
Create Resource Group in ukwest
Add required ResourceProviders (Microsoft.Compute and Microsoft.Network) manually (e.g. via Azure Portal or CLI)
Create AKS in this ResourceGroup
The commands used are the provided commands in the question and in answer from #Femi-Sulu. They keypoints are:
- Use bash
- Use region ukwest
- Add ResourceProviders manually
Please read comments in answer from #Femi-Sulu!

There are 2 ways to create your cluster
Approach 1:
Via bash / powershell / cmd
a) Make sure you are logged in to az account [use az login]
b) select your subscription by typing az account set --subscription <subacription-name>
c) az aks create --resource-group <your-RG-name> --name <your-cluster-name> -node-count 3 --generate-ssh-keys
Approach 2: Via portal.azure.com.
a) Search for Azure Kubernetes service on the top search bar
b) create your AKS cluster with all the options
c) Bonus here is you can also bring your cluster into your org custom VNet as well.

Related

How can I use a SystemAssigned identity when pulling an image from Azure Container Registry into Azure Container Instances?

I want to create a container (or container group) in Azure Container Instances, pulling the image(s) from Azure Container Registry - but with using a SystemAssigned identity. With that I want to avoid using ACR login credentials, a service principal or a UserAssigned identity.
When I run this script (Azure CLI in PowerShell) ...
$LOC = "westeurope"
$RG = "myresourcegroup"
$ACRNAME = "myacr"
az configure --defaults location=$LOC group=$RG
$acr = az acr show -n $ACRNAME -o json | ConvertFrom-Json -Depth 10
az container create --name app1 --image $($acr.loginServer+"/app1") `
--assign-identity --role acrpull --scope $acr.id `
--debug
... ACI does not seem to recognize that it should be already authorized for ACR and shows this prompt:
Image registry username:
Azure CLI version: 2.14.0
Does this make sense? Is the ACI managed identity supported for ACR?
In your code, when you create an Azure container with a managed identity that is being created at the ACI creating time to authenticate to ACR. I am afraid that you can not do that because there are limitations
You can't use a managed identity to pull an image from Azure Container
Registry when creating a container group. The identity is only
available within a running container.
From Jan 2022 on managed identity is supported on Azure Container Instance to access Azure Container Registry: https://learn.microsoft.com/en-us/azure/container-instances/using-azure-container-registry-mi
#minus_one -solution do not work in my case. Runbook to make container registry. It does need more priviledges than stated in here...
https://github.com/Azure/azure-powershell/issues/3215
This solution will not use managed identity, and it is important to note that we will need owner role at least on the resource group level.
The main idea is to use service principals to get the access using the acrpull role. See the following PowerShell script:
$resourceGroup = (az group show --name $resourceGroupName | ConvertFrom-Json )
$containerRegistry = (az acr show --name $containerRegistryName | ConvertFrom-Json)
$servicePrincipal = (az ad sp create-for-rbac `
--name "${containerRegistryName}.azurecr.io" `
--scopes $containerRegistry.id `
--role acrpull `
| ConvertFrom-Json )
az container create `
--name $containerInstanceName `
--resource-group $resourceGroupName `
--image $containerImage `
--command-line "tail -f /dev/null" `
--registry-login-server "${containerRegistryName}.azurecr.io" `
--registry-username $servicePrincipal.appId `
--registry-password $servicePrincipal.password
Please note that we have created a service principal, so we also need to remove that:
az ad sp delete --id $servicePrincipal.appId
There is a documentation on how to do that:
Deploy to Azure Container Instances from Azure Container Registry
Update:
I think the --registry-login-server ${containerRegistryName}.azurecr.io" option was missing.

azdata erroring out due to config file error

All,
The admin setup a 3 node AKS cluster today. I got the kube/config file update by running the az command
az aks get-credentials --name AKSBDCClus --resource-group AAAA-Dev-RG --subscription AAAA-Subscription.
I was able to run all the kubectl commands fine but when I tried setting up the SQLServer 2019 BDC by running azdata bdc create it gave me an error Failed to complete kube config setup.
Since it was something to do with azdata and kubectl I checked the azdata logs and this is what I see in the azdata.log.
Loading default kube config from C:\Users\rgn\.kube\config
Invalid kube-config file. Expected all values in kube-config/contexts list to have 'name' key
Thinking the config file could have got corrupted I tried running az aks get-credentials --name AKSBDCClus --resource-group AAAA-Dev-RG --subscription AAAA-Subscription.
This time I got whole lot of error
The client 'rgn#mycompany.com' with object id 'XXXXX-28c3-YYYY-ZZZZ-AQAQAQd'
does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/listClusterUserCredential/action'
over scope '/subscriptions/Subscription-ID/resourceGroups/
ResourceGroup-Dev-RG/providers/Microsoft.ContainerService/managedClusters/AKSCluster' or the scope is invalid. If access was recently granted, please refresh your credentials.
I logged out and logged back into azure and retried but got the same errors as above. I was able to even stop the VM Scale before I logged for the day. Everything works fine but I'm unable to run azdata script.
Can someone point me in the right direction.
Thanks,
rgn
Turns out that the config file was bad. I deleted the file and ran "az aks get-credentials" (after the necessary permissions to run it) and it worked. The size of old config is 19kb but the new one is 10k.
I guess, I might have messed it up while testing "az aks get-credentials"

How to create a service networking creation from a gcp project to "default" network

I wanted to use the gcloud cli to create an sql instance that is accessible on the default network. So I tried this:
gcloud beta sql instances create instance1 \
--network projects/peak-freedom-xxxxx/global/networks/default
And I get the error
ERROR: (gcloud.beta.sql.instances.create) [INTERNAL_ERROR] Failed to create subnetwork.
Please create Service Networking connection with service 'servicenetworking.googleapis.com'
from consumer project '56xxxxxxxxx' network 'default' again.
When you go to the console to create it, you can check Private IP you can see this:
And there's an "Allocate and connect" button. So I'm guessing that's what I need to do. But I can't figure out how to do that with the gcloud cli.
Can anyone help?
EDIT 1:
I've tried setting the --network to https://www.googleapis.com/compute/alpha/projects/testing-project-xxx/global/networks/default
Which resulted in
ERROR: (gcloud.beta.sql.instances.create) [INTERNAL_ERROR] Failed to
create subnetwork. Set Service Networking service account as
servicenetworking.serviceAgent role on consumer project
Then I tried recreating a completely new project and enabling the Service Networking API like so:
gcloud --project testing-project-xxx \
services enable \
servicenetworking.googleapis.com
And then creating the DB resulted in the same error. So I tried to manually add the servicenetworking.serviceAgent role and ran:
gcloud projects add-iam-policy-binding testing-project-xxx \
--member=serviceAccount:service-PROJECTNUMBER#service-networking.iam.gserviceaccount.com \
--role=roles/servicenetworking.serviceAgent
This succeeded with
Updated IAM policy for project [testing-project-xxx].
bindings:
- members:
- user:email#gmail.com
role: roles/owner
- members:
- serviceAccount:service-OJECTNUMBERRP#service-networking.iam.gserviceaccount.com
role: roles/servicenetworking.serviceAgent
etag: XxXxXX37XX0=
version: 1
But creating the DB failed with the same error. For reference, this is the command line I'm using to create the DB:
gcloud --project testing-project-xxx \
beta sql instances create instanceName \
--network=https://www.googleapis.com/compute/alpha/projects/testing-project-xxx/global/networks/default \
--database-version POSTGRES_11 \
--zone europe-north1-a \
--tier db-g1-small
the network name of form "projects/peak-freedom-xxxxx/global/networks/default" is for creating SQL instances under shared VPC network. if you want to create an instance in a normal VPC network you should use:
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
where [VPC_NETWORK_NAME] is of the form https://www.googleapis.com/compute/alpha/projects/[PROJECT_ID]/global/networks/[VPC_NETWORK_NAME]
for more information check here.
Note: you need to configure private service access for this and it's one time action only. follow instructions here to do so.

GKE clusterrolebinding for cluster-admin fails with permission error

I've just created a new cluster using Google Container Engine running Kubernetes 1.7.5, with the new RBAC permissions enabled. I've run into a problem allocating permissions for some of my services which lead me to the following:
The docs for using container engine with RBAC state that the user must be granted the ability to create authorization roles by running the following command:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin [--user=<user-name>]
However, this fails due to lack of permissions (which I would assume are the very same permissions which we are attempting to grant by running the above command).
Error from server (Forbidden):
User "<user-name>" cannot create clusterrolebindings.rbac.authorization.k8s.io at the cluster scope.:
"Required \"container.clusterRoleBindings.create\" permission."
(post clusterrolebindings.rbac.authorization.k8s.io)
Any help would be much appreciated as this is blocking me from creating the permissions needed by my cluster services.
Janos's answer will work for GKE clusters that have been created with a password, but I'd recommend avoiding using that password wherever possible (or creating your GKE clusters without a password).
Using IAM: To create that ClusterRoleBinding, the caller must have the container.clusterRoleBindings.create permission. Only the OWNER and Kubernetes Engine Admin IAM Roles contain that permission (because it allows modification of access control on your GKE clusters).
So, to allow person#company.com to run that command, they must be granted one of those roles. E.g.:
gcloud projects add-iam-policy-binding $PROJECT \
--member=user:person#company.com \
--role=roles/container.admin
If your kubeconfig was created automatically by gcloud then your user is not the all powerful admin user - which you are trying to create a binding for.
Use gcloud container clusters describe <clustername> --zone <zone> on the cluster and look for the password field.
Thereafter execute kubectl --username=admin --password=FROMABOVE create clusterrolebinding ...

Google cloud dataproc failing to create new cluster with initialization scripts

I am using the below command to create data proc cluster:
gcloud dataproc clusters create informetis-dev
--initialization-actions “gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs://dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh”
--image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
--num-preemptible-workers 2 --num-workers 2 --preemptible-worker-boot-disk-size 1TB --properties hive:hive.metastore.warehouse.dir=gs://informetis-dev/hive-warehouse
--worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev
But Dataproc failed to create cluster with following errors in failure file:
cat
+ mysql -u hive -phive-password -e '' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)
+ mysql -e 'CREATE USER '\''hive'\'' IDENTIFIED BY '\''hive-password'\'';' ERROR 2003 (HY000): Can't connect to MySQL
server on 'localhost' (111)
Does anyone have any idea behind this failure ?
It looks like you're missing the --scopes sql-admin flag as described in the initialization action's documentation, which will prevent the CloudSQL proxy from being able to authorize its tunnel into your CloudSQL instance.
Additionally, aside from just the scopes, you need to make sure the default Compute Engine service account has the right project-level permissions in whichever project holds your CloudSQL instance. Normally the default service account is a project editor in the GCE project, so that should be sufficient when combined with the sql-admin scopes to access a CloudSQL instance in the same project, but if you're accessing a CloudSQL instance in a separate project, you'll also have to add that service account as a project editor in the project which owns the CloudSQL instance.
You can find the email address of your default compute service account under the IAM page for your project deploying Dataproc clusters, with the name "Compute Engine default service account"; it should look something like <number>#project.gserviceaccount.com`.
I am assuming that you already created the Cloud SQL instance with something like this, correct?
gcloud sql instances create g-test-1022 \
--tier db-n1-standard-1 \
--activation-policy=ALWAYS
If so, then it looks like the error is in how the argument for the metadata is formatted. You have this:
--metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
Unfortuinately, the zone looks to be incomplete (asia-east1 instead of asia-east1-b).
Additionally, with running that many initializayion actions, you'll want to provide a pretty generous initialization action timeout so the cluster does not assume something has failed while your actions take awhile to install. You can do that by specifying:
--initialization-action-timeout 30m
That will allow the cluster to give the initialization actions 30 minutes to bootstrap.
By the time you reported, it was detected an issue with cloud sql proxy initialization action. It is most probably that such issue affected you.
Nowadays, it shouldn't be an issue.