Azure Log Analytics - Copy Performance Counters between Workspaces - azure-log-analytics

I set up a Log Analytics workspace for one of my servers and I manually added several performance counters.
Is there a fast way to copy the performance counters to future workspaces, or do I have to manually add them each time?

It isn't possible to copy Performance Counters between LA workspace, however you can automate the process using PowerShell. Here is a SO thread where one of the community member has provided sample solution on how to add the performance counter using PowerShell.
Another article for your reference: Create workspace and configure data sources.

Related

Index creation time is high on Azure Managed Instance

I am working with Azure Managed Instances for hosting a data warehouse. For the large table loads the indexes are removed and rebuilt instead of inserting with the indexes in place. The indexes are re-created using a stored procedure that builds them from a list kept in an admin table. When moving from our on-prem solution to the managed instance, we have seen considerable decrease in performance when building the indexes. The process takes roughly twice as long when running in Azure vs when running on-prem.
The specs for the server are higher in the Azure Managed Instance, more cores and more memory. We have looked at IO time and tried increasing file size to increase IO but it has had a minimal impact.
Why would it take longer to build indexes on the same data using the same code in an Azure Managed Instance than it does on an on-pre SQL Server?
Is there a setting or configuration in Azure that could be changed to improve performance?
Could you please check the transaction log file for the database. Monitor log space use by using sys.dm_db_log_space_usage. This DMV returns information about the amount of log space currently used and indicates when the transaction log needs truncation. Please see the referral link here sys.dm_db_log_space_usage (Transact-SQL) - SQL Server | Microsoft Docs
As creating the index will easily reach throughput limit either for data or log files, you might need to increase individual file sizes. Resource limits - Azure SQL Managed Instance | Microsoft Docs
You also can use this script managed-instance/MI-GP-storage-perf.sql at master · dimitri-furman/managed-instance · GitHub to determine if the IOPS/throughput seen against each database file.

how to clone bigquery datasets

We are evaluating bigquery and snowflake for our new cloud warehouse. Does bigquery has a cloning feature built-in? This will enable our developers to create multiple development environments quickly and we can also restore to point-in-time .Snowflake has a zero-copy clone to minimize the storage footprint. For managing DEV/QA environments in bigquery do we need to manually copy the datasets from prod? Please share some insights.
You can use a pre-GA feature Big query data transfer service to create copies of datasets, you can also schedule and configure the jobs to run periodically so that the target dataset is in sync with source dataset. Restoring to a point in time is available via FOR SYSTEM_TIME AS OF in FROM clause
I don't think there is an exact snowflake clone equivalent on Big query. What would this mean?
You will be charged for additional storage and for data transfer if its cross-region (pricing equivalent to Compute Engine network egress between regions)
Cloning is not instantaneous, for large tables(> 1 TB) you might still have to wait for a while before you see a new copy is created

How to handle index files in a distributed Lucene cluster?

We are using Lucene in our application, and the index files saved in the disk of the same server where the application run.
The index files are almost 2Gb at the moment, and they maybe updated sometime, for example, when new data are inserted into the database, we may have to rebuild that part of index and add them.
So far so good since there is only one application server, now we have to add another to make a cluster, so I wonder how to handle the index files?
BTW, out application should be platform independent, since our clients use different os like Linux, and some of them even use the cloud platform with different storage like Amazon EFS or Azure storage.
Seems I have two opinions:
1 Every server hold a copy of the index files, and the make them synchronized with each other.
But the synchronized mechanism will depend on the OS, we tried to avoid this. And I am not sure if it will cause conflict if two server update the index files with different documents at the sometime.
2 Make the index file shared.
Like 1), the file share mechanism is platform aware. Maybe save them to the database is an alternative, but how about the performance? I have thought to use memcached to save them, but I have not find any examples.
How do you handle this kind of problem?
Possibly you should look into Compass project. Compass allowed to store Lucene index in database and distributed in memory data grids like GigaSpaces, Coherence and Terracotta. Unfortunately this project is outdated and last version was released at 2009. But you can try adapt it for your propose.
Another option, to look at HdfsDirectory that support a storing a index in HDFS file systems. I see only 5 classes in package org.apache.solr.store.hdfs , so it will be relatively easy to adapt them to storing index into in-memory caches like memcached or redis.
Aslo I find a project on github for RedisDirectory, but it initial stage and last commit was at 2012. I can recommend it only for reference.
Hope this help you find a right solution.

Updating Redis Click-To-Deploy Configuration on Compute Engine

I've deployed a single micro-instance redis on compute engine using the (very convenient) click-to-deploy feature.
I would now like to update this configuration to have a couple of instances, so that I can benchmark how this increases performance.
Is it possible to modify the config while it's running?
The other option would be to add a whole new redis deployment, bleed traffic onto that over time and eventually shut down the old one. Not only does this sound like a pain in the butt, but, I also can't see any way in the web UI to click-to-deploy multiple clusters.
I've got my learners license with all this, so would also appreciate any general 'good-to-knows'.
I'm on the Google Cloud team working on this feature and wanted to chime in. Sorry no one replied to this for so long.
We are working on some of the features you describe that would surely make the service more useful and powerful. Stay tuned on that.
I admit that there really is not a good solution for modifying an existing deployment to date, unless you launch a new cluster and migrate your data over / redirect reads and writes to the new cluster. This is a limitation we are working to fix.
As a workaround for creating two deployments using Click to Deploy with Redis, you could create a separate project.
Also, if you wanted to migrate to your own template using the Deployment Manager API https://cloud.google.com/deployment-manager/overview, keep in mind Deployment Manager does not have this limitation, and you can create multiple deployments from the same template in the same project.
Chris

Spark - Automated Deployment & Performance Testing

We are developing an application which uses Spark & Hive to do static and ad-hoc reporting. For these static reports, they take a number of parameters and then run over a data set. We would like to make it easier to test performance of these reports on a cluster.
If we have a test cluster running with a sufficient sample data set which developers can share. To speed up development time, what is the best way to deploy a Spark application to a Spark cluster (in standalone) via an IDE?
I'm thinking we would create an SBT task which would run the spark submit script. Is there a better way?
Eventually this will feed into some automated performance testing which we plan to run as a twice daily Jenkins job. If its an SBT deploy task, it makes it easy to call in Jenkins. Is there a better way to do this?
I've found a project on GitHub, maybe you can get some inspiration.
Maybe just add a for loop for submitting jobs and increase the loop times to find the performance limit, not sure if I'm right or not.