Recommendations for multiple migration runs? - azure-devops-migration-tools

Recommendations for multiple migration runs? - azure-devops-migration-tools

Could anyone provide any best practices about multiple migration runs? Moving from TFS 2017.3.1 to Azure DevOps Service. Dealing with a fair number of work items (32k). Of course, TSTU throttling is making the run take a long time, so I was thinking of pushing what I could up front, then a second pass to pick up the new work items since the first big push. So...enabling UpdateSourceReflectedId would set the ReflectedWorkItemId on the source items that have already been migrated. But what happens if someone changes a work item that has already been pushed? Would the history delta get picked up? How is that typically resolved...I was thinking maybe a Querybit like: ReflectedWorkItemId <> '' and ChangedDate > (last run time), but is that necessary? Those already exist on target...would ReplayRevisions pick up only the missing changes? TIA...

I usually do the following for large runs:
Open work items edited in last 90 days
Closed work items edited in last 90 days
open out to more days in chunks
The important thing to note is that links are created only when both ends of the link exist.
After a long run you can then rerun "edited in last month" to bring any changes a cross.
Changes to avoid in the Source:
changing work item type
moving work item between team project
We handle these, but loosly.

Related

How to force a cache refresh in MS Access

I am working on migrating a MS Access Database over to a newer SQL platform.
But, with all of the users who are currently using it, we're migrating slowly/carefully.
The first step is that we are re-writing the VBA code into C#, which is then deployed in a .dll along with the database.
Now, the VBA code calls into the C# to do the business logic, then the VBA continues to do the displays/UI, while Access still hosts the database.
The problem comes in that I have a report that is being run after the business logic from the C# in one place, and apparently MS Access has a cache, which clears every 5 seconds. So, the transaction that occurs in the C# code writes to the database, but the VBA code is still using the cache. This is causing errors, as the records added to the database (which the VBA report is trying to report on) don't exist in the cache yet...
I'm guessing that the C# .dll must be getting treated as a "second connection" to the MS Access database, which is what seems to typically cause this error in my searches (thinks that one process is writing, and the other is reading).
Since the cache is cleared out every 5 seconds, we can just put the process to sleep, and wake it up after 5 seconds, and then run the report, but that's pretty terrible for an end user.
And, making things difficult, the cache seems like it only gets used in the deployed version (so, when running from source / in debug mode, the error never happens).
Doing some searches, there seems to be plenty of people who have said "just refresh the cache." But, the question is: within VBA, how do you refresh the cache?
Any advice would be welcome.
Thanks

I've been fighting the same issue for years as I write a lot of tools around an old Powerbuilder application that has an Access MDB back end.
The cache does exist and it is VERY real. When data is inserted on a different connection than it is queried on, the cache can be directly observed and measured. It was also documented by Microsoft before they blackholed a bunch of their old articles...
Microsoft Jet has a read-cache that is updated every PageTimeout milliseconds (default is 5000ms = 5 seconds). It also has a lazy-write mechanism that operates on a separate thread to main processing and thus writes changes to disk asynchronously. These two mechanisms help boost performance, but in certain situations that require high concurrency, they may create problems.
I've found a couple workarounds that are not the best, but somewhat make due until I find something better or can re-write the app with a better back end database.
The seemingly best answer I've found (that may actually work for you since you say you need VBA) is to use JRO.RefreshCache. I've been trying to figure out how to implement this using C# or VB.net without any luck. Below is a link to a code example where you execute the RefreshCache method on your 2nd connection that needs to pull the data. I have not tested this myself.
https://documentation.help/MSJRO/jrmthrefreshcachex.htm
A workaround I've found that will deliver the query results within 500ms to 1000ms of insert time (instead of anywhere between 500 and 5000 ms - or more):
Use System.Data.ODBC instead of OleDB, with connection string: Driver={Microsoft Access Driver (*.mdb, *.accdb)};Dbq=;
If someone knows how to use the JRO.RefreshCache method with OLEDB and C# or VB.net, I'd be forever grateful. I believe the issue is it's looking for an ADO connection to be passed in, not an OLEDB connection.

I not aware of ANY suggesting that some 5 second cache exits? Where did this idea come from????
Furthermore, if you have 5 users, then you not going to be able to update their cache, are you?
In other words, the issue of some cache for one user still not going to solve or work with mutli-users anyway, is it?
The simple matter is if you load up a form with 100 reocrds, and then other users are ALSO working on that 100 rows, then all users will not see other changes until such time you tell access to re-load the form.
You can do this with a me.Refresh in the form, and then it will show changes made by other users (or even your c# code!!!).
However, that not really the soluion here.
How does near EVERY system deal with this issue?
Answer:
You don't, you "design" the software to take the user work flow into account.
So, in place of loading up a form with 100 rows of data? (which you should not, unless SUPER DUPER reason exists for doing that).
The you provide a UI in which the user FIRST searches for whatever it is they want to work on.
In other words, say you just booked a user on a tour. Now, they call the office back, and want to change some details of that tour. But, a different tour staff might pick up the phone. So, now a 2nd user opens the tour?
So, you solve that issue by NOT loading all the tours into that form in the first place.
you provide a search screen, so they can search for the user, find the user, maybe type in a invoice number or whatever.
You display the results in a pick list, and then launch the form to the ONE record (and perhaps detail records from child tables).
So there no concpet of a cache in Access anymore then there is in c#.
However, if you load up a datatable in c#, and then display that data?
Well, what about the other users on that system. They will not see changes to that data ANY MORE then the current access form.
So, if you want to update some data in c#? Then fine, but you need/want to do two things:
First, before you call any c# code that may update the current form reocrd? You need to FORCE a data save of that current record BEFORE you call any code, be it VBA code, or c# code that going to update that current reocrd the user is working on.
You can in Access save the current reocrd in MANY different ways, but the typical approach is:
' single record save - current record
if me.dirty then me.dirty = false
' VBA or c# code goes here.
' optional refresh the current form to reflect changes
me.Refresh
So, in most cases, it is the "design" of your software that will solve this issue.
For example, in the tour example, or in fact ANY system, the user can't work, can't update, and can't do their job UNLESS they first find/search and have a means to bring up that form + record data in the first place.
So, ANY typical good design will:
Ask the user for that name, invoce number or whatever.
Display the results of the search, and THEN allow the user to pick the record/data to work on. When they are done, they close that form and are RIGHT BACK to the search form to do battle with the next customer or task or phone call or whatever.
So, a search form might look like this:
In above, I typed in smi, and then displayed a pick list.
The user can further type in say part of the first name, and thus now get this:
So, maybe they type in a invoice number, customer number, booking number or whatever.
So, you display the results, and then they can select the row or "thing" to work on.
thus, we click on the row (or above glasses button), and then jump to the ONE record.
so, the user does whatever they have to do with the customer. Now, when done, they close the ONE thing, the ONE main reocrd.
This not only saves the data (so others in the office can now use that booking data), but it also means the data is saved. and they are NOW right back at the search screen, ready to do battle with the next customer.
So, not only does this mean we have a VERY bandwith friednly design (we only pull the one main reocrd into that form), but it also is better for work flow.
The Access form's cache thus becomes a non issue, since we only dealing with the one record.
And as I pointed out, if the system is multi-user, then you NOT going to be able to udpate and deal with multiple users cached data anyway, are you?
Think of ANY system you EVER used from a software point of view.
When you use google, does it download the WHOLE internet, and then you use ctrl-f to search megs and megs of data in the browser?
Nope!
you search first, get a list of that search, and THEN pick one!!
And when that list is display, maybe others on the internet are udpateing, and add new data - but if that was cached in your browser, then it would not work!!!
And same goes for a desktop accounting system. You don't load up all accounts, and THEN have the user go ctrl-f to search all the data. You search for the customer, invoice number and PICK ONE to work on.
And it does not make sense to load up a form with 1000 customers, and then go ctrl-f to find that customer. Same goes for a instant banking machine. It does not download ALL customers and THEN let you search. It asks you FIRST to get what you need. So, be it browser based, desktop based, or JUST ABOUT ANY software you use?
You quite much elminate the cache issue, since not pre-loading boatloads of data, but asking and letting the user search for the data they need.
So, in regards to the Access form data and cache?
If you are on a form, and call VBA code, or c# code or whatever?
If that code update the current form, you have NO MORE OR LESS of a issue when calling VBA code, or c# code!!!! If that code updates the current form, and the reocrd is dirty (has pending edits), then you get that message about the current form's reocrd having been udpated by another user!!!
So, your cache issue does NOT IN ANY WAY exist MORE or LESS as a issue in typical Access software.
As a genreal rule, if you are on a form with pending edits, and say want to pop up some form to edit releated data?
You have to ensure that pending edits are SAVED before you launch an form that can edit the same data, or run code that can/may edit that data.
As a result, ZERO cache issues should exist, and they no more or no less exist when calling sql or VBA update code in a form then calling some c# code from that form.
So, write the pending update for that form.
Then run your VBA, SQL, or c# code.
And then do a me.Refresh to display any changes made by those external routines.
there is no documetjion, or ANY article I can find that suggests some kind of 5 seocnd cache or update - it is a urban myth, and your software challenge here in regards to use c# or VBA, or even SQL server stored procedures?
They are all the same issue, and I dare say that often access is used as a front end to SQL server, and ALL OF the SAME issues exist when using SQL server with ms-access.

Unable to migrate Work Item with greater than 25000 revisions

I am using the Azure DevOps Migration Tools and it hits an exception when I try to migrate a work items with 35977 revisions.
Here is my configuration:
Here is the SOAP error I am getting:

Wow, that's a lot of revisions. I'm not sure I have ever seen a work item with that many revisions. I assume that you have some sort of tool that auto updates the work items, which is what is causing this.
Since it take about 200ms per revision to save Work Items I would
expect it to 5000 seconds (thats 82h) to migrate just 1 work item with
that many!
Since the Azure DevOps Migration tools use the old SOAP API (Object Model) this one is out of our hands. There may be some way to page the revisions but I am unaware. If you do find a way to only load partial works items i'd be intersted... although thinking about it I think there is a wi.LoadPartial() method... never used that.
To move forward you could add AND [System.Rev] < 25000 to your query to not load those work items that have that many revisions.
This would allow you to continue with the CLosed items that are supported.
We have added a way to only migrate some of the revisions when there are more, but we never envisaged in our wildest defensive coding strategies that it would be above 1000 revisions. I can imagine memory and other issues cropping up.
Added to: https://github.com/nkdAgility/azure-devops-migration-tools/issues/1096

Work Item Query Policy to check workitems match on merge

With our TFS 2015 source control we require developers to check-in changes against work items.
However, we've had a couple of instances where a developer has checked in against one work item within our development branch, but then when merging to our QA branch they've checked in the merged changes to a different work item. An example of this is where a bug has been created underneath a PBI, the changes in dev have been checked in against a task under the bug, but then merged to QA against the PBI itself. This causes us issues with traceability.
I've seen that it's possible to add a check-in policy of "Work Item Query Policy". I'm just wondering if there is a way to write a query that will determine if the work item of a check-in after a merge matches the work item of the source changesets? I'm not necessarily after the exact query (though it would be lovely if someone could provide one :) ), really I'm just wondering whether it's possible or not to have a query to do this - i.e. is the information available to queries in TFS?

You can't do this with the existing policies, you'd need to build a custom policy.
So, technically this is possible. You can access the VersionControlServer object through the PendingChanges object:
this.PendingCheckin.PendingChanges.Workspace.VersionControlServer
You can use that to query the history of the branch in question and grab the work items associated to the check-ins in that branch.
You can check the associated workitems to the current workitem:
this.PendingCheckin.WorkItems
You could probably even provide the option to auto-correct by adding the correct work items to the checkin upon validation.
One of my policies provides an example on using the VersionControlServer from a policy.

Some SSAS attribute hierarchies take a long time to resolve

Background
I have developed a SSAS cube that works well for most of my organization's purposes. The primary method of users interacting with this cube is via Excel Pivot Tables.
The Issue
Some of the Pivot Tables created by users have attributes for which their attribute hierarchies take a long time to resolve when the user first clicks on the drop-down box over the field name in the Pivot Table. For instance, the first time a user clicks the drop-down for a field called "Location - County", it takes ~45 seconds for the pop-up box with the list of ~40 counties to show up.
Side Note 1: If I had to guess, it actually seems like SSAS is resolving all field hierarchies in the PT at the same time as the first field that was clicked on because right after this initial resolve, the user can click on any of the fields in the PT and they resolve instantly. Said another way, the first field clicked on always takes ~45 seconds to resolve.
Side Note 2: The next time the user clicks on any of the field drop downs, it resolves almost instantly which I am assuming is because of caching.
The question
Why does it take SSAS so long to resolve some attribute hierarchy lists? It seems to me like this should always be instantaneous?! Doesn't SSAS build all attribute hierarchy lists ahead of time (i.e. during cube processing)?
Many thanks for any light you can shed on this issue for me.
Regards, Jon
1/20/15 Update: Adding Trace Files per Request: Zipped Trace Files. I included all EventClasses just to be sure, but if you need me to run again with only the EventClasses requested below I could.
"Trace of DAR Cubes Project - Test (from service restart).trc" - I restarted AS and immediately refreshed my PT, and recorded the traced events in this file.
"Trace of DAR Cubes Project - Test (after one refresh).trc" - After refreshing the Excel PT as described above, I closed Excel, reopened the same PT, and refreshed again. I expected a much faster refresh, but was surprised with almost the same ~35 second wait. If I keep Excel open between refreshes, it only takes ~2 seconds. This makes me wonder if Excel is caching the results somehow? Which would be weird b/c I thought all the logic and caching took place on the server side.

Is it possible to re-generate Lucene index in background?

Sometimes there is need to re-generate a lucene index, e.g. when something changes in the Compass mapping or in the way boosts are applied, or if something went corrupt for whatever reason.
In my case, generation of the index takes about 5 to 6 hours, clearing the index before leads to data not being complete for this interval. I. e. doing a search in this time returns an incomplete result.
Is there any standard way to have lucene generate the index in the background? E.g. write index to a temporary directory and (when indexing is finished without exceptions etc) replace the existing index with the new one?
Of course, one could implement this "manually", but does one have to? Sounds like a common use case to me.
Best regards + Thanks for your opinion,
Peter :)

I had a similar experience; there were certain parameters to the Analyzer which would get changed from time to time; obviously if that was the case, the entire index needs to get rebuilt. (I won't go into the details, suffice to say I had the same requirement!)
I did what you suggested in your question. There were three directories, "old", "current" and "new". Queries from the live site went against "current" always. The index recreation process was:
Recursive delete on the "old" and "new" directories
Create the new index into the "new" directory (in my case takes about 6 hrs)
Rename "current" to "old"; and "new" to "current"
Recursive delete the "old" directory
An analysis of what happens when the process crashes - if it crashes in the 1st step, the next time it will just carry on. If it crashes in the 2nd step then the "new" directory will get deleted next run. The 3rd step is very fast - renaming a directory is fast and atomic. Crashing in the 4th step doesn't matter, it'll just get cleaned up next run.
The careful observer will note that in step 3, the system could crash between renaming the current directory away and moving the new directory in. This is unlikely to happen as directory rename is so fast. The system has been in production for a few years and this has never happened (yet?).

I think the usual way to do this is to use solr's replication functionality. In your case though, the master and slave would be on the same machine, but just pointed at different directories.

We have a similar problem. Our data is indexed in Lucene, but the original source is DB and content repo.
So if an index goes out of sync (or data type changes, etc.), we simply iterate over all existing entries in the index and re-generate the data so each document gets updated. It is not really a complex thing to do.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas