TFS 2015.4 Unable to start or detach a collection - tfs-2015

I'm having an issue with a particular team project collection on my TFS2015.4. This collection caused issues when I wanted to upgrade TFS sometime ago as well. I was able to detach it in TFS2013.3 and then upgraded. Now I want to upgrade to TFS2017 and I don't know how to resolve the issue with this collection.
TF400868: Job definition not found for JobId d891ac97-ddf1-42df-8242-3cd4bd607790
Here is the current status:
Won't detach
Won't start
Status -> ApplyPatch -> won't execute
Stays Offline
One project inside and stays at 'Deleting' state
If I try to start, I get this error:
TF400783: The host 'MyDAS' cannot be started. The host is in the process of being serviced. The servicing may have failed and needs to be restarted and completed before the host can be started.
I did a pre-production upgrade to TFS2017 and there was a validation error with this collection's state that prevented me from finishing the upgrade.
The detailed log for ApplyPatch Rerun has just one failure point:
[12:29:58.457] +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[12:29:58.457] Executing step: Populate commit changes
[12:29:58.457] Executing step: 'Populate commit changes' Git.M83PopulateCommitChanges (1017 of 1201)
[12:29:58.477] [Error] TF400868: Job definition not found for JobId d891ac97-ddf1-42df-8242-3cd4bd607790.
[12:29:58.480] Microsoft.TeamFoundation.Framework.Server.JobDefinitionNotFoundException: TF400868: Job definition not found for JobId d891ac97-ddf1-42df-8242-3cd4bd607790.
[12:29:58.480] at Microsoft.TeamFoundation.Framework.Server.TeamFoundationJobService.ResolveJobPriorityClasses(IVssRequestContext requestContext, IEnumerable`1 jobReferences, ITFLogger logger)
[12:29:58.480] at Microsoft.TeamFoundation.Framework.Server.TeamFoundationJobService.QueueJobsRaw(IVssRequestContext requestContext, IEnumerable`1 jobReferences, JobPriorityLevel priorityLevel, Int32 maxDelaySeconds, ITFLogger logger, Boolean queueAsDormant)
[12:29:58.480] at Microsoft.TeamFoundation.Server.Deploy.TFCollection.GitStepPerformer.M83PopulateCommitChanges(IVssRequestContext requestContext, ServicingContext servicingContext)
[12:29:58.480] at Microsoft.TeamFoundation.Framework.Server.TeamFoundationStepPerformerBase.PerformHostStep(String servicingOperation, ServicingOperationTarget target, IServicingStep servicingStep, String stepData, ServicingContext servicingContext)
[12:29:58.480] at Microsoft.TeamFoundation.Framework.Server.TeamFoundationStepPerformerBase.PerformStep(String servicingOperation, ServicingOperationTarget target, String stepType, String stepData, ServicingContext servicingContext)
[12:29:58.480] at Microsoft.TeamFoundation.Framework.Server.ServicingStepDriver.PerformServicingStep(ServicingStep step, ServicingContext servicingContext, ServicingStepGroup group, ServicingOperation servicingOperation, Int32 stepNumber, Int32 totalSteps)
[12:29:58.480] Step failed: Populate commit changes. Execution time: 23 milliseconds.
[12:29:58.480] [StepDuration] 0.0236576
[12:29:58.480] [GroupDuration] 0.2517195
[12:29:58.480] [OperationDuration] 0.2517302
[12:29:58.587] Clearing dictionary, removing all items.
======================================================================================================
Step execution times in descending order
======================================================================================================
Updates all rows in tbl_GitCommit and sets the Status to ... (GitToDev14M83Collection, ToDev14M83Collection) - 227 milliseconds
Populate commit changes (GitToDev14M83Collection, ToDev14M83Collection) - 23 milliseconds
Write service level to stamp (StartInstallUpdates, StartInstallUpdates) - 20 milliseconds
Configure framework servicing tokens (VsspToDev14M71Collection, VsspToDev14M71Collection) - 20 milliseconds
Setup integration environment (TestManagementToDev12M65FinalConfiguration, ToDev12M65FinalConfiguration) - 3 milliseconds
Setup Git environment (GitToDev14M74Collection, ToDev14M74Collection) - 1 millisecond
Setup Git environment (GitToDev14M83Collection, ToDev14M83Collection) - 1 millisecond
Set the collection partition id tokens in servicing context (GitToDev14M83Collection, ToDev14M83Collection) - 1 millisecond
======================================================================================================
Execution times by group in descending order
======================================================================================================
GitToDev14M83Collection (ToDev14M83Collection) - 250 milliseconds
StartInstallUpdates (StartInstallUpdates) - 20 milliseconds
VsspToDev14M71Collection (VsspToDev14M71Collection) - 20 milliseconds
TestManagementToDev12M65FinalConfiguration (ToDev12M65FinalConfiguration) - 3 milliseconds
GitToDev14M74Collection (ToDev14M74Collection) - 1 millisecond
I was pondering the fact that this is a rouge job that needs to be deleted from the databse manually but I mmight be wrong. Any pointer will be greatfully +1-ed.

The main reason for TFS being unable to upgrade team project collections usually is that their data was already either corrupted, incomplete, or stuck between schema versions.
You will need to contact customer support for assistance troubleshooting these data issues and getting your data back to a workable state.
Since there is only one project inside. If you don't care the source control history and have the back up of project. A crude (not recommend) way should be deleting the special project collection, delete the database from SQL Server and create a new collection, restore the project inside.
Also take a look at this similar question with the same error as you: Starting a team project collection after detaching and attaching again

Ultimately, there was no way to recover the collection. So I used the admin command to permanently remove the collection. In my case, n one came back to me asking where the collection go so it's solved for me. I'm running the latest TFS 2017.2 and the rest of the dev are happy.

Related

How to make the SSIS package status to failure when propagate was set to false for a Sequence container

I have an SSIS package with for each loop > sequence container. The sequence container is trying to read file from For each loop and process its data. The requirement was to not fail the entire package when any exception happened in processing a file but to continue processing the next file until all the files were processed from the for each loop. For this, I have set the Propagate variable for the sequence container to False. I have also added email step on On Error event of Sequence container. The package is running as expected and able to process all files even when any exception happened with any file. But I would like the status of my SSIS package to be failed finally since one of the files got failed. How can I achieve that ?
Did you try this options?
(SSIS version in russian on the left side but it's sequence container)
View -> Properties window -> Then click on your sequence container and it will show you ther properties of sequence container.
If i were you first of all i would try property "FailPackageOnFailture" - it should cover your question if i get it right.
P.S. Also you can see the whole properties of your project when you click on a free place in your project
UPDATED (after comments and more clear understanding task):
The idea is - set this param Maximum ErrorCount for SQ as max as you want - in this case it wont stop the package because 1 of the files was failed in SQ and next file will process, but it should stop package after SQ will finish his work because you don't change MaximumErrorCount for package.
Important - a value of zero sets the error count threshold to infinity and package or task never get's Failure

Local Search phase needs to start from an initialized solution - But how?

I am writing a modified version of the Task Assignment example with my own domain model.
In my model each Task can have a NextTask and a PreviousTask and an Assignee. All 3 are configured as PlanningVariables:
...
/** PreviousTask is a calculated task that the Resource will complete before this one. */
#PlanningVariable(valueRangeProviderRefs = { "tasksRange" }, graphType = PlanningVariableGraphType.CHAINED)
public Task PreviousTask;
#InverseRelationShadowVariable(sourceVariableName = "PreviousTask")
public Task NextTask;
#AnchorShadowVariable(sourceVariableName = "PreviousTask")
public Resource Assignee;
I have been stuck on the Local Search Phase step for some time now as it appears to require an initialized state of my planning variables (Task.PreviousTask in this case).
Error log:
2020-07-16 15:00:15.341 INFO 4616 --- [pool-1-thread-1] o.o.core.impl.solver.DefaultSolver : Solving started: time spent (65), best score (-27init/[0]hard/[0/0/0/0]soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
2020-07-16 15:00:15.356 INFO 4616 --- [pool-1-thread-1] .c.i.c.DefaultConstructionHeuristicPhase : Construction Heuristic phase (0) ended: time spent (81), best score (-27init/[0]hard/[0/0/0/0]soft), score calculation speed (90/sec), step total (0).
2020-07-16 15:00:15.376 ERROR 4616 --- [pool-1-thread-1] o.o.c.impl.solver.DefaultSolverManager : Solving failed for problemId (5e433f57-8c75-4756-8a9c-4c4ca4a83d6d).
java.lang.IllegalStateException: Local Search phase (1) needs to start from an initialized solution, but the planning variable (Task.PreviousTask) is uninitialized for the entity (com.redhat.optaplannersbs.domain.Task#697ea710).
Maybe there is no Construction Heuristic configured before this phase to initialize the solution.
I've been pouring over the documentation and trying to figure out what I've missed or broken between it and the source example but I cannot work it out. I have no Construction Heuristic configured (Same as the example), and I could not see where in the example does it ever set the previousTaskOrEmployee variable before solving.
I could supply a random PreviousTask in the initial solution model, but surely at least 1 of the tasks would have no previous task?
May I ask if you have <localSearch/> in your solverConfig.xml? If so, the <constructionHeuristic/> has to be there as well.
If none of these phases (CH nor LS) is configured, both Construction Heuristic and Local Search is added with default parameters. But once the appears in the solverConfig.xml, it's considered an override of the defaults, and a user is supposed to take care of initializing the solution (most often by providing the Construction Heuristic configuration).

ServerXmlHttpRequest hanging sometimes when doing a POST

I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.

I am getting 'Local Search phase started with an uninitialized Solution' when I run on a larger dataset

I am developing a solver using Optaplanner 6.1.0, similar to the Vehicle Routing Problem. When I run my solver on 700 installers and 200 bookings, it will successfully solve the planning problem. But, when I used against a larger dataset (700 installers and 1220 bookings), I get
Caused by: java.lang.IllegalStateException: Local Search phase started with an uninitialized Solution. First initialize the Solution. For example, run a Construction Heuristic phase first.
but right before the exception,
16:10:40,378 INFO [DefaultConstructionHeuristicPhase] [http-listener-1(4)] Construction Heuristic phase (0) ended: step total (194), time spent (30693), best score (-1hard/-688803soft).
I am using <constructionHeuristicType>FIRST_FIT_DECREASING</constructionHeuristicType>
in my config.
Am I using it wrong?
Maybe the value range for a planning variable is empty. Especially with value range provider from entity, this is more likely. Feel free to file a jira that the error message should improve in such a case.
Diagnostic todo: Comment out the local solver phase, run the solver (so it only does the construction heuristic) and then iterate through the planning entities and print out the value for each planning value. Check if there are any nulls in there.
The fact that you have 194 steps, instead 200 steps in your CH indicates this. (If those other 6 planning entities are immovable, this won't trigger this exception (more info), so that's not the problem.)

SSIS package fails and then runs successfully 15 minutes later

I have an SSIS package that is scheduled to run every weekday morning at 8:15. It copies data to and from Active Directory and SQL. About two weeks ago, it started failing, with no changes having been made to the server (beyond MS updates).
The funny thing is that if I then immediately run the package again, it succeeds. Here is the error text from when it fails:
Date 7/14/2011 8:15:00 AM
Log Job History (Reference: Active Directory)
Step ID 1
Server MMCI-GD1SQL2
Job Name Reference: Active Directory
Step Name Run Package
Duration 00:00:32
Sql Severity 0
Sql Message ID 0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
Executed as user: MMCI\service-sql. Microsoft (R) SQL Server Execute Package Utility Version 10.0.1600.22 for 32-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved.
Started: 8:15:00 AM Error: 2011-07-14 08:15:31.88
Code: 0xC0047062
Source: Synchronize Permissions Active Directory Permissions [133]
Description: System.DirectoryServices.AccountManagement.PrincipalOperationException: There is no such object on the server. ---> System.DirectoryServices.DirectoryServicesCOMException (0x80072030): There is no such object on the server.
at System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail)
at System.DirectoryServices.DirectoryEntry.Bind()
at System.DirectoryServices.DirectoryEntry.RefreshCache()
at System.DirectoryServices.AccountManagement.ADStoreCtx.LoadDirectoryEntryAttributes(DirectoryEntry de)
--- End of inner exception stack trace ---
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket) End Error Error: 2011-07-14 08:15:31.90
Code: 0xC0047038
Source: Synchronize Permissions SSIS.Pipeline
Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Active Directory Permissions" (133) returned error code 0x80131501. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 8:15:00 AM Finished: 8:15:31 AM Elapsed: 31.343 seconds. The package execution failed. The step failed.
Any thoughts?
Has some new Group Policy been applied that changed the permissions for the account your automated run uses, but which doesn't apply to your user id? I'm assuming when you say "I then ... run the package", you mean your logged-in user id.
Based on the error message that you had provided, the issue seems to be that the task within your package is trying to query an object in Active Directory that might no longer exist.
System.DirectoryServices.AccountManagement.PrincipalOperationException:
There is no such object on the server. --->
System.DirectoryServices.DirectoryServicesCOMException (0x80072030):
There is no such object on the server.
I could be wrong on the below part. I am just speculating what your package might be doing based on the description provided.
Since your package synchronizes data between SQL Server and Active Directory, I assume that the task named Synchronize Permissions Active Directory Permissions selects some form of data stored in SQL Server and updates the content in Active Directory or vice versa. If my assumption is correct, this task is probably Script Task or Script Component. I believe that the code inside this component is failing to select an object (group/user) in Active Directory.
I would check whether a group/user was deleted in Active Directory on the days prior to when the package failed to run.
Hope this helps.