TFS2010 database size - msbuild

We've been using TFS since around 2009 when we installed TFS2008. We upgraded to TFS2010 at some point and we've been using it for source control, work item management, builds etc.
Our TFSVersionControl.mdf file is 287,120,000 KB (273GB). We ran some queries and found that our tbl_BuildInformationField table is massive. It has 1,358,430,452 rows which takes up 150,988,624 KB (143GB). We have multiple active products over multiple active builds which more than one solution per build and the solutions aren't free of warning messages.
My questions:
Is it possible to stop MSBuild from spamming the
tbl_BuildInformationField table so much? I.e. only write errors and
general build information and not all the warnings for every
project?
Is there a way to purge or clean up old data from this
table?
Is 273GB for 4 years of TFS use an average size?
Is 143GB for tbl_BuildInformationField a "normal" size?

The table holds the values and output of build process. Take note that build retention policy doesnt actualy delete the build object like everything else in TFS the object is marked deleted and only public visibility and drop location is cleared.
I would suggest if you have retainened same build definitions for very long time (when build definition is deleted the related objects get removed as well) you should query for build info including deleted ones using TFS api, the same api will also alow you to remove them for good. Deleting build definitions probably will not work and will fail with timeout error.
You can consult the following:
http://blogs.msdn.com/b/adamroot/archive/2009/06/12/working-with-deleted-build-data-in-team-foundation-server-2010-beta-1.aspx

Related

How to recover deleted build definition

We are using the "new, scriptable build system" of TFS 2015 and all of a sudden one of our build definitions is gone. It seems that it deleted itself.
Is there a way to undelete/recover a build definition?
Where are the build definitions (those JSON files) acutally stored?
If you are on prem and you just chose delete from the GUI... run this in the TFS database for your specific id:
update Build.tbl_Definition
set Deleted = 0 --was 1
where DefinitionId = <your build ID goes here>
You will loose your build history without some extra work i did not take the time to dig into, but that might also be do-able.
NOTE: You will also loose credentials stored as build parameters you will need to recreate. Might even be best for you to recreate a new build using this as a template to avoid other unknowns.
TFS build definitions are not version controlled items. So, It's not possible to restore the deleted build definitions for now.
There has been a feature request in uservoice: provide a way to version-control build definitions According to the response from PM, this is still in process.
In the new build system coming with TFS 2015 you can see the full
history of the changes to your build definition. The feature that is
currently missing is the ability to undo or rollback to a previous
revision.
We expect to get the rollback deployed to our service in the next few
months.
Chris Patterson
Program Manager
Moreover, the build definitions are stored in the TFS database.

TFS 2015 The item is locked in workspace (null);(null)

We recently upgraded from TFS 2010 to TFS 2015. Everything appears to be fine post-upgrade, but we are getting the error "The item is locked in workspace (null);(null)." on some source control files. It looks like we have some orphaned locks that need to be tracked down and cleaned up, but the tbl_lock database table is not on the database, so the following select query won't work:
select * FROM tbl_Lock l
LEFT JOIN tbl_PendingChange pc
ON l.PendingChangeId = pc.PendingChangeId
WHERE pc.PendingChangeId IS NULL
Does anyone know how to detect and remove these locks in TFS 2015?
I also installed the TFS power tools, and neither Visual Studio 2015 nor the power tools are picking up the locks.
Updated:
BTW, when I run the SELECT query to find out where PendingChangeId is NULL, I get back no rows. I think the trick is the LEFT JOIN. PendingChangeId would be NULL when tbl_Lock also had no record for the PendingChangeId on tbl_PendingChange (and thus the lock was orphaned). So I'd still need to know where the PendingChangeId should normally be joined to in TFS 2015, to identify which files have a lock that is bad. (Or where a workspace no longer exists, which may be another possible source for the issue.)
And I also still need to know how to clean up those bad locks. I'd prefer to do this using the tools, either via the GUI or the command line, but could also do this programmatically either using the API or the TFS Object Model files for TFS 2015.
I really would rather only touch the database directly as a last ditch resort. And I would also rather use tf vc destroy on the item as a last ditch resort as well, since that would wipe out all history on the files.
Update 2
Aha! I think I found a way to identify the files, and it looks like my thinking for what happened may be correct. Unfortunately, I had to probe the database using a READ UNCOMMITTED query to find the information. I couldn't get at this information programmatically or using the tools. (They all showed or acted like the file is not checked out.) The query that I used on TFS 2015 was:
select pc.* from tbl_PendingChange pc
left join tbl_Workspace ws on pc.WorkspaceId = ws.WorkspaceId
where ws.WorkspaceId is null
This returned the three files that have the (null);(null) lock on our database, because the WorkspaceId listed on tbl_PendingChange does not exist anymore on tbl_Workspace.
How did this happen? Our CI server uses temporary TFS workspaces. I think what happened after the upgrade is that our CI server went to check out the file and apply an update to it. (For example, to increment version numbers as part of the build process.) It checked out the file, but failed to apply the update. (Our tools like working with Server workspaces, but it may have ended up with a Local workspace and thus the file was still checked in Local, but checked out on the Server. Thus the change to the file couldn't be applied.) The code that we are using performs a workspace.Delete operation when the process completes, so the workspace was deleted - even though the workspace still had the file checked out! So this created an orphan record on tbl_PendingChange that isn't linked to any Workspace, and thus the file is still locked with pending changes. But the GUI and tools aren't seeing it as such, because they're not realizing the pending change's workspace is non-existent.
So this brings me back around to how do I fix this? If someone knows of a way to get at these orphaned pending changes, I'd appreciate it. I tried using:
TfsTeamProjectCollection tfsTeamProjectCollection = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(szProjectUri));
VersionControlServer versionControlServer = tfsTeamProjectCollection.GetService<VersionControlServer>();
string[] items = new[] { ... server item path ... };
PendingSet[] queryPendingSets = versionControlServer.QueryPendingSets(items, RecursionType.None, null, null);
PendingSet[] getPendingSets = versionControlServer.GetPendingSets(items, RecursionType.None);
but these aren't finding the orphans.
Update 3
I finally installed Team Foundation Sidekicks 2015 and gave it a try - status tool specifically, but then other tools. It's finding pending changes, but not the orphaned ones.
You can use Team Foundation Sidekicks to search and undo lock by following steps:
Install the tool and launch it.
Select TFS server to connect.
Select "Tools\Status Sidekick".
Set the "Search criteria" for the information you want.
Click "Search" button.
Select the locked file and click "Unlock lock" button.
You can using below command to undo the pending changes:
tf undo "file_path" /workspace:workspace_name
Or you can just use below command to delete the old workspace
tf workspace /delete /server:your_tfs_server workspace;username
From Visual Studio 2015 GUIļ¼š
File -> Source Control -> Advanced -> Workspaces...
In the dialog that came up, check "Show remote workspaces" and the locked workspace came up in the window. Then selected it and click "Remove".
Details about it, please check this blog and more ways to resolve this you can refer the similar question: What do you do if the file in TFS is locked by someone else?
Update:
According to the sql query. It's looking for .PendingChangeId IS NULL . You can use the similarly tbl_PendingChange under collection database. However, it's not a commendatory method. Since operate directly in the TFS database is not recommended.
The following command has cleared up the pending changesets that were orphaned:
tf vc destroy <itemspec> /startcleanup
After running this command, the file was able to be added back to TFS, and the file could be checked in and out and edited as normal. Running the query:
select pc.* from tbl_PendingChange pc
left join tbl_Workspace ws on pc.WorkspaceId = ws.WorkspaceId
where ws.WorkspaceId is null
also showed that the pending changeset record related to this file was gone as well.
Microsoft's documentation on this command can be found at https://msdn.microsoft.com/en-us/library/bb386005.aspx. Before using this command, you should review the documentation carefully and be sure to understand the consequences of using it.
Because this command permanently removes files and potenally all history from TFS - and does so recursively - you need to take precautions and be absolutely certain that you are targeting the command correctly. So before using this command, I would recommend taking the following additional precautions:
Stop all user and external accesses to TFS and any other software that may be running from the machine.
Make sure to run a full backup of TFS and any other databases located on the machine.
If you can, take a snapshot in time of the server.
That way if something goes horribly wrong, you will have one or more points to fall back on.

SSDT How to ignore dependency projects

We have 3 databases: UntouchDB, Work1DB and Work2DB.
UntouchDB is very large with over 100 tables, views for each their type.
We're planning to use SSDT with Git to manage version of DB. In fact, it's easy to use in that way. We created 2 SSDT projects and import as database of Work1DB and Work2DB. Both of them reference to UntouchDB.
The problem is UntouchDB is forbidden to touch. It means we cannot create its SSDT project. As the results, 2 projects are built fail.
I tried to get DACPAC of UntouchDB and it costs 8.5MB, when we created reference as DACPAC for Work1DB and Work2DB project, they cost too much time to analysis and build. It seems never successfully build, it drops every time.
How to ignore dependency project UntouchDB from the remaining DB?
I really need your help because I searched a lot on Google but always need create reference as DACPAC or as Database. Just on case nearly work is set error file properties "Extended T-SQL Verification" to FALSE. But then it jumps out with Computed Column cannot resolved error.
My question is:
How to ignore dependency project UntouchDB from the remaining DB?
How to fix Computed Column cannot resolved error as I mentioned above?

TFS Builds, Project Files: Orphaned references to files not being pushed are causing endless build errors

We are using TFS 2010 (Visual Studio) for our deployments and have client code projects (.csproj files) and database projects (.dbproj files) We understand that when our Developers add files to our application there is a corresponding reference to these files in the Project file. If I push a changeset from Dev to QA that includes the project file, and the project file contains a reference to a file that's been added that is not in the changeset, I will receive a build error.
Once we started pushing just changesets (as opposed to performing full builds) this quickly became our number one bottleneck in doing TFS builds. I would deploy the database project and there would be 20 errors. The only way I could correct them was to navigate down the entire solution explorer tree and exclude each orphaned reference individually. This has proved far too time consuming and on the advice of our lead programmer we have returned to doing full builds of QA and UAT.
We are in the early stages of this product, and therefore we will be adding many files for some time. We need a better solution for this problem. Neither the manual exclusions nor asking developers to not check in code until it is ready for qa will suffice for us. Has anybody out there had any experience with this problem and if so how did you deal with it? Thanks!
Jon
Pushing changesets to QA selectively is known as cherry picking and causes the sorts of issues that you are experiencing. This is not the recommended practice, instead setup the Qa build so that successful build is part of checkin. This way that if a part of a fix is missed ( as it may be in multiple change sets ) the build will fail and the checkin cannot be performed.
Second have the developers do the second checkin to QA or merge the dev change sets to Qa and have the team lead coordinate changes to project files by watching for changes by turning on "notify changes made by others " or setting a policy for the dev team. Full builds should always be done as partials do not always pick up the complete pick up the dependency graph.

Do you put your database static data into source-control ? How?

I'm using SQL-Server 2008 with Visual Studio Database Edition.
With this setup, keeping your schema in sync is very easy. Basically, there's a 'compare schema' tool that allow me to sync the schema of two databases and/or a database schema with a source-controlled creation script folder.
However, the situation is less clear when it comes to data, which can be of three different kind :
static data referenced in the code. typical example : my users can change their setting, and their configuration is stored on the server. However, there's a system-wide default value for each setting that is used in case the user didn't override it. The table containing those default settings grows as more options are added to the program. This means that when a new feature/option is checked in, the system-wide default setting is usually created in the database as well.
static data. eg. a product list populating a dropdown list. The program doesn't rely on the existence of a specific product in the list to work. This can be for example a list of unicode-encoded products that should be deployed in production when the new "unicode version" of the program is deployed.
other data, ie everything else (logs, user accounts, user data, etc.)
It seems obvious to me that my third item shouldn't be source-controlled (of course, it should be backuped on a regular basis)
But regarding the static data, I'm wondering what to do.
Should I append the insert scripts to the creation scripts? or maybe use separate scripts?
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
Should I differentiate my two kind of data? (the first one being usually created by a dev, while the second one is usually created by a non-dev)
How do you manage your DB static data ?
I have explained the technique I used in my blog Version Control and Your Database. I use database metadata (in this case SQL Server extended properties) to store the deployed application version. I only have scripts that upgrade from version to version. At startup the application reads the deployed version from the database metadata (lack of metadata is interpreted as version 0, ie. nothing is yet deployed). For each version there is an application function that upgrades to the next version. Usually this function runs an internal resource T-SQL script that does the upgrade, but it can be something else, like deploying a CLR assembly in the database.
There is no script to deploy the 'current' database schema. New installments iterate trough all intermediate versions, from version 1 to current version.
There are several advantages I enjoy by this technique:
Is easy for me to test a new version. I have a backup of the previous version, I apply the upgrade script, then I can revert to the previous version, change the script, try again, until I'm happy with the result.
My application can be deployed on top of any previous version. Various clients have various deployed version. When they upgrade, my application supports upgrade from any previous version.
There is no difference between a fresh install and an upgrade, it runs the same code, so I have fewer code paths to maintain and test.
There is no difference between DML and DDL changes (your original question). they all treated the same way, as script run to change from one version to next. When I need to make a change like you describe (change a default), I actually increase the schema version even if no other DDL change occurs. So at version 5.1 the default was 'foo', in 5.2 the default is 'bar' and that is the only difference between the two versions, and the 'upgrade' step is simply an UPDATE statement (followed of course by the version metadata change, ie. sp_updateextendedproperty).
All changes are in source control, part of the application sources (T-SQL scripts mostly).
I can easily get to any previous schema version, eg. to repro a customer complaint, simply by running the upgrade sequence and stopping at the version I'm interested in.
This approach saved my skin a number of times and I'm a true believer now. There is only one disadvantage: there is no obvious place to look in source to find 'what is the current form of procedure foo?'. Because the latest version of foo might have been upgraded 2 or 3 versions ago and it wasn't changed since, I need to look at the upgrade script for that version. I usually resort to just looking into the database and see what's in there, rather than searching through the upgrade scripts.
One final note: this is actually not my invention. This is modeled exactly after how SQL Server itself upgrades the database metadata (mssqlsystemresource).
If you are changing the static data (adding a new item to the table that is used to generate a drop-down list) then the insert should be in source control and deployed with the rest of the code. This is especially true if the insert is needed for the rest of the code to work. Otherwise, this step may be forgotten when the code is deployed and not so nice things happen.
If static data comes from another source (such as an import of the current airport codes in the US), then you may simply need to run an already documented import process. The import process itself should be in source control (we do this with all our SSIS packages), but the data need not be.
Here at Red Gate we recently added a feature to SQL Data Compare allowing static data to be stored as DML (one .sql file for each table) alongside the schema DDL that is currently supported by SQL Compare.
To understand how this works, here is a diagram that explains how it works.
The idea is that when you want to push changes to your target server, you do a comparison using the scripts as the source data source, which generates the necessary DML synchronization script to update the target. This means you don't have to assume that the target is being recreated from scratch each time. In time we hope to support static data in our upcoming SQL Source Control tool.
David Atkinson, Product Manager, Red Gate Software
I have come across this when developing CMS systems.
I went with appending the static data (the stuff referenced in the code) to the database creation scripts, then a separate script to add in any 'initialisation data' (like countries, initial product population etc).
For the first two steps, you could consider using an intermediate format (ie XML) for the data, then using a home grown tool, or something like CodeSmith to generate the SQL, and possible source files as well, if (for example) you have lookup tables which relate to enumerations used in the code - this helps enforce consistency.
This has another benefit that if the schema changes, in many cases you don't have to regenerate all your INSERT statements - you just change the tool.
I really like your distinction of the three types of data.
I agree for the third.
In our application, we try to avoid putting in the database the first, because it is duplicated (as it has to be in the code, the database is a duplicate). A secondary benefice is that we need no join or query to get access to that value from the code, so this speed things up.
If there is additional information that we would like to have in the database, for example if it can be changed per customer site, we separate the two. Other tables can still reference that data (either by index ex: 0, 1, 2, 3 or by code ex: EMPTY, SIMPLE, DOUBLE, ALL).
For the second, the scripts should be in source-control. We separate them from the structure (I think they typically are replaced as time goes, while the structures keeps adding deltas).
How do I (as a developer) warn the people doing the deployment that they should execute an insert statement ?
We have a complete procedure for that, and a readme coming with each release, with scripts and so on...
First off, I have never used Visual Studio Database Edition. You are blessed (or cursed) with whatever tools this utility gives you. Hopefully that includes a lot of flexibility.
I don't know that I'd make that big a difference between your type 1 and type 2 static data. Both are sets of data that are defined once and then never updated, barring subsequent releases and updates, right? In which case the main difference is in how or why the data is as it is, and not so much in how it is stored or initialized. (Unless the data is environment-specific, as in "A" for development, "B" for Production. This would be "type 4" data, and I shall cheerfully ignore it in this post, because I've solved it useing SQLCMD variables and they give me a headache.)
First, I would make a script to create all the tables in the database--preferably only one script, otherwise you can have a LOT of scripts lying about (and find-and-replace when renaming columns becomes very awkward). Then, I would make a script to populate the static data in these tables. This script could be appended to the end of the table script, or made it's own script, or even made one script per table, a good idea if you have hundreds or thousands of rows to load. (Some folks make a csv file and then issue a BULK INSERT on it, but I'd avoid that is it just gives you two files and a complex process [configuring drive mappings on deployment] to manage.)
The key thing to remember is that data (as stored in databases) can and will change over time. Rarely (if ever!) will you have the luxury of deleting your Production database and replacing it with a fresh, shiny, new one devoid of all that crufty data from the past umpteen years. Databases are all about changes over time, and that's where scripts come into their own. You start with the scripts to create the database, and then over time you add scripts that modify the database as changes come along -- and this applies to your static data (of any type) as well.
(Ultimately, my methodology is analogous to accounting: you have accounts, and as changes come in you adjust the accounts with journal entries. If you find you made a mistake, you never go back and modify your entries, you just make a subsequent entries to reverse and fix them. It's only an analogy, but the logic is sound.)
The solution I use is to have create and change scripts in source control, coupled with version information stored in the database.
Then, I have an install wizard that can detect whether it needs to create or update the db - the update process is managed by picking appropriate scripts based on the stored version information in the database.
See this thread's answer. Static data from your first two points should be in source control, IMHO.
Edit: *new
all-in-one or a separate script? it does not really matter as long as you (dev team) agree with your deployment team. I prefer to separate files, but I still can always create all-in-one.sql from those in the proper order [Logins, Roles, Users; Tables; Views; Stored Procedures; UDFs; Static Data; (Audit Tables, Audit Triggers)]
how do you make sure they execute it: well, make it another step in your application/database deployment documentation. If you roll out application which really needs specific (new) static data in the database, then you might want to perform a DB version check in your application. and you update the DB_VERSION to your new release number as part of that script. Then your application on a start-up should check it and report an error if the new DB version is required.
dev and non-dev static data: I have never seen this case actually. More often there is real static data, which you might call "dev", which is major configuration, ISO static data etc. The other type is default lookup data, which is there for users to start with, but they might add more. The mechanism to INSERT these data might be different, because you need to ensure you do not destoy (power-)user-created data.