How do I find how much space TFS is using - tfs-sdk

I'm tring to find out how much space TFS is using. Is there a simple check free space command on TFS?
Also is there a way to poll TFS for the amount of hard drive space left and see when large changes or large amount of files have been added and by whom for a given week or day?

I'm tring to find out how much space
TFS is using. Is there a simple check
free space command on TFS?
Projects are not partitioned in the database in such a way that you can easily figure this out. Of course if you just want to see how much space all collections are using you can take a look at your db size.
Here is a good article to read that gives you a rough estimate of space used for files, work items etcetera.
Also is there a way to poll TFS for
the amount of hard drive space left
and see when large changes or large
amount of files have been added and by
whom for a given week or day?
TFS doesn't control the amount of HDD space you have left on your drive. You can however in code check this by doing something like:
using System.IO.DriveInfo
var drive = new DriveInfo("DRIVE_LETTER");
long freeSpace= drive.freeSpace;
As for your final question (see when large changes or large amounts of files have been added and by whom), this article demonstrates how to do what you are describing using TFS API.

Related

monetdb in the cloud, scalability, amazon s3

i have recently discovered MonetDB and i am evaluating it for an internal project, so probably my questions are from a really newbie point of view. Maybe someone could point me to a site and/or document where i could find more info (i haven't found too much googling)
regarding scalability, correct me please if i am wrong, but what i understand is that if i need to scale, i would launch more server instances and discover them from the control node, is it right?
is there any limit on the number of servers?
the other point is about storage, is it possible to use amazon S3 to back MonetDB readonly instances?
update we would need to store a massive amount of Call Detail Records from different sources, on a read-only basis. We would aggregate/reduce that data for the day-to-day operation, accessing the bigger tables only when the full detail is required.
We would store the historical data as well to perform longer-term analysis. My concern is mostly about memory, disk storage wouldn't be the issue i think; if the hot dataset involved in a report/analysis eats up the whole memory space (fast response times needed, not sure about how memory swapping would impact), i would like to know if i can scale somehow instead of reingeneering the report/analysis process (maybe i am biased by the horizontal scaling thing :-) )
thanks!
You will find advantages of monetdb easily on net so let me highlight some disadvantages
1. In monetdb deleting rows does not free up the space
Solution: copy data in other table,drop existing table, and rename the other table
2. Joins are little slower
3. We can can not give table name as dynamic variable
Eg: if you have table name stored in one main table then you can't make a query like "for each (select tablename from mytable) select data from tablename)" the sql
You can't make functions with tablename as variable argument.
But it is still damn fast and can store large amount of data.

Is there a reverse-incremental backup solution with built-in redundancy (e.g. par2)?

I'm setting a home server primarily for backup use. I have about 90GB of personal data that must be backed up in the most reliable manner, while still preserving disk space. I want to have full file history so I can go back to any file at any particular date.
Full weekly backups are not an option because of the size of the data. Instead, I'm looking along the lines of an incremental backup solution. However, I'm aware that a single corruption in a set of incremental backups makes the entire series (beyond a point) unrecoverable. Thus simple incremental backups are not an option.
I've researched a number of solutions to the problem. First, I would use reverse-incremental backups so that the latest version of the files would have the least chance of loss (older files are not as important). Second, I want to protect both the increments and backup with some sort of redundancy. Par2 parity data seems perfect for the job. In short, I'm looking for a backup solution with the following requirements:
Reverse incremental (to save on disk space and prioritize the most recent backup)
File history (kind of a broader category including reverse incremental)
Par2 parity data on increments and backup data
Preserve metadata
Efficient with bandwidth (bandwidth saving; no copying the entire directory over for each increment). Most incremental backup solutions should work this way.
This would (I believe) ensure file integrity and relatively small backup sizes. I've looked at a number of backup solutions already but they have a number of problems:
Bacula - Simple normal incremental backups
bup - incremental and implements par2 but isn't reverse incremental and doesn't preserve metadata
duplicity - incremental, compressed, and encrypted but isn't reverse incremental
dar - incremental and par2 is easy to add, but isn't reverse incremental and no file history?
rdiff-backup - almost perfect for what I need but it doesn't have par2 support
So far I think that rdiff-backup seems like the best compromise but it doesn't support par2. I think I can add par2 support to backup increments easily enough since they aren't modified each backup but what about the rest of the files? I could generate par2 files recursively for all files in the backup but this would be slow and inefficient, and I'd have to worry about corruption during a backup and old par2 files. In particular, I couldn't tell the difference between a changed file and a corrupt file, and I don't know how to check for such errors or how they would affect the backup history. Does anyone know of any better solution? Is there a better approach to the issue?
Thanks for reading through my difficulties and for any input you can give me. Any help would be greatly appreciated.
http://www.timedicer.co.uk/index
Uses rdiff-backup as the engine. I've been looking at it, but that requires me to set up a "server" using linux or a virtual machine.
Personally, I use WinRAR to make pseudo-incremental backups (it actually makes a full backup of recent files) run daily by a scheduled task. It is similarly a "push" backup.
It's not a true incremental (or reverse-incremental) but it saves different versions of files based on when it was last updated. I mean, it saves the version for today, yesterday and the previous days, even if the file is identical. You can set the archive bit to save space, but I don't bother anymore as all I backup are small spreadsheets and documents.
RAR has it's own parity or recovery record that you can set in size or percentage. I use 1% (one percent).
It can preserve metadata, I personally skip the high resolution times.
It can be efficient since it compresses the files.
Then all I have to do is send the file to my backup. I have it copied to a different drive and to another computer in the network. No need for a true server, just a share. You can't do this for too many computers though as Windows workstations have a 10 connection limit.
So for my purpose, which may fit yours, backs up my files daily for files that have been updated in the last 7 days. Then I have another scheduled backup that backups files that have been updated in the last 90 days run once a month or every 30 days.
But I use Windows, so if you're actually setting up a Linux server, you might check out the Time Dicer.
Since nobody was able to answer my question, I'll write a few possible solutions I found while researching the topic. In short, I believe the best solution is rdiff-backup to a ZFS filesystem. Here's why:
ZFS checksums all blocks stored and can easily detect errors.
If you have ZFS set to mirror your data, it can recover the errors by copying from the good copy.
This takes up less space than full backups, even though the data is copied twice.
The odds of an error in both the original and mirror is tiny.
Personally I am not using this solution as ZFS is a little tricky to get working on Linux. Btrfs looks promising but hasn't been proven stable from years of use. Instead, I'm going with a cheaper option of simply checking hard drive SMART data. Hard drives should do some error checking/correcting themselves and by monitoring this data I can see if this process is working properly. It's not as good as additional filesystem parity but better than nothing.
A few more notes that might be interesting to people looking into reliable backup development:
par2 seems to be dated and buggy software. zfec seems like a much faster modern alternative. Discussion in bup occurred a while ago: https://groups.google.com/group/bup-list/browse_thread/thread/a61748557087ca07
It's safer to calculate parity data before even writing to disk. i.e. don't write to disk, read it, and then calculate parity data. Do it from ram, and check against the original for additional reliability. This might only be possible with zfec, since par2 is too slow.

When should we store images in database?

I have a table of productList in which i have 4 column, now i have to store image for each row so i have two option for this..
Store image in data base.
Save images in a folder and store only path on table.
So my question is which one is better in this situation and why ?
Microsoft Research published quite an extensive paper on the subject, called To Blob Or Not To Blob.
Their synopsis is:
Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation – one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of “storage age” or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates.
It depends -
You can store images in DB if you know that they wont increase in size very often. This has its advantage when you are deploying your systems or migrating to new servers. you dont have to worry about copying images seperately.
If the no. of rows increase very frequently on that system, and the images get bulkier, then its good to store on the file system and have a path stored in database for later retrieval. This also will keep you on toes when migrating your servers where you have to take care of copying the images from filepath seperately.

Backing up my database is taking too long

On a windows mobile unit, the software I'm working on relies on a sdf file as it's database.
The platform that the software is targeted towards is "less than optimal" and hard resets every once and a while. In the far distant past we lost data. Now we close the database, and copy the SDF file to the SD card. If the unit gets hard reset, we restore the app (also on the sd card) and the database.
I'm not concerned about the restore (just yet). The problem we have now is that doing a "backup" takes a crazy amount of time because the SDF is 7+ megs and writing to the SD card is slow slow slow.
My boss suggested we create hashes of "chunks" of the file and then write to the destination file only when a compare of the hashes is !=.
So here's the question.
How would you test if a file is changed if you can only have one copy of the file and thus can't compare it with it's original.
I'm just shooting for a bit of brain storming.
Just store your hashes of your chunks somewhere. You don't need the "backup" copy to compare to if you know what your hashes are. Obviously this creates a chicken and egg problem for at least one hash, but copying a single "chunk" is a much smaller problem.
Your proposed approach will still have performance problems though, as hashing a large file isn't going to be a pretty operation on a slow CPU powered by a battery.
I assume you don't have the granular control to keep track of the parts of the file you modify, and then update just those sections when you need to do backup?

Create test cube based on existing cube data (but much larger)

Is it possible to create a large cube based on existing cube data?
We'd like to test the performance of certain tools in combination with SSAS and currently do not have any cubes large enough.
e.g. We have a year's worth of data and want to expand it to be 10 year's worth.
Mostly I have created my own scripts for growing test data.
I have used Adventure Works as a base for names, address etc, also I have used Red Gate's data generator (was working at a place that had the full Red Gate product suite, you can download an evaluation copy to test it out).
Might be worth writing your own scripts. Then you can tweak the generation scripts to generate additional versions for testing.
To increase the size of your data you need to write custom scripts to copy it. There is no automatic way to "grow data" in SQL.