Zope Data.fs too large that I can't pack it anymore - zope

My site's database grew too large that it can't even pack itself anymore. When I do pack it, it says that there's insufficient space.
I've tried deleting some files in the server but to no avail - the database itself just takes the whole disk space.
How would I continue on from this point? The website is basically stuck since it can't add more data since the disk space is full.
Additional details
The server is on Ubuntu 11.10

Copy the Data.fs file to another machine with more disk space, and pack it there. Then copy the smaller file back to the server, bring that down and move the packed version in place.
Depending on how much downtime you are willing to tolerate, you could remove the large unpacked Data.fs file first, then copy the replacement over.
If you are using a blobstorage with your site, you'll have to include that when copying across your ZODB.

After a few weeks, I returned to this problem and I have finally fixed it.
It is similar to the idea of #MartijnPieters, but I approached the problem differently.
My zope instance was in /dev/sda6, and that filesystem is full. I just increased the size from 27G to 60G and THEN I packed my Data.fs file.
I used GParted on my machine but it's because /dev/sda6 is a native linux filesystem. If you're running LVMs, you might need to use resize2fs.

Related

passing large data to Titanium.Utils.md5HexDigest

I am trying to calculate md5 hash for large files ( about 60MB or more). The device, a Nexus 7 with 1GB RAM and 16GB, is not able to allocate anything more than 30MB. The code fails with java.lang.OutOfMemory exception.
And I don't find any way to add data in a piecemeal way to Titanium.Utils.md5HexDigest(). It needs the whole data at once.
Is there any way I can workaround this problem?
I have searched for any products that would help me do this on the Marketplace. But I haven't found any.
You mentioned it was to determine wether or not to download it again. So, it comes from a server somewhere.
Instead of recalculating the MD5, you should have already stored that in the app when downloading the file in the first place. So just compare the stored MD5 Hash with the one on the server. This saves you a lot of trouble, and actually doesn't require you to recalculate. It also speeds up the app tremendously.

Extracting Data from a VERY old unix machine

Firstly apologies if this question seems like a wall of text, I can't think of a way to format it.
I have a machine with valuable data on(circa 1995), the machine is running unix (SCO OpenServer 6) with some sort of database stored on it.
The data is normally accessed via a software package of which the license has expired and the developers are no longer trading.
The software package connects to the machine via telnet to retrieve data and modify data (the telnet connection no longer functions due to the license being changed).
I can access the machine via an ODBC driver (SeaODBC.dll) over a network, this was how I was planning to extract the data but so far I have retrieved 300,000 rows in just over 24 hours, in total I estimate there will be around 50,000,000 rows total so at current speed it will take 6 months!
I need either a quicker way to extract the data from the machine via ODBC or a way to extract the entire DB locally on the machine to an external drive/network drive or other external source.
I've played around with the unix interface and the only large files I can find are in a massive matrix of single character folder (eg A\G\data.dat, A\H\Data.dat ect).
Does anyone know how to find out the installed DB systems on the machine? Hopefully it is a standard and I'll be able to find a way to export everything into a nicely formatted file.
Edit
Digging around the file system I have found a folder under root > L which contains lots of single lettered folders, each single lettered folder contains more single letter folders.
There are also files which are named after the table I need (eg "ooi.r") which have the following format:
<Id>
[]
l for ooi_lno, lc for ooi_lcno, s for ooi_invno, id for ooi_indate
require l="AB"
require ls="SO"
require id=25/04/1998
{<id>} is s
sort increasing Id
I do not recognize those kinds of filenames A\G\data.dat and so on (filenames with backslashes in them???) and it's likely to be a proprietary format so I wouldn't expect much from that avenue. You can try running file on these to see if they are in any recognized format just to see...
I would suggest improving the speed of data extraction over ODBC by virtualizing the system. A modern computer will have faster memory, faster disks, and a faster CPU and may be able to extract the data a lot more quickly. You will have to extract a disk image from the old system in order to virtualize it, but hopefully a single sequential pass at reading everything off its disk won't be too slow.
I don't know what the architecture of this system is, but I guess it is x86, which means it might be not too hard to virtualize (depending on how well the SCO OpenServer 6 OS agrees with the virtualization). You will have to use a hypervisor that supports full virtualization (not paravirtualization).
I finally solved the problem, running a query using another tool (not through MS Access or MS Excel) worked massively faster, ended up using DaFT (Database Fishing Tool) to SELECT INTO a text file. Processed all 50 million rows in a few hours.
It seems the dll driver I was using doesn't work well with any MS products.

Cocoa apis reporting incorrect values for free space, what should I use?

Does anyone know what apis Apple is using for it's Get Info panel to determine free space in Lion? All of the code I have tried to get the same Available Space that Apple is reporting is failing, even Quick Look isn't displaying the same space that Get Info shows. This seems to happen if I delete a bunch of files and attempt to read available space.
When I use NSFileManager -> NSFileSystemFreeSize I get 42918273024 bytes
When I use NSURL -> NSURLVolumeAvailableCapacityKey i get 42918273024 bytes
When I use statfs -> buffer.f_bsize * buffer.f_bfree i get 43180417024 bytes
statfs gets similar results to Quick Look, but how do I match Get Info?
You are probably seeing a result of local Time Machine snapshot backups. The following quotes are from the following Apple Support article - OS X Lion: About Time Machine's "local snapshots" on portable Macs:
Time Machine in OS X Lion includes a new feature called "local
snapshots" that keeps copies of files you create, modify or delete on
your internal disk. Local snapshots compliment regular Time Machine
backups (that are stored on your external disk or Time Capsule) giving
you a "safety net" for times when you might be away from your external
backup disk or Time Capsule and accidentally delete a file.
The article finishes by saying:
Note: You may notice a difference in available space statistics between Disk Utility, Finder, and Get Info inspectors. This is
expected and can be safely ignored. The Finder displays the available
space on the disk without accounting for the local snapshots, because
local snapshots will surrender their disk space if needed.
It looks like all the programmatic methods of measuring available disk space that you have tried give the true free space value on the disk, not the space that can be made available by removing local Time Machine backups. I doubt command line tools like df have been made aware of local Time Machine backups either.
This is a bit of a workaround, not a real api, but the good old unix command df -H will get you the same information as in the 'get info' panel, you just need to select the line of your disk and parse the output.
The df program has many other options that you might want to explore. In this particular case the -H switch tells the program to spit out the numbers in human readable format and to use base 10 sizes.
Take a look here on how to run command lines from within an app and get the output inside your program: Execute a terminal command from a Cocoa app
I believe that the underpinnings of both df and the get info panel are very likely to be the same thing.

How to maintain lucene indexes in azure cloud-app

I just started playing with the Azure Library for Lucene.NET (http://code.msdn.microsoft.com/AzureDirectory). Until now, I was using my own custom code for writing lucene indexes on the azure blob. So, I was copying the blob to localstorage of the azure web/worker role and reading/writing docs to the index. I was using my custom locking mechanism to make sure we dont have clashes between reads and writes to the blob. I am hoping Azure Library would take care of these issues for me.
However, while trying out the test app, I tweaked the code to use compound-file option, and that created a new file everytime I wrote to the index. Now, my question is, if I have to maintain the index - i.e keep a snapshot of the index file and use it if the main index gets corrupt, then how do I go about doing this. Should I keep a backup of all the .cfs files that are created or handling only the latest one is fine. Are there api calls to clean up the blob to keep the latest file after each write to the index?
Thanks
Kapil
After i answered this, we ended up changing our search infrastructure and used Windows Azure Drive. We had a Worker Role, which would mount a VHD using the Block Storage, and host the Lucene.NET Index on it. The code checked to make sure the VHD was mounted first and that the index directory existed. If the worker role fell over, the VHD would automatically dismount after 60 seconds, and a second worker role could pick it up.
We have since changed our infrastructure again and moved to Amazon with a Solr instance for search, but the VHD option worked well during development. it could have worked well in Test and Production, but Requirements meant we needed to move to EC2.
i am using AzureDirectory for Full Text indexing on Azure, and i am getting some odd results also... but hopefully this answer will be of some use to you...
firstly, the compound-file option: from what i am reading and figuring out, the compound file is a single large file with all the index data inside. the alliterative to this is having lots of smaller files (configured using the SetMaxMergeDocs(int) function of IndexWriter) written to storage. the problem with this is once you get to lots of files (i foolishly set this to about 5000) it takes an age to download the indexes (On the Azure server it takes about a minute,, of my dev box... well its been running for 20 min now and still not finished...).
as for backing up indexes, i have not come up against this yet, but given we have about 5 million records currently, and that will grow, i am wondering about this also. if you are using a single compounded file, maybe downloading the files to a worker role, zipping them and uploading them with todays date would work... if you have a smaller set of documents, you might get away with re-indexing the data if something goes wrong... but again, depends on the number....

sync automatic background offline

I'm working on some documents on a laptop which is sometimes offline (it runs winXP).
I'd like to backup automatically the documents to a folder to a remote location so that it runs in the background.
I want to edit the documents and forget about backuping and once online - have it all backuped to a remote location, or even better - to an svn server or something that supports versioning.
I want something which is:
1. free
2. does not overload the network too much but only send the diff.
3. works 100%
thanks in advance
DropBox does everything your asking. http://www.getdropbox.com/
Plus it's fully cross platform, Windows, Mac, Linux.
Free up to 2GB.
I like IDrive Online personally. It's Windows/Mac (no Linux), 2 GB free, and here's the important bit: it stores the last 30 revisions of files. And the history for a file isn't counted against your space either; only the most recent version counts. None of the other free online backup solutions I've seen handle versioning as well. It supports continuous backup too. Oh, and they handle being offline beautifully -- even losing connection in the middle of the backup process doesn't bother it.
Dropbox also keeps previous revisions, including of deleted files.