Can I split a gziped content-encoding across multiple chunks? Are there any known implementations that fail with this?
This looks to be the common practice on the internet, so I am doubting that there would be a problem.
Related
I need to update multiple files to s3 from a Java application. But the catch is we need all the files atomically i.e. All or nothing.
I am unable to find any solution for that.
Any suggestions are welcome.
Thanks!
S3 is an eventual consistency store so you'll need some mechanism like _commit. Parquet format and others do this for you. The format options depend on your readers, for example, no RedShift bulk loader for Parquet, so AVRO is a better format for that use case.
What common formats are supported by all systems that need to work with these files?
Till date only elegant solution I could find was reading it in DataFrame (using spark libs) and write it.
I also implemented basically checking of some commit files (let's say _commit) for locking/sync purposes which is basically done by Spark APIs as well.
Hope that helps. If anyone has any other solution - they are most welcome to please share. :)
Does anyone know what the best way to split up large files in VB.NET? These files can be in excess of 10GB. I have found ways of doing it by googling all day! most of the solutions I have found almost work. But what I really want to know is what is the most efficient way to do this?
Many thanks
I don't know whether you have stumbled on for example following topics. If so, didn't they fulfil your wishes or would you like other alternatives?
How to split a large file into smaller files using VB.NET 2003?
Split large file into smaller files by number of lines in C#?
Is there any alternative implementations of Lucene's (FS)Directory, notably ones related to replication? What I am looking forward to doing (but looking for something existing before implementing my own :) is a directory that writes to multiple identical directories at the same time. The idea behind is that I can't deploy DFS or SAN and thinking of a sort of a "manual" replication to another node with the minimum possible delay. Thoughts?
Many thanks!
Usually people use Solr for this. If you can't use Solr's replication functionality, you can do what people did before Solr: rsync your directories.
I have some binary data (blobs) from a database, and I need to know what compression method was used on them to decompress them.
How do I determine what method of compression that has been used?
Actually it is easier. Assume one of the standard methods was used, there possibly are some magic bytes at the beginning. I suggest taiking the hex values of the first 3-4 bytes and asking google.
It makes no sense to develop your own compressions, so... unless the case was special, or the programmer stupid, he used one of the well known compression methods. YOu could also take libraires of the most popular ones and just try what they say.
The only way to do this, in general, would be to store which compression method was used when you store the BLOB.
Starting from the blob in db you can do the following:
Store in file
For my use case I used DBeaver to export multiple blobs to separate files.
Find out more about the magic numbers from the file by doing
file -i filename
In my case the files are application/zlib; charset=binary.
I have a webpage where i have to allow the users to customize their header and footer.
i.e. I should store the Users header and footer HTML and should dynamically add it to the webpage. I have two ways of storing in database and storing in a files. Please suggest me which approach is better.
Solution with files get messier with time. With databases, it is easier to scale.
With databases, you can add bookkeeping fields (like last-modified, tags, or something else depending on your need). Backup is easier also perhaps.
With files, you have to worry about directory structure (having too many files in single directory is not good), permissions, etc.
If you are worried about efficiency, stop worrying :). MySQLqueries are pretty fast especially with the caching mechanisms/modules in apache.
Ther's not a better approach (there are pro and cons in general) but in this specific case I would store these snippet as a file because you have less complexity for sure (because you don't need to query a database and fetching result) and you don't relies on database connection for including header and footer
If you're using .Net you've something called Portals which does the same thing. There are things like master pages also that you may want to read. But all these are in .Net. Even if you're not doing this in .Net it'd be time consuming to handle all this stuff on your own as you need to take care of cross-site scripting and a few other issues.
Check the platform for features that you're working on to find out if this is possible by them. (Let me know the platform that you're using so I may help in that). Also, if the changes are just cosmetic you may store just css settings instead of complete html.
Finally it'd be better to use sql if the number of changes to store are more than 100 as the complexity will bug you down. But if you're fewer users and don't expect any scaling up then sure go for a file system.
:
Here are a couple of links for understanding portals and web parts in .Net:
http://msdn.microsoft.com/en-us/magazine/cc300767.aspx
http://msdn.microsoft.com/en-us/magazine/cc163965.aspx