What should I store in my blob field? - sql

I try to use blobs as little as possible. When possible I replace blobs by a link to a file. I can only think of a few times that I needed a blob. I used to put zipped pickled python objects in there, but I use mongo or couchdb these days for that. One thing I still use it for is to store wkb (gis) objects. This made me wonder, what do other people put in their blob fields?

Whatever binary data needs to be stored - typically images & documents (Word, PDF).

They have disadvantages..., so tried to ignore them, especially now file-stream exists in SQL2008.

BLOB accepts any data, all the items stored in the BLOB are stored in whole disk pages in separate disk areas from normal row data. Normally in a BLOB you can store any data generated by your program, images, graphics, video, audio or different types of documents.

Related

Load file from Cloud Storage to BigQuery to single string column

We are designing a new ingestion framework (Cloud Storage -> BigQuery) using Cloud Functions. However, we receive some files (json, csv) that are corrupted and cannot be inserted as is (bad field names, missing columns, etc.) not even as external tables. Therefore, we would like to ingest every row to one cell as a JSON string and deal with the issues when we cleanse the data in BigQuery.
Is there a way to do that natively and efficiently and as little processing possible (so Cloud Functions wouldn't time out)? I wrote a function that processes the files and wraps lines one by one but for bigger files it won't be an option. We would prefer to stay with Cloud Functions to have this as lightweight as possible.
My option in that case is to ingest the CSV with a dummy separator, for instance # or |. I know that I will never have those characters and that's why I chose them.
Like that, the schema autodetect detect only 1 column, and create a single string column table.
If you can pick a character like that, it's the easiest solution, but without any guaranty of course (it's corrupted file, it's hard to know in advance what will be the unused characters)

Genexus 15, save PDFs, GIFs, JPGs, WORD documents

I need to save PDFs, GIFs, TIFs, JPGs, etc...how can I do this in Genexus 15 compiling in C#.
After this I have to show the saved documents in a form.
Thank you..
PD: I'm new using Genexus...
This question seems too broad to answer... Will those files be stored in the database or the file system? Or perhaps in an external storage like Amazon's S3?
Will the application store different file types in the same database column, or will there be a filed storing images, another one por PDFs, etc.?
Anyway, here are some documents that may be of some help:
Blob data type for storing any file in the database (or see BlobFile data type if using GeneXus Tero, in pre-beta at this moment...)
File data type for storing files in the file system
Image data type for storing image files in the database (there is also Audio and Video which work exactly the same way)
External Storage for Multimedia explains hot to store multimedia files in an external service.
Hope this helps...

Creating a test-data container in Azure blob storage

I'm adding some testing to my current project which uses Azure blob storage to store telemetry data coming from a stream analytics job. I want to do testing of the routines that get the telemetry data, so I created a separate container for test data. I downloaded a sample set of data, modified the data to serve my needs and re-uploaded (using Azure storage explorer) everything back into the new container.
The tests were immediately failing and I quickly found out that this is because the LastModified date of the files changed into the date/time of upload. This is fine, but the sequence of the upload was also different. My code uses the modified date of the file to find out which one is the most recent, which would now return a different file based on the new dates.
I found that you cannot modify this property, although you can change another property to have it update. So I know the solution: I could write a quick script which gets the sequence of files from my production instance and then touches every file in the test instance in the same sequence.
But... I was wondering whether this is the best option. I also read it's 'best practice' to store a custom datetime in a separate property, but I don't think I can do that straight from Stream Analytics (which is writing the blobs). I also considered using an Azure Function to do this (new blob => update property), but I'm than adding complexity and something that might fail for whatever reason.
So I'm looking for the best way to solve this problem. Anyone?
Update: this one probably deserves a tiny bit more explanation. Apart from using the LastModified date to sort on, I also use it to filter blobs. The blobs themselves are CSV files containing ASA output data, so telemetry records. Each record has a timestamp, but that information is IN the file. When retrieving data, I don't want to have to dive into each file to find out what the timestamp is of those records. So I use a prefilter to filter out the blobs within a certain timespan, and then only download / open those file to the records inside.
This works perfectly as long as you do not touch any of the blob, but obviously it stops working as soon as any of the blobs gets modified for whatever reason. So I'm now convinced that I need a different / better way to solve this issue; but how?
It seems to me that you have two separate things: the data that you want to store in blob storage and metadata about the blob such as the timestamp. I would create a different (azure) database for the metadata or even simpler just add metadata to the (block)blob:
blockBlob.Metadata.Add("from", dateTime.ToString());
blockBlob.Metadata.Add("to", dateTime.ToString());
blockBlob.Metadata.Add("order", "1");
For sorting I would just add a simple order property.
The comment by #Vignesh deserves the credit here, but in order to get this one marked answer I'll provide it myself.
With ASA, you can set the output to be structured by date/time. That means in this case, data is written to the blob store with a directory structure such as:
2016 / 06 / 27 / 15 / 23 (= 27-06-2016 15:23)
2016 / 06 / 28 / 11 / 02 (= 28-06-2016 11:02)
The ASA output allow you to specify how granular you want the structure to be, in my case I chose to store it by day (so not including a time path). The ASA runtime will now ensure that data from a certain point in time is stored within a blob in that resides in the correct path.
Then I subsequently changed my logic to not use the datetime stamp of the individual blob files any more, but simply read just the files from the folders that are within the timerange I'm interested in. That assures we only get data that was produced within that timerange. And if there's more than one file in a folder, I need to load them both since both were in the same timerange anyway. As long as minutes are enough granularity for you, this works excellent even though it might feel a bit strange to use a folder structure for such a thing.
Having a seperate 'index' for blobs which tracks their datetime would work too of course, but adds complexity which in this case I don't really need.

what data type to use for storing different files in database

Im trying to make a database accept different files in a postgres database table. The files I want to support are of different mime-types. I want to support pdf, word, plain text, and power point. The problem is that i don't know what datatype to choose. The documentation to pgadmin (the tool im using) is very (let´s say) unsatisfactory. Thanks
While you can store the file contents in the database, consider storing the file path instead and using the file system to store the file.
In the IT world "you can do anything with anything", but that doesn't mean you should.
In this case, you're trying to use a database as a file system, which it can do, but databases are not as efficient or practical as file systems for storing file contents (typically "large" data). It will:
make your backups longer and larger
slow your insert queries down (more I/O)
make your log files larger (slower and fill more often)
make accessing the files slower (query vs simple disk I/O)
require you to go via the database to access the files (hassle, can't use browser etc)
etc
You can use bytea type in PostgreSQL.

How does CouchDB handle data?

How is authentication handled in CouchDB? Say I create Admin users and Readers, and assign them roles. Say also that I assign them to an individual database. On the file system level, is there a way for someone who is not authenticating, to look at the data that is stored in the database? Is the data stored as plain text in a file? How is this handled in CouchDB?
Through the database interface, roles are just as strong as they are in any other database. As long as they can't get hold of the files, it's absolutely as secure as your permissions and passwords. However, if they do, there's absolutely no compression or encryption built into CouchDB. Encrypt the data in your code (or your abstraction layer if you use one) if file system access control is a concern - of course anyone who gets hold of your DB filesystem could probably find your code's decryption keys, as well.
It's not a plain text file, it's a binary file that combines the data and indices, but you could copy it to a local CouchDB install and view it that way, or just open it in a good text editor. The data chunks are stored in plain text (JSON, actually) and isn't hard to read, though binary attachments remain binary.