Watching this video about how to design Tinder, at 06:50 a point is being made about files vs BLOBS.
I wonder what the difference is between a large binary file and a BLOB (binary large object).
Do they differ by
The method of access
The method of backup
Sharding?
What is the origin of difference? They sound quite similar to me.
When the video refers to BLOB (binary large object), it doesn't refer to any old collection of binary information (file). According to this wikipedia article, it specifically means
collection of binary data stored as a single entity in a database
management system.
The distinction lies in the "database management system". A BLOB is managed by the DBMS, although it might still be stored in a file system. However, other files are stored directly stored in the file system, and often only a URI is stored in the database.
Related
Want to migrate bulk files (e.g VSAM) from Mainframe to Azure in the beginning of the Project, how that can be achieved ?
Any utility or do we need to write own scripts?
I suspect there are some utilities out there but I suspect they are most / all priced products. Since VSAM datasets are not defined using a language construct like DDL you will likely have to do most of the heavy lifting. Either writing your own programs or custom scripts. You didn’t mention operating system but I assume you’re working on z/OS.
Here are some things to consider:
The structure of the VSAM dataset is basically record oriented. There are three basic types you’ll run into that host application data:
Key Sequenced Datasets (KSDS)
Entry Sequenced Datasets (ESDS)
Relative Record datasets (RRDS)
Familiarize yourself with the means of defining the datasets as it will give you some insight into the dataset specifics. DFSMS Access Method Services Commands will show the utilities used to create them and get information like Keylength and offest of the key. DEFINE CLUSTER is the command to create the dataset. You mentioned you are moving the data toi Azure but this will help you understand the characteristics of the data you are moving.
Since there is no DDL for VSAM datasets you will generally find the structure in the programs that manipulate them like COBOL Copybooks, HLASM DSECTs and similar constructs. This is the long pole in the tent for you.
Consider what are the semantics of accessing the data. VSAM as an access method does have some ability to control read/write access on a macro level using a DEFINE CLUSTER option called SHAREOPTIONS. The SHAREOPTIONS instruct the operating system how to handle the VSAM buffers in terms of reading and writing so that multiple processes can access the same data. Its primitive if compared to sahred files systems like NFS. VSAM allows the application to control access (or serialization) using ENQ / DEQ functions. These enable applications to express intent in the cluster about a VSAM file and coordinate their own activities.
You might find that converting a VSAM file to a relational form like Db2 is better for you. Again, you’ll have to create the DDL to describe the tables, data formats and the like.
Another consideration is data conversion. You’ll find there is character data that is most likely in EBCDIC and needs to be converted to a new code page. Numeric data can be in Packed Decimal, Binary, or even text and will need to be converted.
The short answer is there isn’t an “Easy Button” to do what you want. Consider the data is only one of the questions that needs to be answered. Serialization and access to the data, codepage conversion, if you are moving some data but not others will you need to be able to map some of the converted data back to data on the mainframe.
Consider exploring IBM CDC classic replication. You can achieve it with click of buttons.
I have not done for Azure. So not sure about support.
I am currently using ArangoDB to store all the data I'm using for my application, including images. Now I want to migrate to S3 to store the image files and transfer the files I currently have in my ArangoDB.
I am aware that images are stored in file system, but I am not sure how to actually transfer them to s3.
Thank you for your help
The location of the data files is implementation-specific, as it can be changed at install and startup. On Linux, the default directory is /var/lib/arangodb3.
But in my experience, backing-up the raw storage files is not a good idea. I have found it very difficult to restore or access data with this method. Instead, I recommend using one of these two "official" methods:
Hot backups (enterprise-edition only)
JSON export (using arangobackup/arangoimport)
Snapshot-style "Hot backups" are really great - truly the preferred method. They have everything you would need (speed, reliability, portability, etc.), with only a few case-dependent limitations. The real downside is that it's only available in the enterprise editions (including Oasis).
JSON export is the "thrifty" backup option - I would forget about arangorestore (it does horrible things to your _id/_key values, and takes forever to do so). The good news about JSON export is that it's EXTREMELY portable. Almost ANY code-base (and even most good DB's) can work with it, so you're never locked into a single product or workflow, or even a specific version of ArangoDB (making up/down-grades much easier).
this question has been asked many times, I have read many users telling that it is not advisable to store images in a DB, in particular within CoreData. By they all seems to omit the reason why they would do so. Even Apple documentation state this, and everybody points to that direction, and every discussion end like this "well you can, but storing the path is better".
Apart from opinions, I would like to have a concrete example of why it is not a good solution.
I explain better, I have a strong background in building Web Application. A concrete example I would give from my point of view could be: do not store images in a DB, but rather the path to them, because you can have them served them by the web server, which can apply all of its caching issues.
But in a desktop environment, especially in iOS application, what are the downside of having stored in Core Data using sqllite, providing that:
There's a separate entity holding the images, it is not an attribute
of main entity
Also seems to be a limit of 100kb for images. Why ? What does happen with a 110,120...200kb ecc ?
thanks
There's nothing special about what Core Data normally does here. It's just using an SQLite database. You can put large blobs of data into it, but it just doesn't scale all that well. You can read more about it here: Internal Versus External BLOBs in SQLite.
That said, Core Data has support for external blobs which in Core Data terminology is called stored in external record (iOS 5.0 and later). Again, there's nothing magic about it, it's just storing the large pieces of data in the file system separately from the SQLite db itself. The benefit is that Core Data updates all this for you.
When you're in Xcode, there'll be a checkbox called Allows External Storage that you can check for Binary Data properties.
The filesystem, and the API:s surrounding it is (just like a webserver) optimized to serve files, of any size, and to apply caching where appropriate.
CoreData is optimized for handling an object graph with tiny pieces of data, like integers and short strings.
Also, there are a number of other issues that tend to creep up on you, like periodically vacuuming the SQLite database CoreData uses, or it won't be able to shrink, just grow.
Leonardo,
With Lion/iOS 5, Core Data started handling file system storage of large BLOBs for you.
The choice is really determined by how many images you are going to have open. If you have many, then you should keep them in the DB. Why? Because you only have a modest number of file descriptors, one of which is used for each open image stored in the file system.
That said, there is still a reason to manage the files yourself. If your BLOBs are really big, say 2+ MB, you will want to map them into memory and not just read them in. (When the memory warnings come, this lets the OS automatically purge them from your resident memory. This is a very good thing.) Even so, you still have the limited number of file descriptors problem.
Andrew
I'm trying to see what the best way to store large amounts of text (more than 255 characters) in Cocoa would be. Being a big fan of Core Data, I would assume there's an effective way to do so. However, I feel like 'string' is the wrong data type for this type of thing. Does anyone have any info on this? I don't see an option for BLOB in Core Data
Well you can't very well compress the text or store it as a binary that must be translated, otherwise you give up SQLite's querying speed (because all text-stored-as-binary-encoded-data) records must be read into memory, translated/decompressed, then searched). Otherwise, you'd have to mirror (and maintain) the text-only representation in your Core Data store alongside the more full-featured stuff.
How about a hybrid solution? Core Data stores all but the actual text; the text itself is archived a one-file-per-entry-in-Core-Data on the file system. Each file named for its unique identifier in the Core Data store. This way a search could do two things (in the background, of course): search the Core Data store for things like titles, dates, etc; search the files (maybe even with Spotlight) for content search. If there's a file search match, its file name is used to find the matching record in Core Data for display in your app's UI.
This lets you leverage your app-specific internal search criteria and Spotlight's programmatic asynchronous search. It's a little more work, granted, but if you're talking about a LOT of text, I can't think of a better way.
The BLOB data type is called "Binary data" in Core Data. As middaparka has pointed out, the Core Data Programming Guide offers some guidance on how to deal with binary data in Core Data. Depending on your requirements, an alternative to using BLOBs would be to just store references to files on disk.
I'd recommend a read of Apple's Core Data Programming Guide (specifically the "Core Data Performance" section). This specifically mentions BLOBs (see the "Large Data Objects (BLOBs)" section) and gives some, albeit vague, guidelines.
What is best practice for storing large photos/text files in sql server. Baring the need for scalability and we are just working with 1 server.
I feel that storing a file path in sql as opposed to a blob is better. Is this true? If we had to scale the software should we still follow this method.
It depends on the size of the files.
There is a good Microsoft white paper on the subject, here.
objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors
Of course, their conclusions are specific to SQL Server (2005 and 2008 R2).
It's a bad idea. Unless you have some very specific reason to store files in data. Already discussed here: Storing Images in DB - Yea or Nay?
If you still insist, read the best practice to do so :) here Best Practices for uploading files to database
It's mostly a question of using the right tool for the job. A lot of time and effort has been put into optimizing a relational database for the purpose of storing relational data. A lot of time and effort has been put into optimizing file systems for the purpose of storing files.
The former can be used to perform part of the job of the latter, but unless there's a really good reason not to use the latter then it's the tool more suited for the job. In nearly every case I've come across, storing the file path (and other relevant information about the file you may want) in the DB and the actual file on the FS is a more well-suited approach to using the tools available.