Storing PDFs and Images: nvarchar(max) vs. varbinary(max) - sql-server-2012

We are designing a database that needs to store various versions of a file (pdf/image/reduced image) in a table. The powers that be have opted against using Filestream for whatever reason so this is not up for debate.
I can't seem to find anything online that indicates what the appropriate datatype is for storing pdf and image data. That or I'm being just a total idiot while searching for it.
I'm not trying to start a debate, so I'm not looking for opinionated responses. But rather I am trying to find out if one or the other was actually designed for what I'm trying to do. If either will work, that's all I need to know.

Given your binary choice of nvarchar vs varbinary there's no choice: it's varbinary. nvarchar is for storing unicode character based data. varbinary is going to store a bit-perfect copy of the data you put in there. PDFs and images are binary file types so varbinary it is.
As for the BLOB suggestion, no. That's not even supported with 2012. Oh, and perhaps you meant TEXT/NTEXT/IMAGE data type. Those are deprecated too so don't build anything new using them.
Finally, you said you can't use FileStream, but what about FileTable. I'm not sure if you're looking for just storage of data or you need it searchable in which case, FileTable's pretty slick.

Related

What is the difference between the sql blob and image types

This question was actually posed to me this morning by one of my compadres, and it completely threw me, in so far as I didn't even realise SQL had an 'image' datatype so I've always simply gone down the route of compacting files/images etc into BLOB fields.
After having a (very) quick look around msdn the most I could really find was the info that the image datatype pages works like a blob or varbinary(max) up to a [max] size of ~2GB but past this can't seem to find too much more about it.
I'm wondering if maybe the image datatype is a way of providing indexable/searchable metadata on images within SQL...?
Does anyone have anything to offer, by way of links or explanations regarding this?
Any help would be gratefully received, so many thanks in advance of any replies!
UPDATE / Possible answer
After a little more digging I've perhaps acheived some headway with this...
It may well be that my original thoughts on what the "image" type relates to may have been somewhat misguided, maybe due to the type being titled "image" (which I apparently found quite misleading on first look) which I thought related to an image in a mimetype kinda way, but which appears to infer more of a "disk-image" or "binary-image" sort of idea.
In this way it seems that the IMAGE type was introduced to SQL before the varbinary type was allowed as a (max) field, possibly as a way of storing files in SQL in the way that is now taken for granted (by myself at least) with the use of VARBINARY(MAX)...
For Reference (both reasonably old but seem to fit the bill):
http://channel9.msdn.com/Forums/Coffeehouse/138883-Storing-Retrieving-images-from-SQL-Server-2005
http://www.basenow.com/help/Data_types_in_Microsoft_SQL_Server.asp
If anyone can offer any constructive crit on this possible answer that would be really useful in trying to understand this...
Cheers All!
It is vendor dependent but generally:
A blob/image data type is a column type which stores binary data in the database separate from rest of the columns. So everytime you ask for image/blob data, database looks up the location and reads the data and send back to you.
Some vendors do TEXT data type which is the same thing wit the the difference it accepts textual data so you can put full-text indexes on them.
As you're referencing Microsoft SQL Server, one important thing to keep in mind when choosing between IMAGE and VARBINARY data types is that Microsoft is deprecating the IMAGE type - so the fact that you did not use it is in your case a very good thing.
http://msdn.microsoft.com/en-us/library/ms143729.aspx

should xml or sqlite3 be used?

I just started iOS development am currently developing an application that just reads data from a server and displays it onto the screen. What I am not sure of is whether to use XML or sqlite3 to store the data. Which method should be more preferred and why? thanks in advance.
It is important to remember they are two different things, suited to different tasks. Choose the one that fits the problem. (In this case I would likely use XML or "just plain text" because it sounds like just a simple download-cache. Either the raw response could be kept or, perhaps the data already transformed into objects and then automatically serialized into XML or whatnot. In any case, keep it simple.)
XML is (at the very core) a markup format. XML documents are a (hopefully well-defined) structure. There is a large set of tooling that supports manipulation and querying within a hierarchical "document" model. I use XML a good bit for a serialization format and also use it for local caching if appropriate (e.g. there are no non-hierarchical relationships). XML is often loaded entirely into memory (e.g. a DOM) for manipulation.
SQLite is a relational database that is designed around tables and relationships between sets of tables. Being able to run (complex) queries is where a relational database really shines. SQLite is also very fast and can process large data-sets which can't all fit in memory. Columns in SQLite can also contain text (read: XML) so the approaches are not orthogonal.
Happy coding.
Probably all depends on how data is processed after it was stored. If data must be sorted, uses specific selection etc. then, sqlite is better solution.
Second, not so important, concern is how much data will be stored, if it's just one "table" with 10 rows then sqlite is probably too much for it.
If you want to read data from server and want to display on screen and don't need to save it locally then use XML.
If you want to store it locally and don't want to fetch from server then use XML files or sqlite database in your project.
If you want to fetch from server and also to store it locally then first use XML to fetch data and then use sqlite to store it locally.
and look at #pst answer for what is the difference between them.

Storing large amounts of text in Core Data

I'm trying to see what the best way to store large amounts of text (more than 255 characters) in Cocoa would be. Being a big fan of Core Data, I would assume there's an effective way to do so. However, I feel like 'string' is the wrong data type for this type of thing. Does anyone have any info on this? I don't see an option for BLOB in Core Data
Well you can't very well compress the text or store it as a binary that must be translated, otherwise you give up SQLite's querying speed (because all text-stored-as-binary-encoded-data) records must be read into memory, translated/decompressed, then searched). Otherwise, you'd have to mirror (and maintain) the text-only representation in your Core Data store alongside the more full-featured stuff.
How about a hybrid solution? Core Data stores all but the actual text; the text itself is archived a one-file-per-entry-in-Core-Data on the file system. Each file named for its unique identifier in the Core Data store. This way a search could do two things (in the background, of course): search the Core Data store for things like titles, dates, etc; search the files (maybe even with Spotlight) for content search. If there's a file search match, its file name is used to find the matching record in Core Data for display in your app's UI.
This lets you leverage your app-specific internal search criteria and Spotlight's programmatic asynchronous search. It's a little more work, granted, but if you're talking about a LOT of text, I can't think of a better way.
The BLOB data type is called "Binary data" in Core Data. As middaparka has pointed out, the Core Data Programming Guide offers some guidance on how to deal with binary data in Core Data. Depending on your requirements, an alternative to using BLOBs would be to just store references to files on disk.
I'd recommend a read of Apple's Core Data Programming Guide (specifically the "Core Data Performance" section). This specifically mentions BLOBs (see the "Large Data Objects (BLOBs)" section) and gives some, albeit vague, guidelines.

SQL Images -> byte arrays

So I am importing some images stored in SQL image columns, and I need to change them to Byte arrays since I store my images as varbinary(max) and recreate them. I would LOVE it if there was a program to do this, or a really easy way since I don't have a ton of time.
Any ideas out there?
The image data type in Sql Server is a varbinary field that is being discontinued in future versions.
I would bet that a tool like bcp handles the "conversion" automatically. I use quotes because its a type conversion and not a format conversion.
Have you looked into writing a quick script in PowerShell? It has access to the full .NET framework, so should be somewhat simple if you're using those technologies.
Of course it's not simple if you have to learn PowerShell in order to write the script, but learning's always good :)

What's the canonical way to store arbitrary (possibly marked up) text in SQL?

What do wikis/stackoverflow/etc. do when it comes to storing text? Is the text broken at newlines? Is it broken into fixed-length chunks? How do you best store arbitrarily long chunks of text?
nvarchar(max) ftw. because over complicating simple things is bad, mmkay?
I guess if you need to offer the ability to store large chunks of text and you don't mind not being able to look into their content too much when querying, you can use CLobs.
This all depends on the RDBMS that you are using as well as the types of text that you are going to store. If the text is formatted into sizable chunks of data that mean something in and of themselves, like, say header/body, then you might want to break the data up into columns of these types. It may take multiple tables to use this method depending on the content that you are dealing with.
I don't know how other RDBMS's handle it, but I know that that it's not a good idea to have more than one open ended column in each table (text or varchar(max)). So you will want to make sure that only one column has unlimited characters.
Regarding PostgreSQL - use type TEXT or BYTEA. If you need to read random chunks you may consider large objects.
If you need to worry about keeping things like formatting strings, quotes, and other "cruft" in the text, as code would likely have, then the special characters need to be completely escaped first - otherwise on submission the db, they might end up causing an invalid command to be issued.
Most scripting languages have tools to do this built-in natively.
I guess it depends on where you want to store the text, if you need things like transactions etc.
Databases like SQL Server have a type that can store long text fields. In SQL Server 2005 this would primarily be nvarchar(max) for long unicode text strings. By using a database you can benefit from transactions and easy backup/restore assuming you are using the database for other things like StackOverflow.com does.
The alternative is to store text in files on disk. This may be fairly simple to implement and can work in environments where a database is not available or overkill.
Regards the format of the text that is stored in a database or file, it is probably very close to the input. If it's HTML then you would just push it through a function that would correctly escape it.
Something to remember is that you probably want to be using unicode or UTF-8 from creation to storage and vice-versa. This will allow you to support additional languages. Any problem with this encoding mechanism will corrupt your text. Historically people may have defaulted to ASCII based on the assumption they were saving disk space etc.
For SQL Server:
Use a varchar(max) to store. I think the upper limit is 2 GB.
Don't try to escape the text yourself. Pass the text through a parameterizing structure that will do the escapes properly for you. In .Net you'd add a parameter to a SqlCommand, or just use LinqToSQL (which then manages the SqlCommand for you).
I suspect StackOverflow is storing text in markdown format in arbitrarily-sized 'text' column. Maybe as UTF8 (but it might be UTF16 or something. I'm guessing it's SQL Server, which I don't know much about).
As a general rule you want to store stuff in your database in the 'rawest' form possible. That is, do all your decoding, and possibly cleaning, but don't do anything else with it (for example, if it's Markdown, don't encode it to HTML, leave it in its original 'raw' format)