Is there any datatype that can store data more than 2 gb in sql server - sql

I am having requirement to store more than 2 gigabytes data in a column. Is there any way that I can do it? I need the data what I store need to be in database not in computer which results when using file stream concept

NO there isn't. NVARCHAR(MAX) is the datatype which can be used to store 2GB of data in a column. But you can not store more than 2GB of data in it so that the upper limit to the datatype.
On a side note what makes you store such a big data in a column as this may cause you a lot of performance overhead and also it might not be a worthy thing to proceed with. I am sure you may find alternatives to that.
Possible alternatives may be to split the data and store it into multiple rows.
Else as commented by Mladen Prajdic you can use Filestream to store more than 2Gb of data.

Related

Vertica Large Objects

I am migrating a table from Oracle to Vertica that contains an LOB column. The maximum actual size of the LOB column amounts to 800MB. How can this data be accommodated in Vertica? Is it appropriate to use the Flex Table?
In Vertica's documentation, it says that data loaded in a Flex table is stored in column raw which is a LONG VARBINARY data type. By default, it has a max value of 32MB, which, according to the documentation can be changed(i.e. increased) using the parameter FlexTablesRawSize.
I'm thinking this is the approach for storing large objects in Vertica. We just need to update the FlexTablesRawSize parameter to handle 800MB of data. I'd like to consult if this is the optimal way or if there's a better way. Or will this conflict with Vertica's table row constraint limitation that only allows up to 32MB of data per row?
Thank you in advance.
If you use Vertica for what it's built for - running a Big Data database, you would, like in any analytical database, try to avoid large objects in your table. BLOBs and CLOBs are usually used to store unstructured data: large documents, image files, audio files, video files. You can't filter by such a column, you can't run functions on it, or sum it, etc, you can't group by it.
A safe and performant design should lead to storing the file name in a Vertica table column, storing the file maybe even in Hadoop, and letting the front end (usually a BI tool, and all BI tools support that) retrieve the file to bring it to a report screen ...
Good luck ...
Marco

Simple database needed to store JSON using C#

I'm new to databases. I've been saving a financials table from a website in JSON format on a daily basis, accumulating new files in my directory every day. I simply parse the contents into a C# collection for use in my program and compare data via Linq.
Obviously I'm looking for a more efficient solution especially as my file collection will grow over time.
An example of a row of the table is:
{"strike":"5500","type":"Call","open":"-","high":"9.19B","low":"8.17A","last":"9.03B","change":"+.33","settle":"8.93","volume":"0","openInterest":"1,231"}
I'd prefer to keep a 'compact file' per stock that I can access individually as opposed to a large database with many stocks.
What would be an 'advisable' solution to use? I know that's a bit of an open ended question but some suggestions would be great.
I don't mind slower writing into the DB but a fast read would be beneficial.
What would be the best way to store the data? Strings or numerical values?
I found this link to help with the conversion How to Save JSON data to SQL server database in C#?
Thank you.
For a faster reading in a DB, I would suggest denormalization of the data.
Read "Normalization vs Denormalization"
Judging from your JSON file, it doesn't seems like you have any table joins. Keeping that structure should be fine.
For the comparison between varchar(string) and int(numeric), int are faster than varchar, for the simple fact that ints take up much less space than varchars. int takes up 2-8 bytes where varchar takes up 4 bytes plus the actual characters.

How to get the memory consumed by SQL variables, columns etc.?

I know basic SQL/RDBMS and not the kind of detailed information a highly experienced DBA would know.
I need to know how much memory is consumed by a variable such as int, bigint, date time. I also need to know how much memory is consumed by a Varchar(50) column in two cases -
1] The Column is filled with strings of size 50
2] Column has all Null
The purpose behind this is to make estimates for ETL/data transfer.
I also want to know how to store SQL server result into a cache on disk and then retrieve the data from that cache, chunk by chunk (doing this due to memory related concerns). But, I'll make that another question.
In addition to the documentation linked in the comment, note that varchar storage depends on what data is actually entered.
From http://technet.microsoft.com/en-us/library/ms176089(v=sql.100).aspx:
The storage size is the actual length of data entered + 2 bytes.

Does varchar result in performance hit due to data fragmentation?

How are varchar columns handled internally by a database engine?
For a column defined as char(100), the DBMS allocates 100 contiguous bytes on the disk. However, for a column defined as varchar(100), that presumably isn't the case, since the whole point of varchar is to not allocate any more space than required to store the actual data value stored in the column. So, when a user updates a database row containing an empty varchar(100) column to a value consisting of 80 characters for instance, where does the space for that 80 characters get allocated from?
It seems that varchar columns must result in a fair amount of fragmentation of the actual database rows, at least in scenarios where column values are initially inserted as blank or NULL, and then updated later with actual values. Does this fragmentation result in degraded performance on database queries, as opposed to using char type values, where the space for the columns stored in the rows is allocated contiguously? Obviously using varchar results in less disk space than using char, but is there a performance hit when optimizing for query performance, especially for columns whose values are frequently updated after the initial insert?
You make a lot of assumptions in your question that aren't necessarily true.
The type of the a column in any DBMS tells you nothing at all about the nature of the storage of that data unless the documentation clearly tells you how the data is stored. IF that's not stated, you don't know how it is stored and the DBMS is free to change the storage mechanism from release to release.
In fact some databases store CHAR fields internally as VARCHAR, while others make a decision about how to the store the column based on the declared size of the column. Some database store VARCHAR with the other columns, some with BLOB data, and some implement other storage, Some databases always rewrite the entire row when a column is updated, others don't. Some pad VARCHARs to allow for limited future updating without relocating the storage.
The DBMS is responsible for figuring out how to store the data and return it to you in a speedy and consistent fashion. It always amazes me how many people to try out think the database, generally in advance of detecting any performance problem.
The data structures used inside a database engine is far more complex than you are giving it credit for! Yes, there are issues of fragmentation and issues where updating a varchar with a large value can cause a performance hit, however its difficult to explain /understand what the implications of those issues are without a fuller understanding of the datastructures involved.
For MS Sql server you might want to start with understanding pages - the fundamental unit of storage (see http://msdn.microsoft.com/en-us/library/ms190969.aspx)
In terms of the performance implications of fixes vs variable storage types on performance there are a number of points to consider:
Using variable length columns can improve performance as it allows more rows to fit on a single page, meaning fewer reads
Using variable length columns requires special offset values, and the maintenance of these values requires a slight overhead, however this extra overhead is generally neglible.
Another potential cost is the cost of increasing the size of a column when the page containing that row is nearly full
As you can see, the situation is rather complex - generally speaking however you can trust the database engine to be pretty good at dealing with variable data types and they should be the data type of choice when there may be a significant variance of the length of data held in a column.
At this point I'm also going to recommend the excellent book "Microsoft Sql Server 2008 Internals" for some more insight into how complex things like this really get!
The answer will depend on the specific DBMS. For Oracle, it is certainly possible to end up with fragmentation in the form of "chained rows", and that incurs a performance penalty. However, you can mitigate against that by pre-allocating some empty space in the table blocks to allow for some expansion due to updates. However, CHAR columns will typically make the table much bigger, which has its own impact on performance. CHAR also has other issues such as blank-padded comparisons which mean that, in Oracle, use of the CHAR datatype is almost never a good idea.
Your question is too general because different database engines will have different behavior. If you really need to know this, I suggest that you set up a benchmark to write a large number of records and time it. You would want enough records to take at least an hour to write.
As you suggested, it would be interesting to see what happens if you write insert all the records with an empty string ("") and then update them to have 100 characters that are reasonably random, not just 100 Xs.
If you try this with SQLITE and see no significant difference, then I think it unlikely that the larger database servers, with all the analysis and tuning that goes on, would be worse than SQLITE.
This is going to be completely database specific.
I do know that in Oracle, the database will reserve a certain percentage of each block for future updates (The PCTFREE parameter). For example, if PCTFREE is set to 25%, then a block will only be used for new data until it is 75% full. By doing that, room is left for rows to grow. If the row grows such that the 25% reserved space is completely used up, then you do end up with chained rows and a performance penalty. If you find that a table has a large number of chained rows, you can tune the PCTFREE for that table. If you have a table which will never have any updates at all, a PCTFREE of zero would make sense
In SQL Server varchar (except varchar(MAX)) is generally stored together with the rest of the row's data (on the same page if the row's data is < 8KB and on the same extent if it is < 64KB. Only the large data types such as TEXT, NTEXT, IMAGE, VARHCAR(MAX), NVARHCAR(MAX), XML and VARBINARY(MAX) are stored seperately.

Storing time-temperature data in DB

I'm storing time-temperature data in a database, which is really just CSV data. The first column is time in seconds, starting at zero, with the following (one or more) column(s) being temperature:
0,197.5,202.4
1,196.0,201.5
2,194.0,206.5
3,192.0,208.1 ....etc
Each plot represents about 2000 seconds. Currently I'm compressing the data before storing it in a output_profile longtext field.
CREATE TABLE `outputprofiles` (
`id` int(11) NOT NULL auto_increment,
`output_profile` longtext NOT NULL,
PRIMARY KEY (`id`)
This helps quite a bit...I can compress a plot which is 10K of plain text down to about 2.5K. There is no searching or indexing needed on this data since it's just referenced in another table.
My question: Is there any other way to store this data I'm not thinking about which is more efficient in terms of storage space?
Is there any reason to think that storage space will be a limiting constraint on your application? I'd try to be pretty sure that's the case before putting a higher priority on that, compared to ease of access and usage; for which purpose it sounds like what you have is satisfactory.
I actually do not understand quite well what you mean with "compressing the plot". Means that, that you are compressing 2000 measurements or you are compressing each line?
Anyway, space is cheap. I would make it in the traditional way, i.e. two columns, one entry for each measurements.
If for some reason this doesn't work and if you want to save 2000 measurements as one record then you can do it pretty much better.
. Create a csv file with your measurements.
. zip it (gzip -9 gives you the maximal compression)
. save it as a blob (or longblob depending the DB you are using) NOT as a longtext
Then just save it at the DB.
This will give you maximal compression.
PostgreSQL has a big storage space overhead since every tuple (a prepresentation of a row in a table) is 28 byte excluding the data (PostgreSQL 8.3). There are 2, 4 and 8 bytes integers and a timestamp is 8 byte. Floats are 8 bytes I think. So, storing 1,000,000,000 rows in PostgreSQL will require several GiB more storage than MySQL (depending on which storage enginge you use in MySQL). But PostgreSQL is also great at handling huge data compared to MySQL. Try do run some DDL queries to a huge MySQL table and you'll see what I mean. But this simple data you are storing should probably be easy to partition heavily, so maby a simple MySQL can handle the job nicely. But, as I always says, if you're not really really sure you need a specific MySQL feature you should go for PostgreSQL.
I'm limiting this post to only MySQL and PostgreSQL since this question is tagged with only those two databases.
Edit: Sorry, I didn't see that you actually stores the CSV in the DB.