SQL_Variant or separated data type value in EAV? - entity-attribute-value

We use sql_variant data type for value in EAV, but I think it's better that we use one table for each data type value because sql_variant is 8000 byte and I'm worried about this data type for huge value.
What's your idea? Which them is better?
ThanX

I realize this question is quite old but I thought I'd answer it anyway.
SQL_VARIANT does NOT use 8000 bytes for every row. It uses the same number of bytes as the "base data type" + up to 16 bytes to store the meta-data for the entry. It works fine for most EAV's and has the added advantage of preserving the original datatype. It's also pretty fast especially on inserts because you don't have to convert to VARCHAR.
The real problem isn't whether or not to use SQL_Variant in an EAV... the real problem may be the reason why you're using an EAV to begin with. ;-)

Related

PostgreSQL how to create table for big numbers like 1.33E+09 -1.8E+09?

I am new to SQL and I need to create a table to accommodate a bunch of data that is in this format:
1.33E+09 -1.8E+09 58 -1.9E+09 2.35E+10 2.49E+10 2.49E+10 3.35E+08 etc.
How to deal with it? I am not surely populating the table with this and if I need to convert it in order to work with it... Any suggestions?
is that a BIGINT?
The correct data type for data like this is double precision or float.
It is obvious from the data that high precision is not necessary, so numeric would not be appropriate (it takes more storage and makes computations much slower).
float is only a good choice if you really need as few significant digits as your examples suggest and storage space is at a premium.
Your data are already in a format that PostgreSQL can accept for these data types, so there is no need for conversion.
The answer depends on whether you know the maximum range of data values your application might create.
A postgresql 'numeric' column will certainly hold any of the values you list above. Per the docs: "up to 131072 digits before the decimal point; up to 16383 digits after the decimal point".
See the data type reference here.

How are you supposed to choose K when creating a VARCHAR(K) column? Or are you?

This is something I've never understood. Let's say I want a column that is storing an email address. I think,
"Ok, email addresses are usually no more than 15 characters, but I'll
say 50 max characters just to play it safe."
and making that column VARCHAR(50). Of course, then this means that I have to create extra code, possibly both client- and server-side validation of entries to that column.
That brings up the question of Why not just use NVARCHAR all the time except in those rare circumstances where the logic of my application dicates a fixed or maximum length. From what I understand, if I create a VARCHAR(50) and none of the entries are more than 25 characters, that does not mean that 50% of the space is wasted, as the database knows how to optimize everything.
Again, that brings up the question of why not just use NVARCHAR.
nvarchar itself has nothing to with "unlimited length of string" since it is just unicode version of varchar. At present time there are no reasons to use varchar (except some backward compatibility issues) and nvarchar should be preferred.
So I'm supposing you're asking why don't use nvarchar(max) everywhere which is almost unlimited (2 GByte of storage) instead of specifying nvarchar(n) for concrete columns.
There are many reasons of using nvarchar(n) instead of nvarchar(max).
For example, if your column should be included in index - it can't be nvarchar(max).
Also nvarchar(max) data internally stored differently than nvarchar(n) and somtimes it can affect performance.

At what point does it become more efficient to use a text field than an nvarchar field in SQL Server?

How long does an nvarchar field need to be before it is better to use a text field in SQL Server? What are the general indications for using one or the other for textual content that may or may not be queried?
From what I understand, the TEXT datatype should never be used in SQL 2005+. You should start using VARCHAR(MAX) instead.
See this question about VARCHAR(MAX) vs. TEXT.
UPDATE (per comment):
This blog does a good job at explaining the advantages. Taken from it:
But the pain from using the type text comes in when trying to query against it. For example grouping by a text type is not possible.
Another downside to using text types is increased disk IO due to the fact each record now points to a blob (or file).
So basically, VARCHAR(MAX) keeps the data with the record, and gives you the ability to treat it like other VARCHAR types, like using GROUP BY and string functions (LEN, CHARINDEX, etc.).
For TEXT, you almost always have to convert it to VARCHAR to use functions against it.
But back to the root of your question regarding efficiency, I don't think it's ever more efficient to use TEXT vs. VARCHAR(MAX). Looking at this MSDN article (search for "data types"), TEXT is deprecated, and should be replaced with VARCHAR(MAX).
First of all don't use text at all. MSDN says:
ntext, text, and image data types will
be removed in a future version of
Microsoft SQL Server. Avoid using
these data types in new development
work, and plan to modify applications
that currently use them. Use
nvarchar(max), varchar(max), and
varbinary(max) instead.
varchar(max) is what you might need.
If you compare varchar(n) vs varchar(max), these are technically two different datatypes (stored differently):
varchar(n) value is always stored inside of the row. Which means it cannot be greater than max row size, and row cannot be greater than page size, which is 8K.
varchar(max) is stored outsize the row. Row has a pointer to a separate BLOB page. However, under certain condition varchar(max) can store data as a regular row, obviously it should at least fit to the row size.
So if your row is potentially greater than 8K, you have to use varchar(max). If not, using varchar(n) will likely be preferable as it is faster to retrieve in-row data vs from outside page.
MSDN says:
Use varchar(max) when the sizes of the
column data entries vary considerably,
and the size might exceed 8,000 bytes.
The main advantage of VARCHAR over TEXT is that you can run string manipulations and string functions on it. With VARCHAR(max), now you basically have an awesome large (unrestricted) variable that you can manipulate how you want..

SQL When to use Which Data Type

Hi I was wondering when I should use the different data types. As in in my table, how can I decide which to use: nvarchar, nchar, varchar, varbinary, etc.
Examples:
What would I use for a ... column:
Phone number,
Address,
First Name, Last Name,
Email,
ID number,
etc.
Thanks for any help!
As a general rule, I would not define anything as a "number" field if I wasn't going to be doing arithmetic on it, even if the data itself was numeric.
Your "phone" field is one example. I'd define that as a varchar.
Varchar, Integer, and Bit cover 99% of my day to day uses.
The question really depends on your requirements. I know that's not a particularly satisfactory answer, but it's true.
The n..char data types are for Unicode data, so if you're going to need to use unicode character sets in your data you should use those types as opposed to their "non-n" analogs. the nchar and char type are fixed length, and the nvarchar and varchar type can have a variable length, which will effect the size of the column on the disk and in memory. Generally I would say to use the type that uses the least disk space but fits for your needs.
This page has links to the Microsoft descriptions of these datatypes for SQL Server 2005, many of which give pointers for when to use which type. You might be particularly interested in this page regarding char and varchar types.
A data type beginning with n means it can be used for unicode characters... eg nVarchar.
Selection of integers is also quite fun.
http://www.databasejournal.com/features/mssql/article.phpr/2212141/Choosing-SQL-Server-2000-Data-Types.htm
The most common data type i use is varchar....
The N* data types (NVARCHAR, NCHAR, NTEXT) are for Unicode strings. They take up two times the space their "normal" pendants (VARCHAR, CHAR, TEXT) need, but they can store Unicode without conversion and possible loss of fidelity.
The TEXT data types can store nearly unlimited amounts of data, but they perform not as good as the CHAR data types because they are stored outside of the record.
THE VARCHAR data types are of variable length. They will not be padded with spaces at the end, but their CHAR pendants will (a CHAR(20) is always twenty characters long, even if if contains 5 letters only. The remaining 15 will be spaces).
The binary data types are for binary data, whatever you care to store into them (images are a primary example).
Other people have given good general answers, but I'd add one important point: when using VARCHAR()s (which I would recommend for those kinds of fields), be sure to use a length that's big enough for any reasonable value. For example, I typically declare VARCHAR(100) for a name, e-mail address, domain name, city name, etc., and VARCHAR(200) for an URL or street address.
This is more than you'll routinely need. In fact, 30 characters is enough for almost all of these values (except full name, but a good database should always store first and last name separately), but it's better than having to change data types some day down the road. There's very little cost in specifying a higher-than-necessary length for a VARCHAR, but note that VARCHAR(MAX) and TEXT do entail significant overhead, so use them only when necessary.
Here's a post which points out a case where a longer-than-necessary VARCHAR can hurt performance: Importance of varchar length in MySQL table. Goes to show that everything has a cost, though in general I'd still favor long VARCHARs.

What problems can an NVARCHAR(3000) cause

I have an already large table that my clients are asking for me to extend the length of the notes field. The notes field is already an NVARCHAR(1000) and I am being asked to expand it to 3000. The long term solution is to move notes out of the table and create a notes table that uses an NVARCHAR(max) field that is only joined in when necessary. My question is about the short term. Knowing that this field will be moved out in the future what problems could I have if I just increase the field to an NVARCHAR(3000) for now?
text and ntext are deprecated in favor of varchar(max) and nvarchar(max). So nvarchar(3000) should be fine.
You probably already know this, but just make sure that the increased length doesn't drive your total record length over 8000. I pretty sure that still applies for 2005/2008.
You should be fine with nvarchar(3000) for the interim solution. You can go up to a maximum of nvarchar(4000). And as posted earlier by km.srd.myopenid.com, make sure that the entire length of your row doesn't exceed 8000 (remember that nvarchar is 2x the size of a regular varchar - which is why you can only have nvarchar(4000), but you can have varchar(8000)).
I would suggest changing the column to NTEXT. You will have virtually no limit on the amount of data and the data is not stored with the rest of the row data. This helps keep you from hitting the maximum row size limit.
The only drawback is that you can only perform "LIKE" searches on that column and you cannot index it. However, if it's a notes field, my guess is that you are not doing any searching on it at all.
You may also experience more slowness as your data pages may get split up to accomodate the larger field. You can create a structure that allows a record of more than 8060 bytes by doing this but be aware if you try to add a data record that actually contains more than that you will have a problem.