Let's say we have a table or an array of N-length.
What happens if the length of the table is uneven ?
With an even table the length would be determined by doing N/2.
I would assume that for an uneven table this still holds true.
Does it do N/2 and then take the integer part of the result and use that as "the middle" or does it round up/down ? Or does it do something else entirely ?
Related
Should I define a column type from actual length to nth power of 2?
The first case, I have a table column store no more than 7 charactors,
will I use NVARCHAR(8)? since there maybe implicit convert inside Sql
server, allocate 8 space and truncate automatic(heard some where).
If not, NCHAR(7)/NCHAR(8), which should be(assume the fixed length is 7)
Any performance differ on about this 2 cases?
You should use the actual length of the string. Now, if you know that the value will always be exactly 7 characters, then use CHAR(7) rather than VARCHAR(7).
The reason you see powers-of-2 is for columns that have an indeterminate length -- a name or description that may not be fixed. In most databases, you need to put in some maximum length for the varchar(). For historical reasons, powers-of-2 get used for such things, because of the binary nature of the underlying CPUs.
Although I almost always use powers-of-2 in these situations, I can think of no real performance differences. There is one. . . in some databases the actual length of a varchar(255) is stored using 1 byte whereas a varchar(256) uses 2 bytes. That is a pretty minor difference -- even when multiplied over millions of rows.
I would like to know how do i convert number of row into size like in MB or KB?
Is there a way to do that or formula?
The reason I doing this is because I would like to know given this set of data but not all in tablespace, how much size is used by this set of data.
Thanks,
keith
If you want an estimate, you could multiple the row count with the information from user_table.avg_row_len for that table.
If you want the real size of the table on disk, this is available user_segments.bytes. Note that the smallest unit Oracle will use is a block. So even for an empty table, you will see a value that is bigger tzen zero in that column. That is actual size of the space reserved in the tablespace for that table.
I need to generate unique numbers and I can think of the consecutive way, for example, I can have a counter starting from 0, every time a unique number is needed I return the counter and increase the counter by 1, this simply works until I may have a lot of unique numbers which go beyond the range of the data type(say int), also, the generated unique numbers, for example, the counter is 10 but 4 and 5 are not used any more so they can be re-used, how do I make use of the reusable numbers without keeping all of the in a data structure?
Thanks!
Are you able to substitute numbers you've already handed out? If so then as soon as any number is returned, substitute the most recently handed out one with it and decrement the allocation counter. If it's the most recent one that's been returned then skip the substitution.
Otherwise, I guess the best you're going to be able to do is keep a sorted array of ranges.
To allocate a new number:
If the array is empty, create a new range and return the only number in it.
Otherwise, get the first range in the array and increase its length by 1. Return that number. Check whether that makes the first two ranges join up. If so then merge them into a single range.
To return a number:
Find the range it falls within (eg, by binary search; see NSOrderedSet if your deployment plans allow it). If the returned number is at either end of the range then just shrink the range. Otherwise split the one into two with the returned number as the hole.
let's say I have a column on my table defined the following:
"MyColumn" smallint NULL
Storing a value like 0, 1 or something else should need 2 bytes (1). But how much space is needed if I set "MyColumn" to NULL? Will it need 0 bytes?
Are there some additional needed bytes for administration purpose or such things for every column/row?
(1) http://www.postgresql.org/docs/9.0/interactive/datatype-numeric.html
Laramie is right about the bitmap and links to the right place in the manual. Yet, this is almost, but not quite correct:
So for any given row with one or more nulls, the size added to it
would be that of the bitmap(N bits for an N-column table, rounded up).
One has to factor in data alignment. The HeapTupleHeader (per row) is 23 bytes long, actual column data always starts at a multiple of MAXALIGN (typically 8 bytes). That leaves one byte of padding that can be utilized by the null bitmap. In effect NULL storage is absolutely free for tables up to 8 columns.
After that, another MAXALIGN (typically 8) bytes are allocated for the next MAXALIGN * 8(typically 64) columns. Etc. Always for the total number of user columns (all or nothing). But only if there is at least one actual NULL value in the row.
I ran extensive tests to verify all of that. More details:
Does not using NULL in PostgreSQL still use a NULL bitmap in the header?
Null columns are not stored. The row has a bitmap at the start and one bit per column that indicates which ones are null or non-null. The bitmap could be omitted if all columns are non-null in a row. So for any given row with one or more nulls, the size added to it would be that of the bitmap(N bits for an N-column table, rounded up).
More in depth discussion from the docs here
It should need 1 byte (0x00) however it's the structure of the table that makes up most of the space, adding this one value might change something (Like adding a row) which needs more space than the sum of the data in it.
Edit: Laramie seems to know more about null than me :)
I have a table representing standards of alloys. The standard is partly based on the chemical composition of the alloys. The composition is presented in percentages. The percentage is determined by a chemical composition test. Sample data.
But sometimes, the lab cannot measure below a certain percentage. So they indicate that the element is present, but the percentage is less than they can measure.
I was confused how to accurately store such a number in an SQL database. I thought to store the number with a negative sign. No element can have a negative composition of course, but i can interpret this as less than the specified value. Or option is to add another column for each element!! The latter option i really don't like.
Any other ideas? It's a small issue if you think about it, but i think a crowd is always wiser. Somebody might have a neater solution.
Question updated:
Thanks for all the replies.
The test results come from different labs, so there is no common lower bound.
The when the percentage of Titanium is less than <0.0004 for example, the number is still important, only the formula will differ slightly in this case.
Hence the value cannot be stored as NULL, and i don't know the lower bound for all values.
Tricky one.
Another possibility i thought of is to store it as a string. Any other ideas?
What you're talking about is a sentinel value. It's a common technique. Strings in most languages after all use 0 as a sentinel end-of-string value. You can do that. You just need to find a number that makes sense and isn't used for anything else. Many string functions will return -1 to indicate what you're looking for isn't there.
0 might work because if the element isn't there there shouldn't even be a record. You also face the problem that it might be mistaken for actually meaning 0. -1 is another option. It doesn't have that same problem obviously.
Another column to indicate if the amount is measurable or not is also a viable option. The case for this one becomes stronger if you need to store different categories of trace elements (eg <1%, <0.1%, <0.01%, etc). Storing the negative of those numbers seems a bit hacky to me.
You could just store it as NULL, meaning that the value exists but is undefined.
Any arithmetic operation with a NULL yields a NULL.
Division by NULL is safe.
NULL's are ignored by the aggregation functions, so queries like these:
SELECT SUM(metal_percent), COUNT(metal_percent)
FROM alloys
GROUP BY
metal
will give you the sum and the count of the actual, defined values, not taking the unfilled values into account.
I would use a threshold value which is at least one significant digit smaller than your smallest expected value. This way you can logically say that any value less than say 0.01, can be presented to you application as a "trace" amount. This remains easy to understand and gives you flexibility in determining where your threshold should lie.
Since the constraints of the values are well defined (cannot have negative composition), I would go for the "negative value to indicate less-than" approach. As long as this use of such sentinel values are sufficiently documented, it should be reasonably easy to implement and maintain.
An alternative but similar method would be to add 100 to the values, assuming that you can't get more than 100%. So <0.001 becomes 100.001.
I would have a table modeling the certificate, in a one to many relation with another table, storing the values for elements. Then, I would still have the elements table containing the value in one column and a flag (less than) as a separate column.
Draft:
create table CERTIFICATES
(
PK_ID integer,
NAME varchar(128)
)
create table ELEMENTS
(
ELEMENT_ID varchar(2),
CERTIFICATE_ID integer,
CONCENTRATION number,
MEASURABLE integer
)
Depending on the database engine you're using, the types of the columns may vary.
Why not add another column to store whether or not its a trace amount
This will allow you to to save the amount that the trace is less than too
Since there is no common lowest threshold value and NULL is not acceptable, the cleanest solution now is to have a marker column which indicates whether there is a quantifiable amount or a trace amount present. A value of "Trace" would indicate to anybody reading the raw data that only a trace amount was present. A value of "Quantity" would indicate that you should check an amount column to find the actual quantity present.
I would have to warn against storing numerical values as strings. It will inevitably add additional pain, since you now lose the assertions a strong type definition gives you. When your application consumes the values in that column, it has to read the string to determine whether it's a sentinel value, a numeric value or simply some other string it can't interpret. Trying to handle data conversion errors at this point in your application is something I'm sure you don't want to be doing.
Another field seems like the way to go; call it 'MinMeasurablePercent'.