SQL Exam - estimating table size problem

SQL Exam - estimating table size problem - sql

I'm preparing to the SQL Server exam (70-431). I have the book from Sybex "SQL Server 2005 - Implementation and Maintenance". I'm little confused about estimating a size of a table.
In the 2nd chapter there is explained how to do this:
Count the row size from the formula: Row_Size = Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + Row_Header.
Fixed_Data_Size is a sum of all sizes of fixed length columns (simple sum)
Variable_Data_Size = 2 + (num_variable_columns × 2) + max_varchar_size, num_variable_columns - number of columnt with variable length, max_varchar_size - maximum size of varchar column
null_bitmap = 2 + ((number of columns + 7) ÷ 8) (rounded down)
Row_header always equals 4.
Calculating rows per page from the formula: Rows_Per_Page = 8096 ÷ (Row_Size + 2) (rounded down)
Estimating the number of rows in the table. Let's say that table has 1,000 rows.
Calculating the number of pages needed: No_Of_Pages = 1,000 / Rows_Per_Page (rounded up)
Total size: Total_Size = No_Of_Pages * 8,192, where 8,192 is the size of one page.
So everything is perfectly clear for me. I made one example and checked with the answers in the book that my calculations are correct. But there is one question which confuses me.
The question is: we have a table with the following schema:
Name Datatype
-------------------
ID Int
VendorID Int
BalanceDue Money
DateDue Datetime
It is expected that in this table there will be about 5,000 rows. Question (literaly): "How much space will the Receivables table take?"
So my answer is simple:
null_bitmap = 2 + ((4+7) / 8) = 3.375 = 3 (rounded)
fixed_datasize = 4 + 4 + 8 + 8 = 24
variable_datasize = 0
row_header = 4 (always)
row_size = 3 + 24 + 0 + 4 = 31
But in the answer they omit row_header and they don't add 4. Is it a mistake in the book or row_header is added only in some cases (which are not mentioned in the book)? I was thinking that maybe row_header is added only if there are variable-length fields in the table, but there is another exercise in which there are not variable-length fields and row_header is added. I would appreciate if someone explains me that. Thanks.

Inside the Storage Engine: Anatomy of a record says all records have a record header:
The record structure is as follows:
record header
4 bytes long
two bytes of record metadata (record type)
two bytes pointing forward in the record to the NULL bitmap

Related

How to process mainframe numbers where "{" is the last character

I have a one mainframe file data like as below
000000720000{
I need to parse the data and load into a hive table like below
72000
the above field is income column and "{" sign which denotes +ve amount
datatype used while creating table income decimal(11,2)
in layout.cob copybook using INCOME PIC S9(11)V99
could someone help?

The number you want is 7200000 which would be 72000.00.
The conversion you are looking for is:
Positive numbers
{ = 0
A = 1
B = 2
C = 3
D = 4
E = 5
F = 6
G = 7
H = 8
I = 9
Negative numbers (this makes the whole value negative)
} = 0
J = 1
K = 2
L = 3
M = 4
N = 5
O = 6
P = 7
Q = 8
R = 9
Let's explain why.
Based on your question the issue you are having is when packed decimal data is unpacked UNPK into character data. Basically, the PIC S9(11)V2 actually takes up 7 bytes of storage and looks like the picture below.
You'll see three lines. The top is the character representation (missing in the first picture because the hex values do not map to displayable characters) and the lines below are the hexadecimal values. Most significant digit on top and least below.
Note that in the rightmost byte the sign is stored as C which is positive, to represent a negative value you would see a D.
When it is converted to character data it will look like this
Notice the C0 which is a consequence of the unpacking to preserve the sign. Be aware that this display is on z/OS which is EBCDIC. If the file has been transferred and converted to another code-page you will see the correct character but the hex values will be different.
Here are all the combinations you will likely see for positive numbers
and here for negative numbers
To make your life easy, if you see one of the first set of characters then you can replace it with the corresponding number. If you see something from the second set then it is a negative number.

I want to know the specific reason why we have take those 256,.. numbers in the conversion below

Projected code is used to convert a date into integer and vice-versa. I want to know the reason why here we have used this specific hexadecimal codes and the number series to get back the date from int. If there is an article about this code sample it would also help me understand this code actually.
I have tried online Hex to Decimal conversion for this codes and found its a 256^1,256^2... even though trying not able to find the exact reason.
declare #dDate date = '2017-10-12'
declare #iDate int = 0
select #iDate = ( (datepart(year,#dDate)*65536 | datepart(month,#dDate)*256 | datepart(dd,#dDate)))
select (#iDate&0xfff0000)/65536 --year
select (#iDate&0xff00)/256 --Month
select (#iDate&0xff) --Date

& is an operator doing bitwise AND. "|" is bitwise OR. See here and here. Also see here for an explanation on using bitwise AND/OR to store multiple number values in a single number column.
This part:
#iDate&0xfff0000
will "mask", or eliminate/replace-with-zeros, the portion of iDate that isn't from 256^2. Then you divide by 65536 -- which is simply reversing the original math of multiplying the year by 65536.
If the concept of bitwise AND is foreign, I'll give an example that DOESN'T WORK in decimal. Bitwise AND converts the whole thing to binary and then masks things (like IP subnetting, if you're familiar with that).
Anyway, consider a decimal number 20171012. If such a thing as a decimal-wise AND existed, it could look like 20171012&11110000. The "1" places are "keepers" and the "0" places are "throw-aways". If you stack them vertically, the result is to keep the values with a "1" beneath them and replace the values with a "0" beneath them with a "0".
number 20171012
dec-wise AND 11110000
result 20170000
now the result isn't 2017, so you'd have to divide by 10000 to get 2017.
For 20171012&1100 you have to use implied leading zeros:
number 20171012
dec-wise AND 00001100
result 1000
I probably would have converted to int by adding the year*10000 and month * 100 and day. Reverting back I would use a combination of integer division and MOD. But I think the bitwise AND is perhaps a bit more elegant (particularly for getting the month).
Based on your comment, I will include how I have converted dates to int and reverted back:
declare #dDate date = '2017-10-12'
declare #iDate int
set #iDate = year(#dDate) * 10000 + month(#dDate) * 100 + day(#dDate)
select #iDate
select 'year', #iDate/10000 -- basic integer division provides the year
select 'month', (#iDate % 10000)/100 -- combine modulo and integer division to get the month
select 'day', #iDate % 100 -- basic modulo arithmetic provides the day
returns:
20171012
year 2017
month 10
day 12

This is bit manipulation.
Bit Shifting
Decimal 3 = Binary 11
If we do a left shift (<<) 4 bits in 3 it will become 48 which is equal to binary 110000 <- 4 zero bits added due to left shift
But since we don't have bit shifting operators in T-SQL therefore we can do the math.
Left Shifting of n bits in number x = x * 2^n
Therefore, multiple a number with 256 is actually left shift 8 bits from that number (2^8 = 256).
Later on when you do bitwise OR between 2 numbers they actually "concatenate" the bits up.
For example, you need to concatenate 2 binary numbers, (3) 11 and (2) 10, the resultant number should be 1110 = 14
So first we'll do 2 left shift in 3 = 3 * 2^2 = 12 and then we will do bitwise OR this number with the next number
12 = 1100
2 = 0010
OR
---------------
14 = 1110
Your example is actually saving the whole date in an integer variable which is actually efficient way of saving a date.

How to sew multiple values into one

I need to store 5 values in a single SQL Server column, each range 1-90. The values cannot be repeated. I though of using the 2, 4, 8, 16, 32, 64, ... system but you guess it will get really big, using decimal I risk wrong calculation. Is there a convenient way to:
store the 5 values into a single column so that to avoid having 90 bit column in the table, see my previous post here.
quickly query the database for example to return all records with number X and Y
another option was a string (90) containing flags like 000001000011000 but this way I have to use substrings to query and I fear it will slow down on a table with 25.000 records or more.

First request: You say most are bit. But if not all then you cant use bitwise operator. And can't save it in a single field
In that case you need an aditional table.
Row_id | fieldName | fieldValue
1 | name1 | value1
1 | name2 | value2
.
.
.
1 | name90 | value90
Second request: Save the 5 values is very easy and fast on the aditional table. Just create and index for row_id on both tables.
Third Request: Here you say again can save it as bits. But instead using strings, that is a bad idea.
Your are right, number isnt big enough to hold 90 bit, that is because a number can only hold 32 or 64 bits depending on type.
In that case you need to use two field (64 bits) or three field (32 bits) to store all 90 possible flags.
Again easy to do and really fast.
EDIT
For use multiple fields you have to create categories
Like imagine there are 16 bits split into two 8 bits (0..256)
01234567 89ABCDEF
01010101 11111111
Create fieldUp and fieldDown
SAVE
FieldUp = 01234567
FieldUp = 1 + 4 + 16 + 64
FieldDown = 89ABCDEF
FieldDown = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128
Then Select a row with FLAGS [b1, b5, bA] would be
SELECT *
FROM TABLE
WHERE FieldUp & (4 + 32)
AND FieldDown & 8

I have resolved saving the numbers comma separated, then in my code i split this field into an array and can process the data. Numbers are not meant for math operations but just as a string.

How to generate sequences with distinct subsums?

I'm looking for a way to generate some (6 for default) equations where all subsums are unique.
For example,
a+b+c=50
d+e+f=50
g+h+i=50
a, d and g have to be distinct.
a+b and d+e have to be distinct.
e+f and h+i have to be distinct.
a+c and d+f have to be distinct.
But, a+b and e+f can be the same. So I only care about the subsums of aligned parameters..
I could only found ways to check whether some sequence is subsum-distinct, but I found nothing on how to generate such a sequence..

You didn't state whether you need it to be a random sequence, so suppose that this is not required.
One simple approach is this:
1 + 2 + 47 = 50
3 + 4 + 43 = 50
5 + 6 + 39 = 50
7 + 8 + 35 = 50
9 + 10 + 31 = 50
11 + 12 + 27 = 50
First two numbers are 2 smallest available numbers, the third number is final sum - those numbers.
a and b are always increasing, c is always decreasing
a + b is always increasing, b + c and a + c are always decreasing
You can generate it this way in a loop.
EDIT after comment that it has to be a random sequence:
Possibly you could create several sets (some sort of hashset/hashmap would be the most appropriate)
set of first summands
set of sums of first and second summands
set of sums of second and third summands
set of sums of first and third summands
set of previously generated triples
You would generate random triples this way:
If total number of demanded triples was not achieved generate a random triple, otherwise finish.
Check if the triple was not previously generated, if not proceed with step 3.
Conduct checks for first four sets. If no sums are contained within those sets, add triple and proceed with step 1.
However, I am not sure if this approach guarantees that you will get results (especially in small final sums).
So, I would add an counter, if too many consecutive attempts are not successful, then I would switch to brute force approach (which should not be problem if final sums are small and on other hand is very unlikely to happen if a final sum is large).
Overall, performance should be good.

SQL Server 2008, how much space does this occupy?

I am trying to calculate how much space (Mb) this would occupy. In the database table there is 7 bit columns, 2 tiny int and 1 guid.
Trying to calculate the amount that 16 000 rows would occupies.
My line of thought was that 7 bit columns consume 1 byte, 2 tiny ints consumes 2 bytes and a guid consumes 16 bytes. Total of 19byte for one row in the table? That would mean 304000 bytes for 16 000 rows or ~0.3mbs us that correct? Is there a meta data byte as well?

There are several estimators out there which take away the donkey work
You have to take into account the Null bitmap which will be 3 bytes in this case + number of rows per page + row header + row version + pointers + all the stuff here:
Inside the Storage Engine: Anatomy of a record
Edit:
Your 19 bytes of actual data
has 11 bytes overhead
total 30 bytes per row
around 269 rows per page (8096 / 30)
requires 60 pages (16000 / 269)
around 490k space (60 x 8192)
a few KB for the index structure of the primary

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas