SQL Loader - Actual length exceeds maximum - sql

I tried loading data into table using sql loader.
The log shows actual length of the string is 101 where as 100 is maximum(Rejects the record).But when i checked ,I found the length is 99.
data type of the string is varchar2(100) in table
I didnt specify anything about length in control file
What would be the exact problem?

Your data value only has 99 characters, but it seems some are multibyte characters - from a comment at least one is the symbol ½.
There are two related way to see this behaviour, depending on how your table is defined and what is in your control file.
You're probably seeing the effect of character length semantics. Your column is defined as 100 bytes; you're trying to insert 99 characters, but as some characters require multiple bytes for storage, the total number of bytes required for your string is 101 - too many for the column definition.
You can see that effect here:
create table t42 (str varchar2(10 byte));
Then if I have a data file with one row that has a multibyte character:
This is 10
This is 9½
and a simple control file:
LOAD DATA
CHARACTERSET UTF8
TRUNCATE INTO TABLE T42
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(
STR
)
Then trying to load that gets:
Record 2: Rejected - Error on table T42, column STR.
ORA-12899: value too large for column "MYSCHEMA"."T42"."STR" (actual: 11, maximum: 10)
Total logical records read: 2
Total logical records rejected: 1
If I recreate my table with character semantics:
drop table t42 purge;
create table t42 (str varchar2(10 char));
then loading with the same data and control file now gets no errors, and:
Total logical records read: 2
Total logical records rejected: 0
However, even when the table is defined with character semantics, you could still see this; if I remove the line CHARACTERSET UTF8 then my environment defaults (via NLS_LANG, which happens to set my character set to WE8ISO8859P1) leads to a character set mismatch and I again see:
Record 2: Rejected - Error on table T42, column STR.
ORA-12899: value too large for column "STACKOVERFLOW"."T42"."STR" (actual: 11, maximum: 10)
(Without that control file line, and with byte semantics for the column, the error reports actual length as 13 not 11).
So you need the table to be defined to hold the maximum number of characters you expect, and you need the control file to specify the character set if your NLS_LANG is defaulting it to something that doesn't match the database character set.
You can see the default semantics a new table will get by querying, for the database default and your current session default:
select value from nls_database_parameters where parameter = 'NLS_LENGTH_SEMANTICS';
select value from nls_session_parameters where parameter = 'NLS_LENGTH_SEMANTICS';
For an existing table you can check which was used by looking at the user_tab_columns.char_used column, which will be B for byte semantics and C for character semantics.

Related

I get this error: ORA-01438: value larger than specified precision allowed for this column

I have a column in my database table that is of type Number(5,3). I need to be able to insert data or update data to this column. I am currently using a form field that lets users input whatever number they want. This field is the one used when inserting or updating data into this column of type Number(5,3). When testing I enter any number and get this error: ORA-01438: value larger than specified precision allowed for this column
I am aware the data type NUMBER(5,3) means 5 is precision (total number of digits) and the 3 means scale (number of digits to the right of decimal point). For example: 52.904
Is there a function in oracle to format any number into a number of this type: NUMBER(5,3)?
Again I would like for the user to input any number on the field and be able to process that number as NUMBER(5,3) to insert or update into my table.
You could use something like this:
select cast (512.33333333 as number(5,2)) from dual;

Numeric Data Type - Storage

According to Microsoft Site a data with type Numeric(10,2) - 10 means precision should have 9 bytes.
But when I'm doing this:
DECLARE #var as numeric(10,0) = 2147483649
SELECT #var, DATALENGTH(#var)
DATALENGTH(#var) is returning 5 bytes instead of 10. Can someone explain me why?
The documentation specifies:
Maximum storage sizes vary, based on the precision.
The storage is not constant for a given precision. The actual storage depends on the value.
As a note, this has nothing to do with integerness. The following also returns 5:
declare #var numberic(11, 1) = 214483649.8
In actual fact, SQL Server seems to use the amount of storage needed for the value, not for the maximum value of the type. You can readily see this by changing the "10" to "20" and noting that the data length does not change.
EDIT:
You can see the dependence on the value if you run:
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a), datalength(#b);
The two lengths are not the same.
The other answer, by #GordonLinoff is wrong, or at least misleading.
Numeric is not stored with a variable number of bytes, but with a fixed size for a specific precision.
Trying this on SQL Server 2017 gave the same results you got.
The documentation you linked to originally, for numeric, is correct about how many bytes it takes to store a numeric of varying precisions.
This storage requirement is based only on the precision of the numeric column. In other words, that's how many bytes of storage are used. It is not a maximum that depends on the value in that row.
All rows use the same number of bytes for that column.
The key to this variation is the documentation for DATALENGTH says this function
Returns the number of bytes used to represent any expression.
It appears that DATALENGTH goes not mean 'represent' as in 'represent' on disk, but rather 'represent' in memory.
The other documenation regarding numeric is talking about the on-disk storage of numeric.
This is probably because DATALENGTH is intended primarily for var* types or the other BLOB types.
So although a numeric(20,1) requires 13 bytes of storage, depending on the value, SQL Server can represent it in a smaller number of bytes when in memory, which is when DATALENGTH evaluates it.
As I pointed out in my other comment, although numeric has different sizes, it a fixed size data type, because for a specific column in a specific table, every values takes up the same amount of storage.
Roughly, a SQL Server row has 4 parts:
4 byte header
Fixed size data
Offsets into variable size data
Variable size data
Numerics & other fixed size types are stored in 2, var* are stored in 4, with lengths in 3.
This script displays the metadata for a table with some fixed & variable columns.
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a) union select datalength(#b);
create table #numeric(num1 numeric(20,1), text1 varchar(10), char2 char(6));
insert into #numeric(num1, text1, char2) values ('123.1', 'hello', 'first'), ('1234567890123456789.0', 'there', '2nd');
select datalength(num1) from #numeric;
select
t.name as table_name,
c.name as column_name,
pc.partition_column_id,
pc.max_inrow_length,
pc.max_length,
pc.precision,
pc.scale,
pc.collation_name,
pc.leaf_offset
from tempdb.sys.tables as t
join tempdb.sys.partitions as p
on(t.object_id=p.object_id)
join tempdb.sys.system_internals_partition_columns as pc
on(pc.partition_id=p.partition_id)
join tempdb.sys.columns as c
on((c.object_id=p.object_id)and(c.column_id=pc.partition_column_id))
where (t.object_id=object_id('tempdb..#numeric'));
drop table #numeric;
Notice the leaf_offset column. This indicates the starting position of the value in the raw binary data.
The first column starts immediately after the 4 byte header.
The second fixed column starts 13 bytes later, as per the SQL documentation.
The varchar column has an offset of -1, indicating it is a variable length column & it's position in the byte array isn't fixed.
In this case it could be fixed since there's only 1 var column, but an alter table statement could add another column & shift things.
If you want to research further, the best source is a book called SQL Server Internals, by Kalen Delaney. She was part of the team that wrote SQL Server.

Compare strings with trailing spaces in Firebird SQL?

I have an existing database with a table with a string[16] key field.
There are rows whose key ends with a space: "16 ".
I need to allow user to change from "16 " to e.g. "16" but also do a unique key check (i.e. the table does not have already a record with key="16").
I run the following query:
select * from plu__ where store=100 and plu_num = '16'
It returns the row with key="16 "!
How do I check for unique key so that keys with trailing spaces are not included?
EDIT: The DDL and the char_length
CREATE TABLE PLU__
(
PLU_NUM Varchar(16),
CAPTION Varchar(50),
...
string[16] - there is no such datatype in Firebird. There are CHAR(16) and VARCHAR(16) (and BLOB SUBTYPE TEXT, but it is improbable here). So you omit some crucial points about your system. You do not work with Firebird, but with some undisclosed intermediate layer, that is no one knows how opaque or transparent.
I suspect you or your system chose CHAR datatype instead of VARCHAR where all data is right-padded with space to the max. OR maybe the COLLATION of the column/table/database is so that trailing spaces do not matter.
Additionally, you may be just wrong. You claim that the row being Selected does contain the trailing blank, but I do not see it. For example, add CHAR_LENGTH(plu_num) to the columns in your SELECT and see what is there.
Additionally, if plu_num is number - should it not be integer or int64 rather than text?
Bottom of your screenshot shows "(NONE)". I suspect that is the "connection charset". This is allowed for backward compatibility with programs made 20 years ago, but it is quite dangerous today. You have to consult your system documentation, how to set the connection charset to URF-8 or Windows-1250 or something meaningful.
"How do I check for unique key so that keys with trailing spaces are not included?" you do not. You just can not do it reliably, because of different transactions and different programs making simultaneous connections. You would check it, decide you are clear, but right before you would insert your row - some other computer would insert it too. That gap can not be crossed that way, between your two commands of checking and inserting - anyone else can do it too. It is called race conditions.
You have to ask the server to do the checks.
For example, you have to introduce UNIQUE CONSTRAINT on the pair of columns (store, plu_num). That way the server would refuse to store two rows with the same values in those columns, visible in the same transaction.
Additionally, is it even normal to have values with spaces? Convert the field to integer datatype and be safe.
Or if you want to keep it textual and non-numeric you still can
Introduce CHECK CONSTRAINT that trim(plu_num) is not distinct from plu_num (or if plu_num is declared as a NOT NULL column to the server, then trim(plu_num) = plu_num). That way the server would refuse storing any value with spaces before or after the text.
In a case the datatype or the collation of the column makes no difference for comparing texts with and without trailing spaces (and in case you can not change that datatype or collation), you may try adding tokens, like ('+' || trim(plu_num) || '+') = ('+' || plu_num || '+')
Or instead of that CHECK CONSTRAINT, you can have proactively remove those spaces: set new before update or insert TRIGGER on the table, that would do like NEW.plu_num = TRIM(NEW.plu_num)
Documentation:
https://www.firebirdsql.org/refdocs/langrefupd20-distinct.html
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-tbl.html#fblangref25-ddl-tbl-constraints
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-tbl.html#fblangref25-ddl-tbl-altradd
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-trgr.html
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-datatypes-chartypes.html
Also, via http://www.translate.ru a bit more verbose:
http://firebirdsql.su/doku.php?id=constraint
http://firebirdsql.su/doku.php?id=alter_table
You may also check http://www.firebirdfaq.org/cat3/
Additionally, if you add the constraints onto existing table with non-valid data entered earlier before you introduced those checks, you might trap yourself into "non-restorable backup" situation. You would have to check for it, and sanitize your old data to abide by newly introduced constraints.
Option #4 explained in detail is below. Just this seems be a bad idea of database design! One should not just "let people edit number to remove trailing blanks", one should make the database design so that there would be no any numbers with trailing blank and would be no any way to insert them into the database.
CREATE TABLE "_NEW_TABLE" (
ID INTEGER NOT NULL,
TXT VARCHAR(10)
);
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
ID TXT CONCATENATION CHAR_LENGTH
1 1 _1_ 1
2 2 _2_ 1
4 1 _1 _ 2
5 2 _2 _ 2
7 1 _ 1_ 2
8 2 _ 2_ 2
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt = '2'
ID TXT CONCATENATION CHAR_LENGTH
2 2 _2_ 1
5 2 _2 _ 2
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt || '+' = '2+' -- WARNING - this PROHIBITS index use on txt column, if there is any
ID TXT CONCATENATION CHAR_LENGTH
2 2 _2_ 1
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt = '2' and char_length(txt) = char_length('2')

SQL Server size difference for a column

I have table in SQL Server say "Temp" and it has Addr1, Addr2, Addr3, Addr4 columns and some additional columns also there.
These Addr1, Addr2, Addr3 and Addr4 are nvarchar type. when I check the size of this column by object explorer. it shows all of them in nvarchar(100).
But when I check them using Alt + F1. It shows the details in Result Pane with the length as 200. screenshot is below.
why there is different?
If I enter more than 100 characters, I'm getting truncation errors? seems like it taking only 100 characters.
can you please let me know what is the length value specifies ?
Thanks,
Prakash.
Because the size listed in Object Explorer is number of characters and the size listed in the result of your query to sp_help is number of bytes.
VARCHAR values in SQL use 1 byte per character, whereas NVARCHAR values use 2 bytes per character. Both also need a 2 byte overhead - see below. So because you are looking at NVARCHAR columns, these need 200 (well actually 202) bytes to store 100 characters, where a VARCHAR would only require 100 (really 102).
References:
MSDN: char and varchar
The storage size is the actual length of the data entered + 2 bytes.
MSDN: nchar and nvarchar:
The storage size, in bytes, is two times the actual length of data entered + 2 bytes.
(emphasis mine)
MSDN: sp_help:
Reports information about a database object (any object listed in the sys.sysobjects compatibility view), a user-defined data type, or a data type.
/------------------------------------------------------------------------\
| Column name | Data type | Description |
|-------------+-----------+----------------------------------------------|
| Length | smallint | Physical length of the data type (in bytes). |
\------------------------------------------------------------------------/

What does the specified number mean in a VARCHAR() clause?

Just to clarify, by specifying something like VARCHAR(45) means it can take up to max 45 characters? I remember I heard from someone a few years ago that the number in the parenthesis doesn't refer to the number of characters, then the person tried to explain to me something quite complicated which I don't understand and forgot already.
And what is the difference between CHAR and VARCHAR? I did search around a bit and see that CHAR gives you the max of the size of the column and it is better to use it if your data has a fixed size and use VARCHAR if your data size varies.
But if it gives you the max of the size of the column of all the data of this column, isn't it better to use it when your data size varies? Especially if you don't know how big your data size is going to be. VARCHAR needs to specify the size (CHAR don't really need right?), isn't it more troublesome?
You also have to specify the size with CHAR. With CHAR, column values are padded with spaces to fill the size you specified, whereas with VARCHAR, only the actual value you specified is stored.
For example:
CREATE TABLE test (
char_value CHAR(10),
varchar_value VARCHAR(10)
);
INSERT INTO test VALUES ('a', 'b');
SELECT * FROM test;
The above will select "a " for char_value and "b" for varchar_value
If all your values are about the same size, the CHAR is possibly a better choice because it will often require less storage space than VARCHAR. This is because VARCHAR stores both the length of the value and the value itself, whereas CHAR can just store the (fixed-size) value.
The MySQL documentation gives a good explanation of the storage requirements of the various data types.
In particular, for a string of length L, a CHAR(M) datatype will take up (M x c) bytes (where c is the number of bytes required to store a character... this depends on the character set in use).
A VARCHAR(M) will take up (L + 1) or (L + 2) depending on whether M is <=255 or >255.
So, it really depends on how long you expect your strings to be, what the variation in length will be.
NB: The documetation doesn't discuss the impact of character sets on the storage requirements of a VARCHAR type. I've tried to quote it accurately, but my guess is that you would need to multiply the string length by the character byte-width as well to get the storage requirement.
The complicated stuff you don't remember is that the 45 refer to bytes, not chars. It's not the same if you are using a multibyte character encoding. In Oracle you can specify bytes or chars explicitly.
varchar2(45 BYTE)
or
varchar2(45 CHAR)
See Difference between BYTE and CHAR in column datatypes
char and varchar actually becomes irrelevant if you have just 1 variable length field in your table, like a varchar or text. Mysql will automatically change all char to varchar.
The fixed length/size record can give you extra performance, but you can't use any variable length field types. The reason is that it will be quicker and easier for mysql to find the next record.
For example, if you do a SELECT * FROM table LIMIT 10, mysql has to scan the table file for the tenth record. This means finding the end of each record until you find the end of the 10th record. But if your table has fixed length/size records, mysql just needs to know the record size and then skip 10 x #bytes.
If you know a column will contain a small, fixed number of chars use a CHAR, otherwise use a varchar. A CHAR column is padded to the max length.
VARCHAR has a small overhead (4-8 bytes depending on RDBMS), but only uses the overhead + the actual number of chars stored.
For the values you know they are going to be constant, for example for Phone Numbers, Zip Codes etc., It is optimal to use "char" for sure.