text truncated using perl DBI insert - sql

The problem is that DBI's insert leaves long string truncated when inserting to MS SQL server. Here are my codes:
my $insert = $dbh->prepare("INSERT INTO my_table (field_1, field_2) values (?, ?)");
$insert->execute($value_1, $value_2);
where field_2 has data type of varchar(100) and $value_2 is a text string of 90 characters with spaces but no other special characters.
After the statement is executed, with no error raised, I checked the database and apparently the actual inserted $value_2 is truncated at the 80th character, which is in the middle of a regular English word (i.e. not a special character).
I've tried to alter the data type of field_2 to varchar(150) and text. I've also used $dbh->quote($value_2) in place of $value_2. But they didn't help.
Why is this happening? What should I do? Thx!!

If you are using freeTDS it is probably a bug identified in the freeTDS mailing list. See freetds silently truncating text/varchar/etc fields to 80 characters and http://lists.ibiblio.org/pipermail/freetds/2011q2/026943.html and http://lists.ibiblio.org/pipermail/freetds/2011q2/026925.html and http://lists.ibiblio.org/pipermail/freetds/2011q2/026944.html

I'd say try various strings to see if they all behave the same way. It's probably an encoding issue as davorg suggests. The default collation in MySQL is Swedish or Latin1 as I recall, so in MySQL you'll probably want to change your collation to utf8_general_ci.

Related

How to store emoji characters in DB2 database?

I have an application that it's storing tweets in a DB2 database, and need to retrieve them in some moments. I'm having troubles showing text string with emojis inside (some emojis loose the format).
I've been reading different answers in internet, but most are for MySQL (switch from utf8 to utf8mb4), but nothing for DB2...
Is there any way to do something like the following in DB2 databases?
https://mathiasbynens.be/notes/mysql-utf8mb4
Thanks too much
You can use Unicode constants like this in a Db2 Unicode database
$ db2 "values U&'\+01F600'"
1
----
😀
1 record(s) selected.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000731.html
U& followed by a sequence of characters that starts and ends with a string delimiter and that is optionally followed by the UESCAPE clause. This form of a character string constant is also called a Unicode string constant.
A character can be expressed by either its typographical character (glyph) or its Unicode code point. The code point of a Unicode character ranges from X'000000' to X'10FFFF'.
To express a Unicode character through its code point, use the Unicode escape character followed by 4 hexadecimal digits, or the Unicode escape character followed by a plus sign (+) and 6 hexadecimal digits. The default Unicode escape character is the reverse solidus ()
or you can use UTF-8 HEX values if you prefer
db2 "values x'F09F9880'"
1
----
😀
1 record(s) selected.
Could you clarify what is the issue? With UTF-8 database there is no issues with the linked example
$ db2 "create table emoji(string_with_emjoi varchar(32))"
DB20000I The SQL command completed successfully.
$ db2 "insert into emoji values 'foo𝌆bar'"
DB20000I The SQL command completed successfully.
$ db2 "select string_with_emjoi, hex(string_with_emjoi) string_with_emjoi_hex from emoji"
STRING_WITH_EMJOI STRING_WITH_EMJOI_HEX
-------------------------------- ----------------------------------------------------------------
foo𝌆bar 666F6FF09D8C86626172
Code point for emjoi is stored with 4 bytes (0xF09D8C86). If you have an issue displaying the emoji after retrieval you need to dig a bit deeper and see what is the actual value returned by the database - problem very well might be in the application itself.

Handling chinese characters in SQL Server 2016

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

INSERT Statement in SQL Server Strips Characters, but using nchar(xxx) works - why?

I have to store some strange characters in my SQL Server DB which are used by an Epson Receipt Printer code page.
Using an INSERT statement, all are stored correctly except one - [SCI] (nchar(154)). I realise that this is a control character that isn't representable in a string, but the character is replaced by a '?' in the stored DB string, suggesting that it is being parsed (unsuccessfully) somewhere.
The collation of the database is LATIN1_GENERAL_CI_AS so it should be able to cope with it.
So, for example, if I run this INSERT:
INSERT INTO Table(col1) VALUES ('abc[SCI]123')
Where [SCI] is the character, a resulting SELECT query will return 'abc?123'.
However, if I use NCHAR(154), by directly inserting or by using a REPLACE command such as:
UPDATE Table SET col1 = REPLACE(col1, '?', NCHAR(154))
The character is stored correctly.
My question is, why? And how can I store it directly from an INSERT statement? The latter is preferable as I am writing from an existing application that produces the INSERT statement that I don't really want to have to change.
Thank you in advance for any information that may be useful.
When you write a literal string in SQL is is created as a VARCHAR unless you prefix is with N. This means if you include any Unicode characters, they will be removed. Instead write your INSERT statement like this:
INSERT INTO Table(col1) VALUES (N'abc[SCI]123')

Losing special characters on insert

I am using oracle 11g and trying to insert a string containing special UTF8 characters eg '(ε- c'. The NLS character sets for the databse are...
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET WE8ISO8859P1
when I copy and paste the above string into a NVARCHAR field it works fine.
if I execute the below I get an upside down question mark in the field
insert into title_debug values ('(ε- c');
where title debug table consists of a single NVARCHAR2(100) field called title.
I have attempted to assign this string to a NVARCHAR2(100) variable then iserting this. And also attempted all the different CAST / CONVERT ect functions I can find and nothing is working.
Any assistance would be greatly appreciated.
UPDATE
I have executed
select dump(title, 1016), dump(title1, 1016)
into v_title, v_title1
from dual
where title is the string passed in as a varchar and title1 is the string passed in as a NVarchar.
Unsuprisingly the encodings come through as WE8ISO8859P1 and AL16UTF16. but on both the ε comes through as hex 'BF'. This is the upside down Question mark.
My only thought left is to try and pass this through as a raw and then do something with it. However I have not yet been able to figure out how to convert the string into a acceptable format with XQuery (OSB).
Continued thanks for assistance.
Our DBA found the solution to this issue. The answer lay in a setting on the dbc connection on the bus to tell it to convert utf8 to NChar.
On The connection pool page add the following lines to the Properties box.
oracle.jdbc.convertNcharLiterals=true
oracle.jdbc.defaultNchar=true
this will allow you to be able to insert into NVarchar2 fields while maintaining the utf8 characters.
Cheers
I did a test with '(ε- c'. And i have not encountered any problems.
If you can i advice you to change your character set for :
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
And Oracle recommendation for all new deployment is the Unicode character set AL32UTF8.
Because it's flexible globalization support and is a universal character set
reg,
First verify the data is being stored correctly, then use the correct NLS_LANG settings. See my answer to this question:
when insert persian character in oracle db i see the question mark

Why are my accented characters breaking in SQL Server 2005?

When I update my database with this command:
UPDATE myTable SET Name = 'Hermann Dönnhoff' WHERE ID = 123;
SQL Server actually puts 'Hermann Do¨nnhoff' in the field instead. Instead of faithfully inserting the o-umlaut (char(246)), I'm getting two characters ( char(111) + char (168) ).
This happens for all characters that have accent marks, not just umlauts.
Has anybody seen this?
Thank you.
You need to use the nchar, nvarchar, or ntext datatypes for Unicode data.
The issue is that your code page does not directly support those characters.
Read up on collations for more information:
http://msdn.microsoft.com/en-us/library/aa214408%28SQL.80%29.aspx
http://msdn.microsoft.com/en-us/library/aa174903%28SQL.80%29.aspx