How to check for BOM in postgres text columns?

How to check for BOM in postgres text columns? - sql

We have some encoding issues and I need to check whether a BOM is already present in a PostgreSQL text column. I used
select convert(varbinary, columnXY) from tableXY where id = 1;
for MS SQL successfully, but don't find equivalent conversions for PostgreSQL. I found this documentation and tried with decode(columnXY, 'hex'), but that is not working.

You may consider the binary representation of the TEXT column by converting it to BYTEA (edit: not by a direct cast, better use convert_to(text,'UTF-8') instead) and searching the BOM sequence in it as a series of bytes.
as an SQL expression:
position('\xefbbbf'::bytea IN convert_to(your_text_column,'UTF-8'))=1
0 as the result of position(...) would mean the BOM is not in the string.
1 means it's at the beginning of the string.

Related

NLSSORT Oracle to Snowflake

I'm trying to convert the following code from Oracle to Snowflake:
order by nlssort(name, 'NLS_SORT=BINARY')
I know NLSSORT is not a function in Snowflake, but is there anything I can use as an alternative?

It should be pretty similar already to Snowflake's default sorting - you just need to consider your database charset in Oracle (select * from nls_database_parameters where parameter='NLS_CHARACTERSET') and see whether it has a different binary order than ASCII/UTF-8.
Oracle's documentation:
If the value is BINARY, then comparison is based directly on byte
values in the binary encoding of the character values being compared.
Snowflake's documentation:
All data is sorted according to the numeric byte value of each
character in the ASCII table. UTF-8 encoding is supported.
So I think you should be able to just do:
order by name
It's kind of odd that somebody would write that Oracle code to begin with, since BINARY is the default sort order (collation). But if your Oracle database is using multilingual collation (which is not common) for other queries, I don't think you're going to be able to easily emulate that in Snowflake.

Interpret numeric field as a string in SQL

I have a 64-bit integer field in my Postgres database, which is populated with 64 bit integer numbers. (Non) coincidentally, those numbers are actually 8-chars strings in ASCII format, little endian. For example, a number 5208208757389214273 is a numeric representation of a string "ABCDEFGH": it is 0x4847464544434241 in hex, where 0x41 is A, 0x42 is B, 0x43 is C and so forth.
I would like to convert those numbers purely for display purposes - i.e. find a way to leave them as numbers in the database, but be able to see them as strings when querying. Is there any way to do it in SQL? If not in SQL, is there anything I can do on the server side (install extensions, stored procedures, anything at all) which would allow this? This problem is trivially solvable with any script or programming language, but I do not know how to solve it with SQL.
P.S. And just one more time for some of trigger-happy duplicate-hammer-yielding bunch - this is not a question of translating number like 5208208757389214273 to string "5208208757389214273" (we have a lot of answers on how to do this, but this is not what I am looking for).

Use to_hex() to get a hexadecimal representation for the number. Then use decode() to turn it into a bytea. (Unfortunately I did not find any direct way from bigint to bytea.) Cast that to text and reverse() it, because of the endianess.
reverse(decode(to_hex(5208208757389214273), 'hex')::text)
ABCDEFGH
The bytea_output must be set to 'escape' for this to work properly -- use SET bytea_output = 'escape';.
(Tested on versions 9.4 and 9.6.)
An alternative way to achieve the same rsult without using SET is following:
select reverse(encode(decode(to_hex(5208208757389214273),'hex'),'escape'))

SQL Server NText field limited to 43,679 characters?

I working with SQL Server data base in order to store very long Unicode string. The field is from type 'ntext', which theoretically should be limit to 2^30 Unicode characters.
From MSDN documentation:
ntext
Variable-length Unicode data with a maximum string length of 2^30 - 1 (1,073,741,823) bytes. Storage size, in bytes, is two times the string length that is entered. The ISO synonym for ntext is national
text.
I'm made this test:
Generate 50,000 characters string.
Run an Update SQL statement
UPDATE [table]
SET Response='... 50,000 character string...'
WHERE ID='593BCBC0-EC1E-4850-93B0-3A9A9EB83123'
Check the result - what actually stored in the field at the end.
The result was that the field [Response] contain only 43,679 characters. All the characters at the end of the string was thrown out.
Why this happens? How I can fix this?
If this is really the capacity limit of this data type (ntext), which another data type can store longer Unicode string?

Based on what I've seen, you may just only be able to copy 43679 characters. It is storing all the characters, they're in the db(check this with Select Len(Reponse) From [table] Where... to verify this), and SSMS has problem copying more than when you go to look at the full data.

NTEXT datatype is deprecated and you should use NVARCHAR(MAX).
I see two possible explanations:
Your ODBC driver you use to connect to database truncate parameter value when it is too long (try using SSMS)
You write you generate your input string. I suspect you generate CHAR(0) which is Null literal
If second is your case make sure you cannot generate \0 char.
EDIT:
I don't know how you check the length but keep in mind that LEN does not count trailing whitespaces
SELECT LEN('aa ') AS length -- 2
,DATALENGTH('aa ') AS datalength -- 7
Last possible solution I see you do sth like:
SELECT 'aa aaaa'
-- result in SSMS `aa aaaa`: so when you count you lose all multiple whitespaces
Check query below if returns 100k:
SELECT DATALENGTH(ntext_column)

For all bytes; Grid result on right click and click save result to file.

Can confirm. The actual limit is 43679. Had a problem with a subscription service for a week now. Every data looked good, but it still gave us an error that one of the fields have invalid values, even tho, it got correct values in. It turned out that the parameters was stored in NText and it maxed out at 43679 characters. And because we cannot change the database design, we had to make 2 different subscriptions for the same thing and put half of the entities to the other one.

Update multiple rows in same query using mapping table in Postgres 8.4

In question Update multiple rows in same query using PostgreSQL Roman Peckar gave an answer similar to this; I have modified it for the purpose of my question:
update test as t set
column_a = c.column_a,
column_b = c.column_b
from (values
('123', bytea1),
('345', bytea2)
) as c(column_a, column_b)
where c.column_a = t.column_a;
In my case table test has a column of type bytea, say column_b. However, this does not work as c.column_b is of type text and thus an error is produced saying there is no conversion from text to bytea and hinting to use a cast. Well, using a cast does not help either as another error occurs about encoding referring to a LATIN encoding. I apologise for the imprecise reporting of the errors but I do not presently have access to the machine on which this work was carried out.
It seems that the default type of the c.column_b is text. Cannot the type of a column be dictated in the 'as' clause say, 'as c(column_a, column_b type bytea)' or in some other way? If not I assume I must resort to using some binary string function which seems a bit inelegant to say the least.

Because text type is for text. It needs properly encoded text in your client encoding and which can be saved with no data loss in your server encoding (so for example in latin1 no “ or €, as characters like this can not be saved using this encoding).
So if you need to save text, which can contain characters outside of latin1 (like anything typed to a web form) you'd need to change database encoding to utf-8. Or, as a last resort, use encode(data,'base64').

count number of characters in nvarchar column

Does anyone know a good way to count characters in a text (nvarchar) column in Sql Server?
The values there can be text, symbols and/or numbers.
So far I used sum(datalength(column))/2 but this only works for text. (it's a method based on datalength and this can vary from a type to another).

You can find the number of characters using system function LEN.
i.e.
SELECT LEN(Column) FROM TABLE

Use
SELECT length(yourfield) FROM table;

Use the LEN function:
Returns the number of characters of the specified string expression, excluding trailing blanks.

Doesn't SELECT LEN(column_name) work?

text doesn't work with len function.
ntext, text, and image data types will be removed in a future version
of Microsoft SQL Server. Avoid using these data types in new
development work, and plan to modify applications that currently use
them. Use nvarchar(max), varchar(max), and varbinary(max) instead. For
more information, see Using Large-Value Data Types.
Source

I had a similar problem recently, and here's what I did:
SELECT
columnname as 'Original_Value',
LEN(LTRIM(columnname)) as 'Orig_Val_Char_Count',
N'['+columnname+']' as 'UnicodeStr_Value',
LEN(N'['+columnname+']')-2 as 'True_Char_Count'
FROM mytable
The first two columns look at the original value and count the characters (minus leading/trailing spaces).
I needed to compare that with the true count of characters, which is why I used the second LEN function. It sets the column value to a string, forces that string to Unicode, and then counts the characters.
By using the brackets, you ensure that any leading or trailing spaces are also counted as characters; of course, you don't want to count the brackets themselves, so you subtract 2 at the end.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to check for BOM in postgres text columns? - sql

Related

NLSSORT Oracle to Snowflake

Interpret numeric field as a string in SQL

SQL Server NText field limited to 43,679 characters?

Update multiple rows in same query using mapping table in Postgres 8.4

count number of characters in nvarchar column

Categories

Resources