How to convert string with German characters to Blob in Firebird? - sql

I want to convert a string into a blob with the f_strblob(CSTRING) function of FreeAdhocUDF. At this point I do not find a way to get my special characters like ß or ä shown in the blob.
The result of f_strblob('Gemäß') is Gem..
I tried to change the character set to UTF8 of my variables, but that does not help.
Is there a masking option which I did not find?

You don't need that function, and the FreeAdhocUDF documentation also marks it as obsolete for that reason.
In a lot of situations, Firebird will automatically convert string literals to blobs (eg in statements where a string literal is assigned to a blob value), and otherwise you can explicitly cast using cast('your string' as blob sub_type text).

Related

Azure Data Factory: Reading doubles with a comma as decimal separator instead of a dot

I'm currently trying to read from a CSV files ( separated with semicolon ';') with decimal numbers formatted with a comma(,) as a decimal separator instead of a dot (.).
i.e: the number 12356.12 is stored as 12356,12.
In the source's projection, what would be the correct format to read the value correctly?
The format should in Java Decimal Format
If your CSV file's columnDelimiter is a comma (','), your first concern is how to avoid your number data won't be treated as different columns. Since your number data is stored as 12356,12, so my suggests as below :
Change the columnDelimiter as | or other special characters.
2.Set escape char. Please see this description:
In addition, 12356,12 can't be identified as Decimal format in ADF automatically. And no such mechanism o turn , into .. So I think you need to transfer data as string temporary. Then convert it into Decimal in your destination with java code.
True answer is in the comments: In the copy job the culture can be defined, which influences the decimal separator. Go to "mapping" > "Type conversion settings" > "culture" and chose en-us, de-de or whatever works for you. Be aware that this will also influence other types like dates.

Is there an EBCDIC_STR function on IBMi

I'm running into issue with character encoding and I found the functions EBCDIC_STR, ASCII_STR in Db2 for z/OS. Are there similar function for Db2 for IBM i?
Starting with v7.2, there is a similar function in DB2 for i, it is CHAR. It is not an exact replacement though. While EBCDIC_STR returns a string in the system EBCDIC CCSID, and provides a UTF-16 encoding for unknown characters, CHAR takes a string and converts it to a provided CCSID. CHAR has no defined behavior for characters that cannot be converted to the new CCSID.
I believe you will have to use a CAST specification in your SQL statement, specifying in it the desired CCSID, rather than using a built-in function.
This documentation page gives the syntax of a CAST specification, but it does not have a precisely relevant example. The DB2 for zOS CAST page gives an example that should be the same on the i Series:
CAST(MYDATA AS CHAR(10) CCSID 367)

how to render a dicom file's header unreadable

Kind of a strange question, but I'm doing some testing to handle errors when a dicom file's tags can't be read.
Unfortunately I don't have a damaged dicom available.
Specifically, can anyone advise how to apply some sort of incorrectly encoded text tag or some invalid numeric data tag onto the file, such that it can't be read by python's pydicom package?
you could have a look at the dcmodify tool from the DCMTK. It can be used to insert, modify and delete attributes. I doubt that it is possible to specify invalid attribute values through the command line, but you could surely modify the source code to accomplish that (except you can definitely write attribute values that exceed the maximum length according to the Value Representation).
My approach would be to create a buffer of characters and write binary data to it. Then pass it to the method that writes the value to the attribute.
Examples:
write unicode (UTF-8) sequences which are not a valid unicode character
write ascii characters which are not covered by the characterset specified by (0008,0005) - not sure whether pydicom would run into problems but it would be wrong from the DICOM perspective
write non-numeric characters to attributes with Value Representation "Decimal String" or "Integer String".
formats other than YYYYMMDD for VR "Date"
formats other than HHMMSS.FFFFFF for VR "Time"
other characters than ['0'-'9'], '.' for VR "Unique Identifier"
[edit]: DCMTK, dcmodify: http://dicom.offis.de/dcmtk.php.en

Inserting string as regular string in mongodb

The pymongo documentation says that BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
So I understand that to get rid of the Unicode literal 'u', I will have to call json.dumps() on the document returned by the query.
The documentation also says that Regular strings (<type ‘str’>) are validated and stored unaltered. And I am assuming that the query result also throws it back as a regular string and not a Unicode string.
I created a dictionary with regular string types and inserted it in DB and when I retrieve it, I get the strings as Unicode. Any idea on how do I do it? The purpose is to avoid calling json.dumps() on the query result. I need to fetch large number of documents from the DB and json.dumps() is taking quite some time. The strings that I am storing contain ASCII data so I don't need Unicode strings.
The assumption that the regular string is returned back as regular string was not correct. It is stored unaltered and not encoded to UTF-8 because it is already UTF-8. While decoding during the query, everything is converted back to Unicode.
Source:
Automatic string to unicode object conversion
How can I get pymongo to always return str and not unicode?

Convert text with HTML character encoding to database characterset

Our application receives data from various sources. Some of these contain HTML character makeup instead of regular characters. So instead of string "â" we receive string "â".
How can we convert "â" to a character in the database character set using SQL/PLSQL?
Our database is 10GR2.
Unescape_reference and excape_reference I believe is what you're looking for
UTL_I18N.UNESCAPE_REFERENCE('hello < å')
This returns 'hello <'||chr(229).
http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/u_i18n.htm#i998992
You can use the CHR() function to convert an ascii character number to a character representation.
SELECT chr(226)
FROM dual;
CHR(226)
--------
â
For more information see: http://www.techonthenet.com/oracle/functions/chr.php
Hope it helps...
one solution
replace(your_test, 'â', chr(226))
but you'd have to nest many replace functions, one for each entity you need to replace. This might be very slow if you have to replace many.
You can wrote your own function, seqrching for the ampersand and replacing when found.
Have you searched the Oracle Supplied Packages manual? I know they have a function that does the opposite for a few entities.
to convert a column in oracle which contains HTML items to plain text, you could use:
trim(regexp_replace(UTL_I18N.unescape_reference(column_name), '<[^>]+>'))
It will replace HTML character as above stated but will also remove HTML tags en remove leading and trailing spaces.
I hope it will help someone.