Storing and returning emojis - sql

What's the simplest way to write and, then, read Emoji symbols in Oracle table?
Currently I have this situation:
iOS client pass encoded Emojis: One%20more%20time%20%F0%9F%98%81%F0%9F%98%94%F0%9F%98%8C%F0%9F%98%92. For example, %F0%9F%98%81 means 😁;
Column type is nvarchar2(2000), so when view saved text via Oracle SQL Developer it looks like: One more time ????????.

This seems more a client problem than a database problem. Certain iOs programs are capable of interpreting that string and show an image instead of that string.
SQL Developer does not do that.
As long as the data stored in the database is the same as the data retrieved from the database, you have no problem.

After all, we do BASE64 encoding/decoding of the text. It’s suitable for small texts.

In MySQL the character set needs to be set to UTF-16 to be able to save emojis, I assume Oracle would need the same ch

Related

HANA: Unknown Characters in Database column of datatype BLOB

I need help on how to resolve characters of unknown type from a database field into a readable format, because I need to overwrite this value on database level with another valid value (in the exact format the application stores it in) to automate system copy acitvities.
I have a proprietary application that also allows users to configure it in via the frontend. This configuration data gets stored in a table and the values of a configuration property are stored in a column of type "BLOB". For the here desired value, I provide a valid URL in the application frontend (like http://myserver:8080). However, what gets stored in the database is not readable (some square characters). I tried all sorts of conversion functions of HANA (HEX, binary), simple, and in a cascaded way (e.g. first to binary, then to varchar) to make it readable. Also, I tried it the other way around and make the value that I want to insert appear in the correct format (conversion to BLOL over hex or binary) but this does not work either. I copied the value to clipboard and compared it to all sorts of character set tables (although I am not sure if this can work at all).
My conversion tries look somewhat like this:
SELECT TO_ALPHANUM('') FROM DUMMY;
while the brackets would contain the characters in question. I cant even print them here.
How can one approach this and maybe find out the character set that is used by this application? I would be grateful for some more ideas.
What you have in your BLOB column is a series of bytes. As you mentioned, these bytes have been written by an application that uses an unknown character set.
In order to interpret those bytes correctly, you need to know the character set as this is literally the mapping of bytes to characters or character identifiers (e.g. code points in UTF).
Now, HANA doesn't come with a whole lot of options to work on LOB data in the first place and for C(haracter)LOB data most manipulations implicitly perform a conversion to a string data type.
So, what I would recommend is to write a custom application that is able to read out the BLOB bytes and perform the conversion in that custom app. Once successfully converted into a string you can store the data in a new NVCLOB field that keeps it in UTF-8 encoding.
You will have to know the character set in the first place, though. No way around that.
I assume you are on Oracle. You can convert BLOB to CLOB as described here.
http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
In case of your example try this query:
select UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(<your_blob_value)) from dual;
Obviously this only works for values below 32767 characters.

sql server data length [duplicate]

What is the best way to store a large amount of text in a table in SQL server?
Is varchar(max) reliable?
In SQL 2005 and higher, VARCHAR(MAX) is indeed the preferred method. The TEXT type is still available, but primarily for backward compatibility with SQL 2000 and lower.
I like using VARCHAR(MAX) (or actually NVARCHAR) because it works like a standard VARCHAR field. Since it's introduction, I use it rather than TEXT fields whenever possible.
Varchar(max) is available only in SQL 2005 or later. This will store up to 2GB and can be treated as a regular varchar. Before SQL 2005, use the "text" type.
According to the text found here, varbinary(max) is the way to go. You'll be able to store approximately 2GB of data.
Split the text into chunks that your database can actually handle. And, put the split up text in another table. Use the id from the text_chunk table as text_chunk_id in your original table. You might want another column in your table to keep text that fits within your largest text data type.
CREATE TABLE text_chunk (
id NUMBER,
chunk_sequence NUMBER,
text BIGTEXT)
In a BLOB
BLOBs are very large variable binary or character data, typically documents (.txt, .doc) and pictures (.jpeg, .gif, .bmp), which can be stored in a database. In SQL Server, BLOBs can be text, ntext, or image data type, you can use the text type
text
Variable-length non-Unicode data, stored in the code page of the server, with a maximum length of 231 - 1 (2,147,483,647) characters.
Depending on your situation, a design alternative to consider is saving them as .txt file to server and save the file path to your database.
Use nvarchar(max) to store the whole chat conversation thread in a single record. Each individual text message (or block) is identified in the content text by inserting markers.
Example:
{{UserId: Date and time}}<Chat Text>.
On display time UI should be intelligent enough to understand this markers and display it correctly. This way one record should suffice for a single conversation as long as size limit is not reached.

SQL Parse NVARCHAR Field

I am loading data from Excels into database on SQL Server 2008. There is one column which is in nvarchar data type. This field contains the data as
Text text text text text text text text text text.
(ABC-2010-4091, ABC-2011-0586, ABC-2011-0587, ABC-2011-0604)
Text text text text text text text text text text.
(ABC-2011-0562, ABC-2011-0570, ABC-2011-0575, ABC-2011-0588)
so its text with many sentences of this kind.
For each row I need to get the data ABC-####-####, respectivelly I only need the last part. So e.g. for ABC-2010-4091 I need to obtain 4091. This number I will need to join to other table. I guess it would be enough to get the last parts of the format ABC-####-####, then I should be able to handle the request.
So the example of given above, the result should be 4091, 0586, 0587, 0604, 0562, 0570, 0575, 0588 in the row instead of the whole nvarchar value field.
Is this possible somehow? The text in the nvarchar field differ, but the text format (ABC-####-####) I want to work with is still the same. Only the count of characters for the last part may vary so its not only 4 numbers, but could be 5 or more.
What is the best approach to get these data? Should I parse it in SSIS or on the SQL server side with SQL Query? And how?
I am aware this is though task. I appreciate every help or advice how to deal with this. I have not tried anything yet as I do not know where to start. I read articles about SQL parsing, but I want to ask for best approach to deal with this task.
Stackoverflow is about programming.
Sit down and start programming.
Ok, seriously. That is string parsing and the last part in brackets with multiple fields means no bulk import, it is not a standard CSV file.
Either you use SSIS in SQL Server and program the parsing there or.... you write a program for that.
String maniupation in SQL is the worst part of the language and I would avoid it.
So, yes, sit down and program a routine. Probable the fastest way.
If I understand correctly, "ABS-####-####" will be the value coming through in the column and the numeric part is variable in length.
If that is the case, maybe this will work.
Use a "Derived Column" transformation.
Lets say we call "ABC-####-####" = Column1
SUBSTRING("Column1",(FINDSTRING("Column1","-",2)+1),LEN(Column1)-(FINDSTRING("Column1","-",2)))
If I am not mistaken, that should give you the last # values in a new column no matter how long that value is.
HTH
I have worked this problem out with the following guides:
Split Multi Value Column into Multiple Records &
Remove Multiple Spaces with Only One Space

What data type to use for variable length data (for performance)?

What data type should I use for data that can be very short, eg. html link (think twitter), or very long eg. html blog post (think wordpress).
I am thinking if I use varchar(4000), it maybe too short for a html formated blog entry? but if I use text, it will take up more space and is less efficient?
[update]
i am still condering using MySQL (if PHP 5.3/Zend Framework) or MSSQL (if ASP.NET MVC 2)
MySQL also has a Text data type for storing an arbitrarily large amount of text. You can find more here: The BLOB and TEXT Types
If you are using Micrsoft SQL server 2008 you can use varchar(max).
Edit:
Text is also available but isn't searchable without text indexing..

What MySQL datatype & attributes should be used to store large amounts of html formatted data?

I'm setting up a database using PHPMyAdmin and many fields will be large chunks of HTML.
What MySQL datatype & attributes should be used for fields that store large amounts of HTML data?
TEXT, MEDIUMTEXT, or LONGTEXT
I would recommend against storing large chunks of HTML (or other text data) in a database. I find that it's often far more useful to store files as files, and put a filename in the database instead.
On the other hand, if you're doing everything through phpMyAdmin, you may not have that option available to you.
You really really should start with the documentation, then if you have questions based on the data types you find there, try to ask for some clarification. But it really helps to understand what the datatypes are before asking the question: Documentation here:
http://dev.mysql.com/doc/refman/5.4/en/data-types.html
That said, take a closer look at text and blob. Text will store a large body of textual information (probably a good choice) where blob is designed for binary data. This does make a difference based on the query functions and what data types they operate on.
I think you can store HTML in simple TEXT field. If your html is more then 64KB then you can use MEDIUMTEXT instead.
See also Storage Requirements for String Types for more details about maximum length of stored value.
Also remember than characters in Unicode can require more then 1 byte to store.