I want to insert some data into hive.
under some limits I have to use insert. I can not use load local data ...
INSERT into (DATABASE)::(TABLE) values (a1,b1,c1,d1),(a2,b2,c2,d2) ...
The question is:
some field is very long string , like article content, I have use base64 to handle special characters, now I am looking for the max length that hive-sql support.
Related
I have 100 billion records in a BigQuery table. There is an ID field created using a hash function with 64 characters like this: 1f5ec82dff18c01aac6f4c07feaedf5b4ad43fe8815a5da732d5fe445a788f59. In my BQ table, the datatype of ID is STRING, but at this point, it seems extremely wasteful.
What should the datatype of that column be and how do I convert to it in BQ sql?
I suggest you use the data type to “BYTES”.
BYTES also represent variable-length data but, as the name suggests,
they operate on raw bytes rather than Unicode characters. For that
reason, STRING and BYTE types cannot be used interchangeably.
declare variable1 BYTES(10);
set variable1=cast('HeuTnDaw=q' as BYTES);
select variable1;
You can see more documentation.
I want to store a text of size 70000, but BigSQL Hadoop extrernal table restricts maximum field length to 32762. I do not want to trim or split into multiple columns. Is there any other datatype that let's me to load the full data.
You can use CLOB
String types in DB2
As part of testing, I'm using the Cassandra Python driver for trying to select and delete all rows on one of the tables generated by the Cassandra stress tool (Standard1, on Keyspace1). Standard1 consists of several blob columns.
My approach is to extract the (primary) key for the rows and then run a loop to delete the rows based on that.
The problem I'm facing is that it looks like the Cassandra driver converts the blobs (hex bytes) to strings, so when i try to pass that to the delete statement it fails with "cannot parse 'XXXXXX' as hex bytes".
The data in the table on CQLSH looks like "0x303038333830343432", whereas the below select extracts the keys as, i.e., '069672027'.
Is there any way of preventing the hex bytes from being converted into strings? Any other approach i should be using?
Thanks!
query = SimpleStatement("SELECT (key) FROM \"Standard1\" LIMIT 10", consistency_level=ConsistencyLevel.LOCAL_QUORUM)
rows = session.execute(query)
for row in rows:
query = SimpleStatement("DELETE FROM \"Standard1\" WHERE key = %s", consistency_level=ConsistencyLevel.LOCAL_QUORUM)
session.execute(query, (row.key, ))
When using simple (unprepared) statements, you need to create a buffer from the string in order for the encoder to recognize it as a blob type.
http://datastax.github.io/python-driver/getting_started.html#type-conversions
Try this:
session.execute(query, (buffer(row.key),)
Alternatively, binding a prepared statement would do that implicitly.
A little late to the party, but you can also convert it directly by encoding in hex and prepending 0x. for example:
'0x{}'.format(row.key.encode('hex'))
I working with SQL Server data base in order to store very long Unicode string. The field is from type 'ntext', which theoretically should be limit to 2^30 Unicode characters.
From MSDN documentation:
ntext
Variable-length Unicode data with a maximum string length of 2^30 - 1 (1,073,741,823) bytes. Storage size, in bytes, is two times the string length that is entered. The ISO synonym for ntext is national
text.
I'm made this test:
Generate 50,000 characters string.
Run an Update SQL statement
UPDATE [table]
SET Response='... 50,000 character string...'
WHERE ID='593BCBC0-EC1E-4850-93B0-3A9A9EB83123'
Check the result - what actually stored in the field at the end.
The result was that the field [Response] contain only 43,679 characters. All the characters at the end of the string was thrown out.
Why this happens? How I can fix this?
If this is really the capacity limit of this data type (ntext), which another data type can store longer Unicode string?
Based on what I've seen, you may just only be able to copy 43679 characters. It is storing all the characters, they're in the db(check this with Select Len(Reponse) From [table] Where... to verify this), and SSMS has problem copying more than when you go to look at the full data.
NTEXT datatype is deprecated and you should use NVARCHAR(MAX).
I see two possible explanations:
Your ODBC driver you use to connect to database truncate parameter value when it is too long (try using SSMS)
You write you generate your input string. I suspect you generate CHAR(0) which is Null literal
If second is your case make sure you cannot generate \0 char.
EDIT:
I don't know how you check the length but keep in mind that LEN does not count trailing whitespaces
SELECT LEN('aa ') AS length -- 2
,DATALENGTH('aa ') AS datalength -- 7
Last possible solution I see you do sth like:
SELECT 'aa aaaa'
-- result in SSMS `aa aaaa`: so when you count you lose all multiple whitespaces
Check query below if returns 100k:
SELECT DATALENGTH(ntext_column)
For all bytes; Grid result on right click and click save result to file.
Can confirm. The actual limit is 43679. Had a problem with a subscription service for a week now. Every data looked good, but it still gave us an error that one of the fields have invalid values, even tho, it got correct values in. It turned out that the parameters was stored in NText and it maxed out at 43679 characters. And because we cannot change the database design, we had to make 2 different subscriptions for the same thing and put half of the entities to the other one.
I have 2 columns containing text one will be max 150 chars long and the other max 700 chars long,
My question is, should I use for both varchar types or should I use text for the 700 chars long column ? why ?
Thanks,
The varchar data type in MySQL < 5.0.3 cannot hold data longer than 255 characters. While in MySQL >= 5.0.3 it has a maximum of 65,535 characters.
So, it depends on the platform you're targeting, and your deployability requirements. If you want to be sure that it will work on MySQL versions less than 5.0.3, go with a text type data field for your longer column
http://dev.mysql.com/doc/refman/5.0/en/char.html
An important consideration is that with varchar the database stores the data directly in the table, and with text the database stores a pointer to a separate tablespace in which the data is stored. So, unless you run into the limit of a row length (64K in MySQL, 4-32K in DB2, 8K in SQL Server 2000) I would normally use varchar.
Not sure about mysql specifically, but in MS SQL you definitely should use a VARCHAR for anything under 8000 characters long, if you want to be able to run any sort of comparison on the value in the field. For example this would be possible with a VARCHAR:
select your_column from your_table
where your_column like '%dogs%'
but not with a TEXT field.
More information regarding TEXT field in mysql 5.4 can be found here and more information about the VARCHAR field can be found here.
I'd pick the option based on the type of data being entered. For example, if it's a (potentially) super long username and a biography, I'd do varchar and text. Sure, you're limiting the bio to 700 chars, but what if you add HTML formatting down the road, keeping the 700 char limit but allowing HTML tags for formatting?
Twitter may use text for their tweets, since it could be quicker to add meta data to the posts (e.g. url href's and #user PK's) to cache the additional data instead of caluclate it every rendering.