Postgres' text column doesn't like my zlib compressed data - sql

Is there a better data type to be using to store a zlib compressed string in Postgresql?

Use bytea "The bytea data type allows storage of binary strings"

Use a bytea. Zip compressed data is not a text.

Related

Unknown postgis gps data format

A third-party program stores tracking data to the db, but I not understand the format. I know that postgis is working there and this column should contain GPS location(s) and maybe additional data.
Example (db dump as csv):
"Location","DateTime"
"010100000023E37C4023E33C40417F41EF407F4740","2020-05-24 15:33:53+00"
How can I decode Location column data?
This is Well-known binary format.
See PostGIS methods for WKB: ST_AsBinary, ST_GeomFromWKB.
WKT methods: ST_AsText, ST_GeomFromText.
The example in WKT format: POINT(28.887256651392033 46.99416914651966).
For .Net can use Geo, NetTopologySuite.IO.TinyWKB.

What is the limit of BINARY data types in Hive 1.2?

I did not find much about BINARY data types in apache docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
I created a table with BINARY column using-
create table table1(col1 binary);
After fetching metadata via JDBC I found,
columnSize:2147483647
Is there any official document for this?
From the Binary DataType Proposal :
How is 'binary' represented internally in Hive
Binary type in Hive will map to 'binary' data type in thrift.
Primitive java object for 'binary' type is ByteArrayRef
PrimitiveWritableObject for 'binary' type is BytesWritable
And since ByteArrayRef holds a reference to a byte array, the answer should be Integer.MAX_VALUE - 5, see here

Parquet Binary Data type

I have a question regarding the Binary data type. I am trying to write a Parquet Schema for my MR job to create the Parquet file contrary to have Hive or Impala create one. I see some references to a Binary type which I do not see in Parquet
Is binary an alias to BYTE_ARRAY?
Also is UTF-8 a default encoding on Binary data types?
Raw bytes are stored in Parquet either as a fixed-length byte array (FIXED_LEN_BYTE_ARRAY) or as a variable-length byte array (BYTE_ARRAY, also called binary). Fixed is used when you have values with a constant size, like a SHA1 hash value. Most of the time, the variable-length version is used.
Strings are encoded as variable-length binary with the UTF8 type annotation to indicate how to interpret the raw bytes back into a String. UTF8 is the only encoding supported in the format, but not every binary uses UTF8 because not all binary fields are storing string data.
There is no data type in parquet-column called BYTE_ARRAY.
I saw their PrimitiveType in latest package but could not see it.
Could not write byte[] in binary as well.

Convert sql binary (16) to utf-8 byte in .net

I have values stored in a sql database with datatype of binary(16), they come over into the .NET application (using Entity Framework) as type System.Data.Linq.Binary. I'd like to convert this binary representation of my data to data type byte[] without losing any data and preferably using UTF-8 encoding. Is this not built into the .NET framework? Must I convert it to some intermediary data type first before being able to get my byte array?
Binary has nothing to do with UTF-8.
binary(16) means 16 bytes of binary data. There is a Binary::ToArray method to get a byte array.
https://msdn.microsoft.com/en-us/library/system.data.linq.binary.toarray

How much difference does BLOB or TEXT make in comparison with VARCHAR()?

If I don't know the length of a text entry (e.g. a blog post, description or other long text), what's the best way to store it in MYSQL?
TEXT would be the most appropriate for unknown size text. VARCHAR is limited to 65,535 characters from MYSQL 5.0.3 and 255 chararcters in previous versions, so if you can safely assume it will fit there it will be a better choice.
BLOB is for binary data, so unless you expect your text to be in binary format it is the least suitable column type.
For more information refer to the Mysql documentation on string column types.
use TEXT if you want it treated as a character string, with a character set.
use BLOB if you want it treated as a binary string, without a character set.
I recommend using TEXT.