Why cannot Hive Map key (i16) cast upside when search by key? - hive

There is Hive table that is defined by thrift.
struct A {
1: optional map<i16, string> myMap;
}
I have tried a few queries to search through this table.
// this one gets an error:
select myMap[10] from A where myMap[10] is not null
Line 1:74 MAP key type does not match index expression type '10'
And the following two returns the right results.
// this one works well
select myMap[10S] from A where myMap[10S] is not null
// this one works well
select myMap[10Y] from A where myMap[10Y] is not null
I know 10Y means tinyint and 10S means smallint, and i16 is smallint.
But why does Hive cast a smallint to tinyint instead of int?
I think this may cause some information loss or number overflow.

Related

Selecting and comparing columns which contain constants

Follow up to this question.
Say that on postgres, I have a table TABLE1 containing the columns id (integer), name (string):
create table table1 (id int primary key,name varchar(100));
insert into table1(id,name) values(5,'a');
insert into table1(id,name) values(6,'b');
insert into table1(id,name) values(7,'c');
insert into table1(id,name) values(55,'a');
And attempt to run the following queries:
with base (x) as (select 5 as x from table1)
select table1.name from base, table1 where table1.id = base.x;
with base (x) as (select 'a' as x from table1)
select table1.name from base, table1 where table1.name = base.x;
On sqlfiddle, the former yields a result, whilst the latter fails with the message:
ERROR: failed to find conversion function from unknown to text
On postgres 13.3 which I have installed locally, however, neither errs. (Nor similar queries on oracle and sqlite.)
My first question is, does this error stem from an issue within sqlfiddle, or has it persisted within earlier versions of postgres?
And second, does this count as a bug? Generally, are constant columns (or values) in SQL assumed to be typeless, and any operations on them are undefined unless there happens to be an implicit / explicit cast in place?
Per my understanding, using constant columns for joining is generally inadvisable as it thwarts indexing, but it seems a little odd in any programming language to have difficulties telling one constant format apart from another.
The cause is that a string literal in PostgreSQL is of type unknown, not text, because all data types have a string representation. Normally, the actual type is determined by context.
A number literal, however, has data type integer, bigint or numeric, based on its size and the presence of a fractional part:
SELECT pg_typeof('a');
pg_typeof
═══════════
unknown
(1 row)
SELECT pg_typeof(5);
pg_typeof
═══════════
integer
(1 row)
Now the subquery select 'a' as x from table1 has no context to determine a better type than unknown for the result column, which makes the comparison in the join condition fail.
Since this is strange and undesirable behavior, PostgreSQL v10 has demoted unknown from being a regular data type (you could create columns of that type!) to a “pseudo-type”. It also coerced unknown to text in SELECT lists, which is why you cannot reproduce that on v10 or later.

Postgres Jsonb datatype

I am using PostgreSQL to create a table based on json input given to my Java code, and I need validations on JSON keys that is passed on the database just like oracle but problem here is the whole jsonb datatype column name lets say data is single column. Consider I get json in below format -
{
"CountActual": 1234,
"CountActualCharacters": "thisreallyworks!"
"Date": 09-11-2001
}
Correct datatype of above json:- number(10), varchar(50), date
Now to put validations on I'm using constraints
Query 1 -
ALTER TABLE public."Detail"
ADD CONSTRAINT "CountActual"
CHECK ((data ->> 'CountActual')::bigint >=0 AND length(data ->> 'CountActual') <= 10);
--Working fine.
But for Query 2-
ALTER TABLE public."Detail"
ADD CONSTRAINT "CountActualCharacters"
CHECK ((data ->> 'CountActualCharacters')::varchar >=0 AND length(data ->> 'CountActualCharacters') <= 50);
I'm getting below error -
[ERROR: operator does not exist: character varying >= integer
HINT: No operator matches the given name and argument type(s).
You might need to add explicit type casts.]
I tried another way also like -
ALTER TABLE public."Detail"
ADD CONSTRAINT CountActualCharacters CHECK (length(data ->> 'CountActualCharacters'::VARCHAR)<=50)
Above constraints works successfully but I don't think this is the right way as my validation is not working when inserting the data -
Insert into public."Detail" values ('{"
CountActual":1234,
"CountActualCharacters":789
"Date": 11-11-2009
}');
And its shows insert successfully when passing in 789 in CountActualCharacters instead of varchar like "the78isgood!".
So please can anyone suggest me proper constraint for PostgreSQL for varchar just like number that I have written in Query 1.
And if possible for Date type also with DD-MM-YYYY format.
I just started with PostgresSQL, forgive me if I'm sounded silly but I'm really stuck here.
You can use jsonb_typeof(data -> 'CountActualCharacters') = 'string'
Note the single arrow, as ->> will try to convert anything to string.
You can read more about JSON functions in PostgreSQL here:
https://www.postgresql.org/docs/current/static/functions-json.html

How can I query a table based on bits in a column?

I created this table:
CREATE TABLE [dbo].[Subject] (
[SubjectId] INT NOT NULL,
[Name] NVARCHAR (50) NOT NULL,
[Type] BINARY(10) NULL,
CONSTRAINT [PK_Subject] PRIMARY KEY CLUSTERED ([SubjectId] ASC)
);
My thinking is that the Type column would contain a binary string like this:
0000101010
1000001010
0000001011
1000000011
What I would then like to be able to do is to for example query this table to look for particular bit being set but I am not sure how to do this. So for example how could I query the data to see what rows matched the fourth bit (from right) being set to a 1 and the second bit (from right) being set to a 1 which would result in three rows returned from the data above.
I would not recommend doing bit fiddling unless you really know what you are doing.
Normally, flags that are tinyint are quite sufficient for most purposes. However, if you have a bunch of binary flags and are very concerned about space, then declare them each independently:
Flag1 bit not null,
Flag2 bit not null,
. . .
This gives a name to each "bit" and let's the database manage the bit fiddling.
In any case, the answer to your specific question are the bit-wise operators, which are documented here.
You can query your data using bitwise operators in SQL.
See
https://msdn.microsoft.com/en-us/library/ms176122.aspx
Something like
SELECT * FROM table WHERE Type & 0x0001000 > 0
You may have to experiment with the constant you use to AND against (0x0001000). But this is saying - get me all records where the fourth bit from the right is a 1. (0x prefix denotes a binary representation)

Define dataType of column that is really big SQL Server

I have data greater to this number, if I attempt to get several sums of them like::
1,22826520941614E+24+1,357898350941614E+34+1,228367878888764E+26 I get as Result NULL, How to define the table Datatype for that kind of fields??
I am using float, but it does not work.
If you're getting NULL back, it's not the data type. It's because you have a null value in one of the rows of data. NULL + anything is NULL.
Change your Sum() to include a WHERE YourNumericColumn IS NOT NULL, or use COALESCE().
A float is sufficiently large to contain data of that range. It can store binary floating-point values from -1.79E+308 to 1.79E+308. I suspect an error elsewhere in your statement.

TSQL Arithmetic overflow using BIGINT

Can someone clarify for me why do I get an error when I try to set the variable #a in the example below?
DECLARE #a BIGINT
SET #a = 7*11*13*17*19*23*29*31
/*
ERROR:
Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
*/
What I could figure out til now is that, internaly, SQL starts doing the math evaluating the multiplication and placing the temporary result into a INT then it casts it to a BIGINT.
However, if I add a 1.0 * to my list of numbers, there is no error, hence I believe that for this time SQL uses float as a temporary result, then cast it to BIGINT
DECLARE #b BIGINT
SET #b = 1.0 * 7*11*13*17*19*23*29*31
/*
NO ERROR
*/
Frankly, I don't see anything wrong with the code... it's so simple...
[ I am using SQL 2008 ]
[EDIT]
Thanks Nathan for the link.
That's good information I didn't know about, but I still don't understand why do I get the error and why do I have do "tricks" to get a simple script like this working.
Is it something that I should know how to deal with as a programmer?
Or, this a bug and, if so, I will consider this question closed.
When you're doing calculations like this, the individual numbers are stored just large enough to hold that number, ie: numeric(1,0). Check this out:
Caution
When you use the +, -, *,
/, or % arithmetic operators to
perform implicit or explicit
conversion of int, smallint, tinyint,
or bigint constant values to the
float, real, decimal or numeric data
types, the rules that SQL Server
applies when it calculates the data
type and precision of the expression
results differ depending on whether
the query is autoparameterized or not.
Therefore, similar expressions in
queries can sometimes produce
different results. When a query is not
autoparameterized, the constant value
is first converted to numeric, whose
precision is just large enough to hold
the value of the constant, before
converting to the specified data type.
For example, the constant value 1 is
converted to numeric (1, 0), and the
constant value 250 is converted to
numeric (3, 0).
When a query is autoparameterized, the
constant value is always converted to
numeric (10, 0) before converting to
the final data type. When the /
operator is involved, not only can the
result type's precision differ among
similar queries, but the result value
can differ also. For example, the
result value of an autoparameterized
query that includes the expression
SELECT CAST (1.0 / 7 AS float) will
differ from the result value of the
same query that is not
autoparameterized, because the results
of the autoparameterized query will be
truncated to fit into the numeric (10,
0) data type. For more information
about parameterized queries, see
Simple Parameterization.
http://msdn.microsoft.com/en-us/library/ms187745.aspx
Edit
This isn't a bug in SQL Server. From that same page, it states:
The int data type is the primary integer data type in SQL Server.
and
SQL Server does not automatically promote other integer data types (tinyint, smallint, and int) to bigint.
This is defined behavior. As a programmer, if you have reason to believe that your data will overflow the data type, you need to take precautions to avoid that situation. In this case, simply converting one of those numbers to a BIGINT will solve the problem.
DECLARE #a BIGINT
SET #a = 7*11*13*17*19*23*29*CONVERT(BIGINT, 31)
In the first example SQL Server multiplies a list of INTs together, and discovers the result is too big to be an INT and the error is generated. In the second example, it notices there's a float so it converts all the INTs to floats first and then does the multiplication.
Similarly, you can do this:
DECLARE #a BIGINT,
#b BIGINT
set #b = 1
SET #a = #b*7*11*13*17*19*23*29*31
This works fine because it notices there's a BIGINT, so it converts all the INTs to BIGINTs and then does the multiplication.