hive queries is giving wrong result for a condition is not null with many or conditions - hive

I need to exculde all the rows having null in few specified column in hive managed table.
when is use "col is not null" or "not isdbnull(col)" with one or two columns it worked fine. But i need to check many col, So when add more or conditions in query, it ignores null condition and gives all rows.
I decide to understand the cause, I reach at conclusion that if all the columns having null same time will give right select result. if any of the isdbnull(col) condition fails will include all rows also which is still having nulls and specified in query with or condition.
Any clue much appreciated.

You mentioned you used "or" instead of "and" in your query. So you did "(not A) or (Not B)" which is equivalent to "not (A and B)". This will require both to be null. This is different than "not (A or B)" which is the same as "(not A) and (not B)" which is how I wrote the query below. See De Morgans laws for a further explanation.
If you want to select all rows that have non nulls then do this:
select col1, col2, col3 from table
where col1 is not null and col2 is not null and col3 is not null;
Additionally if you constitute an empty string as a null value then you can:
Select col1 .... where col1 != '';
I have seen people also do:
Select col1 .... where length(col1) > 0;
How does Hive understand nulls? An empty string is interpreted as empty by Hive, not as NULL. An empty string could be have a different meaning to an application than a NULL so they are interpreted differently.
When you load data the default Missing values are represented by the special value NULL. To import data with NULL fields, check documentation of the SerDe used by the table. The default Text Format uses LazySimpleSerDe which interprets the string \N as NULL when importing. This means you should have \N as values to represent nulls when loading hive.
You can modify this ("serialization.null.format"="") when creating a table to let hive know you have some other value to represent null. In the case here you can see it was set to "" for nulls.
Good luck!

Related

Need to replace all the null values with '0' in Vertica

I have a vertica table, "CUSTOMER" which contains around 10 columns. Each column contains few null values. So I have to write one query which will replace all the null values to '0'.
Is it possible to do it in vertica. Can anyone please help me on that.
You use coalesce():
select coalesce(col1, 0) as col1, . . .
from t;
You can incorporate similar logic into an update as well.
In a SELECT, as #GordonLinoff says, you use COALESCE(), or the slightly faster NVL(), IFNULL() or ISNULL() functions (they are all synonyms of each other and take exactly two arguments, while COALESCE() is more flexible - at a cost - with a variable-length argument list, returning the first non-null value of a list of arguments of varying length).
For updating, strive to update only the rows you need to update, and go, for each column:
UPDATE t SET col1=0 WHERE col1 IS NULL;
UPDATE t SET col2=0 WHERE col2 IS NULL;
Well, in an extreme case, you might end up updating the same row as often as its number of columns, then you have won nothing - but it's worth planning to minimise how often you update.
Or, you might consider:
UPDATE t SET
col1 = NVL(col1,0)
, col2 = NVL(col2,0)
, col3 = NVL(col3,0)
[...]
WHERE col1 IS NULL
OR col2 IS NULL
OR col3 IS NULL
[...]
;
Being columnar, and due to the fact that each UPDATE, in Vertica, is a DELETE and an INSERT anyway, it makes no difference if you update just one column or all columns.

SELECT * FROM Employees WHERE NULL IS NULL; SELECT * FROM Employees WHERE NULL = NULL;

I have recently started learning oracle and sql.
While learning I encountered a couple of queries which my friend was asked in an interview.
SELECT *
FROM Employees
WHERE NULL IS NULL;
this query yields all the rows in the Employees table.
As for as I have understood Oracle searches data in columns, so, NULL, is it treated as a column name here?
Am I correct when I say that Oracle searches for data in columns?
How come Oracle gives all the rows in this query?
In WHERE clause, is it not must that the left hand side of a condition be a COLUMN NAME?
Shouldn't it throw an error?
SELECT *
FROM Employees
WHERE NULL = NULL;
gives NO ROWS SELECTED.
Well, I understand that I can not compare a NULL value using operators except IS NULL and IS NOT NULL.
But why should it yield a result and not an error.
Could somebody explain me this.
Does Oracle treat NULL as a column as well as empty cells?
A where clause consists of conditional expressions. There is no requirement that a conditional expression consist of a column name on either side. In fact, although usually one or both sides are columns, it is not uncommon to have expressions that include:
subqueries
parameters
constants
scalar functions
One common instance is:
WHERE 1 = 1 AND . . .
This is a sign of automatically generated code. It is easier for some programmers to knit together conditions just by including AND <condition> but the clause needs an anchor. Hence, 1 = 1.
The way the WHERE clause works conceptually is that the clause is evaluated for each row produced by the FROM. If the clause evaluates to TRUE, then the row is kept in the result set (or for further processing). If it is FALSE or NULL, then the row is filtered out.
So, NULL IS NULL evaluates to TRUE, so all rows are kept. NULL = NULL evaluates to NULL, so no rows are kept.
NULL is NULL is always true, NULL = NULL is always false. Also, you aren't testing against any columns in either query (thus you are only going to get everything or nothing).

is there any difference between the queries

select field from table where field = 'value'
select field from table where field in ('value')
The reason I'm asking is that the second version allow me to use the same syntax for null values, while in the first version I need to change the condition to 'where field is null'...
When you are comparing a field to a null like field_name=NULL you are comparing to a known data type from a field say varchar to not only an unknown value but also an unknown data type as well, that is, for NULL values. When comparison like field_name=NULL again implies therefore a checking of data type for both and thus the two could not be compared even if the value of the field is actually NULL thus it will always result to false. However, using the IS NULL you are only comparing for the value itself without the implied comparison for data type thus it could result either to false or true depending on the actual value of the field.
See reference here regarding the issue of NULL in computer science and here in relation to the similarity to your question.
Now, for the IN clause (i.e. IN(NULL)) I don't know what RDBMS you are using because when I tried it with MS SQL and MySQL it results to nothing.
See MS SQL example and MySQL example.
There is no difference in your example. The second, slightly longer, query is not usually used for a single value, it is usally seen for multiple values, such as
select field from table where field in ('value1', 'value2')
yes there is difference in both this queries. In first statment you can insert only 1 value in where clause "where field = 'value'" but in second statement in where field you can insert many values using IN clause "where field in (value1,value2..)"
Examples:
1) select field from table where field ='value1';
2) select field from table where field in ('value1', 'value2')
To check null values
SELECT field
FROM tbl_name
WHERE
(field IN ('value1', 'value2', 'value3') OR field IS NULL)

PL/SQL Oracle condition equals

I think I'm encountering a fairly simple problem in PL/SQL on an Oracle Database(10g) and I'm hoping one of you guys can help me out.
I'm trying to explain this as clear as possible, but it's hard for me.
When I try to compare varchar2 values of 2 different tables to check if I need to create a new record or I can re-use the ID of the existing one, the DB (or I) compares these values in a wrong way. All is fine when both the field contain a value, this results in 'a' = 'a' which it understands. But when both fields are NULL (or '' which Oracle will turn into NULL) it can not compare the fields.
I found a 'solution' to this problem but I'm certain there is a better way.
rowTable1 ROWTABLE1%ROWTYPE;
iReUsableID INT;
SELECT * INTO rowTable1
FROM TABLE1
WHERE TABLE1ID = 'someID';
SELECT TABLE2ID INTO iReUsableID
FROM TABLE2
WHERE NVL(SOMEFIELDNAME,' ') = NVL(rowTable1.SOMEFIELDNAME,' ');
So NVL changes the null value to ' ' after which it will compare in the right way.
Thanks in advance,
Dennis
You can use LNNVL function (http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions078.htm) and reverse the condition:
SELECT TABLE2ID INTO iReUsableID
FROM TABLE2
WHERE LNNVL(SOMEFIELDNAME != rowTable1.SOMEFIELDNAME);
Your method is fine, unless one of the values could be a space. The "standard" way of doing the comparison is to explicitly compare to NULL:
WHERE col1 = col2 or col1 is null and col2 is null
In Oracle, comparisons on strings are encumbered by the fact that Oracle treats the empty string as NULL. This is a peculiarity of Oracle and not a problem in other databases.
In Oracle (or any RDBMS I believe), one NULL is not equal to another NULL. Therefore, you need to use the workaround that you have stated if you want to force 2 NULL values to be considered the same. Additionally, you might want to default NULL values to '' (empty) rather than ' ' (space).
From Wikipedia (originally the ISO spec, but I couldn't access it):
Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.
As mentioned by Jan Spurny, you can use LNNVL for comparison. However, it would be wrong to say that a comparison is actually being made when both values being compared are NULL.
This is indeed a simple and usable way to compare nulls.
You cannot compare NULLS directly since NULL is not equal NULL.
You must provide your own logic who you would like to compare, what you've done with NVL().
Take in mind, you are treating NULLS as space, so ' ' in one table would be equal to NULL in another table in your case.
There are some other ways (e.g. LNNVL ) but they are not some kind of a "better" way, I think.

how to filter in sql script to not include any column null

imagine there are 50 columns. I dont wan't any row that includes a null value. Are there any tricky way?
SQL 2005 server
Sorry, not really. All 50 columns have to be checked in one form or another.
Column1 IS NOT NULL AND ... AND Column50 IS NOT NULL
Of course, under these conditions why not disallow NULLs in the first place by having NOT NULL in the table definition
If it's SQL Server 2005+ you can do something like:
SELECT fields
FROM MyTable
WHERE stuff
EXCEPT -- This excludes the below results
SELECT fields
FROM MyTable
WHERE (Col1 + Col2 + Col3....) IS NULL
Adding a null to a value results in a null, so the sum of all your columns will be NULL.
This may need to change based on your data types, but adding NULL to either a char/varchar or a number will result in another NULL.
If you are looking at the values not being null, you can do this in the select statement.
SELECT ISNULL(firstname,''), ISNULL(lastname,'') FROM TABLE WHERE SOMETHING=1
This will replace nulls with string blanks. If you want another value use: ISNULL(firstname,'empty') for example. You can use anything where the word empty is.
I prefer this query
select *
from table
where column1>''
and column2>''
and (column3>'' or column3<'')
Allows sql server to use an index seek if the proper index/es exist. you would have to do the syntext for column 3 for any numeric values that could be negative.