Negation of "<=>" hive operator does not work properly? - hive

I have to add some intro here, because SO says that my question is mostly code ;)
Here is my test:
CREATE TABLE test (
c1 string,
c2 string
);
INSERT INTO test VALUES
(NULL, NULL),
('A', NULL),
('A', 'B'),
('A', 'A')
;
SELECT *, (c1 <=> c2) as is_equal, (NOT (c1 <=> c2)) as is_not_equal
FROM test;
and results:
test.c1,test.c2,is_equal,is_not_equal
NULL,NULL,true,NULL
A,NULL,false,NULL
A,B,false,true
A,A,true,false
I'd rather expect:
test.c1,test.c2,is_equal,is_not_equal
NULL,NULL,true,false
A,NULL,false,true
A,B,false,true
A,A,true,false
Is this a bug in hive?
EDIT
This query works as expected:
SELECT *,
(c1 is null and c2 is null) or (c1 is not null and c2 is not null and c1 = c2) as is_equal,
(NOT ((c1 is null and c2 is null) or (c1 is not null and c2 is not null and c1 = c2))) as is_not_equal
FROM test;
EDIT2
We use Hive 1.2.1 (from HDP 2.6.2)

Related

Difference between delete statements

DELETE a
FROM TableA a
JOIN TableB b ON a.Field1 = b.Field1 AND a.Field2 = b.Field2;
vs.
DELETE
FROM TableA
WHERE Field1 IN (
SELECT Field1
FROM TableB
) AND Field2 IN (
SELECT Field2
FROM TableB
);
The logical conditions of the two statements are different.
The first statement will delete any row in TableA if both it's Field1 and Field2 correspond to the equivalent columns of a row in TableB.
The second statement will delete any row in TableA if the value of Field1 exists in Field1 of TableB, and the value of Field2 exists in Field2 of TableB - but that doesn't have to be in the same row.
It's easy to see the difference if you change the delete to select.
Here's an example. First, create and populate sample tables (Please save us this step in your future questions):
CREATE TABLE A
(
AInt int,
AChar char(1)
);
CREATE TABLE B
(
BInt int,
BChar char(1)
);
INSERT INTO A (AInt, AChar) VALUES
(1, 'a'), (2, 'a'), (3, 'a'),
(1, 'b'), (2, 'b'), (3, 'b');
INSERT INTO B (BInt, BChar) VALUES
(1, 'a'),
(2, 'b'),
(3, 'c');
The statements (translated to select statements):
SELECT A.*
FROM A
JOIN B
ON AInt = BInt AND AChar = BChar;
SELECT *
FROM A
WHERE AInt IN (
SELECT BInt
FROM B
) AND AChar IN (
SELECT BChar
FROM B
);
Results:
AInt AChar
1 a
2 b
AInt AChar
1 a
2 a
3 a
1 b
2 b
3 b
And you can see a live demo on DB<>Fiddle

Is a Case statement what I need?

I am new to SQL and trying to merge two columns into a new table based upon a simple logic. I have been attempting a variety of CASE/WHEN statements but am hitting a wall.
Table: program / Columns C1, C2
The logic must be:
Use a case expression:
select c1, c2,
case when c1 = 'Yes' and c2 = 'Yes' then 'Both'
when c1 = 'Yes' then 'Regional'
when c2 = 'Yes' then 'Local'
else 'No'
end
from program
(The value for the first fulfilled condition will be returned.)
Is this what you are looking for?:
-- create test data in table variable #T
declare #T table
(
C1 nvarchar(3) null,
C2 nvarchar(3) null
)
insert into #T values (null, null)
insert into #T values (null, 'YES')
insert into #T values ('YES', null)
insert into #T values ('YES', 'YES')
insert into #T values ('NO', 'NO')
insert into #T values ('NO', 'YES')
insert into #T values ('YES', 'NO')
insert into #T values ('YES', 'YES')
-- get the wanted value in RESULT columns based on C1 and C2
select
C1, C2,
case
when ISNULL(C1, 'NO') = 'NO' and ISNULL(C2, 'NO') = 'NO' then 'NO'
when ISNULL(C1, 'NO') = 'NO' and C2 = 'YES' then 'LOCAL'
when C1 = 'YES' and ISNULL(C2, 'NO') = 'NO' then 'REGIONAL'
when C1 = 'YES' and C2 = 'YES' then 'BOTH'
end as RESULT
from #T
This will work in SQL Server.

Update rows in table 'A' with value 'Y', for matching records from a different table 'B"

TESTDTA is the test database.
F41= First table
F42=Second table
Data from F41 TABLE
FLAG STORE NAME NUMBER
S 1 A A1
S 2 B B2
S 3 C C3
Data from F42 TABLE
STORE NAME NUMBER
1 A A1
2 B B2
3 C C3
4 D D4
I need to update values for the column "FLAG" in the tabel "F41" to value "P" if there is a matching record in the table "F42" .
I tried below SQL. But it has syntax error.
UPDATE TESTDATA.F41,TESTDATA.F42 SET F41.FLAG='P'
WHERE F41.NAME=F42.NAME AND F41.NUMBER=F42.NUMBER
Can anyone help me to write this SQL?
Thanks in advance for your help
Perhaps a little cleaner
UPDATE F41 SET FLAG='P'
FROM F41 A
JOIN F42 B on A.NAME=B.NAME AND A.NUMBER=B.NUMBER
if your DBMS is Oracle and If by "matching records" you mean all column values in F41 match F42 then this might answer your problem:
DDL:
create tabLe F41
(
FLAG varchar2(10)
,STORE_x number
,NAME_x varchar2(10)
,NUMBER_x varchar2(10)
);
create tabLe F42
(
STORE_x number
,NAME_x varchar2(10)
,NUMBER_x varchar2(10)
);
Note: Added _x to the Columns STORE, NAME, NUMBER because those are reserved words in some DBMS.
DML:
insert into F41 (FLAG, STORE_x, NAME_x, NUMBER_x) values ('S', 1, 'A', 'A1');
insert into F41 (FLAG, STORE_x, NAME_x, NUMBER_x) values ('S', 2, 'B', 'B2');
insert into F41 (FLAG, STORE_x, NAME_x, NUMBER_x) values ('S', 3, 'C', 'C3');
insert into F42 (STORE_x, NAME_x, NUMBER_x) values (1, 'A', 'A1');
insert into F42 (STORE_x, NAME_x, NUMBER_x) values (2, 'B', 'B2');
insert into F42 (STORE_x, NAME_x, NUMBER_x) values (3, 'C', 'C3');
insert into F42 (STORE_x, NAME_x, NUMBER_x) values (4, 'D', 'D4');
Update Statement Using Exists:
update F41 t1
set flag = 'P'
WHERE EXISTS (SELECT 1
FROM F42 t2
WHERE t2.STORE_X = t1.STORE_X
AND t2.NAME_X = t1.NAME_X
AND t2.NUMBER_X = t1.NUMBER_X);
Hope this Helps!
You can do this with nothing more fancy than a WHERE clause:
UPDATE TESTDATA.F41
SET FLAG = 'P'
WHERE EXISTS (SELECT 1
FROM TESTDATE.F42
WHERE F41.NAME = F42.NAME AND F41.NUMBER = F42.NUMBER
);
This is standard SQL and should work in any database.
UPDATE TESTDATA.F41 SET F41.FLAG='P'
FROM TESTDATA.F42
WHERE F41.NAME = F42.NAME AND F41.NUMBER = F42.NUMBER
UPDATE TESTDATA.F41
SET FLAG = 'P'
WHERE EXISTS (SELECT 1
FROM TESTDATE.F42
WHERE F41.NAME = F42.NAME AND F41.NUMBER = F42.NUMBER
);

Select values between two columns range

I have a table like this:
i1 i2
----------
1 a
1 b
1 c
1 d
2 x
3 y
4 a
4 b
4 c
I want to select the rows between 1 c and 4 a. The result should be:
1 c
1 d
2 x
3 y
4 a
How can I do this?
I would do this as:
select t.*
from t
where (i1 > 1 or (i1 = 1 and i2 >= 'c')) and
(i1 < 4 or (i1 = 4 and i2 <= 'a'));
If you are using a database which supports row number functionality, then one option is to create a CTE of your table with row numbers, according to the order you specified (i.e. ascending order by i1 first, then by i2 second).
Then, use two subqueries to identify the row numbers for 1c and 4a. These row numbers constitute the range which you want to select.
;WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY i1, i2) AS RowNumber, i1, i2
FROM yourTable
)
SELECT *
FROM cte t
WHERE t.RowNumber >= (SELECT RowNumber FROM cte WHERE i1=1 AND i2='c') AND
t.RowNumber <= (SELECT RowNumber FROM cte WHERE i1=4 AND i2='a')
Not so beautiful way to do it ... but
create procedure GetRangeBetween (#i11 int, #i12 char, #i21 int, #i22 char)
AS
BEGIN
if object_id ('tempdb..#Test') is not null drop table #Test
create table #Test (i1 int, i2 nvarchar(10), [Rank] int)
insert into #Test(i1, i2)
values
(1, 'a'), (1, 'b'), (1, 'c'),
(1, 'd'), (2, 'x'), (3, 'y'),
(4, 'a'), (4, 'b'), (4, 'c')
update #Test
set Rank = src.[srcRank]
from #Test t
join (select *, row_number() over (order by i1) [srcRank] from #Test) src
on t.i1 = src.i1 and t.i2 = src.i2
declare #Rank1 int = (select [Rank] from #Test where i1 = #i11 and i2 = #i12)
declare #Rank2 int = (select [Rank] from #Test where i1 = #i21 and i2 = #i22)
select i1, i2 from #Test
where (i1 between #i11 and #i21) and ([Rank] between #Rank1 and #Rank2)
END
Then you just execute it with ....
execute GetRangeBetween 1, 'c', 4, 'a'
Certainly not the optimal solution, but this query should work:
select i1, i2
from tbl
where (i1 > 1 and i1 < 4)
or (i1 = 1 and i2 >='c')
or (i1 = 4 and i2 <='a')
Please note that (1, c) and (4, a) are included into the results. Change comparison operators if you don't need to include borders.

SQL to get the first occurence of a value in each column within grouped elements

I am bulding a report using SQL Server and Reporting Services. I have a dataset that looks something like the following where all columns are of type VARCHAR:
Line Code Col1 Col2 Col3 Col4
============================================
1 xxx 1.1
1 xxx 2.3
1 xxx 8.7
1 xxx 3.4
2 yyy 5.3
2 yyy !err
2 yyy 6.5
2 yyy 9.1
I have a report that should have an output like this:
Line Code Col1 Col2 Col3 Col4
============================================
1 xxx 1.1 2.3 8.7 3.4
2 yyy 5.3 !err 6.5 9.1
So I basically need to perform a grouping on the "Line" column including the first non-empty values from each column within the group.
If the columns had been of a numeric type I could have used SUM to get to the desired outcome, but since I am dealing with VARCHAR I cannot use SUM. I can also not convert the VARCHAR to a numeric value because then, if my value is a non-numeric value (such as is suggested by "!err" in my example) then it won't be displayed.
What query can I use to get the desired outcome?
Since there can only be one col-n values for a given line/code, this will work for you:
select
line,
code,
max(col1) col1,
max(col2) col2,
max(col3) col3,
max(col4) col4
from mytable
group by line, code
This works for me:
declare #t table (line int, code varchar(4), c1 varchar(5), c2 varchar(5), c3 varchar(5), c4 varchar(5))
insert into #t (line, code, c1, c2, c3, c4) values (1, 'xxx', '1.5', '', '', '')
insert into #t (line, code, c1, c2, c3, c4) values (1, 'xxx', '', 'err!', '', '')
insert into #t (line, code, c1, c2, c3, c4) values (1, 'xxx', '', '', '2.3', '')
insert into #t (line, code, c1, c2, c3, c4) values (1, 'xxx', '', '', '', '3.5')
insert into #t (line, code, c1, c2, c3, c4) values (2, 'yyy', '1.2', '', '', '')
insert into #t (line, code, c1, c2, c3, c4) values (2, 'yyy', '', '0.8', '', '')
insert into #t (line, code, c1, c2, c3, c4) values (2, 'yyy', '', '', 'err!', '')
insert into #t (line, code, c1, c2, c3, c4) values (2, 'yyy', '', '', '', '4.6')
/* IF only one value in each column for a given code */
SELECT M1.Line, M1.Code
, (SELECT c1 as cc FROM #t WHERE code = M1.code AND c1 <> '') as c1
, (SELECT c2 as cc FROM #t WHERE code = M1.code AND c2 <> '') as c2
, (SELECT c3 as cc FROM #t WHERE code = M1.code AND c3 <> '') as c3
, (SELECT c4 as cc FROM #t WHERE code = M1.code AND c4 <> '') as c4
FROM #t M1
GROUP BY M1.Line, M1.Code