Hive query with WHERE clause not working (ORC table)

Hive query with WHERE clause not working (ORC table) - hive

The table being queried:
describe formatted hfe_reference.dim_number;
The details:
col_name data_type comment
# col_name data_type comment
number int Format: <integer> | Example: 0 | Description: Ascending number series
# Detailed Table Information NULL NULL
Database: hfe_reference NULL
Owner: admin NULL
CreateTime: Tue Mar 20 04:19:39 UTC 2018 NULL
LastAccessTime: UNKNOWN NULL
Protect Mode: None NULL
Retention: 0 NULL
Location: xxxxxxxxx NULL
Table Type: MANAGED_TABLE NULL
Table Parameters: NULL NULL
SORTBUCKETCOLSPREFIX TRUE
auto.purge true
comment Holds a number series that can be used to join onto for calculations
immutable true
numFiles 5
orc.bloom.filter.columns number
orc.bloom.filter.fpp 0.05
orc.compress SNAPPY
orc.compress.size 262144
orc.create.index true
orc.row.index.stride 500000
orc.stripe.size 67108864
tag_created_by xxxxxxx
tag_created_date 3/03/2018
tag_release_environment xxxxxxx
tag_release_modified_by xxxxxxx
tag_release_timestamp 18:46.5
totalSize 884634582
transient_lastDdlTime 1522819180
NULL NULL
# Storage Information NULL NULL
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde NULL
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat NULL
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat NULL
Compressed: No NULL
Num Buckets: 1 NULL
Bucket Columns: [number] NULL
Sort Columns: [Order(col:number, order:1)] NULL
Storage Desc Params: NULL NULL
serialization.format 1
This works:
select number from hfe_reference.dim_number limit 10;
This doesn't, the query runs, but an empty result:
select number from hfe_reference.dim_number where number < 5;
Other WHERE clauses have been used, e.g. where number = 2, where number between 0 and 5
Why is the WHERE clause causing an empty result set?

Related

Any uses of allowing literal NULL with operators?

Some databases support using literal NULL as an operand while others do not. As an example:
SELECT 1 + NULL
Snowflake: null
BigQuery: error
MySQL: null
Postgres: null
SQLServer: null
I'm trying to determine how I should handle this in an application, and was wondering if there are ever any (valid) use cases for when it might be useful to have a literal null in an expression? This could also include testing.

Writing the expression 1 + NULL by itself is fairly meaningless, as we would expect it to always evaluate to NULL (except, apparently, on BigQuery, where it errors out). However, 1 + NULL could arise as the result of some other calculation. Consider the following data and query:
id | val
1 | NULL
2 | 5
2 | 10
3 | NULL
3 | 7
and the query:
SELECT id, 1 + SUM(val) AS total
FROM yourTable
GROUP BY id;
Here for id = 1 the aggregate total would evaluate to 1 + NULL, which would be NULL on most databases. One way around this would be to use COALESCE():
SELECT id, 1 + COALESCE(SUM(val), 0) AS total
FROM yourTable
GROUP BY id;
Now for id groups having only NULL values, we would replace that NULL sum by zero.

PostgreSQL csv import not working for only integer

I have the following problem using PostgreSQL 14
On Windows 10 with latest updates.
I need to insert values into the following table.
CREATE TABLE StateList (
ID int GENERATED ALWAYS AS IDENTITY,
State_Number int NOT NULL,
ElectionGroup_ID INT NOT NULL,
Election_Number int NOT NULL,
UNIQUE (State_Number, ElectionGroup_ID, Election_Number),
PRIMARY KEY (ID)
);
I want to do the following command:
COPY StateList(Election_Number, State_Number, ElectionGroup_ID )
FROM '...\csvFileStateLists19.csv'
WITH (
FORMAT CSV,
DELIMITER ','
);
the "csvFileStateLists19" being
"19","9","4"
"19","5","238"
"19","5","21"
"19","15","1"
"19","5","10"
It worked fine for another table that used strings and integer.
But here I always get:
ERROR: FEHLER: ungültige Eingabesyntax für Typ integer: »19«
CONTEXT: COPY statelist, Zeile 1, Spalte election_number: »19«
SQL state: 22P02
Which is usually the sign that the number is an empty string or really not a number. but its not! It's a 19, why doesn't it work?
I generated the file in java,
its utf8 encoded,
database is "German_Germany.1252"
show client_encoding; => UNICODE
show server_encoding; => UTF8
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'database1'; => UTF8
select pg_encoding_to_char(encoding), datcollate, datctype from pg_database where datname = 'database1';
Returns
"UTF8" "German_Germany.1252" "German_Germany.1252"
Thank you for your help!

Well, with your input, I get the same error message - just in English, not German - I did it in Vertica, Stonebraker's successor of PosgreSQL, whose CSV parser works very much the same:
COPY statelist FROM LOCAL 'st.csv' DELIMITER ',' EXCEPTIONS 'st.log';
-- error messages in "st.log"
-- COPY: Input record 1 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 2 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 3 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 4 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 5 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
Well, that's no wonder really. "9" is a string literal, not an INTEGER literal. It's a VARCHAR(1) consisting of the numeric letter "9", not an INTEGER.
Try adding the ENCLOSED BY '"' clause. It worked for me:
COPY statelist FROM LOCAL 'st.csv' DELIMITER ',' ENCLOSED BY '"' EXCEPTIONS 'st.log';
-- out Rows Loaded
-- out -------------
-- out 5
SELECT * FROM statelist;
-- out State_Number | ElectionGroup_ID | Election_Number
-- out --------------+------------------+-----------------
-- out 19 | 5 | 10
-- out 19 | 5 | 21
-- out 19 | 5 | 238
-- out 19 | 9 | 4
-- out 19 | 15 | 1

Not an answer just proof that double quoted numeric values in a CSV are not the problem:
cat csv_test.csv
"19","9"
"19","5"
"19","5"
"19","15"
"19","5"
test(5432)=# \d csv_test
Table "public.csv_test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
col1 | integer | | |
col2 | integer | | |
select * from csv_test;
col1 | col2
------+------
(0 rows)
\copy csv_test from 'csv_test.csv' with csv;
COPY 5
select * from csv_test;
col1 | col2
------+------
19 | 9
19 | 5
19 | 5
19 | 15
19 | 5
So now maybe we can get on with answers that solve the issue.

Datetime data type in a CASE expression impacts all other values on SQL Server 2016

I'm attempting to insert a few rows into a table in SQL Server 2016 using the SQL query below. A temp table is used to store the data being inserted, then a loop is used to insert multiple rows into the table.
--Declare temporary table and insert data into it
DECLARE #fruitTransactionData TABLE
(
Category VARCHAR (30),
Species VARCHAR (30),
ArrivalDate DATETIME
)
INSERT INTO #fruitTransactionData ([Category], [Species], [ArrivalDate])
VALUES ('Fruit', 'Apple - Fuji', '2017-06-30')
--Go into loop for each FieldName (there will be 3 rows inserted)
DECLARE #IDColumn INT
SELECT #IDColumn = MIN(ID) FROM FieldNames
WHILE #IDColumn IS NOT NULL
BEGIN
--Insert data into Transactions
INSERT INTO [dbo].[Transactions] ([FieldName], [Result])
SELECT
(SELECT Name FROM FieldNames WHERE ID = #IDColumn),
CASE
WHEN #IDColumn = 1 THEN 1 --Result insert for FieldName 'Category' where ID=1 refers to 'Fruits'
WHEN #IDColumn = 2 THEN 99 --Result insert for FieldName 'Species' where ID=99 refers to 'Apple - Fuji'
WHEN #IDColumn = 3 THEN [data].[ArrivalDate] --Result insert for FieldName 'Date'
ELSE NULL
END
FROM
#fruitTransactionData [data]
--Once a row has been inserted for one FieldName, then move to the next one
SELECT #IDColumn = MIN(ID)
FROM FieldNames
WHERE ID > #IDColumn
END
When inserting the data, the data is inserted, but all the results show dates, when some data weren't meant to be dates.
+-----+------------+---------------------+
| ID | FieldName | Result |
+-----+------------+---------------------+
| 106 | Category | Jan 2 1900 12:00AM |
| 107 | Species | Apr 10 1900 12:00AM |
| 108 | Date | Jun 30 2017 12:00AM |
+-----+------------+---------------------+
If I comment out the row insert of the date, the columns display correctly.
+-----+------------+--------+
| ID | FieldName | Result |
+-----+------------+--------+
| 109 | Category | 1 |
| 110 | Species | 99 |
+-----+------------+--------+
It seems like the insertion of the date converts all the result values to datetime format (eg. Jan 2 1900 12:00 is a conversion of the number 1).
The result I'm trying to get as opposed to the above results is this:
+-----+------------+---------------------+
| ID | FieldName | Result |
+-----+------------+---------------------+
| 106 | Category | 1 |
| 107 | Species | 99 |
| 108 | Date | Jun 30 2017 12:00AM |
+-----+------------+---------------------+
Just for clarification, the Transaction table schema is as follows:
[ID] INT IDENTITY(1, 1) CONSTRAINT [PK_Transaction_ID] PRIMARY KEY,
[FieldName] VARCHAR(MAX) NULL
[Result] VARCHAR(MAX) NULL

SQL Server is making a guess at the data type for the CASE statement. It does this based on its internal precedence order for data types and the following case statement return type rule:
the highest precedence type from the set of types in
result_expressions and the optional else_result_expression.
Since int has a lower precedence order than datetime SQL Server is choosing to use a datetime return type.
Ultimately explicitly normalizing the data types of your case statement to varchar will solve the issue:
CASE WHEN #IDColumn = 1 THEN '1'
WHEN #IDColumn = 2 THEN '99'
WHEN #IDColumn = 3 THEN FORMAT([data].[ArrivalDate]), 'Mon d yyyy h:mmtt')
ELSE NULL
END
In case you are interested SQL Server uses the following precedence order for data types:
user-defined data types (highest)
sql_variant
xml
datetimeoffset
datetime2
datetime
smalldatetime
date
time
float
real
decimal
money
smallmoney
bigint
int
smallint
tinyint
bit
ntext
text
image
timestamp
uniqueidentifier
nvarchar (including nvarchar(max) )
nchar
varchar (including varchar(max) )
char
varbinary (including varbinary(max) )
binary (lowest)

It is converting the format because all types in a CASE should have the same format. I think you want to convert the date as a string (and the numbers too).
SELECT
,(SELECT Name FROM FieldNames WHERE ID=#IDColumn)
,CASE WHEN #IDColumn=1 THEN '1' --Result insert for FieldName 'Category' where ID=1 refers to 'Category Fruits'
WHEN #IDColumn=2 THEN '99' --Result insert for FieldName 'Species' where ID=99 refers to 'Apple - Fuji'
WHEN #IDColumn=3 THEN convert(varchar(MAX), [data].[ArrivalDate], 23) --Result insert for Field Name 'Date'
ELSE null
END
FROM #fruitTransactionData [data]

This is too long for a comment.
What you are trying to do just doesn't make sense. The columns you are inserting into are defined by:
INSERT INTO [dbo].[Transactions]([FieldName], [Result])
---------------------------------^ -----------^
The INSERT is inserting rows with two values, one for "FieldName" and the other for "Result".
So the SELECT portion should return two columns, no more, no fewer. Your SELECT appears to have four. Admittedly, the first two are syntactically incorrect CASE expressions, so the count might be off.
It is totally unclear to me what you want to do, so I can't make a more positive suggestion.

PostgreSQL IS NULL and length

I am trying to get all records from a table where a specific column is NULL. But I am not getting any records. On the other hand, there are many records where the length(field) is indeed 0.
select count(*) from article a where length(for_interest) =0;
count
-------
9
(1 row)
select count(*) from article a where for_interest is NULL ;
count
-------
0
(1 row)
Something about NULLs I didn't get right? More info
select count(*) from article AS a where for_interest is NOT NULL ;
count
-------
15
(1 row)
select count(*) from article ;
count
-------
15
(1 row)
PostgreSQL version is 9.3.2.
Adding sample data, table description etc (new sample table created with just 2 records for this)
test=# \d sample_article
Table "public.sample_article"
Column | Type | Modifiers
--------------+------------------------+-----------
id | integer |
title | character varying(150) |
for_interest | character varying(512) |
test=# select * from sample_article where length(for_interest)=0;
id | title | for_interest
----+-----------------------------------------------------------------+--------------
8 | What is the Logic Behind the Most Popular Interview Questions? |
(1 row)
test=# select * from sample_article where for_interest IS NOT NULL;
id | title | for_interest
----+-----------------------------------------------------------------+--------------
7 | Auto Expo 2014: Bike Gallery | Wheels
8 | What is the Logic Behind the Most Popular Interview Questions? |
(2 rows)
test=# select * from sample_article where for_interest IS NULL;
id | title | for_interest
----+-------+--------------
(0 rows)

Character types can hold the empty string '', which is not a NULL value.
The length of an empty string is 0. The length of a NULL value is NULL.
Most functions return NULL for NULL input.
SELECT length(''); --> 0
SELECT length(NULL); --> NULL
SELECT NULL IS NULL; --> TRUE
SELECT '' IS NULL; --> FALSE

im trying make a insert into with select and value

I'm trying to make a INSERT INTO with a SELECT and values, but no works.
TABLE SOURCE:
CREATE TABLE "MICV_PRE"."TS$SEQUENCES"
(
"ID_NODE" NUMBER DEFAULT 1 NOT NULL ENABLE,
"ID_TASK" NUMBER DEFAULT 1 NOT NULL ENABLE,
"ID_DOCUMENT" NUMBER DEFAULT 1 NOT NULL ENABLE,
"ID_WORD" NUMBER DEFAULT 1 NOT NULL ENABLE,
"ID_TEAM" NUMBER DEFAULT '1' NOT NULL ENABLE)
TABLE TO MODIFY:
CREATE TABLE TS$SEQUENCES_NEW(
"ID_CODE" VARCHAR(255 CHAR) NOT NULL ENABLE,
"CODE_SUBSEQUENCE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"VALUE" NUMBER(10,0) NOT NULL ENABLE
);
table source:
id_task | id_node | id_word
10 | 20 | 30
table to modify:
id_code | code_subsequence | value
"id_task" | "empty" | 10
"id_node" | "empty" | 20
"id_word" | "empty" | 30

So, the SQL you tried is this:
SQL> INSERT INTO TS$SEQUENCES_NEW
2 SELECT TS$SEQUENCES.ID_TASK AS "VALUE", 'ID_TASK' AS "ID_CODE", 'VACIO' AS "CODE_SUBSEQUENCE"
3 FROM TS$SEQUENCES
4 /
*
ERROR at line 2:
ORA-01722: invalid number
SQL>
This fails because the datatypes in the projection of the query don't match the order of the columns in the table. So either change the SELECT statement or define the order in the INSERT clause:
SQL> INSERT INTO TS$SEQUENCES_NEW ("VALUE", "ID_CODE","CODE_SUBSEQUENCE" )
2 SELECT TS$SEQUENCES.ID_TASK AS "VALUE", 'ID_TASK' AS "ID_CODE", 'VACIO' AS "CODE_SUBSEQUENCE"
3 FROM TS$SEQUENCES
4 /
1 row created.
SQL>

Try this:
INSERT INTO TS$SEQUENCES_NEW (VALUE, ID_CODE, CODE_SUBSEQUENCE)
SELECT TS$SEQUENCES.ID_TASK AS "VALUE", 'ID_TASK' AS "ID_CODE", 'VACIO' AS "CODE_SUBSEQUENCE"
FROM TS$SEQUENCES;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive query with WHERE clause not working (ORC table) - hive

Related

Any uses of allowing literal NULL with operators?

PostgreSQL csv import not working for only integer

Datetime data type in a CASE expression impacts all other values on SQL Server 2016

PostgreSQL IS NULL and length

im trying make a insert into with select and value

Categories

Resources