Decimal input get rounded in create external table in Hive from CSV

Decimal input get rounded in create external table in Hive from CSV - sql

I am creating a table in Hive from a CSV (comma separated) that I have in HDFS. I have three columns - 2 strings and a decimal one (with at max 18 values after the decimal dot and one before). Below, what I do:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table(
col1 STRING, col2 STRING, col_decimal DECIMAL(19,18))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/hdfs_path/folder/';
My col_decimal is being rounded to 0 when below 1 and to 1 when it's not actually decimal (I only have numbers between 0 and 1).
Any idea of what is wrong? By the way, I have also tried DECIMAL(38,37)
Thank you.

Related

SQL Server Not Recognizing Leading Spaces in Select Query

I get a TXT file from one of our sources systems each night. It's basically a terminal report with headers, footers, titles, column headers, sub-totals, etc. I'm trying to scrape discrete data elements from the file using SQL Server. The file is being FTP'd to a Windows file share. The source system is AIX and the file's encoding is UTF-8, with an EOL marker of LF. I'm using SSIS to import the raw text report into a single column table with each report row being one record in my table. The column I'm storing the rows in is a VARCHAR(240) and I'm using SQL Server 2016.
For the report rows that I want to use, the one thing they have in common is that they all start with two spaces. Here's an example record from a text report I've loaded to SQL:
COLUMN_NAME
AD DEPT 0 0 0 0 0 0 0 0 0 0.0 0 0 0.00 0.00 0 0 0 0.0
When I try to select the record using:
SELECT *
FROM TABLE_NAME
WHERE COLUMN_NAME LIKE ' %';
No rows are returned in my result set. However, REPLACE seems to recognize the the row starts with two spaces.
So this:
SELECT REPLACE([COLUMN_NAME],' ','$')
FROM TABLE_NAME
Returns this:
COLUMN_NAME
$$AD$DEPT$$$$$$0$$$$$0$$$$$0$$$0$$$$$0$$$$$0$$$$0$$$$$0$$$$0$$$0.0$$$0$$$$$$0$$0.00$$0.00$$$$$$0$$$$$$0$$$$$$0$$$0.0
Can someone help me understand why REPLACE sees that there are two leading spaces in the row but the plain SELECT does not?

If you know that you have 2 spaces as the start of the column then you can use the single character wildcards in your LIKE expression.
For example:
CREATE TABLE testString (
sampleImport varchar(20)
)
GO
INSERT testString
VALUES (' AD DEPT 0 0 0'), ('BD DEPT 0 0 0')
GO
SELECT *
FROM testString
WHERE sampleImport LIKE '[ ][ ]%'
GO
SELECT *
FROM testString
WHERE sampleImport NOT LIKE '[ ][ ]%'
GO
The [] is used to signify a single character - the specific character is the one enclosed in the brackets. So placing a single space within the brackets allows you to match the spaces.
I also noted that your criteria had only a single space before the % character. Although I cannot see it documented as such, I suspect that your version is failing as it is not seeing a leading space as a valid character for the wildcard (although by definition it should). When using LIKE ' %' it works with my test data.

Hive array specifying multiple delimiter in collection

I have dataset contains two arrays, both arrays separated by different delimiter..
Ex: 14-20-50-60 is 1st array seperated by -
12#2#333#4 is 2nd array seperated by #..
While creating table how do we specify delimiter in
Collection items terminated by '' ?
input
14-20-50-60,12#2#333#4
create table test(first array<string>, second array<string>)
row format delimited
fields terminated by ','
collection items terminated by '-' (How to specify two delimiters in the collection)

You cannot use multiple delimiters for the collection items. You can achieve what you are trying to do as below though. I have used the SPLIT function to create the array using different delimiters.
Data
14-20-50-60,12#2#333#4
SQL - CREATE TABLE
create external table test1(first string, second string)
row format delimited
fields terminated by ','
LOCATION '/user/cloudera/ramesh/test1';
SQL - SELECT
WITH v_test_array AS
(SELECT split(first, "-") AS first_array,
split(second, "#") AS second_array
FROM test1)
SELECT first_array[0], second_array[0]
FROM v_test_array;
OUTPUT
14 12
Hope this helps.

invalid input syntax for integer with postgres

i have a table:
id | detail
1 | ddsffdfdf ;df, deef,"dgfgf",/dfdf/
when I did: insert into details values(1,'ddsffdfdf ;df, deef'); => got inserted properly
When I copied that inserted value from database to a file,the file had: 1 ddsffdfdf ;df, deef
Then I loaded the whole csv file to pgsql database,with values in the format: 1 ddsffdfdf ;df, deef
ERROR: invalid input syntax for integer: "1 ddsffdfdf ;df, deef is obtained. How to solve the problem?

CSVs need a delimiter that Postgres will recognize to break the text into respective fields. Your delimiter is a space, which is insufficient. Your CSV file should look more like:
1,"ddsffdfdf df, deef"
And your SQL should look like:
COPY details FROM 'filename' WITH CSV;
The WITH CSV is important because it tells Postgres to use a comma as the delimiter and parses your values based on that. Because your second field contains a comma, you want to enclose its value in quotes so that its comma is not mistaken for a delimiter.
To look at a good example of a properly formatted CSV file, you can output your current table:
COPY details TO '/your/filename.csv' WITH CSV;

SQL loader to load data into specific column of a table

Recently started working on SQL Loader, enjoying the way it works.
We are stuck with a problem where we have to load all the columns in csv format say (10 columns in excel)but the destination table contains around 15 fields.
filler works when you want you skip columns in source file but unsure what to do here.
using is staging table helps but is there any alternative?
Any help is really appreciated.
thanks.

You have to specify the columns in the control file
Recommended reading: SQL*Loader Control File Reference
10 The remainder of the control file contains the field list, which provides information about column formats in the table being loaded. See Chapter 6 for information about that section of the control file.
Excerpt from Chapter 6:
Example 6-1 Field List Section of Sample Control File
1 (hiredate SYSDATE,
2 deptno POSITION(1:2) INTEGER EXTERNAL(2)
NULLIF deptno=BLANKS,
3 job POSITION(7:14) CHAR TERMINATED BY WHITESPACE
NULLIF job=BLANKS "UPPER(:job)",
mgr POSITION(28:31) INTEGER EXTERNAL
TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS,
ename POSITION(34:41) CHAR
TERMINATED BY WHITESPACE "UPPER(:ename)",
empno POSITION(45) INTEGER EXTERNAL
TERMINATED BY WHITESPACE,
sal POSITION(51) CHAR TERMINATED BY WHITESPACE
"TO_NUMBER(:sal,'$99,999.99')",
4 comm INTEGER EXTERNAL ENCLOSED BY '(' AND '%'
":comm * 100"
)
In this sample control file, the numbers that appear to the left would not appear in a real control file. They are keyed in this sample to the explanatory notes in the following list:
1 SYSDATE sets the column to the current system date. See Setting a Column to the Current Date.
2 POSITION specifies the position of a data field. See Specifying the Position of a Data Field.
INTEGER EXTERNAL is the datatype for the field. See Specifying the Datatype of a Data Field and Numeric EXTERNAL.
The NULLIF clause is one of the clauses that can be used to specify field conditions. See Using the WHEN, NULLIF, and DEFAULTIF Clauses.
In this sample, the field is being compared to blanks, using the BLANKS parameter. See Comparing Fields to BLANKS.
3 The TERMINATED BY WHITESPACE clause is one of the delimiters it is possible to specify for a field. See TERMINATED Fields.
4 The ENCLOSED BY clause is another possible field delimiter. See Enclosed Fields.

Import Data Into Hive Containing Whitespace

I am importing data from a csv file into Hive. My table contains both strings and ints. However, in my input file, the ints have whitespace around them, so it kind of looks like this:
some string, 2 ,another string , 7 , yet another string
Unfortunately I cannot control the formatting of the program providing the file.
When I import the data using (e.g.):
CREATE TABLE MYTABLE(string1 STRING, alpha INT, string2 STRING, beta INT, string3 STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Then all my integers get set to NULL. I am assuming this is because the extra whitespace makes the parsing fail. Is there a way around this?

You can perform a multi-stage import. In the first stage, save all of your data as STRING and in the second stage use trim() to remove whitespace and then save the data as INT. You could also look into using Pig to read the data from your source files as raw text and then write it to Hive as with the correct data types.
Edit
You can also do this in one pass if you can point to your source file as an external table.
CREATE TABLE myTable(
string1 STRING, alpha STRING, string2 STRING, beta STRING, string3 STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '\\server\path\file.csv'
INSERT INTO myOtherTable
SELECT string1,
CAST(TRIM(alpha) AS INT),
string2,
CAST(TRIM(beta) AS INT),
string3
FROM myTable;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas