I am just playing around with Athena, and I tried following this link
https://awsfeed.com/whats-new/big-data/use-ml-predictions-over-amazon-dynamodb-data-with-amazon-athena-ml
Create an Athena table with geospatial data of neighborhood boundaries
I followed the code based on the sample plus looking at the picture.
However, this is where I ran into issues and had to change the code to this based on the error messages Athena was giving me. Now the current error is mismatched input 'STORED'. Expecting: <EOF
FROM WEBSITE -
CREATE EXTERNAL TABLE <table name
"objectid" int,
"nh_code" int,
"nh_name" string,
"shapearea" double,
"shapelen" double,
"bb_west" double,
"bb_south" double,
"bb_east" double,
"bb_north" double,
"shape" string,
"cog_longitude" double,
"cog_latitude" double)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
I kept getting errors around ROW FORMAT and have tweaked it below
WITH (ROW = DELIMITED
,FIELDS = '\t'
,LINES = '\n'
)
STORED INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
The error messages started at ROW and I've edited above. Now the error code relates to STORED so perhaps the changes I made are necessary. I am not sure. I am not very good with Athena so I was just following the guide and was hoping it would work. Any suggestions on what I am doing wrong?
Thanks.
You have a syntax error in your SQL, the first line should be:
CREATE EXTERNAL TABLE table_name (
There is a stray < in your example, table names can't have spaces, and there should be a ( to start the list of columns.
Related
I'm trying to load some data from stage to relational environment and something is happening I can't figure out.
I'm trying to run the following query:
SELECT
CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
public.some_table;
The some_field is a column that has data with two numbers joined by an underscore like this:
some_field -> 38972691802309_48937927428392
And I'm trying to get the second part.
That said, here is the error I'm getting:
[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0,
Type: Long
Details:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Long
code: 1207
context:
query: 1097254
location: :0
process: query0_99 [pid=0]
-----------------------------------------------;
Execution time: 2.61s
Statement 1 of 1 finished
1 statement failed.
It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.
I thought it would be an encoding error, but I've not found any references to solve that.
Anyone has any idea?
Thanks everybody.
I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.
In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.
select cast(colname as integer) from
(select
case when colname ~ '^[0-9]+$' then colname
else null
end as colname
from tablename);
Bottom line: this Redshift error is completely confusing and really needs to be fixed.
When you are using a Glue job to upsert data from any data source to Redshift:
Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.
In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.
Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will
obviously throw an error because my first column is an ID Key, not
like the other string column).
To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table..
It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.
For example: source table has fields ID|EMAIL|NAME with values 1|abcd#gmail.com|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write abcd#gmail.com|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.
Hmmm. I would start by investigating the problem. Are there any non-digit characters?
SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';
Is the value too long for a bigint?
SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27
If you need more than 27 digits of precision, consider a decimal rather than bigint.
If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file.
So the COPY command will look like below:
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;
For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.
For example, setting my columns datatype to Char with a specific length worked:
Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))
I have the following problem in Hive: I have a table stored as a Textfile with all the fields being of STRING type. I want to convert this table in an ORC table, but some of the STRING fields must be cast to decimal with precision = 3. Th problem is that the comma is not already there in the initial string field, so I am looking to see if there is a way to tell Hive to put this decimal 3 positions before the end of the string :-).
So my HiveSql commands look like this:
CREATE my_orc_table(entry1 STRING, entry2 DECIMAL(10,3)) STORED AS ORC;
INSERT INTO TABLE my_orc_table SELECT * FROM my_text_table;
So the problem is that if I have 00050000 in entry2 of my TextTable, I want to obtain 50.0 in my ORC table. For the moment I have 50000 (I suppose that Hive put the comma at the end of my string, which is quite logic, but not what I am looking for).
I tried to google a bit but I did not really find the solution.
Thank you :-) !
What about..
select cast(entry2 AS DECIMAL)/1000.0
I am loading a csv file to create a new table with a column containing a decimal value of 1.449043781.
Here's my code
CREATE TABLE table (
v1 float
);
Postgres spits out an error saying invalid input syntax error for type numeric even though the value is a float. I have tried changing the data type declaration to decimal(15,13) to no avail. What am I missing here?
Thank you for your input.
Can't reproduce - copies without errors on 9.6:
t=# CREATE TABLE t (
v1 float
);
CREATE TABLE
t=# copy t from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1.449043781
>> \.
COPY 1
t=# select v1,pg_typeof(v1) from t;
v1 | pg_typeof
-------------+------------------
1.449043781 | double precision
(1 row)
also from your error, it looks you created table with numeric, not float. And they are not the same (both would accept the 1.449043781 though)
I am working on a LONG column in Oracle SQL Developer and this column contains carriage returns that need to be removed. The error I'm getting after using :
REPLACE ( col_name , CHR(13) , '' ) is :
ORA-00932: inconsistent datatypes: expected CHAR got LONG
00932. 00000 - "inconsistent datatypes: expected %s got %s"
Is there a workaround for this ?
Answers or suggestions will be much appreciated!
You will not be able to do almost anything with LONG data type columns. You should convert them to CLOB. You just found a reason why that is.
https://docs.oracle.com/cd/B28359_01/appdev.111/b28393/adlob_long_lob.htm
There is a to_lob() function to convert LONG to CLOB, but that can only be used in the select portion of an insert statement (that is, it can only be used to convert a LONG column to a CLOB column). After the conversion, you should have no problems using text functions on the resulting CLOB. You may also want to look at CLOB-specific functions in the DBMS_LOB package:
http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_lob.htm#ARPLS600
I am new to Hadoop. I am trying to create an EXTERNAL table in Hive.
The following is the query I am using:
CREATE EXTERNAL TABLE stocks (
exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs:///data/stocks'
I am getting an error:
' ParseException cannot recognize input near 'exchange' 'STRING' ',' in column specification.
What am I missing? I tried reading the command - I don't think I am missing anything.
Because exchange is a keyword in hive, so you can't use exchange to be your column name. If you want to use it just add backticks around exchange
Exchange is reserved keyword in Hive So try to use different keyword in place of that-
Create table Stocks (exchange1 String, stock_symbol String, stock_date String, stock_price_open double, stock_price_high double, stock_price_low do
uble, stock_price_close double, stock_volume double, stock_price_adj_close double) row format delimited fields terminated by ",";