I tried to connect Presto to S3 using FileHiveMetaStore with below configurations , but it when I am trying to create table with the statement mentioned but it fails with error message mentioned below . could any one let me know if the configurations mentioned are wrong.
I could see that it is possible as some one has already mentioned it is possible to connect
reference thread :- Setup Standalone Hive Metastore Service For Presto and AWS S3
error message:- com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 33F01AA7477B12FC)
**connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://ap-south-1.amazonaws.com/prestos3test/
hive.s3.aws-access-key=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
hive.s3.aws-secret-key=zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
hive.s3.endpoint=http://prestos3test.s3-ap-south-1.amazonaws.com
hive.s3.ssl.enabled=false
hive.metastore.uri=thrift://localhost:9083**
External Table Creation
**CREATE TABLE PropData (
prop0 integer,
prop1 integer,
prop2 varchar,
prop3 varchar ,
prop4 varchar
)
WITH (
format = 'ORC',
external_location = 's3://prestos3test'
)**
Thanks
Santosh
I got help form other corners ,thought it would be helpful to others hence documenting necessary config in below .
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://prestos3test/
hive.s3.aws-access-key=yyyyyyyyyyyyyyyyyy
hive.s3.aws-secret-key=zzzzzzzzzzzzzzzzzzzzzz
hive.s3.ssl.enabled=false
hive.metastore.uri=thrift://localhost:9083
Thanks
Santosh
Related
I have been trying to copy data to a table in my Coginity Pro but I get the error message below .
I have copied my ARN from redshift and pasted it in the relevant path but I still could not populate the sample data to the tables already created in coginity Pro
below is the error message
Status: ERROR
copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt'
credentials 'aws_iam_role='
delimiter '|' region 'us-west-2'
36ms 2022-11-28T02:23:51.059Z
(SQLSTATE: 08006, SQLCODE: 0): An I/O error occurred while sending to the backend.
#udemeribe . Please check STL_LOAD_ERRORS ( order by date_field(starttime)) table
I'm trying to set up a sample cluster with asterixDB on my M1 mac. I have my environment up and running and I am able to successfully make SQL queries with the following code:
drop dataverse csv if exists;
create dataverse csv;
use csv;
create type csv_type as {
lat: int32,
long: int32
};
create dataset csv_set (csv_type)
primary key lat;
However, when I try to load the dataset with a CSV file it seems to brick my sample cluster and throws the error: Error Code: 1 "HYR0010: Node asterix_nc2 does not exist". The code which causes this is below.
use csv;
load dataset csv_set using localfs
(("path"="127.0.0.1:///Users/nicholassantini/Downloads/test.csv"),
("format"="delimited-text"));
Thus far I have tried both java's newest release of version 18 and 17.0.3 as well as a variety of ports for the queries. I'm not sure what else to try. Some logs that I think are relevant say that it is failing to connect to the node. Not sure if that's an issue with the port or the node itself. Here is a snippet of those logs.
image.png
Also in case it matters, my CSV is a simple 2 column 2 row file with all single-digit integer values.
I appreciate any and all help.
After consulting the developer help email thread, I was able to find that the issue stems from the release of asterixDB that I was using (0.9.7.1). Upgrading to the newest release(0.9.8) fixed this issue.
The link can be found here:
https://ci-builds.apache.org/job/AsterixDB/job/asterixdb-snapshot-integration/lastSuccessfulBuild/artifact/asterixdb/asterix-server/target/asterix-server-0.9.8-SNAPSHOT-binary-assembly.zip
I am trying to purge a partition of a glue catalog table and then recreate the partition using getSink option (similar to truncate/load partition in database)
For purging the partition , I am using glueContext.purge_s3_path option with retention period = 0 . The partition is getting purged successfully .
self._s3_path=s3://server1/main/transform/Account/local_segment/source_system=SAP/
self._glue_context.purge_s3_path(
self._s3_path,
{"retentionPeriod": 0, "excludeStorageClasses": ()}
)
Here Catalog database = Account , Table = local_segment , Partition_key = source_system
However when I am trying to recreate the partition right after the purge step , I am getting "An error occurred while calling o180.pyWriteDynamicFrame. No such file or directory" from getSink writeFrame .
If I remove the purge part then getSink is working fine and is able to create the partition and write the files .
I even tried "MSCK REPAIR TABLE" in between purge and getSink but no luck .
Shouldn't getSink create a partition if it does not exist i.e. purged from previous step ?
target = self._glue_context.getSink(
connection_type="s3",
path=self._s3_path_prefix,
enableUpdateCatalog=True,
updateBehavior="UPDATE_IN_DATABASE",
partitionKeys=["source_system"]
)
target.setFormat("glueparquet")
target.setCatalogInfo(
catalogDatabase=f"{self._target_database}",
catalogTableName=f"{self._target_table_name}"
)
target.writeFrame(self._dyn_frame)
Where -
self._s3_path_prefix = s3://server1/main/transform/Account/local_segment/
self._target_database = Account
self._target_table_name = local_segment
Error Message :
An error occurred while calling o180.pyWriteDynamicFrame. No such file or directory 's3://server1/main/transform/Account/local_segment/source_system=SAP/run-1620405230597-part-block-0-0-r-00000-snappy.parquet'
Try to check if you have permission for this object on s3. I got the same error and once I configured the object to be public (just for test), it worked. So maybe it’s a new object and your process might not have access.
I am trying to UNLOAD a Redshift table to an S3 bucket, but I am getting errors that I can't resolve.
When using 's3://mybucket/' as the destination (which is the documented way to specify the destination), I have an error saying S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint..
After some research I have tried to change the destination to include the full bucket url, without success.
All these destinations:
's3://mybucket.s3.amazonaws.com/',
's3://mybucket.s3.amazonaws.com/myprefix',
's3://mybucket.s3.eu-west-2.amazonaws.com/',
's3://mybucket.s3.eu-west-2.amazonaws.com/myprefix'
return this error S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1', which is also the error returned when I use a bucket name that doesn't exist.
My Redshift cluster and my s3 buckets all exist in the same region, eu-west-2.
What am I doing wrong?
[appendix]
Full command:
UNLOAD ('select * from mytable')
to 's3://mybucket.s3.amazonaws.com/'
iam_role 'arn:aws:iam::0123456789:role/aws-service-
role/redshift.amazonaws.com/AWSServiceRoleForRedshift'
Full errors:
ERROR: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 6ADF2C929FD2BE08,ExtRid vjcTnD02Na/rRtLvWsk5r6p0H0xncMJf6KBK
DETAIL:
-----------------------------------------------
error: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 6ADF2C929FD2BE08,ExtRid vjcTnD02Na/rRtLvWsk5r6p0H0xncMJf6KBK
code: 8001
context: Listing bucket=mybucket prefix=
query: 0
location: s3_unloader.cpp:226
process: padbmaster [pid=30717]
-----------------------------------------------
ERROR: S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1',Status 400,Error AuthorizationHeaderMalformed,Rid 559E4184FA02B03F,ExtRid H9oRcFwzStw43ynA+rinTOmynhWfQJlRz0QIcXcm5K7fOmJSRcOcHuVlUlhGebJK5iH2L
DETAIL:
-----------------------------------------------
error: S3ServiceException:The authorization header is malformed; the region 'eu-west-2' is wrong; expecting 'us-east-1',Status 400,Error AuthorizationHeaderMalformed,Rid 559E4184FA02B03F,ExtRid H9oRcFwzStw43ynA+rinTOmynhWfQJlRz0QIcXcm5K7fOmJSRcOcHuVlUlhGebJK5iH2L
code: 8001
context: Listing bucket=mybucket.s3.amazonaws.com prefix=
query: 0
location: s3_unloader.cpp:226
process: padbmaster [pid=30717]
-----------------------------------------------
Bucket zone
Cluster zone
I have followed this link for installing shark on CDH5. I have installed it but as it also mentioned on the above block:-
This -skipRddReload is only needed when you have some table with hive/hbase mapping, because of some issus in PassthroughOutputFormat by hive hbase handler.
the error message is something like:
"Property value must not be null"
or
"java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat"
I have created an external table in hive to access Hbase table and when i tried shark with -skipRddReload ,shark gets started but when i tred to access the same external table within shark getting error
java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat
Is there any solution to get rid of this ?
EDIT
Hbase to hive with
CREATE EXTERNAL TABLE abc (key string,LPID STRING,Value int,ts1 STRING,ts2 STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key,cf1:LPID,cf1:Value,cf1:ts1,cf1:ts2")
TBLPROPERTIES("hbase.table.name" = "abc");
This abc i wanted to access in shark,any solution ?