I`m trying to connect to hadoop via polybase in sql server 2016.
My code is:
CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION ='hdfs://192.168.114.20:8020',
credential= HadoopUser1
);
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='\001',
USE_TYPE_DEFAULT = TRUE)
);
CREATE EXTERNAL TABLE [dbo].[test_hadoop] (
[Market_Name] int NOT NULL,
[Claim_GID] int NOT NULL,
[Completion_Flag] int NULL,
[Diag_CDE] float NOT NULL,
[Patient_GID] int NOT NULL,
[Record_ID] int NOT NULL,
[SRVC_FROM_DTE] int NOT NULL
)
WITH (LOCATION='/applications/gidr/processing/lnd/sha/clm/cf/claim_diagnosis',
DATA_SOURCE = MyHadoopCluster,
FILE_FORMAT = TextFileFormat
);
And i got this error:
EXTERNAL TABLE access failed due to internal error: 'Java exception
raised on call to HdfsBridge_GetDirectoryFiles: Error [Permission
denied: user=pdw_user, access=READ_EXECUTE,
inode="/applications/gidr/processing/lnd/sha/clm/cf/claim_diagnosis":root:supergroup:drwxrwxr--
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:175)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6590)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6572)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6497)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5034)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4995)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:882)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:335)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:615)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) ]
occurred while accessing external file.'
The problem is, in newest version of polybase there is no config file, in which you can specify hadoop default login and password. So even, when i`m creating scoped credentials, polybase is still using default pdw_user. I even tried to create pdw_user on hadoop, but still got this error. Any ideas?
If you have a Kerberos secured Hadoop cluster, make sure you alter the xml files as described https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configuration
If it is not a Kerberos secured Hadoop cluster, make sure that the default user pdw_user has read access to hdfs and execute permissions on Hive.
Related
I am evaluating Ignite and trying to load CSV data to Apache Ignite. I have created a table in Ignite:
jdbc:ignite:thin://127.0.0.1/> create table if not exists SAMPLE_DATA_PK(SID varchar(30),id_status varchar(50), active varchar, count_opening int,count_updated int,ID_caller varchar(50),opened_time varchar(50),created_at varchar(50),type_contact varchar, location varchar,support_incharge varchar,pk varchar(10) primary key);
I tried to load data to this table with command:
copy from '/home/kkn/data/sample_data_pk.csv' into SAMPLE_DATA_PK(SID,ID_status,active,count_opening,count_updated,ID_caller,opened_time,created_at,type_contact,location,support_incharge,pk) format csv;
But the data load is failing with this error:
Error: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer] (state=50000,code=1)
java.sql.SQLException: Server error: class org.apache.ignite.internal.processors.query.IgniteSQLException: Value conversion failed [column=COUNT_OPENING, from=java.lang.String, to=java.lang.Integer]
at org.apache.ignite.internal.jdbc.thin.JdbcThinConnection.sendRequest(JdbcThinConnection.java:1009)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.sendFile(JdbcThinStatement.java:336)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute0(JdbcThinStatement.java:243)
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.execute(JdbcThinStatement.java:560)
at sqlline.Commands.executeSingleQuery(Commands.java:1054)
at sqlline.Commands.execute(Commands.java:1003)
at sqlline.Commands.sql(Commands.java:967)
at sqlline.SqlLine.dispatch(SqlLine.java:734)
at sqlline.SqlLine.begin(SqlLine.java:541)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)
Below is the sample data I am trying to load:
SID|ID_status|active|count_opening|count_updated|ID_caller|opened_time|created_at|type_contact|location|support_incharge|pk
|---|---------|------|-------------|-------------|---------|-----------|----------|------------|--------|----------------|--|
INC0000045|New|true|1000|0|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||1
INC0000045|Resolved|true|0|3|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||2
INC0000045|Closed|false|0|1|Caller2403|29-02-2016 01:16|29-02-2016 01:23|Phone|Location143||3
INC0000047|Active|true|0|1|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||4
INC0000047|Active|true|0|2|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||5
INC0000047|Active|true|0|489|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||6
INC0000047|Active|true|0|5|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||7
INC0000047|AwaitingUserInfo|true|0|6|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||8
INC0000047|Closed|false|0|8|Caller2403|29-02-2016 04:40|29-02-2016 04:57|Phone|Location165||9
INC0000057|New|true|0|0|Caller4416|29-02-2016 06:10||Phone|Location204||10
Need help to understand how to figure out what is the issue and resolve it
You have to upload CSV without header line. Which contains the column names. An error is thrown when trying to convert the string value "count_opening" to a Integer.
I am trying to create an external table using SQL server 2019 to sybase.
I am already able to create a linked server to sybase using the same driver and login information.
I am able to exec this code with no error:
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
ServerNode = jjjjj.nnnn.iiii.com:xxxxx;
Database = report;
Port = xxxxx',
CREDENTIAL = [PolyFriend] );
but when I try to create a table using the data source
CREATE EXTERNAL TABLE v_data(
event_id int
) WITH (
LOCATION='report.dbo.v_data',
DATA_SOURCE=external_data_source_name
);
I get this error:
105082;Generic ODBC error: [SAP][ODBC Driver][SQL Anywhere]Database
server not found.
you need to specify the Host & ServerName & DatabaseName properties (for SQL Anywhere) in the connection_options
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
Host=jjjjj.nnnn.iiii.com:xxxxx;
ServerName=xyzsqlanywhereservername;
DatabaseName=report;',
CREDENTIAL = [PolyFriend] );
Host == machinename:port, the machinename where SQLAnywhere resides and port most likely the default 2638 where the SQLAnywhere service is listening for connections.
ServerName == the name of the SQLAnywhere server/service which hosts the database (connect to the SQLAnywhere db and execute select ##servername).
I have a Kafka topic setup and am attempting to create an external table in Hive to query the Kafka stream.
However, when querying the external table I get the error message
Error: java.io.IOException: org.apache.kafka.common.config.ConfigException: Missing required configuration "group.id" which has no default value. (state=,code=0)
Tried putting group.id in the server.properties when starting the Kafka server.
Tried putting group.id in external table properties.
CREATE EXTERNAL TABLE kafka_table2
(`timestamp` timestamp , `page` string, `newPage` boolean,
added int, deleted bigint, delta double)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "connect-test", "kafka.bootstrap.servers"="mykafka:9092","kafka.group.id"="1")
INFO : Completed compiling command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7); Time taken: 0.6 seconds
INFO : Executing command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7): select * from kafka_table2
INFO : Completed executing command(queryId=hive_20190426082255_729f8adb-bb23-4317-8f3f-2f9049b62bd7); Time taken: 0.018 seconds
INFO : OK
Error: java.io.IOException: org.apache.kafka.common.config.ConfigException: Missing required configuration "group.id" which has no default value. (state=,code=0)
You should put "kafka.consumer.group.id"="1" and not "kafka.group.id"="1" in TBLPROPERTIES.
See: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_set_consumer_producer.html
I tried to connect Presto to S3 using FileHiveMetaStore with below configurations , but it when I am trying to create table with the statement mentioned but it fails with error message mentioned below . could any one let me know if the configurations mentioned are wrong.
I could see that it is possible as some one has already mentioned it is possible to connect
reference thread :- Setup Standalone Hive Metastore Service For Presto and AWS S3
error message:- com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 33F01AA7477B12FC)
**connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://ap-south-1.amazonaws.com/prestos3test/
hive.s3.aws-access-key=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
hive.s3.aws-secret-key=zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
hive.s3.endpoint=http://prestos3test.s3-ap-south-1.amazonaws.com
hive.s3.ssl.enabled=false
hive.metastore.uri=thrift://localhost:9083**
External Table Creation
**CREATE TABLE PropData (
prop0 integer,
prop1 integer,
prop2 varchar,
prop3 varchar ,
prop4 varchar
)
WITH (
format = 'ORC',
external_location = 's3://prestos3test'
)**
Thanks
Santosh
I got help form other corners ,thought it would be helpful to others hence documenting necessary config in below .
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://prestos3test/
hive.s3.aws-access-key=yyyyyyyyyyyyyyyyyy
hive.s3.aws-secret-key=zzzzzzzzzzzzzzzzzzzzzz
hive.s3.ssl.enabled=false
hive.metastore.uri=thrift://localhost:9083
Thanks
Santosh
I need to execute the same db-changelog by ant and then by spring. I hope that ant will run the changelog and when spring run, it will not do anything and just stop normally. Ant run the db-changelog successfully and then spring run but it throws an exception, part of the stack trace :
Reason: liquibase.exception.JDBCException: Error executing SQL CREATE TABLE action (action_id int8 NOT NULL, action_name VARCHAR(255), version_no int8, reason_required BOOLEAN, comment_required BOOLEAN, step_id int8, CONSTRAINT action_pkey PRIMARY KEY (action_id)):
Caused By: Error executing SQL CREATE TABLE action (action_id int8 NOT NULL, action_name VARCHAR(255), version_no int8, reason_required BOOLEAN, comment_required BOOLEAN, step_id int8, CONSTRAINT action_pkey PRIMARY KEY (action_id)):
Caused By: ERROR: relation "action" already exists; nested exception is org.springframework.beans.factory.BeanCreationException....
Any help will much appreciated.
Regards,
It does sound like it is trying to run the changelog again. Each changeSet in the changeLog is identified by a combination of the id, author, and the changelog path/filename. If you run "select * from databasechangelog" you can see the values used.
Your problem may be that you are referencing the changelog file differently from ant and spring therefore generating different filename values. Usually you will want to include them in the classpath so no matter where and how you run them they have the same path (like "com/example/db.changelog.xml")
I ran into this same problem and was able to fix it by altering the filename column of DATABASECHANGELOG to reference the spring resource path. In my case, I was using a ServletContextResource under the WEB-INF directory:
update DATABASECHANGELOG set FILENAME = 'WEB-INF/path/to/changelog.xml' where FILENAME = 'changelog.xml'