Airbnb Superset Datasource Configuration for SparkSQL - apache-spark-sql

I am using Spark 1.6.2 (from the Datastax Enterprise Edition => DSE 5.0.4) and Python 2.7 When I give
from impala.dbapi import connect
Conn = connect (host = '172.31.12.201', port = 7077, user = 'xxxx', password = 'xxxx1111', database = 'test_database', auth_mechanism = 'PLAIN')
it just hangs and never comes out spark master runs at 172.31.12.201 on port 7077
My configuration in superset is as below
SQLAlchemy URI => impala://172.31.12.201:7077/test_database
Extra => {
"metadata_params": {},
"engine_params": {"connect_args": {"user": "xxxx", "password": "xxxx1111"}} }

I had to start dse spark thrift server as below
dse -u <username> -p <password> spark-sql-thriftserver start
This will start the hive server in DSE cluster on port 10000

Related

Connection problem with Clickhouse and RabbitMQ

I am a newbie to Clickhouse and RabbitMQ. While I am trying to record data in RabbitMQ to Clickhouse with the below script, it doesn't work.
CREATE TABLE Station (
Station varchar(2000)
) ENGINE = RabbitMQ SETTINGS rabbitmq_host_port = '<IP>:5672',
rabbitmq_exchange_name = 'Clickhouse',
rabbitmq_exchange_type = 'direct',
rabbitmq_routing_key_list = 'Station',
rabbitmq_format = 'CSV',
rabbitmq_num_consumers = 1;
And the following error message is given.
SQL Error [115]: ClickHouse exception, code: 115, host: <IP>, port: 8123; Code: 115, e.displayText() = DB::Exception: Unknown setting rabbitmq_username: for storage RabbitMQ (version 21.4.3.21 (official build))
Any suggestion for setting the rabbitmq_username?
The Rabbit MQ credentials should be defined in config-file:
open exist or create a new custom config file rabbitmq.xml
sudo nano /etc/clickhouse-server/config.d/rabbitmq.xml
add this configuration & save it
<yandex>
<rabbitmq>
<username>your_rabbitmq_username</username>
<password>your_rabbitmq_password</password>
</rabbitmq>
</yandex>
restart service
sudo service clickhouse-server restart

Connect App Engine to Google cloud SQL fails

I'm following this guide
I'm filling the config like this:
val datasourceConfig = HikariConfig().apply {
jdbcUrl = "jdbc:mysql:///$DB_NAME"
username = DB_PASS
password = DB_USER
mapOf(
"cloudSqlInstance" to CLOUD_SQL_CONNECTION_NAME,
"socketFactory" to "com.google.cloud.sql.mysql.SocketFactory",
"ipTypes" to "PUBLIC,PRIVATE",
).forEach {
addDataSourceProperty(
it.key,
it.value
)
}
}
output of the gcloud sql instances describe project-name:
backendType: SECOND_GEN
connectionName: project-name:europe-west1:project-name-db
databaseVersion: MYSQL_5_7
failoverReplica:
available: true
gceZone: europe-west1-d
instanceType: CLOUD_SQL_INSTANCE
ipAddresses:
- ipAddress: *.*.*.*
type: PRIMARY
kind: sql#instance
name: project-name-db
project: project-name
region: europe-west1
from which I'm filling my env variables:
DB_NAME=project-name-db
CLOUD_SQL_CONNECTION_NAME=project-name:europe-west1:project-name-db
On the deployed app line val dataSource = HikariDataSource(datasourceConfig) crashes with the following exception:
com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Cannot connect to MySQL server on localhost:3,306.
Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.
update: I've tried adding google between second and third slashes("jdbc:mysql://google/$DB_NAME"), according to this answer, now I get:
Cannot connect to MySQL server on google:3,306.
I was missing the following dependency:
implementation("com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.2.2")
more info here
Also DB_NAME is not name of gcloud sql instances output, but a database name that should be created in Console -> Project -> Sql -> Databases

could not open client transport airflow hiveoperator to connect

Error: Could not open client transport with JDBC Uri: jdbc:hive2://XXXX:10000/default;auth=none: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify airflow.ctx.task_id at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Beeline version 2.3.6 by Apache Hive
I have tried to change parameter with hiveconfs to None but still getting issue
configuration which i have used :
Airflow connection is configure based on hive zookeeper based jdbc connection and used 'Hive client wrapper' as connection type.
Trying to pass some extra paramters to connect
extras : {"hive_cli_params": "","use_beeline": "true","auth":"none"}
Sample dag code:
import airflow
from airflow import DAG
from airflow.operators.hive_operator import HiveOperator
from airflow.utils.dates import days_ago
dag_conf = DAG(dag_id = "airflow_hiveoperator",schedule_interval = None,start_date = airflow.utils.dates.days_ago(1))
HiveOperator = HiveOperator(
hql='hql/query1.hql',
task_id = 'airflow_hive',
schema='default',
hiveconf_jinja_translate=False,
dag = dag_conf,
conn_id = 'hive_cli_default',
hiveconfs=None
)
if __name__ == "__main__":
dag_conf.cli()

Connecting mariaDB on Digital Ocean using R Studio

I installed mariaDB on digital ocean. I set up user name and password. Now, I am trying to connect it through R studio. But, it is not working.
Code I used.
library(odbc)
library("RMariaDB")
con <- dbConnect(odbc(),
Driver = "MariaDB ODBC 3.1 Driver",
Server = "206.189.---.", #droplet IP
Database = "test",
Username = "root",
Password = " ****"
Trusted_Connection = "True")
Error I got :
Error: nanodbc/nanodbc.cpp:983: HY000: [ma-3.1.7]Can't connect to MySQL server on '206.189.---.--' (10061)
I installed a connection driver on my windows before running these codes on R Studio. What am I missing here?

Airflow Adaptive Server connection failed

I want to connect my Airflow and Microsoft SQL Server. I configured my connection under 'connections' bar in 'Admin' box as mentioned in the following link:
http://airflow.apache.org/howto/manage-connections.html
But when I run my Dag task that is related to SQL server immedatly fails by following error:
[2019-03-28 16:16:07,439] {models.py:1788} ERROR - (18456, "Login failed for user 'XXXX'.DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\nDB-Lib error message 20002, severity 9:\nAdaptive Server connection failed (***.***.***.28:1433)\n")
My code from DAG for Micrososft Sql Connection is following:
sql_command = """
select * from [sys].[tables]
"""
t3 = MsSqlOperator( task_id = 'run_test_proc',
mssql_conn_id = 'FIConnection',
sql = sql_command,
dag = dag)
I verified ip address and port number kind of configuration things by establishing connection through pymssql library from my local computer. Test code is following:
pymssql.connect(server="***.***.***.28:1433",
user="XXXX",
password="XXXXXX"
) as conn:
df = pd.read_sql("SELECT * FROM [sys].[tables]", conn)
print(df)
Could you please share if you have experienced this issue?
By the way I am using VirtualBox in Ubuntu 16.04 LTS
I had the same problem because freetds-dev was missing on linux:
apt-get install freetds-dev