Airbnb Superset Datasource Configuration for SparkSQL

Airbnb Superset Datasource Configuration for SparkSQL - apache-spark-sql

I am using Spark 1.6.2 (from the Datastax Enterprise Edition => DSE 5.0.4) and Python 2.7 When I give
from impala.dbapi import connect
Conn = connect (host = '172.31.12.201', port = 7077, user = 'xxxx', password = 'xxxx1111', database = 'test_database', auth_mechanism = 'PLAIN')
it just hangs and never comes out spark master runs at 172.31.12.201 on port 7077
My configuration in superset is as below
SQLAlchemy URI => impala://172.31.12.201:7077/test_database
Extra => {
"metadata_params": {},
"engine_params": {"connect_args": {"user": "xxxx", "password": "xxxx1111"}} }

I had to start dse spark thrift server as below
dse -u <username> -p <password> spark-sql-thriftserver start
This will start the hive server in DSE cluster on port 10000

Related

Connection problem with Clickhouse and RabbitMQ

I am a newbie to Clickhouse and RabbitMQ. While I am trying to record data in RabbitMQ to Clickhouse with the below script, it doesn't work.
CREATE TABLE Station (
Station varchar(2000)
) ENGINE = RabbitMQ SETTINGS rabbitmq_host_port = '<IP>:5672',
rabbitmq_exchange_name = 'Clickhouse',
rabbitmq_exchange_type = 'direct',
rabbitmq_routing_key_list = 'Station',
rabbitmq_format = 'CSV',
rabbitmq_num_consumers = 1;
And the following error message is given.
SQL Error [115]: ClickHouse exception, code: 115, host: <IP>, port: 8123; Code: 115, e.displayText() = DB::Exception: Unknown setting rabbitmq_username: for storage RabbitMQ (version 21.4.3.21 (official build))
Any suggestion for setting the rabbitmq_username?

The Rabbit MQ credentials should be defined in config-file:
open exist or create a new custom config file rabbitmq.xml
sudo nano /etc/clickhouse-server/config.d/rabbitmq.xml
add this configuration & save it
<yandex>
<rabbitmq>
<username>your_rabbitmq_username</username>
<password>your_rabbitmq_password</password>
</rabbitmq>
</yandex>
restart service
sudo service clickhouse-server restart

Connect App Engine to Google cloud SQL fails

I'm following this guide
I'm filling the config like this:
val datasourceConfig = HikariConfig().apply {
jdbcUrl = "jdbc:mysql:///$DB_NAME"
username = DB_PASS
password = DB_USER
mapOf(
"cloudSqlInstance" to CLOUD_SQL_CONNECTION_NAME,
"socketFactory" to "com.google.cloud.sql.mysql.SocketFactory",
"ipTypes" to "PUBLIC,PRIVATE",
).forEach {
addDataSourceProperty(
it.key,
it.value
)
}
}
output of the gcloud sql instances describe project-name:
backendType: SECOND_GEN
connectionName: project-name:europe-west1:project-name-db
databaseVersion: MYSQL_5_7
failoverReplica:
available: true
gceZone: europe-west1-d
instanceType: CLOUD_SQL_INSTANCE
ipAddresses:
- ipAddress: *.*.*.*
type: PRIMARY
kind: sql#instance
name: project-name-db
project: project-name
region: europe-west1
from which I'm filling my env variables:
DB_NAME=project-name-db
CLOUD_SQL_CONNECTION_NAME=project-name:europe-west1:project-name-db
On the deployed app line val dataSource = HikariDataSource(datasourceConfig) crashes with the following exception:
com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Cannot connect to MySQL server on localhost:3,306.
Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.
update: I've tried adding google between second and third slashes("jdbc:mysql://google/$DB_NAME"), according to this answer, now I get:
Cannot connect to MySQL server on google:3,306.

I was missing the following dependency:
implementation("com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.2.2")
more info here
Also DB_NAME is not name of gcloud sql instances output, but a database name that should be created in Console -> Project -> Sql -> Databases

could not open client transport airflow hiveoperator to connect

Error: Could not open client transport with JDBC Uri: jdbc:hive2://XXXX:10000/default;auth=none: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify airflow.ctx.task_id at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Beeline version 2.3.6 by Apache Hive
I have tried to change parameter with hiveconfs to None but still getting issue
configuration which i have used :
Airflow connection is configure based on hive zookeeper based jdbc connection and used 'Hive client wrapper' as connection type.
Trying to pass some extra paramters to connect
extras : {"hive_cli_params": "","use_beeline": "true","auth":"none"}
Sample dag code:
import airflow
from airflow import DAG
from airflow.operators.hive_operator import HiveOperator
from airflow.utils.dates import days_ago
dag_conf = DAG(dag_id = "airflow_hiveoperator",schedule_interval = None,start_date = airflow.utils.dates.days_ago(1))
HiveOperator = HiveOperator(
hql='hql/query1.hql',
task_id = 'airflow_hive',
schema='default',
hiveconf_jinja_translate=False,
dag = dag_conf,
conn_id = 'hive_cli_default',
hiveconfs=None
)
if __name__ == "__main__":
dag_conf.cli()

Connecting mariaDB on Digital Ocean using R Studio

I installed mariaDB on digital ocean. I set up user name and password. Now, I am trying to connect it through R studio. But, it is not working.
Code I used.
library(odbc)
library("RMariaDB")
con <- dbConnect(odbc(),
Driver = "MariaDB ODBC 3.1 Driver",
Server = "206.189.---.", #droplet IP
Database = "test",
Username = "root",
Password = " ****"
Trusted_Connection = "True")
Error I got :
Error: nanodbc/nanodbc.cpp:983: HY000: [ma-3.1.7]Can't connect to MySQL server on '206.189.---.--' (10061)
I installed a connection driver on my windows before running these codes on R Studio. What am I missing here?

Airflow Adaptive Server connection failed

I want to connect my Airflow and Microsoft SQL Server. I configured my connection under 'connections' bar in 'Admin' box as mentioned in the following link:
http://airflow.apache.org/howto/manage-connections.html
But when I run my Dag task that is related to SQL server immedatly fails by following error:
[2019-03-28 16:16:07,439] {models.py:1788} ERROR - (18456, "Login failed for user 'XXXX'.DB-Lib error message 20018, severity 14:\nGeneral SQL Server error: Check messages from the SQL Server\nDB-Lib error message 20002, severity 9:\nAdaptive Server connection failed (***.***.***.28:1433)\n")
My code from DAG for Micrososft Sql Connection is following:
sql_command = """
select * from [sys].[tables]
"""
t3 = MsSqlOperator( task_id = 'run_test_proc',
mssql_conn_id = 'FIConnection',
sql = sql_command,
dag = dag)
I verified ip address and port number kind of configuration things by establishing connection through pymssql library from my local computer. Test code is following:
pymssql.connect(server="***.***.***.28:1433",
user="XXXX",
password="XXXXXX"
) as conn:
df = pd.read_sql("SELECT * FROM [sys].[tables]", conn)
print(df)
Could you please share if you have experienced this issue?
By the way I am using VirtualBox in Ubuntu 16.04 LTS

I had the same problem because freetds-dev was missing on linux:
apt-get install freetds-dev

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Airbnb Superset Datasource Configuration for SparkSQL - apache-spark-sql

I had to start dse spark thrift server as below dse -u <username> -p <password> spark-sql-thriftserver start This will start the hive server in DSE cluster on port 10000

Related

Connection problem with Clickhouse and RabbitMQ

Connect App Engine to Google cloud SQL fails

could not open client transport airflow hiveoperator to connect

Connecting mariaDB on Digital Ocean using R Studio

Airflow Adaptive Server connection failed

Categories

Resources