Copy_To_SQL command is throwing error when I am passing a valid schema name - pandas

We are trying to copy the data frame to Teradata specific database and the script is not accepting the schema_name parameter. Data Copy to User Database, which used in logon command is happening. But I tried to override the default and specifying Database Name in the copy_to_sql it is failing.
from teradataml import *
from teradataml.dataframe.copy_to import copy_to_sql
create_context(host = "ipaddrr", username='uname', password = "pwd")
df = DataFrame.from_query("select top 10* from dbc.tables;")
copy_to_sql(df = df ,table_name = 'Tab', schema_name='DB_Name',if_exists = 'replace')
Error: TeradataMlException: [Teradata][teradataml](TDML_2007) Invalid value(s) 'DB_Name' passed to argument 'schema_name', should be: A valid database/schema name..

Do you have a database / user named DB_Name? If not, try creating the database first and then running your copy script:
CREATE DATABASE DB_NAME FROM <parent_DB> AS PERMANENT = 1000000000;
I don't think the utilities / packages will typically create a database for you on the fly, since it can be a more involved operation (locks, space allocation, etc.) than creating a table.

Related

How to use dbWriteTable on a connection with no default Database

I've seen many posts on SO and the DBI Github regarding trouble using DBI::dbWriteTable (i.e. [1], [2]). These mostly have to do with the use of non-default schemas or whatnot.
That's not my case.
I have a server running SQL Server 2014. This server contains multiple databases.
I'm developing a program which interacts with many of these databases at the same time. I therefore defined my connection using DBI::dbConnect() without a Database= argument.
I've so far only had to do SELECTs on the databases, and this connection works just fine with dbGetQuery(). I just need to name my tables including the database names: DatabaseFoo.dbo.TableBar, which is more than fine since it makes things transparent and intentional. It also stops me from being lazy and making some calls omitting the Database name on whichever DB I named in the connection.
I now need to add data to a table, and I can't get it to work. A call to
DBI::dbWriteTable(conn, "DatabaseFoo.dbo.TableBar", myData, append = TRUE)
works, but creates a table named DatabaseFoo.dbo.TableBar in the master Database, which isn't what I meant (I didn't even know there was a master Database).
The DBI::dbWriteTable man page states the name should be
A character string specifying the unquoted DBMS table name, or the result of a call to dbQuoteIdentifier().
So I tried dbQuoteIdentifier() (and a few other variations):
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
"DatabaseFoo.dbo.TableBar"),
myData)
# no error, same problem as above
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar")),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar"),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar")),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
DBI::dbWriteTable(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar"),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
In the DBI::Id() attempts, I also tried using cluster instead of catalog. No effect, identical error.
However, if I change my dbConnect() call to add a Database="DatabaseFoo" argument, I can simply use dbWriteTable(conn, "TableBar", myData) and it works.
So the question becomes, am I doing something wrong? Is this related to the problems in the other questions?
This is a shortcoming in the DBI package. The dev version DBI >= 1.0.0.9002 no longer suffers from this problem, will hit CRAN as DBI 1.1.0 soonish.

Using Jinja template variables with BigQueryOperator in Airflow

I'm attempting to use the BigQueryOperator in Airflow by using a variable to populate the sql= attribute. The problem I'm running into is that the file extension is dropped when using Jinja variables. I've setup my code as follows:
dag = DAG(
dag_id='data_ingest_dag',
template_searchpath=['/home/airflow/gcs/dags/sql/'],
default_args=DEFAULT_DAG_ARGS
)
bigquery_transform = BigQueryOperator(
task_id='bq-transform',
write_disposition='WRITE_TRUNCATE',
sql="{{dag_run.conf['sql_script']}}",
destination_dataset_table='{{dag_run.conf["destination_dataset_table"]}}',
dag=dag
)
The passed variable contains the name of the SQL file stored in the separate SQL directory. If I pass the value as a static string, sql="example_file.sql", everything works fine. However, when I pass the example_file.sql using Jinja template variable it automatically drops the file extension and I receive this error:
BigQuery job failed.
Final error was: {u'reason': u'invalidQuery', u'message': u'Syntax error: Unexpected identifier "example_file" at [1:1]', u'location': u'query'}
Additionally, I've tried hardcoding ".sql" to the end of the variable anticipating that the extension would be dropped. However, this causes the entire variable reference to be interpreted as as string.
How do you use variables to populate BigQueryOperator attributes?
Reading the BigQuery operator docstring it seems that you can provide the sql statement in 2 ways:
1. As a string that can contain templating macros
2. A reference to a file that can contain templating macros (the file, not the file name).
You cannot template the file name but only the SQL statement. In fact, your error message shows that BigQuery did not recognize the identifier "example_file". If you inspect the BigQuery history for the project which ran that query, you will see that the query string was "example_file.sql" which is not a valid SQL statement, thus the error.

How to list all databases and tables in AWS Glue Catalog?

I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console.
How can I access the catalog and list all databases and tables? The usual sqlContext.sql("show tables").show() does not work.
What might help is the CatalogConnection Class but I have no idea in which package it is. I tried importing from awsglue.context and no success.
I spend several hours trying to find some info about CatalogConnection class but haven't found anything. (Even in the aws-glue-lib repository https://github.com/awslabs/aws-glue-libs)
In my case I needed table names in Glue Job Script console
Finally I used boto library and retrieved database and table names with Glue client:
import boto3
client = boto3.client('glue',region_name='us-east-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
databaseName = databaseDict['Name']
print '\ndatabaseName: ' + databaseName
responseGetTables = client.get_tables( DatabaseName = databaseName )
tableList = responseGetTables['TableList']
for tableDict in tableList:
tableName = tableDict['Name']
print '\n-- tableName: '+tableName
Important thing is to setup the region properly
Reference:
get_databases - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_databases
get_tables - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_tables
Glue returns back one page per response. If you have more than 100 tables, make sure you use NextToken to retrieve all tables.
def get_glue_tables(database=None):
next_token = ""
while True:
response = glue_client.get_tables(
DatabaseName=database,
NextToken=next_token
)
for table in response.get('TableList'):
print(table.get('Name'))
next_token = response.get('NextToken')
if next_token is None:
break
The boto3 api also supports pagination, so you could use the following instead:
import boto3
glue = boto3.client('glue')
paginator = glue.get_paginator('get_tables')
page_iterator = paginator.paginate(
DatabaseName='database_name'
)
for page in page_iterator:
print(page['TableList'])
That way you don't have to mess with while loops or the next token.

How to use insert_job

I want to run a Bigquery SQL query using insert method.
I ran the following code just like so:
JobConfigurationQuery = Google::Apis::BigqueryV2::JobConfigurationQuery
bq = Google::Apis::BigqueryV2::BigqueryService.new
scopes = [Google::Apis::BigqueryV2::AUTH_BIGQUERY]
bq.authorization = Google::Auth.get_application_default(scopes)
bq.authorization.fetch_access_token!
query_config = {query: "select colA from [dataset.table]"}
qr = JobConfigurationQuery.new(configuration:{query: query_config})
bq.insert_job(projectId, qr)
and I got an error as below:
Caught error invalid: Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0:
Please let me know how to use the insert_job method.
I'm not sure what client library you're using, but insert_job probably takes a JobConfiguration. You should create one of those and set the query parameter to equal your JobConfigurationQuery you've created.
This is necessary because you can insert various jobs (load, copy, extract) with different types of configurations to this one API method, and they all take a single configuration type with a subfield that specifies which type and details about the job to insert.
More info from BigQuery's documentation:
jobs.insert documentation
job resource: note the "configuration" field and its "query" subfield

Unable to upload Images to sql server via pyodbc

I'm trying to upload an image to SQL server in Linux (raspbian) environment using python language.So far i was able connect to Sql server and also i created a table and i'm using pyodbc.
#! /user/bin/env python
import pyodbc
from PIL import Image
dsn = 'nicedcn'
user = myid
password = mypass
database = myDB
con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
string = "CREATE TABLE Database2([image name] varchar(20), [image] image)"
cursor.execute(string)
cnxn.commit()
This part complied without any error.That means i have successfully created a table right? Or is there any issue?
I try to upload image as this way.
image12= Image.open('new1.jpg')
cursor.execute("insert into Database1([image name], [image]) values (?,?)",
'new1', image12)
cnxn.commit()
I get the error on this part. and it pyodbc.ProgrammingError:
('Invalid Parameter type. param-index=1 param-type=instance', 'HY105')
Please tell me another way or proper way to upload a image via pyodbc to a database