Querying multiple postgres tables in python - sql

I'm trying to query multiple sql tables and store them as pandas dataframe.
cur = conn.cursor()
cur.execute("select relname from pg_class where relkind='r' and relname !~ '^(pg_|sql_)';")
tables_df = cur.fetchall()
##table_name_list = tables_df.table_name
select_template = ' SELECT * FROM {table_name}'
frames_dict = {}
for tname in tables_df :
query = select_template.format(table_name = tname)
frames_dict [tname] = pd.read_sql ( query , conn)
But I'm getting error like :
DatabaseError: Execution failed on sql ' SELECT * FROM ('customer',)': syntax
error at or near "'yesbank'"
`enter code here`LINE 1: SELECT * FROM ('customer',)
Customer is name of table in my databse that i get from line
tables_df = cur.fetchall()

Per your error, looks like you have a typo in the word format:
AttributeError: 'str' object has no attribute 'formate'
Try
query = select_template.format(table_name = tname)

Related

UndefinedTable: Connection doesn't exist

I always used psycopg2 to connect my Postgres database to my script, but, this is the first time that a get this error.
import psycopg2
import pandas as pd
conn = psycopg2.connect(
host='dabname',
database='db',
user='myuser',
password='mypassword')
cursor = conn.cursor()
cursor.execute("select relname from pg_class where relkind='r' and relname !~ '^(pg_|sql_)';")
print(cursor.fetchall())
and this returns:
[('tb_orcamento_mes',), ('tb_notificacao',), ('tb_grupo_premissa',), ('tb_premissa',), ('tb_etapas',), ('tb_orcamento_anual',)]
But, when I try to fetch all records from 'tb_premissa', I get an error:
cursor = conn.cursor()
cursor.execute("SELECT * FROM tb_premissa")
quantitativos_realizado = cursor.fetchall()
results in this error:
UndefinedTable: ERRO: relação "tb_premissa" LINE 1: SELECT *
FROM tb_premissa
Does anyone have an idea?

SQL UPDATE statement for a subset of columns: syntax problem

I am trying to update cell contents of a subset of columns in an SQLite Table in Python.
for i in range(len(list)):
if list[i] in other_list:
update_statement = "UPDATE Table \
SET Column_1 = Column_1 || ', ({}, {})', \
Item count = Item count + {}, \
Tot. = {}\
WHERE Column_4 = '{}';".format(part_1, part_2, \
df.iloc[i].at['Count'],\
2*len(other_list)+2, 'Text')
engine.execute(update_statement)
However, I am getting the following syntax error:
OperationalError: (sqlite3.OperationalError) near "count": syntax error
The error arises during the first iteration.
The statement in question is the following:
UPDATE Table
SET Column_1 = Column_1 || ', (part_1, part_2)', Item count = count + 2, Tot. = 6
WHERE Column_4 = 'Text';
Any help will be very appreciated!
Not a SqlLite expert but is Count a keyword in SqlLite? If so maybe alias the Count field. Try:
`UPDATE Table SET Column_1 = Column_1 || ', additional_text', theCount = theCount + 2 AS 'Count', Total = 6`
WHERE Column_4 = 'Text';
`
As pointed out by #forpas, the column name Item count was the problem as it contained a space. Re-defining the database's column name to Item_count fixed the syntax error.

Is it possible to invoke BigQuery procedures in python client?

Scripting/procedures for BigQuery just came out in beta - is it possible to invoke procedures using the BigQuery python client?
I tried:
query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)
print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?
Thank you!
Edited - stored procedure I am calling
CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
THEN
SET saleExists = (SELECT 1);
ELSE
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
SET saleExists = (SELECT 0);
END IF;
END;
If you follow the CALL command with a SELECT statement, you can get the return value of the function as a result set. For example, I created the following stored procedure:
BEGIN
-- Build an array of the top 100 names from the year 2017.
DECLARE
top_names ARRAY<STRING>;
SET
top_names = (
SELECT
ARRAY_AGG(name
ORDER BY
number DESC
LIMIT
100)
FROM
`bigquery-public-data.usa_names.usa_1910_current`
WHERE
year = 2017 );
-- Which names appear as words in Shakespeare's plays?
SET
top_shakespeare_names = (
SELECT
ARRAY_AGG(name)
FROM
UNNEST(top_names) AS name
WHERE
name IN (
SELECT
word
FROM
`bigquery-public-data.samples.shakespeare` ));
END
Running the following query will return the procedure's return as the top-level results set.
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
In Python:
from google.cloud import bigquery
client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)
Related: If you have SELECT statements within a stored procedure, you can walk the job to fetch the results, even if the SELECT statement isn't the last statement in the procedure.
# TODO(developer): Import the client library.
# from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()
# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)
# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))
# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))
# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
child_rows = list(child_job.result())
print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))
It works if you have SELECT inside your procedure, given the procedure being:
create or replace procedure dataset.proc_output() BEGIN
SELECT t FROM UNNEST(['1','2','3']) t;
END;
Code:
from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
print result
will output:
Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})
However, if there are multiple SELECT inside a procedure, only the last result set can be fetched this way.
Update
See below example:
CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
IF saleExists = 0 THEN
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
END IF;
END;
BEGIN
DECLARE saleExists INT64;
CALL zyun.exists("US", DATE "2019-2-1", saleExists);
SELECT saleExists;
END
BTW, your example is much better served with a single MERGE statement instead of a script.

Passing query parameters in Pandas

Trying to write a simple pandas script which executes a query from SQL Server with WHERE clause. However, the query doesnt return any values. Possibly because the parameter is not passed? I thought we could pass the key-value pairs as below. Can you please point out what i am doing wrong here?
Posting just the query and relevant pieces. All the libraries have been imported as needed.
curr_sales_month = '2015-08-01'
sql_query = """SELECT sale_month,region,oem,nameplate,Model,Segment,Sales FROM [MONTHLY_SALES] WHERE Sale_Month = %(salesmonth)s"""
print ("Executed SQL Extract", sql_query)
df = pd.read_sql_query(sql_query,conn,params={"salesmonth":curr_sales_month})
The program returned with:
Closed Connection - Fetched 0 rows for Report
Process finished with exit code 0
Further to my comment. Here is an example that uses pyodbc to communicate to sql server and demonstrates passing a variable.
import pandas as pd
import pyodbc
pd.set_option('display.max_columns',50)
pd.set_option('display.width',5000)
conn_str = r"DRIVER={0};SERVER={1};DATABASE={2};UID={3};PWD={4}".format("SQL Server",'.','master','user','pwd')
cnxn = pyodbc.connect(conn_str)
sql_statement = "SELECT * FROM sys.databases WHERE database_id = ?"
df = pd.read_sql_query(sql = sql_statement, con = cnxn, params = [2])
cnxn.close()
print df.iloc[:,0:2].head()
which produces:
name database_id
0 tempdb 2
And if you wish to pass multiple parameters:
sql_statement = "SELECT * FROM sys.databases WHERE database_id > ? and database_id < ?"
df = pd.read_sql_query(sql = sql_statement, con = cnxn, params = [2,5])
cnxn.close()
print df.iloc[:,0:2].head()
which produces:
name database_id
0 model 3
1 msdb 4
my preferred way with dynamic inline sql statements
create_date = '2015-01-01'
name = 'mod'
sql_statement_template = r"""SELECT * FROM sys.databases WHERE database_id > {0} AND database_id < {1} AND create_date > '{2}' AND name LIKE '{3}%'"""
sql_statement = sql_statement_template.format('2','5',create_date,name)
print sql_statement
yields
SELECT * FROM sys.databases WHERE database_id > 2 AND database_id < 5 AND create_date > '2015-01-01' AND name LIKE 'mod%'
A further benefit if you do print this out, is you can copy and paste the sql commnand to management studio (or equivalent) and test your sql syntax easily.
and result should be:
name database_id
0 model 3
So this example demonstrates handling: date,string and int datatypes.
Including a LIKE with wildcard %

SQL Update syntax check

Writing a simple update statement in Teradata and I'm having trouble getting it to work. I'm getting a syntax error saying: Syntax error: expected something between the word 'First_name' and the 'FROM' keyword . This is reference to line 7. I have no idea what I'm missing.
Here's the code with some redacted object names:
UPDATE DATA.CONTACTS tgt
SET
tgt.LAST_NAME = TABLES.PART.LAST_NAME
,tgt.BPP_USER_ID = TABLES.PART.User_Id
,tgt.Email_Address = TABLES.PART.Email
,tgt.Last_name = TABLES.PART.Last_name
,tgt.First_name = TABLES.PART.First_name
FROM
(SELECT
C_C
, USER_ID
, Email
,Last_name
,First_name
FROM TABLES.PART) --ppage
WHERE EMAIL_ADDRESS IN (
SELECT Email
FROM DATA.CONTACTS
);
select * from mmbi_tables_data.crm_mmbi_contacts
I've tried deleting using that ppage alias but I still get the same error regardless of what I do.
This doesn't look so simple. I think the following will work in Teradata:
UPDATE tgt
FROM data.contacts tgt, tables.part ppage
SET LAST_NAME = TABLES.PART.LAST_NAME,
BPP_USER_ID = TABLES.PART.User_Id,
Email_Address = TABLES.PART.Email,
Last_name = TABLES.PART.Last_name,
First_name = TABLES.PART.First_name
WHERE tgt.email = ppage.email_address;