Inserting data into a BigQuery table through Python - sql

I am trying to insert data into an existing BigQuery Table. But I'm struggling. I am sorry but I am new to BigQuery so I am surely missing something. I am using the BigQuery API and I want to append/insert the data through Python.
from google.cloud import bigquery
from google.api_core.exceptions import NotFound
import logging
import os
run_transformation_query("A_DAILY_ORDER_NORMALIZE.sql", pipeline_settings)
def run_transformation_query(query_file, pipeline_settings):
result = invoke_transformation_queries(query_file, pipeline_settings)
error_value = result.error_result
def invoke_transformation_queries(sql_file_name, pipeline_settings):
logging.basicConfig(filename='error.log', level=logging.ERROR)
client = bigquery.Client(project=pipeline_settings['project'])
print("check")
#debug_string = f"{pipeline_settings['repo_path']}sql/{sql_file_name}"
# sql = get_query_text(
# f"{pipeline_settings['repo_path']}sql/{sql_file_name}")
file_path = os.path.join('sql', sql_file_name)
sql = get_query_text (file_path)
def get_query_text(file_path):
with open(file_path, 'r') as file:
query_text = file.read()
return query_text
My SQL File is as follows:
DECLARE project_name STRING DEFAULT 'xxx-dl-cat-training';
DECLARE dataset_name STRING DEFAULT 'xxxxxxx_bootcamp_dataset_source';
DECLARE table_name_source STRING DEFAULT 'xxxxxxx-bootcamp-table-source';
DECLARE table_name_target STRING DEFAULT 'xxxxxxx-bootcamp-table-target';
DECLARE todays_date STRING;
SET todays_date = FORMAT_DATE("%Y%m%d", CURRENT_DATE());
WITH ORDERS AS (
SELECT
date_order,
area,
customer_name,
SPLIT(order_details, ',') AS items_list,
total_transaction
FROM
`${project_name}.${dataset_name}.${table_name_source}`
), TRANSFORMED_ORDERS AS (
SELECT
date_order,
area,
customer_name,
TRIM(IFNULL(
REGEXP_REPLACE(
item,
r'-\s*\d+(\.\d+)?$',
''
),
item
)) AS item_name,
CAST(NULLIF(TRIM(REGEXP_EXTRACT(item, r'-\s*\d+(\.\d+)?')), '') AS FLOAT64) AS item_price,
total_transaction
FROM ORDERS, UNNEST(items_list) as item
WHERE CAST(NULLIF(TRIM(REGEXP_EXTRACT(item, r'-\s*\d+(\.\d+)?')), '') AS FLOAT64) IS NOT NULL
)
CREATE OR REPLACE TABLE `${project_name}.${dataset_name}.${table_name_target}`
SELECT *
FROM TRANSFORMED_ORDERS;
Once my Subquery ends , whatever I get this error
Expected "(" or "," or keyword SELECT but got keyword CREATE at [36:1]'}
I am not sure where I ma messing up. Any help will be appreciated. I have run the transformation in the BigQuery UI and i am happy with the transformation. It all works ok
I shall be grateful if someone can help

Related

Spark SQL column doesn't exist

I am using Spark in databricks for this SQL command.
In the input_data table, I have a string for the st column. Here I want to do some calculations of the string length.
However, after I assign the length_s alias to the first column, I can not call it in the following columns. SQL engine gives out Column 'length_s1' does not exist error message. How can I fix this?
CREATE OR REPLACE VIEW content_data (SELECT LENGTH(st) AS length_s, LENGTH(st)-LENGTH(REGEXP_REPLACE(seq,'[AB]','')) AS AB_c,
length_s - LENGTH(REGEXP_REPLACE(seq, '[CD]', '') AS CD_c, CD_c+AB_c AS sum_c
FROM input_data)
You can't use aliases in the same select
So do
CREATE OR REPLACE VIEW content_data (
SELECT
LENGTH(st) AS length_s
, LENGTH(st)-LENGTH(REGEXP_REPLACE(seq,'[AB]','')) AS AB_c,
LENGTH(st) - LENGTH(REGEXP_REPLACE(seq, '[CD]', '') AS CD_c
, LENGTH(st) - LENGTH(REGEXP_REPLACE(seq, '[CD]', '') + LENGTH(st)-LENGTH(REGEXP_REPLACE(seq,'[AB]','')) AS sum_c
FROM input_data
)

Variable Storage - JDBC request in Jmeter

I have the following problem:
When adding a JDBC Request, query type "Select Statement", I add a variable name, but it doesn't save successfully.
Could anyone tell the reason?
Code below and print below.
Script to select:
USE ${DATABASE};
Declare #ID_SOLICITACAO_RSP as int
Declare #NM_ARQUIVO_RET as varchar (50)
set #ID_SOLICITACAO_RSP =
(
SELECT
VLR_SEQUENCIA
FROM
TBJD_SEQUENCIA
WHERE
CD_NEGOCIO = 'JDCTC'
AND CD_OBJETO = 'IDSOLIC'
)
SET #NM_ARQUIVO_RET =
(
SELECT substring(cat.nm_arqv, len(cat.nm_arqv)-30, 31) + '_RET.XML'
FROM TBJDCTCPRO_SOLIC_ARQV_TRANS SAT
JOIN TBJDCTCCIP_ARQV_TRANS CAT ON (SAT.ID_ARQV_TRANS = CAT.ID_ARQV_TRANS)
JOIN TBJDCTCPRO_SOLIC SOL ON (SOL.ID_SOLICITACAO = SAT.ID_SOLICITACAO)
WHERE SAT.ID_SOLICITACAO = #ID_SOLICITACAO_RSP
AND CAT.TP_ARQV IN ('ACTC101', 'ACTC201', 'ACTC301', 'ACTC401', 'ACTC501', 'ACTC601', 'ACTC701', 'ACTC801', 'ACTC851')
)
PRINT #NM_ARQUIVO_RET;
enter image description here
Can you help me?
We cannot because on JDBC level it's not possible to execute Select statement which calls Statement.executeQuery() under the hood which doesn't produce a ResultSet.
So you need to transform your query to something like:
SELECT substring(cat.nm_arqv, len(cat.nm_arqv) - 30, 31) + '_RET.XML'
FROM TBJDCTCPRO_SOLIC_ARQV_TRANS SAT
JOIN TBJDCTCCIP_ARQV_TRANS CAT ON (SAT.ID_ARQV_TRANS = CAT.ID_ARQV_TRANS)
JOIN TBJDCTCPRO_SOLIC SOL ON (SOL.ID_SOLICITACAO = SAT.ID_SOLICITACAO)
WHERE SAT.ID_SOLICITACAO = (
SELECT VLR_SEQUENCIA
FROM TBJD_SEQUENCIA
WHERE CD_NEGOCIO = 'JDCTC'
AND CD_OBJETO = 'IDSOLIC'
)
AND CAT.TP_ARQV IN ('ACTC101', 'ACTC201', 'ACTC301', 'ACTC401', 'ACTC501', 'ACTC601', 'ACTC701', 'ACTC801', 'ACTC851')
so it would issue single Select statement which would return a result.
More information: The Real Secret to Building a Database Test Plan With JMeter

INSERT values into table using cursor.execute

I'm writing some code that will pull data from an API and insert the records into a table for me.
I'm unsure how to go about formatting my insert statement. I want to insert values where there is no existing match in the table (based on date), and I don't want to insert values where the column opponents = my school's team.
import datetime
import requests
import cx_Oracle
import os
from pytz import timezone
currentYear = 2020
con = Some_datawarehouse
cursor = con.cursor()
json_obj = requests.get('https://api.collegefootballdata.com/games?year='+str(currentYear)+'&seasonType=regular&team=myteam')\
.json()
for item in json_obj:
EVENTDATE = datetime.datetime.strptime(item['start_date'], '%Y-%m-%dT%H:%M:%S.%fZ').date()
EVENTTIME = str(datetime.datetime.strptime(item['start_date'], '%Y-%m-%dT%H:%M:%S.%fZ').replace(tzinfo=timezone('EST')).time())
FINAL_SCORE = item.get("home_points", None)
OPPONENT = item.get("away_team", None)
OPPONENT_FINAL_SCORE = item.get("away_points", None)
cursor.execute('''INSERT INTO mytable(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE) VALUES (:1,:2,:3,:4,:5)
WHERE OPPONENT <> 'my team'
AND EVENTDATE NOT EXISTS (SELECT EVENTDATE FROM mytable);''',
[EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE])
con.commit()
con.close
This may be more of an ORACLE SQL rather than python question, but I'm not sure if cursor.execute can accept MERGE statements. I also recognize that the WHERE statement will not work here, but this is more of an idea of what I'm trying to accomplish.
change the sql query to this :
INSERT INTO mytable(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE)
SELECT * FROM (VALUES (:1,:2,:3,:4,:5)) vt(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE)
WHERE vt.OPPONENT <> 'my team'
AND vt.EVENTDATE NOT IN (SELECT EVENTDATE FROM mytable);

Right way to implement pandas.read_sql with ClickHouse

Trying to implement pandas.read_sql function.
I created a clickhouse table and filled it:
create table regions
(
date DateTime Default now(),
region String
)
engine = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY tuple()
SETTINGS index_granularity = 8192;
insert into regions (region) values ('Asia'), ('Europe')
Then python code:
import pandas as pd
from sqlalchemy import create_engine
uri = 'clickhouse://default:#localhost/default'
engine = create_engine(uri)
query = 'select * from regions'
pd.read_sql(query, engine)
As the result I expected to get a dataframe with columns date and region but all I get is empty dataframe:
Empty DataFrame
Columns: [2021-01-08 09:24:33, Asia]
Index: []
UPD. It occured that defining clickhouse+native solves the problem.
Can it be solved without +native?
There is encient issue https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/10. Also there is a hint which assumes to add FORMAT TabSeparatedWithNamesAndTypes at the end of a query. So the init query will be look like this:
select *
from regions
FORMAT TabSeparatedWithNamesAndTypes

MySQL IN Operator

http://pastebin.ca/1946913
When i write "IN(1,2,4,5,6,7,8,9,10)" inside of the procedure, i get correct result but when i add the id variable in the "IN", the results are incorrect. I made a function on mysql but its still not working, what can i do?
Strings (broadly, variable values) don't interpolate in statements. vKatID IN (id) checks whether vKatID is equal to any of the values listed, which is only one: the value of id. You can create dynamic queries using PREPARE and EXECUTE to interpolate values:
set #query = CONCAT('SELECT COUNT(*) AS toplam
FROM videolar
WHERE vTarih = CURDATE() AND vKatID IN (', id, ') AND vDurum = 1;')
PREPARE bugun FROM #query;
EXECUTE bugun;
You could use FIND_IN_SET( ) rather than IN, for example:
SELECT COUNT(*) AS toplam
FROM videolar
WHERE vTarih = CURDATE()
AND FIND_IN_SET( vKatID, id ) > 0
AND vDurum = 1
Sets have limitations - they can't have more than 64 members for example.
Your id variables is a string (varchar) not an array (tuple in SQL) ie you are doing the this in (in java)
String id = "1,2,3,4,5,6,7"
you want
int[] ids = {1,2,3,4,5,6,7}
So in your code
set id = (1,2,3,4,5,6,7,8,9,10)
I cannot help you with the syntax for declaring id as I don't know. I would suggest to ensure the code is easily updated create a Table with just ids and then change your stored procedure to say
SELECT COUNT(*) AS toplam
FROM videolar
WHERE vTarih = CURDATE() AND vKatID IN (SELECT DISTINCT id FROM idtable) AND vDurum = 1;
Hope this helps.