I'm writing some code that will pull data from an API and insert the records into a table for me.
I'm unsure how to go about formatting my insert statement. I want to insert values where there is no existing match in the table (based on date), and I don't want to insert values where the column opponents = my school's team.
import datetime
import requests
import cx_Oracle
import os
from pytz import timezone
currentYear = 2020
con = Some_datawarehouse
cursor = con.cursor()
json_obj = requests.get('https://api.collegefootballdata.com/games?year='+str(currentYear)+'&seasonType=regular&team=myteam')\
.json()
for item in json_obj:
EVENTDATE = datetime.datetime.strptime(item['start_date'], '%Y-%m-%dT%H:%M:%S.%fZ').date()
EVENTTIME = str(datetime.datetime.strptime(item['start_date'], '%Y-%m-%dT%H:%M:%S.%fZ').replace(tzinfo=timezone('EST')).time())
FINAL_SCORE = item.get("home_points", None)
OPPONENT = item.get("away_team", None)
OPPONENT_FINAL_SCORE = item.get("away_points", None)
cursor.execute('''INSERT INTO mytable(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE) VALUES (:1,:2,:3,:4,:5)
WHERE OPPONENT <> 'my team'
AND EVENTDATE NOT EXISTS (SELECT EVENTDATE FROM mytable);''',
[EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE])
con.commit()
con.close
This may be more of an ORACLE SQL rather than python question, but I'm not sure if cursor.execute can accept MERGE statements. I also recognize that the WHERE statement will not work here, but this is more of an idea of what I'm trying to accomplish.
change the sql query to this :
INSERT INTO mytable(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE)
SELECT * FROM (VALUES (:1,:2,:3,:4,:5)) vt(EVENTDATE,EVENTTIME,FINAL_SCORE,OPPONENT,OPPONENT_FINAL_SCORE)
WHERE vt.OPPONENT <> 'my team'
AND vt.EVENTDATE NOT IN (SELECT EVENTDATE FROM mytable);
Related
I am trying to insert data into an existing BigQuery Table. But I'm struggling. I am sorry but I am new to BigQuery so I am surely missing something. I am using the BigQuery API and I want to append/insert the data through Python.
from google.cloud import bigquery
from google.api_core.exceptions import NotFound
import logging
import os
run_transformation_query("A_DAILY_ORDER_NORMALIZE.sql", pipeline_settings)
def run_transformation_query(query_file, pipeline_settings):
result = invoke_transformation_queries(query_file, pipeline_settings)
error_value = result.error_result
def invoke_transformation_queries(sql_file_name, pipeline_settings):
logging.basicConfig(filename='error.log', level=logging.ERROR)
client = bigquery.Client(project=pipeline_settings['project'])
print("check")
#debug_string = f"{pipeline_settings['repo_path']}sql/{sql_file_name}"
# sql = get_query_text(
# f"{pipeline_settings['repo_path']}sql/{sql_file_name}")
file_path = os.path.join('sql', sql_file_name)
sql = get_query_text (file_path)
def get_query_text(file_path):
with open(file_path, 'r') as file:
query_text = file.read()
return query_text
My SQL File is as follows:
DECLARE project_name STRING DEFAULT 'xxx-dl-cat-training';
DECLARE dataset_name STRING DEFAULT 'xxxxxxx_bootcamp_dataset_source';
DECLARE table_name_source STRING DEFAULT 'xxxxxxx-bootcamp-table-source';
DECLARE table_name_target STRING DEFAULT 'xxxxxxx-bootcamp-table-target';
DECLARE todays_date STRING;
SET todays_date = FORMAT_DATE("%Y%m%d", CURRENT_DATE());
WITH ORDERS AS (
SELECT
date_order,
area,
customer_name,
SPLIT(order_details, ',') AS items_list,
total_transaction
FROM
`${project_name}.${dataset_name}.${table_name_source}`
), TRANSFORMED_ORDERS AS (
SELECT
date_order,
area,
customer_name,
TRIM(IFNULL(
REGEXP_REPLACE(
item,
r'-\s*\d+(\.\d+)?$',
''
),
item
)) AS item_name,
CAST(NULLIF(TRIM(REGEXP_EXTRACT(item, r'-\s*\d+(\.\d+)?')), '') AS FLOAT64) AS item_price,
total_transaction
FROM ORDERS, UNNEST(items_list) as item
WHERE CAST(NULLIF(TRIM(REGEXP_EXTRACT(item, r'-\s*\d+(\.\d+)?')), '') AS FLOAT64) IS NOT NULL
)
CREATE OR REPLACE TABLE `${project_name}.${dataset_name}.${table_name_target}`
SELECT *
FROM TRANSFORMED_ORDERS;
Once my Subquery ends , whatever I get this error
Expected "(" or "," or keyword SELECT but got keyword CREATE at [36:1]'}
I am not sure where I ma messing up. Any help will be appreciated. I have run the transformation in the BigQuery UI and i am happy with the transformation. It all works ok
I shall be grateful if someone can help
Trying to implement pandas.read_sql function.
I created a clickhouse table and filled it:
create table regions
(
date DateTime Default now(),
region String
)
engine = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY tuple()
SETTINGS index_granularity = 8192;
insert into regions (region) values ('Asia'), ('Europe')
Then python code:
import pandas as pd
from sqlalchemy import create_engine
uri = 'clickhouse://default:#localhost/default'
engine = create_engine(uri)
query = 'select * from regions'
pd.read_sql(query, engine)
As the result I expected to get a dataframe with columns date and region but all I get is empty dataframe:
Empty DataFrame
Columns: [2021-01-08 09:24:33, Asia]
Index: []
UPD. It occured that defining clickhouse+native solves the problem.
Can it be solved without +native?
There is encient issue https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/10. Also there is a hint which assumes to add FORMAT TabSeparatedWithNamesAndTypes at the end of a query. So the init query will be look like this:
select *
from regions
FORMAT TabSeparatedWithNamesAndTypes
Quick one I hope...I am struggling with an Access Query
I need to copy values from Table A into Table B ONLY if they don't already exist in Table B using the MTR# field to determine if exists.
The query will also need to increment tB.ImgRefNum up one from the previous record when inserting.
I need to copy
tA.MTR# to tB.MTR#
tA.MTRF1 to tB.Item
tA.MTRF2 to tB.PONum
tA.MTRF3 to tB.DateRecv **BUT need to cast from text YYYYMMDD to date format)
Table A
TRX Number (number)
MTR# (number)
MTRF1 (text)
MTRF2 (text)
MTRF3 (text) *A date is stored here but textually as YYYYMMDD
Table B
ImgRefNum (number)
MTR# (number)
Item (text)
W (number)
L (number)
Vendor (text)
PONum (number)
DateRecv (date)
Anyone give me a hand?
You can use the following SQL query (don't know exactly what part you were struggling with, so can't provide a specific explanation):
INSERT INTO tB (tB.ImgRefNum, tB.MTR#, tB.Item, tB.PONum, tB.DateRecv)
SELECT (SELECT Max(tB.ImgRefNum)+1 FROM tB) As NewRef, tA.MTR#, tA.MTRF1, tA.MTRF2, DateSerial(CInt(Mid(tA.MTRF2, 1, 4)),CInt(Mid(tA.MTRF2, 5, 2)), CInt(Mid(tA.MTRF2, 7, 2) ))
FROM tA
WHERE (SELECT Count(s.MTR#) FROM tB AS s WHERE s.MTR# = a.MTR#) = 0
Obviously, it is essential that MTRF3 always contains a valid date string exactly formatted YYYYMMDD, else you will run into errors.
Simply use the familiar NOT EXISTS, LEFT JOIN NULL, NOT IN queries with wrangling for your date and max time. Below uses the NOT EXISTS approach and assumes you use month-first dates MM/DD/YYYY (US-based):
INSERT INTO tB ([ImgRefNum], [MTR#], [Item], [PONum], [DateRecv])
SELECT (SELECT Max(sub.[ImgRefNum]) FROM tA sub) + 1,
tA.[MTR#], tA.[MTRF1], tA.[MTRF2],
CDate(Mid(tA.[MTRF3], 5, 2) & "/" & Mid(tA.[MTRF3], 7, 2) & "/" &
LEFT(tA.[MTRF3], 4))
FROM tA
WHERE NOT EXISTS
(SELECT 1 FROM tB sub WHERE sub.[MTR#] = tA.[MTR#])
I would like to write forecasted data into a sql-server using R and RODBC. Each forecast are for the next six hours and I would like to only save the newest generation of each foreacst. Illustrated here:
set.seed(1)
# First forecast at 00:00:00
df.0 <- data.frame(Dates = seq.POSIXt(from = as.POSIXct("2015-10-29 00:00:00"),
to = as.POSIXct("2015-10-29 5:00:00"), by = "hour"),
Value = runif(6, min = 0, max = 6))
# Second forecast at 01:00:00
df.1 <- data.frame(Dates = seq.POSIXt(from = as.POSIXct("2015-10-29 01:00:00"),
to = as.POSIXct("2015-10-29 6:00:00"), by = "hour"),
Value = runif(6, min = 0, max = 6))
Now, at 00:00:00 I would save my first forecast into my data base dbdata:
require(RODBC)
sqlSave(channel = dbdata, data = df.0, tablename = "forecasts",
append = TRUE, rownames = FALSE, fast = FALSE, verbose = TRUE)
# Query: INSERT INTO "forecast" ( "Dates", "Values") VALUES
( '2015-10-29 00:00:00', '1.59')
# Query: INSERT INTO "forecast" ( "Dates", "Values") VALUES
( '2015-10-29 00:00:00', '2.23')
# etc for all 6 forecasts
Now, at 01:00:00 I get a new forecast. I want to save/update this forecast, so I replace all the values from 01:00:00 to 05:00:00 and the add the newest forecast at 06:00:00 as well.
The update works well - so I can overwrite the files - but update can't insert the last 06:00:00 forecast.
sqlUpdate(channel = dbdata, dat = df.1, tablename = "forecasts",
fast = FALSE, index = c("Dates"), verbose = TRUE)
# Query: UPDATE "forecast" SET "Value" = 5.668 WHERE "Dates" = '2015-10-29 00:01:00'
# etc. until
# Error in sqlUpdate(channel = prognoser, dat = df.1[, ],
# table = "forecast", :
# [RODBC] ERROR: Could not SQLExecDirect
# 'UPDATE " "forecast" SET "Value" = 1.059 WHERE "Dates" = '2015-10-29 06:00:00'
So, this can be probably be solved in a lot of ways - but what are the good ways to do this?
I think there must be better ways than to read the table and find out how long the forecast is in the database. Then split the new data into an update and a save part, and write these in.
It is a t-sql, microsoft server. The tables are in the same database - but this a pure coincidence. Which means this: RODBC: merge tables from different databases (channel) shouldn't be a issue and perhaps I can get away with a t-sql "MERGE INTO". But next time I probably won't be able to.
You can try making a conditional insert followed by an update, the conditional insert means you only insert if the Date does not exist yet and the update always succeeds (you do some unnecessary updates if the value was succesfully inserted)
Something like the following for the conditional insert:
INSERT INTO "forecast" ( "Dates", "Values") VALUES ( '2015-10-29 00:00:00', '2.23') where not exists (select 1 from "forecast" where "Dates"='2015-10-29 00:00:00')
Background:
I am wanting to run this merge inside a procedure on a schedule. I have to insert new data into the sql database table and if the data exist, I am wanting to update the quantities.
Problem:
I am trying to do a merge from an Oracle database to a sql database and getting an error (see the title of this question). I have tried using the merge with the same sql script used to create the view and it returned the same error.
Question:
Is the problem something in my code (see below)?
MERGE INTO "receipt_details"#invware d
USING (
SELECT *
FROM raf_po_receiving_details_v
WHERE last_update_date >= '1-AUG-2013' ) s
ON ( "po_header_id" = s.po_header_id
and "po_line_id" = s.po_line_id )
WHEN MATCHED THEN UPDATE
SET "purchased_qty" = s.purchased_qty,
"qty_received" = s.qty_received,
"balance_remaining" = s.balance_remaining,
"qty_billed" = s.qty_billed
WHEN NOT MATCHED THEN INSERT ( "warehouse_code", "po_number", "po_header_id",
"vendor_name", "line_num",
"item_code", "purchased_qty", "qty_received",
"rcv_by", "balance_remaining", "qty_billed",
"closed_code", "rec_date", "need_by_date", "po_line_id" )
VALUES (s.warehouse_code, s.po_number, s.po_header_id, s.vendor_name,
s.line_num, s.item_code, s.purchased_qty, s.qty_received, s.rcv_by,
s.balance_remaining, s.qty_billed, s.closed_code, s.rec_date, s.need_by_date, po_line_id);