BigQuery LEFT OUT JOIN error for a temp function? - google-bigquery

I have a table I have create in BigQuery.
In that table I have some placeholder values in fields.
I then thought I could simply update the placeholder values, so I wrote a small temp function
CREATE TEMP FUNCTION getBoroughFromCoords(longitude FLOAT64, latitude FLOAT64 )
RETURNS STRING
AS ((SELECT CAST(UPPER(tz_loc.borough)as STRING) FROM `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` tz_loc WHERE (ST_DWithin(tz_loc.zone_geom, ST_GeogPoint(longitude, latitude),0)) )
);
and tested it like this sql select getBoroughFromCoords(-73.95908, 40.705246) and it returned "BROOKLYN" - which was great.
But when I try to update a value (and I am specifially targetting a single row here for testing) using:
UPDATE `project-id.datasetid.collated_data`
SET NEIGHBORHOOD = (select getBoroughFromCoords(LONG, LAT))
WHERE collision_date = "2019-10-27" AND LAT = 40.705246 AND LONG = -73.95908
I get the error "LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join."
What I don't understand is why? I mean, this function is standalone, correct? it just takes in 2 params and returns a string, so why does it suddenly want a join?

You encounter the LEFT OUTER JOIN error since you are not passing values to LONG and LAT on your subquery. To fix this you should SET values to these variables at the start of your query.
DECLARE long_var, lat_var FLOAT64;
SET long_var = -73.95908;
SET lat_var = 40.705246;
To test this out, I just created a table with dummy values just enough to satisfy your WHERE conditions.
Query used:
DECLARE long_var, lat_var FLOAT64;
SET long_var = -73.95908;
SET lat_var = 40.705246;
CREATE TEMP FUNCTION getBoroughFromCoords(longitude FLOAT64, latitude FLOAT64 )
RETURNS STRING
AS ((SELECT CAST(UPPER(tz_loc.borough)as STRING) FROM `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` tz_loc
WHERE (ST_DWithin(tz_loc.zone_geom, ST_GeogPoint(longitude, latitude),0)))
);
UPDATE `project-id.dataset_id.collated_data`
SET NEIGHBORHOOD = (select getBoroughFromCoords(long_var, lat_var))
WHERE collision_date = "2019-10-27" AND LAT = lat_var AND LONG = long_var
Output:
Query project-id.dataset_id.collated_data table:

Related

One-Time Use Randomized Number

Tables(columns) that are important here (there are more in my actual DB but I simplified):
Patients(PatientID, PatientNumber, DOB, Weight, RandomCode)
RandomNumbers(RandomNumberID, Available)
Sites(SiteID, SiteNumber)
Patients does not contain any data and is to be populated via a 3rd party GUI. RandomNumbers contains 50 entries. When a patient's data is first added to Patients, a random number 1-50 is selected, which is tied to RandomNumberID. The "available" column in RandomNumbers is treated as a boolean (T meaning available, F meaning unavailable). After the RandomNumberID is tied to a patient, that RandomNumberID cannot be used again (Available is switched to "F"). If an already used RandomNumber is assigned to a patient upon data entry, another randomly selected RandomNumberID is selected until one is found where Available is T.
Here is the storedproc that I've written so far:
GO
CREATE PROCEDURE uspPatientEntry
#intPatientID AS INTEGER OUTPUT
,#intPatientNumber AS INTEGER
,#intSiteID AS INTEGER
,#intRandomCode AS INTEGER
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN TRANSACTION
SELECT #intPatientID = MAX (intPatientID) + 1
FROM TPatients TP (TABLOCKX)
SELECT #intPatientID = COALESCE(#intPatientID, 1)
SELECT #intPatientNumber = CONCAT(#intSiteID, #intPatientID)
INSERT INTO TPatients (intPatientID, intPatientNumber, intSiteID, intRandomCodeID)
VALUES (#intPatientID, #intPatientNumber, #intSiteID, #intRandomCode)
COMMIT TRANSACTION
GO
And here is another function I wrote to test it out:
DECLARE #intPatientID AS INTEGER = 0;
DECLARE #intSiteNumber AS INTEGER = 0;
DECLARE #intPatientNumber AS INTEGER = CONCAT(#intSiteNumber, #intPatientID);
DECLARE #intRandomCode AS INTEGER = FLOOR(RAND()*50)+1; -- This just picks a random integer 1-40. <- this is the heart of my question
EXECUTE uspPatientEntry #intPatientID OUTPUT, #intPatientNumber, 2, #intRandomCode
Would something like this work? Ignore syntax, just the general idea:
DECLARE #intRandomCode AS INTEGER;
SELECT #intRandomCode = FLOOR(RAND()*50+1;
FROM TRandomCodes TRC
WHERE TRC.blnAvailable = "T"
UPDATE TRandomCodes TRC
SET TRC.blnAvailable = "F"
WHERE #intRandomCode = TRC.intRandomNumberID;
Thank you

DB2 SQL function returning multiple values when I am expecting only one

I am trying to get the location of the last time an item was moved via sql function with the code below. Pretty basic, I'm just trying to grab the max date and time. If I run the sql as a regular select and hard code an item number in ATPRIM I get only one location. But if I create this function and then try to run it and then pass the function an item number I get every occurrence in the history file instead of just the MAX which would be the most recent. Also I have tried a Select Distinct and that did not do anything for me.
ATOGST = Item Location
ATPRIM = Item
ATDATE = Date
ATTIME = Time
CREATE FUNCTION ERPLXU/F#QAT1(AATPRIM VARCHAR(10))
RETURNS CHAR(50)
LANGUAGE SQL
NOT DETERMINISTIC
BEGIN DECLARE F#QAT1 CHAR(50) ;
SET F#QAT1 = ' ' ;
SELECT ATOGST
INTO F#QAT1 FROM ERPLXF/QAT as t1
WHERE ATPRIM = AATPRIM
AND ATDATE = (SELECT MAX(ATDATE) FROM ERPLXF/QAT AS T2
WHERE T2.ATPRIM = AATPRIM)
AND ATTIME = (SELECT MAX(ATTIME) FROM ERPLXF/QAT AS T3
WHERE T3.ATPRIM = AATPRIM
AND T3.ATDATE = T1.ATDATE) ;
RETURN F#QAT1 ;
END
EDIT:
So what I am trying to do is get that location and I got it to work on my iSeries in strsql but the problem is we use a web application called Web Object Wizard (WoW) which lets us use sql to make reports that are more user friendly. Below is what I was trying to get to work but the subquery in the select does not work in WoW so that is where I was trying create a function which we know works in other applications.
SELECT distinct t1.atprim, atdesc, dbtabl, dbdtin, dblife, dblpdp,
dbcost, dbbas, dbresv, dbyrdp, dbcurr,
(select atogst
from erplxf.qat as t2
where t1.atprim = t2.atprim and atdate = (select max(atdate) from
erplxf.qat as t3 where t2.atprim = t3.atprim) and attime = (select
max(attime) from erplxf.qat as t4 where t1.atprim = t4.atprim and
t1.atdate = t4.atdate)
) as #113_ToLoc
FROM erplxf.qat as t1 join erplxf.qdb on atassn = dbassn
where dbrcid = 'DB'
and dbcurr != 0
So instead of that subquery at the end of the select it would just be
, erplxu.f#qat1(atprim) as #113_ToLoc
Try this:
CREATE FUNCTION ERPLXU/F#QAT1(AATPRIM VARCHAR(10))
RETURNS CHAR(50)
LANGUAGE SQL
RETURN
SELECT ATOGST
FROM ERPLXF/QAT
WHERE ATPRIM = AATPRIM
ORDER BY ATDATE DESC, ATTIME DESC
FETCH FIRST 1 ROW ONLY;

Is it possible to invoke BigQuery procedures in python client?

Scripting/procedures for BigQuery just came out in beta - is it possible to invoke procedures using the BigQuery python client?
I tried:
query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)
print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?
Thank you!
Edited - stored procedure I am calling
CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
THEN
SET saleExists = (SELECT 1);
ELSE
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
SET saleExists = (SELECT 0);
END IF;
END;
If you follow the CALL command with a SELECT statement, you can get the return value of the function as a result set. For example, I created the following stored procedure:
BEGIN
-- Build an array of the top 100 names from the year 2017.
DECLARE
top_names ARRAY<STRING>;
SET
top_names = (
SELECT
ARRAY_AGG(name
ORDER BY
number DESC
LIMIT
100)
FROM
`bigquery-public-data.usa_names.usa_1910_current`
WHERE
year = 2017 );
-- Which names appear as words in Shakespeare's plays?
SET
top_shakespeare_names = (
SELECT
ARRAY_AGG(name)
FROM
UNNEST(top_names) AS name
WHERE
name IN (
SELECT
word
FROM
`bigquery-public-data.samples.shakespeare` ));
END
Running the following query will return the procedure's return as the top-level results set.
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
In Python:
from google.cloud import bigquery
client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)
Related: If you have SELECT statements within a stored procedure, you can walk the job to fetch the results, even if the SELECT statement isn't the last statement in the procedure.
# TODO(developer): Import the client library.
# from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()
# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)
# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))
# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))
# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
child_rows = list(child_job.result())
print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))
It works if you have SELECT inside your procedure, given the procedure being:
create or replace procedure dataset.proc_output() BEGIN
SELECT t FROM UNNEST(['1','2','3']) t;
END;
Code:
from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
print result
will output:
Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})
However, if there are multiple SELECT inside a procedure, only the last result set can be fetched this way.
Update
See below example:
CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
IF saleExists = 0 THEN
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
END IF;
END;
BEGIN
DECLARE saleExists INT64;
CALL zyun.exists("US", DATE "2019-2-1", saleExists);
SELECT saleExists;
END
BTW, your example is much better served with a single MERGE statement instead of a script.

Reusing calculated column in update in same code-block

I am trying to find the (best) way to update column of table based on calculated another column.
First of all, I am using function to calculate value for 'BAS_CBR_EB_RWA' column and then I am using 'BAS_CBR_EB_RWA' value as an input for 'BAS_CBR_EB_TOTAL_CAPITAL' column calculation, as shown in below-mentioned code.
Could you please share how to reuse it to do next calculation. replacing 'BAS_CBR_EB_RWA' with right-hand calculation isn't desirable because we have too many calculation of this type and it'll confuse other users.
Thanks in advance for help.
IF INSTTABLE = 16 THEN
UPDATE LAO_DATA
SET BAS_CBR_EB_RWA = BAS2_RWA_CALC(BAS_CAPITAL_CALC_CD,
CBR_CUR_BOOK_BAL,
BAS_CAP_FACTOR_K,
V_BASEL_MIN,
V_BAS_RWA_RATE),
BAS_CBR_EB_TOTAL_CAPITAL = ROUND(BAS2_MGRL_CAPITAL(V_DATE,
BAS_CBR_EB_RWA,
0),
2),
WHERE (AS_OF_DATE = V_DATE);
--COMMIT;
END IF;
In Oracle, you can update a subquery. I'm not 100% sure if it works for UDFs, but you can try:
UPDATE (SELECT LD.*,
BAS2_RWA_CALC(BAS_CAPITAL_CALC_CD, CBR_CUR_BOOK_BAL, BAS_CAP_FACTOR_K, V_BASEL_MIN, V_BAS_RWA_RATE) as new_BAS_CBR_EB_RWA
FROM LAO_DATA LD
)
SET BAS_CBR_EB_RWA = new_BAS_CBR_EB_RWA,
BAS_CBR_EB_TOTAL_CAPITAL = ROUND(BAS2_MGRL_CAPITAL(V_DATE, nw_BAS_CBR_EB_RWA, 0), 2),
WHERE AS_OF_DATE = V_DATE;
A MERGE statement can be used. You may also replace ROWID with the primary key or a Unique key of the table.
Put all your first level of calculations with function calls inside the USING() block and the second level of calculation for RHS in the SET expression
MERGE INTO lao_data t
USING (
SELECT ROWID AS rid,bas2_rwa_calc(bas_capital_calc_cd,
cbr_cur_book_bal,
bas_cap_factor_k,
v_basel_min,v_bas_rwa_rate
) AS new_BAS_CBR_EB_RWA
FROM lao_data
WHERE as_of_date = V_DATE
)
s ON ( s.rid = t.rowid )
WHEN MATCHED THEN UPDATE
SET t.bas_cbr_eb_rwa = s.new_BAS_CBR_EB_RWA
t.bas_cbr_eb_total_capital
= round(bas2_mgrl_capital(v_date,s.nw_BAS_CBR_EB_RWA,0), 2) );

Error creating function in DB2 with params

I have a problem with a function in db2
The function finds a record, and returns a number according to whether the first and second recorded by a user
The query within the function is this
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = '0010111003')
) AS T WHERE POLLIFE = '0010111003'
And works perfect
I create the function with this code
CREATE FUNCTION LIBWEB.BNOWPAPOL(POL CHAR)
RETURNS DECIMAL(7,2)
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
RETURN (
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = POL)
) AS T WHERE POLLIFE = POL
)
The command runs executed properly
WARNING: 17:55:40 [CREATE - 0 row(s), 0.439 secs] Command processed.
No rows were affected
When I want execute the query a get a error
SELECT LIBWEB.BNOWPAPOL('0010111003') FROM DATAS.DUMMY -- dummy has only one row
I get
[Error Code: -204, SQL State: 42704] [SQL0204] BNOWPAPOL in LIBWEB
type *N not found.
I detect, when I remove the parameter the function works fine!
With this code
CREATE FUNCTION LIBWEB.BNOWPAPOL()
RETURNS DECIMAL(7,2)
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
RETURN (
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = '0010111003')
) AS T WHERE POLLIFE = '0010111003'
)
Why??
This statement:
SELECT LIBWEB.BNOWPAPOL('0010111003') FROM DATAS.DUMMY
causes this error:
[Error Code: -204, SQL State: 42704] [SQL0204] BNOWPAPOL in LIBWEB
type *N not found.
The parm value passed into the BNOWPAPOL() function is supplied as a quoted string with no definition (no CAST). The SELECT statement assumes that it's a VARCHAR value since different length strings might be given at any time and passes it to the server as a VARCHAR.
The original function definition says:
CREATE FUNCTION LIBWEB.BNOWPAPOL(POL CHAR)
The function signature is generated for a single-byte CHAR. (Function definitions can be overloaded to handle different inputs, and signatures are used to differentiate between function versions.)
Since a VARCHAR was passed from the client and only a CHAR function version was found by the server, the returned error fits. Changing the function definition or CASTing to a matching type can solve this kind of problem. (Note that a CHAR(1) parm could only correctly handle a single-character input if a value is CAST.)