Within BigQuery I want to declare a list of 5-digit zipcodes and then refer to the list throughout different elements of the rest of the code.
I've tried to do something like:
DECLARE monday ARRAY<int>
('98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110');
SELECT
CASE WHEN SUBSTR(o.shipping_address.zip, 0, 5) IN --ARRAY NAMED MONDAY-- THEN 'Monday' END
But can't really seem to get it to work correctly. As it stands now my code works with declaring the same list over and over but I know there's gotta be a way to declare once and then re-use the declaration wherever.
Thanks!
I think you want unnest():
case when SUBSTR(o.shipping_address.zip, 1, 5) in (select mon from unnest(monday) mon)
Note that substr() starts with 1.
Of course, if you want your code to work, you should use proper typing:
declare monday ARRAY<string>;
set monday = array['98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110'];
select *
from unnest(monday) mon;
Before providing a query I have some considerations about your code. First, you specified a ARRAY and passed a tuple of STRINGS. Second, I assumed you want to pass a ARRAY of STRINGS because you use the SUBSTR() built-in method which receives a string. Lastly, it is in my understanding that you will have a static array with the zip codes for Monday then it will be used to evaluate each zip code in your table if it belongs to Monday.
Having said that, I was able to create a query using JavaScript UDF in BigQuery to perform what you aim. Below is the Query in StandardSQL.
DECLARE monday ARRAY<STRING> DEFAULT ['98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110'];
CREATE TEMP FUNCTION check( monday_code ARRAY<STRING> ,x STRING)
RETURNS BOOL
LANGUAGE js AS """
return monday_code.includes(x);
""";
#Sample data within WITH
WITH post_code AS (
SELECT '123456229' as zip_code UNION ALL
SELECT '123456789' as zip_code UNION ALL
SELECT '98198' as zip_code UNION ALL
SELECT '98003' as zip_code UNION ALL
SELECT '98023' as zip_code
)
SELECT substr(zip_code,1,5) as zip_code,
CASE WHEN check(monday,SUBSTR(zip_code, 1,5)) THEN 'MONDAY'
ELSE 'Not identified'
END AS DAY
from post_code
And the output,
Row zip_code DAY
1 12345 Not identified
2 12345 Not identified
3 98198 MONDAY
4 98003 MONDAY
5 98023 MONDAY
Notice that, I have used the DECLARE method with a DEFAULT array specified. Then, within the JavaScript UDF, it received the monday array and the row's current value. So, it checks if the value is inside the monday array using the include() built-in JS method which returns TRUE if the zip code is within the array and FALSE otherwise. Finally, in the last SELECT statement, CASE WHEN is used in order to specify MONDAY if True, meaning that the array contains the passed value.
Related
In one of the database tables, I have a nvarchar type field that contains a series of special strings combined with some special characters. For example:
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD'
or
'SJDGh-SUDYSUI-jhsdhsj-YTsagh-ytetyyuwte-sagd'
or
'hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn'
Based on a formula, a sub string is always returned after the special character. But this string may be after the first, second or third place of the special character - or _ or ~. I used Charindex and Substring function in SQL server. But always only the first part of the character string after the selected character is returned. for example:
select SUBSTRING ('hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn', 0, CHARINDEX('~', 'hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn', 0))
returned value: hwerweyri
If there is a solution for this purpose or you have a piece of code that can work in solving this problem, please advise.
It is important to mention that the location of the special character must be entered by ourselves in the function, for example, after the third repetition or the second repetition or the tenth repetition. The method or code should be such that the location can be entered dynamically and the function does not need to be defined statically.
For Example:
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> 3rd substring ==> 'GFSDGFSHDGF'
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> second substring ==> 'HGSDHGJD'
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> 1st substring ==> 'JHJSD'
And The formula will be sent to the function through a programmed form and the generated numbers will be numbers between 1 and 15. These numbers are actually the production efficiency of a product whose form is designed in C# programming language. These numbers sent to the function are variable and each time these numbers may be sent to the function and applied to the desired character string. The output should look something like the one above. I don't know if I managed to get my point across or if I managed to make my request correctly or not.
Try the following function:
CREATE FUNCTION [dbo].[SplitWithCte]
(
#String NVARCHAR(4000),
#Delimiter NCHAR(1),
#PlaceOfDelimiter int
)
RETURNS Table
AS
RETURN
(
WITH SplitedStrings(Ends,Endsp)
AS (
SELECT 0 AS Ends, CHARINDEX(#Delimiter,#String) AS Endsp
UNION ALL
SELECT Endsp+1, CHARINDEX(#Delimiter,#String,Endsp+1)
FROM SplitedStrings
WHERE Endsp > 0
)
SELECT f.DataStr
FROM (
SELECT 'RowId' = ROW_NUMBER() OVER (ORDER BY (SELECT 1)),
'DataStr' = SUBSTRING(#String,Ends,COALESCE(NULLIF(Endsp,0),LEN(#String)+1)-Ends)
FROM SplitedStrings
) f WHERE f.RowId = #PlaceOfDelimiter + 1
)
How to use:
select * from [dbo].[SplitWithCte](N'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD', N'_', 3)
or
select DataStr from [dbo].[SplitWithCte](N'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD', N'_', 3)
Result: GFSDGFSHDGF
While writing SQL queries on BigQuery UI, sometimes I do a lot of JOIN over multiple tables using many WHERE clauses with the same date condition for each table.
Whenever I need to see the results for a different date, I have to replace it at multiple locations in the query. I wonder if there is a simple way to use a variable in the BQ SQL Editor and pass it just once (top/bottom)?
This is true for all the complicated queries as we have to search throughout the query for variables to change.
While parameterized queries are not available in the Console. You can use Scripting, instead.
According to your need, you can use DECLARE and/or SET. It is stated in the documentation that:
DECLARE: Declares a variable of the specified type. If the DEFAULT clause is specified, the variable is initialised with the
value of the expression; if no DEFAULT clause is present, the variable
is initialised with the value NULL
The syntax is as follows:
#Declaring the variable's type and initialising the variable using DEFAULT
DECLARE variable STRING DEFAULT 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
SET: Sets a variable to have the value of the provided expression, or sets multiple variables at the same time based on the
result of multiple expressions.
And the syntax, as below:
#Declaring the variable using Declare and its type
DECLARE variable STRING;
#Initialising the variable
SET variable = 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
In both examples above, I have queried against a public dataset bigquery-public-data.san_francisco_bikeshare.bikeshare_regions. Also, both outputs are the same,
Row region_id name
1 3 San Francisco
In addition to the above example, more specifically to your case, when declaring a variable as date you can to it as follows:
DECLARE data1 DATE DEFAULT DATE(2019,02,15);
WITH sample AS(
SELECT DATE(2019,02,15) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,16) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,17) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,18) AS date_s, "Some other field!" AS string
)
SELECT * FROM sample
WHERE date_s = data1
And the output,
Row date_s string1
1 2019-02-15 Some other field!
I am using SQL Server Management Studio 2012. I work with medical records and need to de-identify reports. The reports are structured in a table with columns Report_Date, Report_Subject, Report_Text, etc... The string I need to update is in report_text and there are ~700,000 records.
So if I have:
"patient had an EKG on 04/09/2012"
I need to replace that with:
"patient had an EKG on [DEIDENTIFIED]"
I tried
UPDATE table
SET Report_Text = REPLACE(Report_Text, '____/___/____', '[DEIDENTIFED]')
because I need to replace anything in there that looks like a date, and it runs but doesn't actually replace anything, because apparently I can't use the _ wildcard in this command.
Any recommendations on this? Advance thanks!
You can use PATINDEX to find the location of Date and then use SUBSTRING and REPLACE to replace the dates.
Since there may be multiple dates in the Text you have to run a while loop to replace all the dates.
Below sql will work for all dates in the form of MM/DD/YYYY
WHILE EXISTS( SELECT 1 FROM dbo.MyTable WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0 )
BEGIN
UPDATE t
SET Report_Text = REPLACE(Report_Text, DateToBeReplaced, '[DEIDENTIFIED]')
FROM ( SELECT * ,
SUBSTRING(Report_Text,PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text), 10) AS DateToBeReplaced
FROM dbo.MyTable AS a
WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0
) AS t
END
I have tested the above sql on a dummy table with few rows.I don't know how it will scale for your data but recommend you to give it a try.
To keep it simple, assume that a number represents an identifying element in the string so look for the position of the first number in the string and the position of the last number in the string. Not sure if this will apply to your entire set of records but here is the code ...
I created two test strings ... the one you supplied and one with the date at the beginning of the string.
Declare #tstString varchar(100)
Set #tstString = 'patient had an EKG on 04/09/2012'
Set #tstString = '04/09/2012 EKG for patient'
Select #tstString
-- Calculate 1st Occurrence of a Number
,PATINDEX('%[0-9]%',#tstString)
-- Calculate last Occurrence of a Number
,LEN(#tstString) - PATINDEX('%[0-9]%',REVERSE(#tstString))
,CASE
-- No numbers in the string, return the string
WHEN PATINDEX('%[0-9]%',#tstString) = 0 THEN #tstString
-- Number is the first character to find the last position and remove front
WHEN PATINDEX('%[0-9]%',#tstString) = 1 THEN
CONCAT('[DEIDENTIFIED]',SUBSTRING(#tstString, LEN(#tstString)-PATINDEX('%[0-9]%',REVERSE(#tstString))+2,LEN(#tstString)))
-- Just select string up to the first number
ELSE CONCAT(SUBSTRING(#tstString,1,PATINDEX('%[0-9]%',#tstString)-1),'[DEIDENTIFIED]')
END AS 'newString'
As you can see, this is messy in SQL.
I would rather achieve this with a parser service and move the data with SSIS and call the service.
I am trying to call/convert a numeric variable into string inside a user-defined function. I was thinking about using to_char, but it didn't pass.
My function is like this:
create or replace function ntile_loop(x numeric)
returns setof numeric as
$$
select
max("billed") as _____(to_char($1,'99')||"%"???) from
(select "billed", "id","cm",ntile(100)
over (partition by "id","cm" order by "billed")
as "percentile" from "table_all") where "percentile"=$1
group by "id","cm","percentile";
$$
language sql;
My purpose is to define a new variable "x%" as its name, with x varying as the function input. In context, x is numeric and will be called again later in the function as a numeric (this part of code wasn't included in the sample above).
What I want to return:
I simply want to return a block of code so that every time I change the percentile number, I don't have to run this block of code again and again. I'd like to calculate 5, 10, 20, 30, ....90th percentile and display all of them in the same table for each id+cm group.
That's why I was thinking about macro or function, but didn't find any solutions I like.
Thank you for your answers. Yes, I will definitely read basics while I am learning. Today's my second day to use SQL, but have to generate some results immediately.
Converting numeric to text is the least of your problems.
My purpose is to define a new variable "x%" as its name, with x
varying as the function input.
First of all: there are no variables in an SQL function. SQL functions are just wrappers for valid SQL statements. Input and output parameters can be named, but names are static, not dynamic.
You may be thinking of a PL/pgSQL function, where you have procedural elements including variables. Parameter names are still static, though. There are no dynamic variable names in plpgsql. You can execute dynamic SQL with EXECUTE but that's something different entirely.
While it is possible to declare a static variable with a name like "123%" it is really exceptionally uncommon to do so. Maybe for deliberately obfuscating code? Other than that: Don't. Use proper, simple, legal, lower case variable names without the need to double-quote and without the potential to do something unexpected after a typo.
Since the window function ntile() returns integer and you run an equality check on the result, the input parameter should be integer, not numeric.
To assign a variable in plpgsql you can use the assignment operator := for a single variable or SELECT INTO for any number of variables. Either way, you want the query to return a single row or you have to loop.
If you want the maximum billed from the chosen percentile, you don't GROUP BY x, y. That might return multiple rows and does not do what you seem to want. Use plain max(billed) without GROUP BY to get a single row.
You don't need to double quote perfectly legal column names.
A valid function might look like this. It's not exactly what you were trying to do, which cannot be done. But it may get you closer to what you actually need.
CREATE OR REPLACE FUNCTION ntile_loop(x integer)
RETURNS SETOF numeric as
$func$
DECLARE
myvar text;
BEGIN
SELECT INTO myvar max(billed)
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE sub.tile = $1;
-- do something with myvar, depending on the value of $1 ...
END
$func$ LANGUAGE plpgsql;
Long story short, you need to study the basics before you try to create sophisticated functions.
Plain SQL
After Q update:
I'd like to calculate 5, 10, 20, 30, ....90th percentile and display
all of them in the same table for each id+cm group.
This simple query should do it all:
SELECT id, cm, tile, max(billed) AS max_billed
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE (tile%10 = 0 OR tile = 5)
AND tile <= 90
GROUP BY 1,2,3
ORDER BY 1,2,3;
% .. modulo operator
GROUP BY 1,2,3 .. positional parameter
It looks like you're looking for return query execute, returning the result from a dynamic SQL statement:
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html
http://www.postgresql.org/docs/current/static/plpgsql-statements.html
I have this code snippet I'm playing with (forgive the generic names):
create function GetList
(#d1 varchar(3), #d2 varchar(3), #d3 varchar(3))
returns table
as
return
with List
as
(
select x.pattern
from (values (#d1), (#d2), (#d3)) as x(pattern)
)
select * from list
This is eventually going to be a user-supplied list which they will use to query something else out, but playing around with this made me curious. If I were to run
select * from GetList('1111111','222','333')
I will get the same results as if I only entered in 3 characters for each. Since I limited the varchar parameter to characters, are the others completely ignored? Is there any potential nastiness that can happen if I have a varchar parameter that is 'overflowed' like this (other than the loss of data at the end of the string, of course)
The other characters totally ignored since you limited the parameter to a length of 3.
The only issue that you could have is if you actually wanted to return the characters that are over the length of 3.
For example, you pass in 1234567 and you actually want the whole value, you will only get 123.
If you are limiting the input parameter to 3, then there would be no reason to try and pass in a longer value. If there is a chance that you will pass in longer values, then you should increase the length of the parameter.