Evaluating a variable using the IN() Function - sql

I'm trying to resolve a datastep variable in the in() function. I have a dataset that looks like the following:
|Run|Sample Level|Samples Tested|
| 1 | 1 | 1-5 |
| 1 | 2 | 1-5 |
...etc
| 1 | 5 | 1-5 |
---------------------------------
| 2 | 1 | 1-4 |
| 2 | 2 | 1-4 |
The samples tested vary by run. Normally the only sample levels in the dataset are the ones in the range provided by "Samples Tested". However occasionally this is not the case, and it can get messy. For example the last one I worked on looked like this:
|Run|Sample Level|Samples Tested|
| 1 | 1 |2-9, 12-35, 37-40|
In this case I'd want to drop all rows with sample levels that were not included in Samples Tested, which I did by manually adding the code:
Data Want;
set Have;
if sample_level not in (2:9, 12:35, 37:40) then delete;
run;
But what I want to do is have this done automatically by looking at the samples tested column. It's easy enough to turn a "-" into a ":", but where I'm stuck is getting the IN() function to recognize or resolve a variable. I would like code that looks like this: if sample_level not in(Samples_Tested) then delete; where samples_tested has been transformed to be something that the IN() function can handle. I'm also not opposed to using proc sql; if anyone has a solution that they think will work. I know you can do things like
Proc sql; Create table want as select * from HAVE where Sample_Level in (Select Samples_Tested from Have); Quit;
But the problem is that the samples tested varies by run and there could be 16 different runs. Hopefully I've explained the challenge clearly enough. Thanks for taking the time to read this and thanks in advance for your help!

Assuming the values of SAMPLES_TESTED is constant for each value of RUN you could use it to generate the selection criteria. For example you could use a data _null_ step to write a WHERE statement to a file and then %include that code into another data step.
filename code temp;
data _null_;
file code;
if eof then put ';';
set have end=eof;
by run;
if first.run;
if _n_=1 then put 'where ' # ;
else put ' or ' # ;
samples_tested=translate(samples_tested,':','-');
put '(' run= 'and sample_level in (' samples_tested '))';
run;
data want;
set have;
%include code;
run;
Note: IN is an operator and not a function.

Good to see SAS code ;-)
That would work with one range:
select * from HAVE where level in (tested);
For multiple ranges I would use SUBSTRING_INDEX in MySQL or just combination of SUBSTRING and INDEX to find next condition.
select * from HAVE where level in (tested1) or level in (tested2) or level in (tested3);
Where you replace tested1 for example as substr(tested,1, index(tested,',')
I used the following to generate sample:
create table have
(run int,
level int,
tested varchar(20));
INSERT INTO have (run, level, tested)
VALUES (1, 1, "3-5");
INSERT INTO have (run, level, tested)
VALUES (1, 3, "3-5, 12:35");
INSERT INTO have (run, level, tested)
VALUES (1, 20, "3-5, 12-35");

Related

How to create a BQ SQL UDF that iterates over a string?

TL;DR:
Is there a way to do string manipulation in BQ only with SQL UDF?
Eg:
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
select removeExtraEqualToFromPayload(payload) from table
should give
____________________________________________________
payload
----------------------------------------------------
key1=val1&key2=val2&key3=val3&key4=val4
----------------------------------------------------
key5=val5&key6=val6
Long version:
My goal is to iterate over a string that is part of one of the columns
This is our table structure
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
As you see, key3 in first row has an = after val3 and key6 in second row has an = after val6 which is not desired for us
So the goal is to iterate over the string and remove these extra =
I had gone through https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions that explains how to use custom functions in BQ. As of now SQL UDF only supports SQL query, where as with JS UDF we can write our custom logic to add loops etc
Since JS UDF is very slow, using it has been ruled out and we only had to rely on SQL UDF.
I thought of using BQ Scripting(https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting) in combination with SQL UDF but that doesn't seem to work. Looks like script has to be altogether different
I had explored stored procedures with BQ for the same, however, that is also not working. I'm not sure if I am doing it right
I've created a procedure like this:
CREATE PROCEDURE test.AddDelta(INOUT x INT64, delta INT64)
BEGIN
SET x = x + delta;
END;
I'm not able to use the above procedure like this:
with ta as (select 1 id union all select 2 id)
select id from ta;
call test.AddDelta(id, 1);
select id;
I'm wondering if there is a way to parse strings like this without using Javascript UDF
Disclaimer: My regex-fu is not good. definitely have a look at the re2 syntax
You should be able to do it with REGEXP_REPLACE
SELECT
payload,
REGEXP_REPLACE(payload,r'=(&)|=$','\\1') AS payload_clean
FROM
`myproject.mydataset.mytable`
example output:
payload
payload_clean
key1=val1&key2=val2&key3=val3=&key4=val4=
key1=val1&key2=val2&key3=val3&key4=val4
Executable example:
WITH
payload_table AS (
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4" AS payload UNION ALL
SELECT "key5=val5&key6=val6=" AS payload UNION ALL
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4=" AS payload UNION ALL
SELECT "key3=val3=abc&key4=val4" AS payload
)
SELECT
payload,
REGEXP_REPLACE(payload,r'(=val\pN)=(\pL*&)|=(&)|=$','\\1\\2') AS payload_clean
FROM
payload_table
Of course (=val\pN)=(\pL*&) in the pattern won't necessarily work for you since you probably have different patterns. If there are no patterns to match then I'm not sure how you will remove the extra '=' from your strings automatically.

Do Loop in SAS SQL for Creating Column

I am new to SAS and SQL. I have a task to create similar column but with different number.
For example: | DATE | NAME | A1 | A2 | A3 | B |
So I code in SAS like this
PROC SQL;
CREATE TABLE TEST AS
SELECT DATE, NAME,
DO i = 1 to 3
0 AS A&i.,
END
1 as B
FROM SOURCE;
QUIT;
When I run, I got this error
Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, AND, EQ,
EQT, GE, GET, GT, GTT, LE, LET, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
I appreciate any kind of help. Thank you.
I think you should use macro code to generate column names depend on loop counter.
For example, in your case:
%macro create_table(); %macro d; %mend d;
PROC SQL;
CREATE TABLE TEST AS
SELECT DATE, NAME,
%DO i = 1 %to 3;
0 AS A&i.,
%END;
1 as B
FROM SOURCE;
QUIT;
%mend create_table;
%create_table();
Output:
+-------+------+----+----+----+---+
| date | name | A1 | A2 | A3 | B |
+-------+------+----+----+----+---+
In addition, there is another way to complete task. Use data step instead of proc sql:
data test(drop=i);
set source;
array a{3};
do i=1 to 3;
a{i} = 0;
end;
b=1;
run;
Data step array statement can be used to create variables (columns in SQL terminology) and initialize them. The retain statement will initialize a variable (once) to a value that is maintained from output row to row. This is different from b=1; which will perform the value assignment as each row is processed.
data want;
set source;
array a(3) (3*0); /* initialization syntax, 3*0 means 3 zero (0) values */
retain b 1;
run;
Variables created by the array statement are the array name suffixed with a sequential number starting from 1. The name list syntax can be used in array to create variables with a different base name and sequence range. The name, such as
array a(3) x7-x9 (3*0);

SAS : PROC SQL : How to read a character format (dd/mm/yyyy) as date format without creating new column?

I have a character column which has dates (dd/mm/yyyy) in character format.
While applying filter (where clause), I need that these characters are recognized as dates in the where statement, without actually making any change to the existing column or without creating a new column.
How can I make this happen.
Any help would be deeply appreciated.
Thank you.
In proc sql, you can come close with like:
select (case when datecol like '__/__/____'
then . . .
else . . .
end)
This is only an approximation. _ is a wildcard that matches any character, not just numbers. On the other hand, this is standard SQL, so it will work in any database.
The SAS INPUT function with a ? informat modifier will convert a string (source value) to a result and not show an error if the source value is not conformant to the informat.
INPUT can be used in a WHERE statement or clause. The input can also be part of a BETWEEN statement.
* some of these free form values are not valid date representations;
data have;
length freeform_date_string $10;
do x = 0 to 1e4-1;
freeform_date_string =
substr(put(x,z4.),1,2) || '/' ||
substr(put(x,z4.),3,2) || '/' ||
'2018'
;
output;
end;
run;
* where statement;
data want;
set have;
where input(freeform_date_string,? ddmmyy10.);
run;
* where clause;
proc sql;
create table want2 as
select * from have
where
input(freeform_date_string,? ddmmyy10.) is not null
;
* where clause with input used with between operator operands;
proc sql;
create table want3 as
select * from have
where
input(freeform_date_string,? ddmmyy10.)
between
'15-JAN-2018'D
and
'15-MAR-2018'D
;
quit;
It is not great idea to store date as character value, it can lead to lot of data accuracy related issues and you may not even know that you have data issues for a long time. say someone enters wrong character date and you may not even know. it is always good to maintain date as date value rather than as character value
In your code Filter dates using like becomes little complex for dates. You can try below code which will work for you by using input statement in where clause
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
;
proc sql;
create table want as
select * from have
where input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd ;
using like as shown below for above same code
proc sql;
create table want as
select * from have
/*include all dates which start with 2 */
where datecolumn like '2%' and datecolumn like '%10/2018'
or datecolumn = '30/10/2018';
Edit1:
looks like you have data quality issue and sample dataset is shown below. try this. Once again i want to say approach of storing dates as character values is not good and can lead to lot of issues in future.
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
3 01/99/2018
;
proc sql;
create table want(drop=newdate) as
select *, case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.)
else . end as newdate from have
where calculated newdate between '20Oct2018'd and '30Oct2018'd
;
or you can put your case statement without making and dropping new column as shown below.
proc sql;
create table want as
select * from have
where
case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd
end;

Oracle function doesn't recognize the input from a column of a table

We have pre-built function which I'm using in one of my oracle PLSQL Block to get the output of the function. I'm passing a column of a field as input to the function, but I'm getting the output of the function as 0, but if I pass hard coded value to test the function I get the desired value for the same function.
Is there an issue how I pass input to the function? I saw a very similar working code in a forum based on which I developed this PLSQL block
DECLARE
EXTRACT_DT_IN DATE := TO_DATE('07-JUL-17','DD-MON-RR');
BEGIN
insert into OPS.temp_table
Select POT.Order_no,
OPS.function_Order_Total(mvq.quoted_order_no)
FROM ORDER POT, mv_quoted mvq where POT.Order_no = mvq.quoted_order_no;
END;
I get the o/P for the above as
12345 0
23456 0
etc
But when I pass hard coded value to test the function as below I get the correct value
Select POT.Order_no,
OPS.function_Order_Total('12345')
FROM ORDER POT, mv_quoted mvq where POT.Order_no='12345'
I get O/P as
12345 14563.23
23456 12564.76
etc
Your help is very appreciated and thanks in advance

How to read and modify the data in oracle database (instead of using Replace function)

I have just started to learn and work with Oracle SQL a few months ago, and I have a question that I could not find similar problems on Stack Overflow.
In SQL Oracle,
I am trying to find a way that I can read the data from a column and modify (add/subtract) the data. What I have got so far is using replace like here, but I do not want to use multiple replace function to make it work. I am not sure whether you guys understand my question, so I have listed what I have so far below, and I used multiple replace function.
COMMOD_CODE (Given) | MODEL(Desired_result)
|
X2-10GB-LR | X2-10GB-LR (same)
15454-OSC-CSM | 15454-OSC
15454-PP64LC | 15454-PP_64-LC
CAT3550 | WS-C3550-48-SMI
CAT3560G-48 | WS-C3560G-48PS-S
CAT3550 | WS-C3550-48-SMI
DWDM-GBIC-30 | DWDM-GBIC-30.33
Select
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(commod.COMMODITY_CODE,
'15454-OSC-CSM', '15454-OSC'),
'15454-PP64LC','15454-PP_64-LC'),
'CAT3550','WS-C3550-48-SMI'),
'CAT3560G-48','WS-C3560G-48PS-S'),
'CAT3550','WS-C3550-48-SMI'),
'DWDM-GBIC-30','DWDM-GBIC-30.33')
MODEL,
NVL(commod.COMMODITY_CODE, ' ') as COMMOD_CODE
FROM tablename.table commod
I got the the answer. However, I think I used a lot of ** REPLACE ** to get it right. So, my question is if there is any easier way to do that instead of using replace multiple times, and make your script look awful.
Is someone able to please give me some guidance?
Thanks in advance,
Use DECODE or CASE for this, I think. Or, better yet, maybe a mapping table.
You can use the DECODE function in this case:
with
test_data as (
select '15454-OSC-CSM' as COMMODITY_CODE from dual
union all select '15454-PP64LC' from dual
union all select 'CAT3550' from dual
union all select 'CAT3560G-48' from dual
union all select 'CAT3550' from dual
union all select 'DWDM-GBIC-30' from dual
)
select
decode(COMMODITY_CODE,
'15454-OSC-CSM', '15454-OSC',
'15454-PP64LC', '15454-PP_64-LC',
'CAT3550', 'WS-C3550-48-SMI',
'CAT3560G-48', 'WS-C3560G-48PS-S',
'CAT3550', 'WS-C3550-48-SMI',
'DWDM-GBIC-30', 'DWDM-GBIC-30.33')
from test_Data
;
Result:
COL
------------------
15454-OSC
15454-PP_64-LC
WS-C3550-48-SMI
WS-C3560G-48PS-S
WS-C3550-48-SMI
DWDM-GBIC-30.33
What the DECODE function does: it checks its first argument - if it is equal to the second argument, then it returns the third argument, otherwise, if it is equal to the 4th argument, it returns the 5th argument, and so on.