How do I grab rows surrounding a flagged value? - sql

I'm starting with a table like this:
code new_code_flag
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
eiw157 0
nzi123 0
epj676 0
ere654 0
yru493 1
ale674 0
I want to grab the 2 records before and 2 records after each value where "new_code_flag"=1. I want my output to look like this:
code new_code_flag
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
epj676 0
ere654 0
yru493 1
ale674 0
Any help on how to do this in SQL or SAS?

SQL tables represent unordered sets. Hence, in SQL you need to have a column that specifies the ordering. Assuming you do, you can do something like:
with t as (
select t.*, row_number() over (order by ?) as seqnum
from tbl t
)
select t.*
from t
where exists (select 1
from t t2
where t2.new_code_flag = 1 and
t.seqnum between t2.seqnum - 2 and t2.seqnum + 2
);

You could create two lag and two lead copies of the flag variable and then test if any of the 5 variables are 1 (true).
data have;
input code $ flag ;
cards;
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
eiw157 0
nzi123 0
epj676 0
ere654 0
yru493 1
ale674 0
;
data want ;
set have ;
set have(keep=flag rename=(flag=lead1_flag) firstobs=2) have(drop=_all_ obs=1);
set have(keep=flag rename=(flag=lead2_flag) firstobs=3) have(drop=_all_ obs=2);
lag1_flag=lag1(flag);
lag2_flag=lag2(flag);
if lag1_flag or lag2_flag or flag or lead1_flag or lead2_flag ;
run;
Results
lead1_ lead2_ lag1_ lag2_
Obs code flag flag flag flag flag
1 abc123 0 0 1 . .
2 xyz456 0 1 0 0 .
3 wer098 1 0 0 0 0
4 jio234 0 0 0 1 0
5 bcx190 0 0 0 0 1
6 epj676 0 0 1 0 0
7 ere654 0 1 0 0 0
8 yru493 1 0 . 0 0
9 ale674 0 . . 1 0

data want(drop=_: i);
merge have have(keep=flag firstobs=3 rename=(flag=_flag));
if flag or _flag then i=1;
if 0<i<=3 then do;
output;
i+1;
end;
else delete;
run;

Related

unpivot with counts in hive columns transpose

I have table with data needs to unpivot and get aggregated counts.
Source table:
primary_id sys_1 sys_2 sys3_ sy5 sys100
newa889 0 1 0 1 0
den7899 1 1 1 1 0
geo8988 1 1 1 1 0
atla8766 0 1 0 1 1
chic7898 0 1 0 0 1
Desired output:
sys_name count(primary_key) flag_0_or_1
sys_1 129999 0
sys_1 544545 1
sys_2 23333 0
sys2 23322323 1
sys3_ 332233 0
sys3_ 323232 1
sy5 32332 0
sy5 32323 1
Looking to get the data transpose get 0's and 1's counts from each sys_ column.

SQL: Is there a way I can find whether a value is within a specific index range of another value?

I have two columns filled with mostly 0's and a few 1's. I want to check whether IF a 1 occurs in the first column, a 1 in the second column occurs within a range of 5 rows of that index. So for example, lets say a 1 occurs in column 1 row 83, then I would like to return TRUE if one or more 1's occur in column 2 row 83-88, and FALSE if this is not the case. Examples of this are listed in the code block. I would want to count the number of TRUE and FALSE occurrences.
TRUE:
0 0
0 0
0 0
1 1
0 0
0 0
0 0
0 0
0 0
0 0
TRUE:
0 0
0 0
0 0
1 0
0 0
0 0
0 1
0 1
0 0
0 0
FALSE:
0 0
0 0
0 1
1 0
0 0
0 0
0 0
0 0
0 0
0 1
I have no idea where to begin, so I do not have any code to start with:(
Kind regards,
Kai
Assuming you have an ordering column, you can use window functions:
select (case when count(*) = 0 then 'false' else 'true' end)
from (select t.*,
max(col2) over (order by <ordering column>
rows between current row and 4 following
) as max_col2_5
from t
) t
where col1 = 1 and max_col2_5 = 1;

Add condition in SQL query based on table value

I am using oracle as my database. I want to add condition in sql query based on table data. In the table if CT_GENERAL is 1 then i want to add another condition in my sql query.( CST_GENERAL = USER ARGUMENT ).
select * from ch_caseinfo where
case when ct_general = 1
then cst_general = %3
end
%3 = Funding
//TABLE STRUCTURE
//CH_CASEINFO
VOLUMEID | CT_ADVERSE | CT_GENERAL | CT_HA | CT_MI | CST_GENERAL | CST_MI
149634 0 0 0 0
161077 0 0 0 0
161147 0 1 0 1 Funding Composition/ingredients
161268 0 1 0 0 Funding
161306 0 1 0 0 Manufacturing
240131 0 1 1 0 Funding
239364 0 0 0 0
239364 0 0 0 0
147434 0 0 0 0
147466 0 0 0 0
158990 0 1 0 1 Funding Administration
98863 1 1 1 1 Funding Disposal
159757 1 1 1 1 Funding Disposal
98863
191039 1 1 0 0 Other
97007 0 0 0 0
ORA-00905: missing keyword
00905. 00000 - "missing keyword"
You need to form your where clause to evaluate an expression that is true when you don't want to include the filter (CT_GENERAL is 0). Considering the example below, if ct_general = 0 then cst_general will always equal cst_general (unless null -- if that is a possibility, you need to accommodate nulls).
SELECT *
FROM ch_caseinfo
WHERE CASE WHEN ct_general = 0 THEN cst_general ELSE USERARGUMENT END = cst_general
AND OTHERCRITERIA = CRITERIA

Combining Same Oracle SQL Scripts in One?

I have four identical scripts that only have one value that varies between them that I would like to combine into one script with four multiple outputs. The reason for this is BI Publisher will not render multiple x-axis dates between multiple scripts, so I trying to make it so it renders as one script. The following is the same script for all four:
select to_char("DATA_POINT_DAILY_AVG"."DATE_OF_AVG", 'DD-MON-YY') as "DATE_OF_AVG",
"DATA_POINT_DAILY_AVG"."VALUE" as "DAILY_AVG_VALUE"
from "TEST"."COMPONENT" "COMPONENT",
"TEST"."COMPONENT_DATA_POINT" "COMPONENT_DATA_POINT",
"TEST"."DATA_POINT_DAILY_AVG" "DATA_POINT_DAILY_AVG"
where "COMPONENT"."SITE_ID" = ('123abc')
and "COMPONENT_DATA_POINT"."COMPONENT_ID"="COMPONENT"."ID"
and "COMPONENT_DATA_POINT"."NAME"='TEST_1'
and "DATA_POINT_DAILY_AVG"."COMPONENT_DATA_POINT_ID" = "COMPONENT_DATA_POINT"."ID"
and "DATA_POINT_DAILY_AVG"."SITE_ID" = "COMPONENT"."SITE_ID"
and "DATA_POINT_DAILY_AVG"."DATE_OF_AVG" between ('01-FEB-17') and ('28-FEB-17')
order by "DATA_POINT_DAILY_AVG"."DATE_OF_AVG" desc;
the only line that varies between the four scripts is:
and "COMPONENT_DATA_POINT"."NAME"='TEST_1'
which would be as follows for all four (i.e.,):
and "COMPONENT_DATA_POINT"."NAME"='TEST_1'
and "COMPONENT_DATA_POINT"."NAME"='TEST_2'
and "COMPONENT_DATA_POINT"."NAME"='TEST_3'
and "COMPONENT_DATA_POINT"."NAME"='TEST_4'
Everything else is identical...expected output would be:
DATE_OF_AVG DAILY_AVG_VALUE_1 DAILY_AVG_VALUE_2 DAILY_AVG_VALUE_3 DAILY_AVG_VALUE_4
----------- ----------------- ----------------- ----------------- -----------------
06-FEB-17 0 0 0 0
05-FEB-17 0 0 0 0
04-FEB-17 0 0 0 0
03-FEB-17 0 0 0 0
02-FEB-17 0 0 0 0
01-FEB-17 0 0 0 0
One date column, with four different values based on the various "TEST_x" values.
I hope this makes sense, and any help would be greatly appreciated. Thanks!
Try this query:
select "COMPONENT_DATA_POINT"."NAME",
to_char("DATA_POINT_DAILY_AVG"."DATE_OF_AVG", 'DD-MON-YY') as "DATE_OF_AVG",
"DATA_POINT_DAILY_AVG"."VALUE" as "DAILY_AVG_VALUE"
from "TEST"."COMPONENT" "COMPONENT",
"TEST"."COMPONENT_DATA_POINT" "COMPONENT_DATA_POINT",
"TEST"."DATA_POINT_DAILY_AVG" "DATA_POINT_DAILY_AVG"
where "COMPONENT"."SITE_ID" = ('123abc')
and "COMPONENT_DATA_POINT"."COMPONENT_ID"="COMPONENT"."ID"
and "COMPONENT_DATA_POINT"."NAME" IN ('TEST_1','TEST_2','TEST_3','TEST_4')
and "DATA_POINT_DAILY_AVG"."COMPONENT_DATA_POINT_ID" = "COMPONENT_DATA_POINT"."ID"
and "DATA_POINT_DAILY_AVG"."SITE_ID" = "COMPONENT"."SITE_ID"
and "DATA_POINT_DAILY_AVG"."DATE_OF_AVG" between ('01-FEB-17') and ('28-FEB-17')
order by "COMPONENT_DATA_POINT"."NAME",
"DATA_POINT_DAILY_AVG"."DATE_OF_AVG" desc;
it will produce a result like this:
NAME DATE_OF_AVG DAILY_AVG_VALUE_1 DAILY_AVG_VALUE_2 DAILY_AVG_VALUE_3 DAILY_AVG_VALUE_4
---- ----------- ----------------- ----------------- ----------------- -----------------
TEST1 06-FEB-17 0 0 0 0
TEST1 05-FEB-17 0 0 0 0
....
....
TEST2 06-FEB-17 0 0 0 0
TEST2 05-FEB-17 0 0 0 0
....
....
TEST3 06-FEB-17 0 0 0 0
TEST3 05-FEB-17 0 0 0 0
....
....

Calculating ratio value within a line which contain binary numbers "0" & "1"

I have a data file which contain more than 2000 lines and 45001 columns.
The first column is actually a "string" which explains the data type.
Start from column #2, up to column #45001, the data is reprsented as
"1"
or
"0"
For example, the pattern of data in a line is
(0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0)
The total number of data is 25. Within this data line, there are 5 sub-groups which are made by only the number "1"s e.g. (11 111 1111 1 111 ). The "0"s in between the subgroups are assumed as "delimiter". The total of all "1"s is = 13.
I would like to calculate the ratio of
(total of all "1"s / total of number of sub-groups made only by "1"s)
That is
(13/5).
I tried with this code for calculating the total of all "1"s ;
awk -F '0' '{print NF}' < inputfile.in
This gives value 13.
But I donn't know how to go further from here to calcuate the ratio that I want.
I don't know how to find the number of sub-groups within each line beacuse the number of occurances of "1"s and "0"s are random.
Wish to get some kind help to sort this problem.
Appreciate any help in advance.
It is not clear to me from the description what the format of the input file is. Assume the input looks like:
$ cat file
0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
To count up the number of ones and the number of groups of ones and take their ratio:
$ awk '{f=0;s1=0;s2=0;for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}; print s1/s2}' file
2.6
Update: Handling all zeros
Suppose one of the lines in the file has all zeros:
$ cat file
0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
For the second line, both sums are zero which would lead to a divide by zero error. We can avoid that by adding an if statement which will print the ratio if one exists or 0/0 is it doesn't:
if (s2>0)print s1/s2; else print s1"/"s2
The complete code is now:
$ awk '{f=0;s1=0;s2=0;for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}; if (s2>0)print s1/s2; else print s1"/"s2}' file
2.6
0/0
How it works
The code uses three variables. f is a flag which is true (1) if we are currently in a group of ones and is false (0) otherwise. s1 is the the number of ones on the line. s2 is the number of groups of ones on the line.
f=0;s1=0;s2=0
At the beginning of each line, we initialize the variables.
for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}
We loop over each field on the line starting with field 2. If the field contains a 1, we increment counter s1. If the field is 1 and is the start of a new group, we increment s2.
if (s2>0)print s1/s2; else print s1"/"s2}
If we encountered at least one one, we print the ratio s1/s2. Otherwise, we print 0/0.
Here is an awk that does what you need:
cat file
data 0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
data 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
data 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
data 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BMR_10#O24-BMR_6#O13-H13 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
data 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
awk '{$1="";$0="0 "$0" 0";t=split($0,b,"1")-1;gsub(/ +/,"");n=split($0,a,"[^1]+")-2;print (n?t/n:0)}' t
2.6
0
25
11
5.5
3