PLSQL Double Split using delimiter and then transpose somehow? - sql

so this is where I realize the difference between theory and practice. Because while I can theoretically picture how it should be/look I can't for the life of me actually figure out how to actually do it. I have tens of thousands of observations that look like this:
>+--------+-------------------------------+--+
>| ID | CALLS | |
>+--------+-------------------------------+--+
>| 162743 | BAD DVR-3|NO PIC-1 | |
>| 64747 | NO PIC-1|BOX HIT-4|PPV DROP-1 | |
>+--------+-------------------------------+--+
And the end results should be something like this:
+--------+---------+--------+---------+----------+--+
| ID | BAD DVR | NO PIC | BOX HIT | PPV DROP | |
+--------+---------+--------+---------+----------+--+
| 162743 | 3 | 1 | 0 | 0 | |
| 64747 | 0 | 1 | 4 | 1 | |
+--------+---------+--------+---------+----------+--+
I'm using PLSQL passthru in SAS so if I need to do transposing I can also always use proc transpose. But getting to that point is quite honestly beyond me. I know I will probably have to create a function likie the question asked here:T-SQL: Opposite to string concatenation - how to split string into multiple records
Any ideas?

Do you have any reference material that describes all the possible values for those PIPE delimited values in the CALLS column? Or do you already know the particular values you need to keep and can ignore others?
If so, you can just process the entire thing in a data step; here is an example:
data have;
input #1 ID 6. #9 CALLS $50.;
datalines;
162743 BAD DVR-3|NO PIC-1
64747 NO PIC-1|BOX HIT-4|PPV DROP-1
run;
data want;
set have; /* point to your Oracle source here */
length field $50;
idx = 1;
BAD_DVR = 0;
NO_PIC = 0;
BOX_HIT = 0;
PPV_DROP = 0;
do i=1 to 5 while(idx ne 0);
field = scan(calls,idx,'|');
if field = ' ' then idx=0;
else do;
if field =: 'BAD DVR' then BAD_DVR = input(substr(field,9),8.);
else if field =: 'NO PIC' then NO_PIC = input(substr(field,8),8.);
else if field =: 'BOX HIT' then BOX_HIT = input(substr(field,9),8.);
else if field =: 'PPV DROP' then PPV_DROP = input(substr(field,10),8.);
idx + 1;
end;
end;
output;
keep ID BAD_DVR NO_PIC BOX_HIT PPV_DROP;
run;
The SCAN function steps through the CALLS column by token; The ":=" operator is "begins with", and the SUBSTR function with only two parameters finds the characters following the hyphen to be read by the INPUT function.
Of course, I'm making a few assumptions about your source data but you get the idea.

I can think of at least two ways to achieve this:
1. Read the entire data from SQL into SAS. Then use DATA STEP to manipulate the data i.e.,
convert data that is in two columns:
>+--------+-------------------------------+--+
>| ID | CALLS | |
>+--------+-------------------------------+--+
>| 162743 | BAD DVR-3|NO PIC-1 | |
>| 64747 | NO PIC-1|BOX HIT-4|PPV DROP-1 | |
>+--------+-------------------------------+--+
to something that looks like this:
result of DATA STEP manipulation:
ID CALLS COUNT
162743 BAD_DVR 3
162743 NO_PIC 1
64747 NO_PIC 1
64747 BOX_HIT 4
64747 PPV_DROP 1
From then it would be a simple matter of passing the above dataset to PROC TRANSPOSE
to get a table like this:
+--------+---------+--------+---------+----------+--+
| ID | BAD DVR | NO PIC | BOX HIT | PPV DROP | |
+--------+---------+--------+---------+----------+--+
| 162743 | 3 | 1 | 0 | 0 | |
| 64747 | 0 | 1 | 4 | 1 | |
+--------+---------+--------+---------+----------+--+
If you want to do everything in pass-through SQL then that to should be easy IF the no. of categories such as {BAD DVR, NO PIC, BOX HIT etc...} are small.
The code will look like:
SELECT
ID
,CASE WHEN SOME_FUNC_TO_FIND_LOCATION_OF_SUBSTRING(CALLS, 'BAD DVR-')>0 THEN <SOME FUNCTION TO EXTRACT EVERYTHING FROM - TO |> ELSE 0 END AS BAD_DVR__COUNT
,CASE WHEN SOME_FUNC_TO_FIND_LOCATION_OF_SUBSTRING(CALLS, 'NO PIC-')>0 THEN <SOME FUNCTION TO EXTRACT EVERYTHING FROM - TO |> ELSE 0 END AS NO_PIC__COUNT
,<and so on>
FROM YOUR_TABLE
You just need to look string manipulation functions available in your database to make everything work.

Related

PostgreSQL Compare value from row to value in next row (different column)

I have a table of encounters called user_dates that is ordered by 'user' and 'start' like below. I want to create a column indicating whether an encounter was followed up by another encounter within 30 days. So basically I want to go row by row checking if "encounter_stop" is within 30 days of "encounter_start" in the following row (as long as the following row is the same user).
user | encounter_start | encounter_stop
A | 4-16-1989 | 4-20-1989
A | 4-24-1989 | 5-1-1989
A | 6-14-1993 | 6-27-1993
A | 12-24-1999 | 1-2-2000
A | 1-19-2000 | 1-24-2000
B | 2-2-2000 | 2-7-2000
B | 5-27-2001 | 6-4-2001
I want a table like this:
user | encounter_start | encounter_stop | subsequent_encounter_within_30_days
A | 4-16-1989 | 4-20-1989 | 1
A | 4-24-1989 | 5-1-1989 | 0
A | 6-14-1993 | 6-27-1993 | 0
A | 12-24-1999 | 1-2-2000 | 1
A | 1-19-2000 | 1-24-2000 | 0
B | 2-2-2000 | 2-7-2000 | 1
B | 5-27-2001 | 6-4-2001 | 0
You can select..., exists <select ... criteria>, that would return a boolean (always true or false) but if really want 1 or 0 just cast the result to integer: true=>1 and false=>0. See Demo
select ts1.user_id
, ts1.encounter_start
, ts1. encounter_stop
, (exists ( select null
from test_set ts2
where ts1.user_id = ts2.user_id
and ts2.encounter_start
between ts1.encounter_stop
and (ts1.encounter_stop + interval '30 days')::date
)::integer
) subsequent_encounter_within_30_days
from test_set ts1
order by user_id, encounter_start;
Difference: The above (and demo) disagree with your expected result:
B | 2-2-2000 | 2-7-2000| 1
subsequent_encounter (last column) should be 0. This entry starts and ends in Feb 2000, the other B entry starts In May 2001. Please explain how these are within 30 days (other than just a simple typo that is).
Caution: Do not use user as a column name. It is both a Postgres and SQL Standard reserved word. You can sometimes get away with it or double quote it. If you double quote it you MUST always do so. The big problem being it has a predefined meaning (run select user;) and if you forget to double quote is does not necessary produce an error or exception; it is much worse - wrong results.

SQL subtracting a value from 2 separate rows, in 2 separate columns

I am having trouble figuring this out... I am trying to write an SQL query to subtract 2 values from this table:
RateTable:
RecordID | Policy | Benefit | CBBR | IBBR
---------+--------+---------+-------+-------
1 | 12345 | A | $1.34 | $5.64
2 | 12345 | B | $4.56 | $0.56
3 | 12345 | C | $5.67 | $3.32
4 | 54321 | A | $2.57 | $6.24
5 | 34512 | A | $1.76 | $3.32
6 | 34512 | A | $4.56 | $1.34
I need to create a query that will return the result from the value in CBBR where Policy = 12345 and benefit = A then subtract the value in IBBR where Policy = 12345 and benefit = B ($1.34 - 0.56)
Any ideas?
I am not clear that you want the diff for only 1 value "B" or the pattern will go on if condition (Policy = 12345 and benefit ='A') to get the diff of next row.
ASSUMING diff you want to calculate for all next rows
select *,case when Benefit='A'AND Policy=12345 then CBBR-new_IBBR else CBBR end as diff from(
select *,lead(IBBR,1) over (order by RecordID ASC) as new_IBBR from <TABLE NAME>)x
Ankit Jindal has already gave a correct answeh. However, I'd like to note that there's no need to use any subquery as they can significantly slown down the performance. In this particular case, a JOIN operator is enough:
SELECT
rta.CBBR-rtb.IBBR as Result
FROM RateTable as rta
JOIN RateTable as rtb ON rtb.Policy=rta.Policy AND rtb.Benefit='B'
WHERE Policy=12345
AND Benefit='A'
As per the details mentioned in query, you need difference between CBBR & IBBR for particular conditions.
SELECT CBBR-
(SELECT IBBR
FROM RateTable
WHERE Policy = 12345
AND benefit = 'B') AS IBBR
FROM RateTable
WHERE Policy = 12345
AND benefit = 'A';
But if you need generalized query then we probably gonna use SUM or something else.
Your question requires more explanation, based on my understanding I think you should use case statement as-
Select
Case when (Policy = 12345 and benefit = A) then CBBR
When (Policy = 12345 and benefit = B) then CBBR-IBBR
END as Value
From Yourtable

Flag observations that fulfill two conditions together

I am very new to SAS as well as SQL and would appreciate help.
I have a data set containing student id, term and audit_type. Each term has 2 audit_types and the student can be present at either of them or be present at both.
I need to create a flag for each of these 3 scenarios for each student id each term: 1) if the student is present at only audit_type_1, 2) if s/he is present at only audit_type_2 and 3) if s/he is present at audit_type_1 and audit_type_2 both during that term. Not sure how to post my data but here it is
Sample data
| Id | Term | Audit_type |
|---- |------------- |-----------: |
| 1 | Fall 2016 | 1 |
| 1 | Fall 2016 | 2 |
| 2 | Winter 2017 | 1 |
| 3 | Winter 2017 | 2 |
| 4 | Spring 2017 | 1 |
| 4 | Spring 2017 | 2 |
I was able to create a flag for the first 2 scenarios using case when as seen below:
proc sql;
create table test as
select id, term, audit_type,
case
when audit_type in ('audit_type_1') then 1
when audit_type in ('audit_type_2 ') then 2
end as audit_type_flag
from have;
I can't figure out how to flag the third scenario. All help will be highly appreciated. Thanks in advance for your help and support. So, I want something like below:
| Id | Term | Audit_type | Flag |
|---- |------------- |-----------: |------ |
| 1 | Fall 2016 | 1 | 3 |
| 1 | Fall 2016 | 2 | 3 |
| 2 | Winter 2017 | 1 | 1 |
| 3 | Winter 2017 | 2 | 2 |
| 4 | Spring 2017 | 1 | 3 |
| 4 | Spring 2017 | 2 | 3 |
I am sort of reading between the lines here and assumed that you are using the logic Audit_Type = 1 then Flag = 1, Audit_Type = 2 then Flag = 2, if it is both then 3 so I just added the flags together and gained 3. This may not be what you are after (you may want a 4, 5, 6 and 7 flag as well), it was just an assumption based on a small amount of data and not knowing the exact use case, therefore, I will provide a solution and if it is correct please let me know and I will add comments explaining the syntax. I don't want to spend time explaining the code if it isn't precisely what you are looking for.
Regards,
Scott
UPDATE
I have added the comments to the code as well as links to pages which may help you better understand what I am talking about.
/* SETUP SOME DUMMY DATA */
DATA HAVE;
LENGTH ID 3. TERM $11. AUDIT_TYPE 3.;
INFILE DATALINES DSD DELIMITER = "," missover;
INPUT ID TERM AUDIT_TYPE;
DATALINES;
1,Fall 2016,1
1,Fall 2016,2
1,Summer 2016,1
1,Summer 2016,2
2,Winter 2017,1
3,Winter 2017,2
4,Spring 2017,1
4,Spring 2017,2
;
RUN;
/* PERFORM A SORT SO THAT WE CAN MAKE USE OF BY STATEMENT PROCESSING IN THE
SUBSEQUENT DATA STEP */
PROC SORT DATA = HAVE;
BY ID TERM;
RUN;
DATA WANT;
/* DOW LOOP */
/* THIS LOOP EXECUTES FOR EACH ROW UNTIL THE LAST.TERM IS ENCOUNTERED. */
/* COMMENTING OUT THE SECOND DOW LOOP WILL SHOW YOU THAT THIS LOOP IS
BASICALLY SUMMARISING THE RESULT OF EACH ID, TERM GROUP AFTER THE CONDITIONAL
LOGIC IS APPLIED TO THE FLAG VARIABLE.*/
DO UNTIL (LAST.TERM);
/* EACH TIME THE DO LOOP EXECUTES A NEW ROW IS READ INTO THE PDV (PROGRAM DATA VECTOR).*/
SET HAVE;
/* THE BY STATEMENT IS IN PLACE TO FACILITATE BY STATEMENT PROCESSING. */
BY ID TERM;
/* INITIALISE THE FLAG VARIABLE EACH TIME A NEW TERM IS ENCOUNTERED */
IF FIRST.TERM THEN FLAG = 0;
/* USING THE SYNTAX FLAG + 1 REPLICATES USING THE RETAIN STATEMENT WITH A SUM FUNCTION.
IT IS REFERRED TO AS THE SUM STATEMENT. IF YOU ARE INTERESTED IN LEARNING MORE ABOUT
THIS THEN SEE THE LINK TO THE DOCUMENTATION BELOW.*/
IF AUDIT_TYPE = 1 THEN FLAG + 1;
ELSE IF AUDIT_TYPE = 2 THEN FLAG + 2;
END;
/* ONCE THE PREVIOUS DOW LOOP EXITS (BECAUSE THE LAST.TERM HAS BEEN REACHED) THE SECOND
DOW LOOP EXECUTES*/
DO UNTIL (LAST.TERM);
/* AS PER DOW LOOP 1, EACH LOOP RESULTS IN A SINGLE OBSERVATION BEING READ */
SET HAVE;
/* THE BY STATEMENT IS IN PLACE TO FACILITATE BY STATEMENT PROCESSING. */
BY ID TERM;
/* THE EXPLICIT OUTPUT STATEMENT EXECUTES TO OUTPUT THE VALUES CONTAINED WITHIN THE
PDV */
OUTPUT;
END;
RUN;
Further reading:
The DOW-loop: a Smarter Approach to your Existing Code
The Sum Statement
The Power of the BY Statement
just use additional else with case it will be solved
select id, term, audit_type,
case
when audit_type in ('audit_type_1') then 1
when audit_type in ('audit_type_2 ') then 2
else 3
end as audit_type_flag
from have;
I think you need aggregation:
proc sql;
create table test as
select id, term, audit_type,
(case when min(audit_type) <> max(audit_type) then 1
when min(audit_type) = 2 then 2
when min(audit_type) = 3 then 3
end) as audit_type_flag
from have
group by id, term;

Dynamically Modify Internal Table Values by Data Type

This is a similar question to the one I posted last week.
I have an internal table based off of a dictionary structure with a format similar to the following:
+---------+--------+---------+--------+---------+--------+-----+
| column1 | delim1 | column3 | delim2 | column5 | delim3 | ... |
+---------+--------+---------+--------+---------+--------+-----+
| value1 | | | value 1 | | | value 1 | | | ... |
| value2 | | | value 2 | | | value 2 | | | ... |
| value3 | | | value 3 | | | value 3 | | | ... |
+---------+--------+---------+--------+---------+--------+-----+
The delim* columns are all of type delim, and the typing of the non-delimiter columns are irrelevant (assuming none of them are also type delim).
The data in this table is obtained in a single statement:
SELECT * FROM <table_name> INTO CORRESPONDING FIELDS OF TABLE <internal_table_name>.
Thus, I have a completely full table except for the delimiter values, which are determined by user input on the selection screen (that is, we cannot rely on them always being ,, or any other common delimiter).
I'd like to find a way to dynamically set all of the values of type delim to some input for every row.
Obviously I could just hardcode the delimiter names and loop over the table setting all of them, but that's not dynamic. Unfortunately I can't bank on a simple API.
What I've tried (this doesn't work, and it's such a bad technique that I felt dirty just writing it):
DATA lt_fields TYPE TABLE OF rollname.
SELECT fieldname FROM dd03l
INTO TABLE lt_fields
WHERE tabname = '<table_name>'
AND as4local = 'A'
AND rollname = 'DELIM'.
LOOP AT lt_output ASSIGNING FIELD-SYMBOL(<fs>).
LOOP AT lt_fields ASSIGNING FIELD-SYMBOL(<fs2>).
<fs>-<fs2> = '|'.
ENDLOOP.
ENDLOOP.
Once again, I'm not set in my ways and would switch to another approach altogether if I believe it's better.
Although I still believe you're barking up the wrong tree with the entire approach, You have been pointed in the right direction both here and in the previous question
ASSIGN COMPONENT <fs2> OF STRUCTURE <fs> TO FIELD-SYMBOL(<fs3>). " might just as well continue with the write-only naming conventions
IF sy-subrc = 0.
<fs3> = '|'.
ENDIF.

How to get numbers arranged right to left in sql server SELECT statements

When performing SELECT statements including number columns (prices, for example), the result always is left to right ordered, which reduces the readability. Therefore I'm searching a method to format the output of number columns right to left.
I already tried to use something like
SELECT ... SPACE(15-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
which gives close results, but depending on font not really. An alternative would be to replace 'SPACE()' with 'REPLICATE('_',...)', but I don't really like the underscores in output.
Beside that this formula will crash on numbers with more digits than 15, therefore I searched for a way finding the maximum length of entries to make it more save like
SELECT ... SPACE(MAX(A.Nummer)-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
but this does not work due to the aggregate character of the MAX-function.
So, what's the best way to achieve the right-justified order for the number-columns?
Thanks,
Rainer
To get you problem with the list box solved have a look at this link: http://www.lebans.com/List_Combo.htm
I strongly believe that this type of adjustment should be made in the UI layer and not mixed in with data retrieval.
But to answer your original question i have created a SQL Fiddle:
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.some_numbers(n INT);
Create some example data:
INSERT INTO dbo.some_numbers
SELECT CHECKSUM(NEWID())
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))X(x);
The following query is using the OVER() clause to specify that the MAX() is to be applied over all rows. The > and < that the result is wrapped in is just for illustration purposes and not required for the solution.
Query 1:
SELECT '>'+
SPACE(MAX(LEN(CAST(n AS VARCHAR(MAX))))OVER()-LEN(CAST(n AS VARCHAR(MAX))))+
CAST(n AS VARCHAR(MAX))+
'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| > 1620287540< |
| >-1451542215< |
| >-1257364471< |
| > -819471559< |
| >-1364318127< |
| >-1190313739< |
| > 1682890896< |
| >-1050938840< |
| > 484064148< |
This query does a straight case to show the difference:
Query 2:
SELECT '>'+CAST(n AS VARCHAR(MAX))+'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| >1620287540< |
| >-1451542215< |
| >-1257364471< |
| >-819471559< |
| >-1364318127< |
| >-1190313739< |
| >1682890896< |
| >-1050938840< |
| >484064148< |
With this query you still need to change the display font to a monospaced font like COURIER NEW. Otherwise, as you have noticed, the result is still misaligned.