SAS: how to properly use intck() in proc sql - sql

I have the following codes in SAS:
proc sql;
create table play2
as select a.anndats,a.amaskcd,count(b.amaskcd) as experience
from test1 as a, test1 as b
where a.amaskcd = b.amaskcd and intck('day', b.anndats, a.anndats)>0
group by a.amaskcd, a.ANNDATS;
quit;
The data test1 has 32 distinct obs, while this play2 only returns 22 obs. All I want to do is for each obs, count the number of appearance for the same amaskcd in history. What is the best way to solve this? Thanks.

The reason this would return 22 observations - which might not actually be 22 distinct from the 32 - is that this is a comma join, which in this case ends up being basically an inner join. For any given row a if there are no rows b which have a later anndats with the same amaskcd, then that a will not be returned.
What you want to do here is a left join, which returns all rows from a once.
create table play2
as select ...
from test1 a
left join test1 b
on a.amaskcd=b.amaskcd
where intck(...)>0
group by ...
;
I would actually write this differently, as I'm not sure the above will do exactly what you want.
create table play2
as select a.anndats, a.amaskcd,
(select count(1) from test1 b
where b.amaskcd=a.amaskcd
and b.anndats>a.anndats /* intck('day') is pointless, dates are stored as integer days */
) as experience
from test1 a
;
If your test1 isn't already grouped by amaskcd and anndats, you may need to rework this some. This kind of subquery is easier to write and more accurately reflects what you're trying to do, I suspect.

If both the anndats variables in each dataset are date type (not date time) then you can simple do an equals. Date variables in SAS are simply integers where 1 represents one day. You would not need to use the intck function to tell the days differnce, just use subtraction.
The second thing I noticed is your code looks for > 0 days returned. The intck function can return a negative value if the second value is less than the first.
I am still not sure I understand what your looking to produce in the query. It's joining two datasets using the amaskcd field as the key. Your then filtering based on anndats, only selecting records where b anndats value is less than a anndats or b.anndats < a.anndats.

Related

Write SQL from SAS

I have this code in SAS, I'm trying to write SQL equivalent. I have no experience in SAS.
data Fulls Fulls_Dupes;
set Fulls;
by name, coeff, week;
if rid = 0 and ^last.week then output Fulls_Dupes;
else output Fulls;
run;
I tried the following, but didn't produce the same output:
Select * from Fulls where rid = 0 groupby name,coeff,week
is my sql query correct ?
SQL does not have a concept of observation order. So there is no direct equivalent of the LAST. concept. If you have some variable that is monotonically increasing within the groups defined by distinct values of name, coeff, and week then you could select the observation that has the maximum value of that variable to find the observation that is the LAST.
So for example if you also had a variable named DAY that uniquely identified and ordered the observations in the same way as they exist in the FULLES dataset now then you could use the test DAY=MAX(DAY) to find the last observation. In PROC SQL you can use that test directly because SAS will automatically remerge the aggregate value back onto all of the detailed observations. In other SQL implementations you might need to add an extra query to get the max.
create table new_FULLES as
select * from FULLES
group by name, coeff, week
having day=max(day) or rid ne 0
;
SQL also does not have any concept of writing two datasets at once. But for this example since the two generated datasets are distinct and include all of the original observations you could generate the second from the first using EXCEPT.
So if you could build the new FULLS you could get FULLS_DUPES from the new FULLS and the old FULLS.
create table FULLS_DUPES as
select * from FULLES
except
select * from new_FULLES
;

Values in one Column different but second column the same SQL

I have an initial query written below and need to find values in the quote_id column that different but the corresponding values in the benefit_plan_cd column are the same. The output should look like the below. I know the prospect_nbr for this issue which is why I am able to add it to my initial query to get the expected results but need to be able to find other ones going forward.
select prospect_nbr, qb.quote_id, quote_type, effective_date,
benefit_plan_cd, package_item_cd
from qo_benefit_data qb
inner join
qo_quote qq on qb.quote_id = qq.quote_id
where quote_type = 'R'
and effective_date >= to_date('06/01/2022','mm/dd/yyyy')
and package_item_cd = 'MED'
Output should look like something like this excluding the other columns.
quote_id benefit_plan_cd
514 1234
513 1234
Let's do this in two steps.
First take your existing query and add the following at the end of your select list:
select ... /* the columns you have already */
, count(distinct quote_id partition by benefit_plan_id) as ct
That is the only change - don't change anything else. You may want to run this first, to see what it produces. (Looking at a few rows should suffice, you don't need to look at all the rows.)
Then use this as a subquery, to filter on this count being > 1:
select ... /* only the ORIGINAL columns, without the one we added */
from (
/* write the query from above here, as a SUBquery */
)
where ct > 1
;

How to select multiple values with the same ids and put them in one row, while maintaining the id to value connection?

I have a processknowledgeentry table that has the following data:
pke_id prc_id knw_id
1 1 2
2 1 4
3 2 4
The column knw_id references another table called knowledge, which also has its own id column. I want to be able to select all knw_id values with the same prc_id, and have them retain its nature as an id (so that it remains referenceable to the knowledge table).
Desired result:
prc_id knw_ids
1 [2, 4]
My code is shown below. (It also selects a Process Name from another table called process by inner joining the prc_ids. That part works correctly at least.)
SELECT * FROM (
SELECT
p.prc_name,
(SELECT knw_id
FROM processknowledgeentry
GROUP BY knw_id
HAVING COUNT(*) > 1)
FROM processknowledgeentry pke
INNER JOIN process p
ON pke.prc_id=p.prc_id
WHERE pke.prc_id = %s) as temp
I get the error: "CardinalityViolation: more than one row returned by a subquery used as an expression", and I understand why the error exists, so I want to know how to work around it. I'm also not sure if my logic is correct.
Would appreciate any assistance, thank you!
Seems you need a STRING_AGG() function instead of GROUP_CONCAT(), which some other DBMS has, containing a string type parameter as the first argument along with HAVING clause which filters multiple prc_id values such as
SELECT p.prc_id, STRING_AGG(knw_id::TEXT,',') AS knw_ids
FROM processknowledgeentry pke
JOIN process p
ON pke.prc_id = p.prc_id
-- WHERE pke.prc_id = %s
GROUP BY p.prc_id
HAVING COUNT(pke.prc_id) > 1
Indeed this case, a WHERE clause won't be needed.
Demo

Multiple Between Dates from table column

There is yearly data in the source. I need to exclude the data -which is in another table and raw count is not static- from it.
Source data:
Dates to be excluded:
There can be 2 raws or 5 raws of data to be excluded, so it need to be dynamically and 2 tables can be bound by the DISPLAY_NAME column.
I am trying to do it with query, don't want to use sp. Is there any way or sp is only choise to do this.
Maybe multiple case when for each raw 1 / 0 and only get if all new case when columns are 1 but issue is don't know how many case when i will use since exclude table data raw count is not static.
Are you looking for not exists?
select s.*
from source s
where not exists (select 1
from excluded e
where e.display_name = s.display_name and
s.start_datetime >= e.start_date and
s.end_datetime < e.end_date
);
Note: Your question does not explain how the end_date should be handled. This assumes that the data on that date should be included in the result set. You can tweak the logic to exclude data from that date as well.

How to use aggregate function to filter a dataset in ssrs 2008

I have a matrix in ssrs2008 like below:
GroupName Zone CompletedVolume
Cancer 1 7
Tunnel 1 10
Surgery 1 64
ComplatedVolume value is coming by a specific expression <<expr>>, which is equal to: [Max(CVolume)]
This matrix is filled by a stored procedure that I am not supposed to change if possible. What I need to do is that not to show the data whose CompletedVolume is <= 50. I tried to go to tablix properties and add a filter like [Max(Q9Volume)] >= 50, but when I try to run the report it says that aggregate functions cannot be used in dataset filters or data region filters. How can I fix this as easy as possible?
Note that adding a where clause in sql query would not solve this issue since there are many other tables use the same SP and they need the data where CompletedVolume <= 50. Any help would be appreciated.
EDIT: I am trying to have the max(Q9Volume) value on SP, but something happening I have never seen before. The query is like:
Select r.* from (select * from results1 union select * from results2) r
left outer join procedures p on r.pid = p.id
The interesting this is there are some columns I see that does not included by neither results1/results2 nor procedures tables when I run the query. For example, there is no column like Q9Volume in the tables (result1, result2 and procedures), however when I run the query I see the columns on the output! How is that possible?
You can set the Row hidden property to True when [Max(CVolume)] is less or equal than 50.
Select the row and go to Row Visibility
Select Show or Hide based on an expression option and use this expression:
=IIF(
Max(Fields!Q9Volume.Value)<=50,
True,False
)
It will show something like this:
Note maximum value for Cancer and Tunnel are 7 and 10 respectively, so
they will be hidden if you apply the above expression.
Let me know if this helps.