Write SQL from SAS - sql

I have this code in SAS, I'm trying to write SQL equivalent. I have no experience in SAS.
data Fulls Fulls_Dupes;
set Fulls;
by name, coeff, week;
if rid = 0 and ^last.week then output Fulls_Dupes;
else output Fulls;
run;
I tried the following, but didn't produce the same output:
Select * from Fulls where rid = 0 groupby name,coeff,week
is my sql query correct ?

SQL does not have a concept of observation order. So there is no direct equivalent of the LAST. concept. If you have some variable that is monotonically increasing within the groups defined by distinct values of name, coeff, and week then you could select the observation that has the maximum value of that variable to find the observation that is the LAST.
So for example if you also had a variable named DAY that uniquely identified and ordered the observations in the same way as they exist in the FULLES dataset now then you could use the test DAY=MAX(DAY) to find the last observation. In PROC SQL you can use that test directly because SAS will automatically remerge the aggregate value back onto all of the detailed observations. In other SQL implementations you might need to add an extra query to get the max.
create table new_FULLES as
select * from FULLES
group by name, coeff, week
having day=max(day) or rid ne 0
;
SQL also does not have any concept of writing two datasets at once. But for this example since the two generated datasets are distinct and include all of the original observations you could generate the second from the first using EXCEPT.
So if you could build the new FULLS you could get FULLS_DUPES from the new FULLS and the old FULLS.
create table FULLS_DUPES as
select * from FULLES
except
select * from new_FULLES
;

Related

Make a summary of a Calculated variable with Group By Proc SQL on SAS

I want to make a summary of the MT_REMBOURSE_RN variable calculated in the Proc SQL statement and output that sum for each Group declared in the Group by statement into the MT_REMBOURSE_RN_Tot.
This the code made :
proc sql undo_policy=none;
create table tab_actif3 (compress=Yes) as
select a.*,
(-sum(a.vb_cc_inclus)) as VM,
intrr(1,calculated VM, CF_1,CF_2
) as X_Spread,
(sum(1, calculated X_Spread))**(-Nb_An_Int) as SPREAD_DF,
INT_NETS_RECUS_PAYES* calculated SPREAD_DF as INT_NETS_RECUS_PAYES_RN,
MT_REMBOURSE* calculated SPREAD_DF as MT_REMBOURSE_RN,
sum( calculated MT_REMBOURSE_RN) as MT_REMBOURSE_RN_Tot
from tab_actif2 as a
Left join tab_actif_CF2 (rename=(NoSetScenario=NoSetScenario1 Institution=Institution1
Classe_Actif_s=Classe_Actif_s1 Canton=Canton1 Code_Isin_S=Code_Isin_S1)) as B
On A.NoSetScenario = B.NoSetScenario1 and
A.institution = B.Institution1 and
A.Classe_Actif_s = B.Classe_Actif_s1 and
A.Canton = B.Canton1 and
A.Code_Isin_S = B.Code_Isin_S1
Group by NoSetScenario,
institution,
Classe_Actif_s ,
Canton,
Code_Isin_S,
Nom_Choc;
quit;
But I have got that error :
ERROR: Summary functions nested in this way are not supported.
Is it possible to achieve that goal in the same Proc SQL without making another proc SQL statement calling tab_actif3 ?
Problem understood!!
This is a specificity of SAS SQL programming. You can not make a sum() linked to a Group by statement in a proc SQL after a regular sum function which makes an addition of variables for each observation in the table. This creates processing confusion at runtime.
For instance in my code:
(-sum(a.vb_cc_inclus)) as VM related to the Group By, which is done before the normal sum statement (sum(1, calculated X_Spread))**(-Nb_An_Int) doesn't bring any problem when the code is running without the second sum sum( calculated MT_REMBOURSE_RN) related to the Group by as well.
Thus, the second Group by related sum, used after the normal sum, ought to be made in another create table.

How can I convert this SAS datastep to oracle sql for a conditional counter column?

I apologize for the generalness of the question. I'm trying to create a column that groups rows together based on the time between the current and previous observation. The code below is code that I' wrote that works correctly in SAS. However because of the way that a data step runs vs how oracle sql runs I can't figure out how to do this in oracle sql. Any help would be greatly appreciated!
DATA GROUP;
SET LAG1;
BY CUSTOMER_KEY;
IF (TIME_BTW>5 OR TIME_BTW=.) THEN JOURNEY=0;
JOURNEY+1;
IF FIRST.CUSTOMER_KEY THEN GROUP=0;
IF JOURNEY=1 THEN GROUP+1;
RUN;
It looks like you are defining groups based on time_btw. You seem to want an analytic function. I think the code is like this:
select t.*,
sum(case when time_btw > 5 then 1 else 0 end) over (partition by customer_key order by ??) as grp
from t;
Note that in SQL (unlike SAS), tables represent unordered sets. This means that you need a column that specifies the ordering.

SQL to powerBI expression?

How to write this expression in PowerBI
select distinct([date]),Temperature from Device47A8F where Temperature>25
Totally new to PowerBI. Is there any tool that can change the query from sql to PowerBI expression?
I have tried so many type of different type of expressions but getting error, Most of the time I am getting this:
The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.
Need help, Thanks.
After I posted my answer, wondered if your expected result is get only one date by temperature, In other words, without repeated dates in your result set.
A side note: select distinct([date]),Temperature from Device47A8F where Temperature>25 returns repeated dates since DISTINCT keyword evaluate distinct columns values specified in the SELECT statement, it doesn't return distinct values in a specific column even if you surround it with parenthesis.
Now what brings us here. What I can see in your error is that you are trying to use a table-valued (produces a table with multiple columns) expression in a measure which only accepts scalar-valued (calculate only one value).
Supposing you have a table like this:
Running your SQL query you will get the highlighted in yellow rows:
You can see 01/09/2016 date is repeated. If you want to create a measure you have to define what calculation you want to show for temperature. i.e, average, max or min etc.
In the below expression is being calculated the maximum temperature greater than 25 per date:
MaxTempGreaterThan25 =
CALCULATE ( MAX ( Device47A8F[Temperature] ), Device47A8F[Temperature] > 25 )
In this case the measure MaxTempGreaterThan25 is calculated per date.
If you don't want to produce a measure but a table. In the Power BI Tool bar select Modeling tab and click New Table icon.
Use this expression:
MyTemperatureTable =
FILTER ( Device47A8F, Device47A8F[Temperature] > 25 )
It should produce a new table named MyTemperatureTable like this:
I recommend you learn some basics about DAX, it is pretty different from SQL / T-SQL and there are things you can't do depending on your model and data.
Let me know if this helps.
You probably don't need to write any code if your objective is to show the result in a Power BI visual e.g. a table. Power BI naturally aggregates data if the datatype is numeric (e.g. Temperature).
I would just add a Table visual on a Report page and add the Date and Temperature columns to it. Then in Visualizations / Fields / Values I would click the little down-arrow on the Temperature field and set the Aggregation e.g. Maximum. Then in Visualizations / Fields / Filters I would click the little down-arrow on the Temperature field and set the Filter e.g. is greater than: 25
Hard-coded solutions are unlikely to survive the next question from your users e.g. "but what if I want to see Temperature > 24? Or 20? Or 30?"

SAS: how to properly use intck() in proc sql

I have the following codes in SAS:
proc sql;
create table play2
as select a.anndats,a.amaskcd,count(b.amaskcd) as experience
from test1 as a, test1 as b
where a.amaskcd = b.amaskcd and intck('day', b.anndats, a.anndats)>0
group by a.amaskcd, a.ANNDATS;
quit;
The data test1 has 32 distinct obs, while this play2 only returns 22 obs. All I want to do is for each obs, count the number of appearance for the same amaskcd in history. What is the best way to solve this? Thanks.
The reason this would return 22 observations - which might not actually be 22 distinct from the 32 - is that this is a comma join, which in this case ends up being basically an inner join. For any given row a if there are no rows b which have a later anndats with the same amaskcd, then that a will not be returned.
What you want to do here is a left join, which returns all rows from a once.
create table play2
as select ...
from test1 a
left join test1 b
on a.amaskcd=b.amaskcd
where intck(...)>0
group by ...
;
I would actually write this differently, as I'm not sure the above will do exactly what you want.
create table play2
as select a.anndats, a.amaskcd,
(select count(1) from test1 b
where b.amaskcd=a.amaskcd
and b.anndats>a.anndats /* intck('day') is pointless, dates are stored as integer days */
) as experience
from test1 a
;
If your test1 isn't already grouped by amaskcd and anndats, you may need to rework this some. This kind of subquery is easier to write and more accurately reflects what you're trying to do, I suspect.
If both the anndats variables in each dataset are date type (not date time) then you can simple do an equals. Date variables in SAS are simply integers where 1 represents one day. You would not need to use the intck function to tell the days differnce, just use subtraction.
The second thing I noticed is your code looks for > 0 days returned. The intck function can return a negative value if the second value is less than the first.
I am still not sure I understand what your looking to produce in the query. It's joining two datasets using the amaskcd field as the key. Your then filtering based on anndats, only selecting records where b anndats value is less than a anndats or b.anndats < a.anndats.

SQL - How to insert a subquery into a query in order to retrieve unique values

I am writing reports using Report Builder 3, and I need some help with an sql query to get unique values.
Using the following sample data:
I need to be able to get one single value for feeBudRec returned for each feeRef. The value of each feeBudRec is always the same for each individual feeRef (eg for every data row for feeRef LR01 will have a feeBudRec of 1177).
The reason why I need to get a single feeBudRec value for each feeRef is that I need to be able to total the feeBudRec value for each feeRef in a feePin (eg for feePin LEE, I need to total the feeBudRec values for LR01 and PS01, which should be 1177 + 1957 to get a total of 3134; but if I don't have unique values for feeBudRec, it will add the values for each row, which would bring back a total of 11756 for the 8 LEE rows).
My experience with writing SQL queries is very limited, but from searching the internet, it looks like I'll need to put in a subquery into my SQL query in order to get a single unique feeBudRec figure for each feeRef, and that a subquery that gets a minimum feeBudRec value for each feeRef should work for me.
Based on examples I've found, I think the following subquery should work:
SELECT a.feeRef, a.feeBudRec
FROM (
SELECT uvw_EarnerInfo.feeRef, Min(uvw_EarnerInfo.feeBudRec) as AvailableTime
FROM uvw_EarnerInfo
GROUP BY
uvw_EarnerInfo.feeRef
) as x INNER JOIN uvw_EarnerInfo as a ON a.feeRef = x.feeRef AND a.feeBudRec = x.AvailableTime;
The problem is that I have no idea how to insert that subquery into the query I'm using to produce the report (as follows):
SELECT
uvw_EarnerInfo.feeRef
,uvw_EarnerInfo.PersonName
,uvw_EarnerInfo.PersonSurname
,uvw_EarnerInfo.feePin
,uvw_RB_TimeLedger.TimeDate
,uvw_RB_TimeLedger.matRef
,uvw_RB_TimeLedger.TimeTypeCode
,uvw_RB_TimeLedger.TimeCharge
,uvw_RB_TimeLedger.TimeElapsed
,uvw_WoffTimeByTime.WoffMins
,uvw_WoffTimeByTime.WoffCharge
,uvw_EarnerInfo.feeBudRec
,uvw_EarnerInfo.personOccupation
FROM
uvw_RB_TimeLedger
LEFT OUTER JOIN uvw_WoffTimeByTime
ON uvw_RB_TimeLedger.TimeId = uvw_WoffTimeByTime.TimeId
RIGHT OUTER JOIN uvw_EarnerInfo
ON uvw_EarnerInfo.feeRef = uvw_RB_TimeLedger.feeRef
WHERE
uvw_RB_TimeLedger.TimeDate >= #TimeDate
AND uvw_RB_TimeLedger.TimeDate <= #TimeDate2
If that subquery will get the correct results, can anyone please help me with inserting it into my report query. Otherwise, can anyone let me know what I will need to do to get a unique feeBudRec value for each feeRef?
Depends on the exact schema, but assuming the uvw_EarnerInfo lists the Pin, Ref, and Rec without duplicates, try adding an extra column (after personOccupation) on the end of your query such as :
feeBudRecSum = (Select SUM(FeeBudRec) From uvw_EarnerInfo x
where x.feePin = uvw_EarnerInfo.feePin
Group By x.FeePin)
Note that you would not Sum these values in your report. This column should have the total you are looking for.
The key to Report Builder is to get your query correct from the offset and let the wizard then structure your report for you. It takes all the hard work out of structuring your report manually.
I haven't used Report Builder for a while now but in the query builder of the report displaying the graphical representation of your query you should be able to drag and drop columns in and out of the query set. Dragging a column upwards and out of the box (showing your columns) would have the effect of causing your report to break on this column.
If you restructure this way you will probably have to run the report generator again to regenerate the report and restructure it.
Once you are happy with the structure you can then begin to add the summary columns.