SAS code Proc rank with groups option to SQL conversion - sql

I am new to SAS and struggling to convert the below piece of code to SQL.
proc rank data = input (where = (limit_assort = 1)) out = assort_rank groups=3;
var assort;
ranks assort_rank;
run;
I tried in sql by using the below one but not matching with the SAS output.
Percentile_Cont(0.3) Within Group (Order by assortment) As assortment_rank
In SAS output, i can see a column "Rank for variable assort"with 0,1&2 based on the groups=3 option. But not sure how to handle it in SQl.
Thanks in advance!

Related

Pass list of dates to SQL WHERE statement in PySpark

In the process of converting some SAS code to PySpark and we previously used a macro variable for the where statement in this code. In adapting to PySpark, I'm trying to pass a list of dates to the where statement, but I keep getting errors. I want the SQL code to pull all data from those 3 months. Any pointers?
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN '{}')
) as table1""".format(month_list)
Pass the list as a tuple to have the right sql syntax:
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN {})
) as table1""".format(tuple(month_list))
And you don’t need apostrophe for in statement

In pig how do i write a date, like in sql we write where date =''

I am new to Pig scripting but good with SQL. I wanted the pig equivalent for this SQL line :
SELECT * FROM Orders WHERE Date='2008-11-11'.
Basically I want to load data for one id or date how do I do that?
I did this and it worked, used FILTER in pig, and got the desired results.
`ivr_src = LOAD '/raw/prod/...;
info = foreach ivr_src generate timeEpochMillisUTC as time, cSId as id;
Filter_table= FILTER info BY id == '700000';
sorted_filter_table = Order Filter_table BY $1;
store sorted_filter_table into 'sorted_filter_table1' USING PigStorage('\t', '-
schema');`

Writing SAS dates to SQL Server databse

How to write SAS dates to Microsoft SQL Server 2016 Date data type in database?
I got SAS data with a sas date DataEndDay and I want to write that into a database. The following bit is in use (buffer is just to speed up the testing-failing) :
libname valu oledb provider=sqloledb schema="dbo" INSERTBUFF=100
properties=("User ID"="&username." Password="&pw."
"data source" = &database.
"initial catalog"=&catalog.);
proc sql noprint;
insert into valu.Data_upload_from_me
( <some_columns...>,
<more-columns...>
,DataEndDay
)
select
<some_columns_source...>,
<more-columns_source...>
,DataEndDay
from work.SAS_data_to_publish
;quit;
Of course because SAS dates are numbers, direct writing is going to fail. What works is if I hard-code this as:
select
<some_columns_source...>,
<more-columns_source...>
,'2018-12-12'
from work.SAS_data_to_publish
;quit;
But If I convert the SAS date to string in SAS datasteps:
data SAS_data_to_publish ;
set SAS_data_to_publish ;
dataEndday0 = put(DataEndDay, yymmddd10.);
DataEndDay1 = quote(dataEndday0, "'") ;
run;
and try to write either of these, I get conversion error:
ERROR: ICommand::Execute failed. : Conversion failed when converting date and/or time from character string.
When I select the string it looks pretty ok:
proc sql; select DataEndDay1 from SAS_data_to_publish; quit;
'2018-12-12'
previously I've managed to write dateTimes with similar trick, which works:
proc format;
picture sjm
. = .
other='%Y-%0m-%0d %0H:%0M:%0S:000' (datatype=datetime)
;run;
data to_be_written;
set save.raw_data_to_be_written;
DataEndDay0 = put(dhms(DataEndDay,0,0,0), sjm. -L);
run;
Anyone ran into similar issues? How could I write the dates?
I could ask them to change the column to dateTime, maybe....
Thank you in advance.
Edit:
I managed to develop a work-around, which works but is ugly and -frankly- I don't like it. It so happens that my date is same for all rows, so I can assing it to macro variable and then use it in database writing.
data _NULL_;
set SAS_data_to_publish;
call symput('foobar', quote( put (DataEndDay , yymmddd10. -L), "'") ) ;
run;
....
select
<some_columns_source...>,
<more-columns_source...>
,&foobar.
from work.SAS_data_to_publish
;quit;
Of course this would fail immediately should DataEndDay vary, but maybe demonstrates that something is off in Proc SQLs select clause....
Edit Edit Pasted the question to SAS forums
I finally managed to crack the issue. The issue was for the missing values. As I am passing the values as strings into the database the parser interpreted missing values as real dots instead of empty strings. The following works:
data upload;
set upload;
CreatedReportdate2 = PUT(CreatedReportdate , yymmddn8.);
run;
libname uplad_db odbc noprompt =
"DRIVER=SQL Server; server=&server.; Uid=&user.;Pwd=&pw.; DATABASE=&db.;"
INSERTBUFF=32767;
proc sql;
insert into uplad_db.upload_table
(.... )
select
case when CreatedReportdate2 ='.' then '' else CreatedReportdate2 end,
...
from upload;
quit;
SAS does not really properly support the SQL server DATE data type. I imagine this is due to the fact that it's newer, but for whatever reason you have to pass the data as strings.
For missing values, it's important to have a blank string, not a . character. The easiest workaround here is to set:
options missing=' ';
That will allow you to insert data properly. You can then return it to . if you wish. In a production application that might be used by others, I'd consider storing aside the option value temporarily then resetting to that, in order to do no harm.
Normally I just use PROC APPEND to insert observations into a remote database.
proc append base=valu.Data_upload_from_me force
data=work.SAS_data_to_publish
;
run;
Make sure your date variable in your SAS dataset use the same data type as the corresponding variable names in your target database table. So if your MS SQL database uses TIMESTAMP fields for date values then make sure your SAS dataset uses DATETIME values.
If you want to use constants then make sure to use SAS syntax in your SAS code and MS SQL syntax in any pass through code.
data test;
date = '01JAN2017'd ;
datetime = '01JAN2017:00:00'dt ;
run;
proc sql ;
connect to oledb .... ;
execute ( ... date = '2017-01-01' .... datetime='2017-01-01 00:00' ...)
by oledb;
quit;

Sql equivalent in SAS

I have a code such as below in sql(lot more and and not ins but just wanted to list few) i am new to sas and know proc sql a bit etc, learning and exploring everyday,
Select * from table
Where date=‘20180112’
and type=‘apple’ and location=‘dc’ and not
(columnName)in(‘a’,’b’) And lat=‘ten’
I am not able to understand sas equivalent of above sql as below. Can someone please explain sas code of if part and then do
Data sample;
Set sourcetble;
If date=‘20180112’ and type=‘apple’
And location=‘dc’ then do;
Blah1=‘rain’
Blah2=‘something else’
If columnName in(‘a’, ‘b’) and lat=‘ten’ Then do;
This just subsets based the values and variables in the WHERE statement.
Data sample;
set table;
WHERE date='20180112' and type='apple' And location='dc'
and columnName in (‘a’, ‘b’) and lat=‘ten’;
<other optional code>;
run;
Not like SQL query, a SAS data step will result in creating a new dataset. If you don't need to have a new dataset, you can use "data _null_;". Alternatively there are SAS procedures that will simply display dataset such as SQL "select" would do.
The "set" in SAS is equivalent to the "from" in SQL: it specifies the base dataset(s) from which you build the new dataset.
By default, SAS data step keeps all variables of the "set" datasets. It is equivalent to "select *" in SQL. If you need only some variables, you can use "keep" and "drop" statements in SAS.
The "where" clause and "and"/"or" operators work similarly in SAS and SQL, but with slightly different syntax.
The if … then in the data step has no correspondce to the SQL shown in the question. A conditional assignment in SQL is done using a case statement.
So a DATA step statement such as
data want;
set have;
…
if date="20180112" and type="apple" and location="dc" then do;
Blah1="rain";
Blah2="something else";
end;
would be concordant with SQL
Proc SQL;
create table want as
select …
, case when date="20180112" and type="apple" and location="dc"
then "rain"
else ""
end as Blah1
, case when date="20180112" and type="apple" and location="dc"
then "something else"
else ""
end as Blah2
from
have
…
;
For the case of some algorithm needing to assign several variables at once when some criteria (if logic) is met:
DATA Step has do; … end; syntax which can have several assignments statements within.
SQL select statement can only assign one variable per logic evaluation (case statement), thus the logic code has to be repeated for each variable being assigned based on criteria.

How to convert Sql inner query to Linq for sum of column

How to Convert this sql query to Linq.
select sum(OutstandingAmt)from IvfReceiptDetails where IvfReceiptId IN(select IvfReceiptId from IvfReceipts where PatientId = 'SI-49650')
I think it is easier to translate SQL using query comprehension syntax instead of lambda syntax.
General rules:
Translate inner queries into separate query variables
Translate SQL phrases in LINQ phrase order
Use table aliases as range variables, or if none, create range variables from table names
Translate IN to Contains
Translate SQL functions such as DISTINCT or SUM into function calls on the entire query.
Here is the code:
var IvfReceiptIds = from IvfReceipt in IvfReceipts
where IvfReceipt.PatientId = "SI-49650"
select IvfReceipt.IvfReceiptId;
var OutstandingAmtSum = (from IvfReceiptDetail in IvfReceiptDetails
where IvfReciptIds.Contains(IvfReceiptDetail.IvfReceiptId)
select IvfReceiptDetail.OutstandingAmt).Sum();
Try this, First get all IvfReceiptId in array based on your inner query used in where condition then check contains. Change name of your _context if it's different.
var arrIvfReceiptId = _context.IvfReceiptDetails.Where(p=>p.PatientId == "SI-49650").ToArray();
var sum = (from ird in _context.IvfReceiptDetails.Where(p=> arrIvfReceiptId.Contains(p.IvfReceiptId))
select OutstandingAmt).Sum();