How to use limit observations in SAS when union two tables - sql

I'm using sas and I want to limit number of output rows for each table after order the data source, can anyone tell me how to achieve that in SAS? I know in mysql I can just use limit to do the work, but in SAS if I use (obs=10) or (outobs =10), it just limit the number of data input. Here is my proc sql
select distinct sales as a from lucas
group by province
outer union
select distinct sales as b from lucas
group by province
order by a desc, b asc;

Normally you would just use OBS= option when reading the data.
data top10;
set have (obs=10);
by size descending;
run;
If you don't already have a dataset sorted in that order and you want to avoid writing the full dataset out you could use a VIEW to do the generation and/or ordering for you.
proc sql ;
create view derived_sales as
select id,sum(sales) as total_sales
from have
group by id
order by calculated total_sales desc
;
quit;
data top10_sales;
set derived_sales(obs=10);
run;

Proc SQL does not implement current modern clauses such a LIMIT, OFFSET, FETCH, nor does it have partitioning functions you may be familiar with.
That said, you can not row limit the output of a sorted sub select or view, however, you can limit the output to a table using the OUTOBS option.
This sample creates two tables, each corresponding to a sub-select limiting 10 rows of a sorted result set. The option is reset prior to unioning them.
proc sql;
reset outobs=10;
create table have_ss1 as
select distinct msrp as msrp_1
from sashelp.cars
group by model
;
create table have_ss2 as
select distinct msrp as msrp_2
from sashelp.cars
group by model
;
reset outobs=&sysmaxlong;
create table want as
select * from have_ss1
outer union
select * from have_ss2
;
The SAS log window will show informative warnings, such as:
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the
SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function.
WARNING: The query as specified involves ordering by an item that doesn't appear in its SELECT
clause. Since you are ordering the output of a SELECT DISTINCT it may appear that some
duplicates have not been eliminated.
WARNING: Statement terminated early due to OUTOBS=10 option.

I would do it like that as this limits the dataset/table created in the proc sql and not the input from the lucas dataset/table :
proc sql outobs=10;
select distinct sales as a from lucas
group by province
outer union
select distinct sales as b from lucas
group by province
order by a desc, b asc;
quit;
this will only limit the output and not the input!

Related

Oracle SQL - creating "unstored" procedure

I have this SQL query
SELECT ACCBAL_DATE, ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE ACC_KEY = '964570223'
AND ACCBAL_KEY = '16'
ORDER BY ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
It returns one row but I need to use this query for many ACC_KEYS (about 600).
So first way to do that is to run this query about 600x with different ACC_KEY parameter.
The second one is creating a procedure I think.
Procedure which will use variable acc_key and move it to WHERE statement.
Issue is that I can't create procedure stored on server because of permissions.
Is there some way to solve it without storing procedure on server?
EDIT: I know the IN clause but that is not what I need. I need something which will run the query about 600x, each execution with another ACC_KEY in WHERE clause and the output should be 600 rows.
when I used them in clause IN, then it will still return only one row. I want to return only one row because without limitations it returns about 100 rows, so I want only the first row which has needed data. For each ACC_KEY it should return only one row
You can still do that with an IN() clause listing all 600 key values:
select acc_key,
max(accbal_date) as accbal_date,
max(accbal_amount) keep (dense_rank last order by accbal_date) as accbal_amount
from account_balances t
where acc_key in ('964570223', '964570224', ...) -- up to 1000 allowed
and accbal_key = '16'
group by acc_key
order by acc_key;
This is using aggregate functions and grouping by the key, so you will get one row per key, with the data for the most recent date.
Read more about keep/last.
It would still be better to use a collection or a table - maybe an external table loaded from your Excel sheet, saved as a CSV; not least because you can only supply 1000 entries to a single IN() clause - or any expression list - but also for performance and readability/maintenance reasons.
You can store the keys in a table or use a derived table in the query. I would recommend something more like this:
WITH keys as (
SELECT '964570223' as ACC_KEY FROM DUAL UNION ALL
. . .
)
SELECT k.ACC_KEY, MAX(ab.ACCBAL_DATE) as ACCBAL_DATE,
MAX(ab.ACCBAL_AMOUNT) KEEP (DENSE_RANK FIRST ORDER BY ab.ACCBAL_DATE DESC) as ACCBAL_AMOUNT
FROM keys k LEFT JOIN
ACCOUNT_BALANCES ab
ON ab.ACC_KEY = k.ACC_KEY AND
ab.ACCBAL_KEY = '16'
GROUP BY k.ACC_KEY;
Of course the CTE keys could be replaced with a table that has the accounts of interest.
Note that this replaces your logic with aggregation logic. You just want the most recent date and balance, which Oracle supports using the KEEP keyword.
Step-1 : CREATE TABLE WITH 1 COLUMN ACC_KEY STORES ALL LIST OF ACC_KEY.
Step-2 : Code Run.
SELECT T.ACCBAL_DATE, T.ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE EXISTS(SELECT A.ACC_KEY FROM <TABLENAME> A WHERE A.ACC_KEY=T.ACC_KEY)
AND T.ACCBAL_KEY = '16'
ORDER BY T.ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;

how to find maximum of sum of number using if else in procedure in sap hana sql

I want to list out the product which has highest sales amount on date wise.
note: highest sales amount in the sense max(sum(sales_amnt)...
by using if or case In the procedure in sap hana SQL....
I did this by using with the clause :
/--------------------------CORRECT ONE ----------------------------------------------/
WITH ranked AS
(
SELECT Dense_RAnk() OVER (ORDER BY SUM("SALES_AMNT"), "SALES_DATE", "PROD_NAME") as rank,
SUM("SALES_AMNT") AS Amount, "PROD_NAME",count(*), "SALES_DATE" FROM "KABIL"."DATE"
GROUP BY "SALES_DATE", "PROD_NAME"
)
SELECT "SALES_DATE", "PROD_NAME",Amount
FROM ranked
WHERE rank IN ( select MAX(rank) from ranked group by "SALES_DATE")
ORDER BY "SALES_DATE" DESC;
this is my table
You can not use IF along with SELECT statement. Note that, you can achieve most of boolean logics with CASE statement syntax
In select, you are applying it over a column and your logic will be executed as many as times the count of result set rows. Hence , righting an imperative logic is not well appreciated. Still, if you want to do the same, create a calculation view and use intermediate calculated columns to achieve what you are expecting .
try this... i got an answer ...
select "SALES_DATE","PROD_NAME",sum("SALES_AMNT")
from "KABIL"."DATE"
group by "SALES_DATE","PROD_NAME"
having (SUM("SALES_AMNT"),"SALES_DATE") IN (select
MAX(SUM_SALES),"SALES_DATE"
from (select SUM("SALES_AMNT")
as
SUM_SALES,"SALES_DATE","PROD_NAME"
from "KABIL"."DATE"
group by "SALES_DATE","PROD_NAME"
)
group by "SALES_DATE");

SAS SQL where and having

select count(*) from tableA having product="abc";
select count(*) from tableA where product="abc";
why the outpur are different from the above statements as both are same?
Is it possible?
WHERE filters the records that go into the calculations. HAVING filters the result rows that are returned.
If you run your first query then SAS will warn you that it is remerging the results with the original data since you are referencing a non summary statistic variable in your HAVING clause. Note that if no original records meet your HAVING clause then you get no observations in your result set. But if ANY records meet your query then you get a separate observation for each observation that meets your HAVING clause, but count is for all observations since none were filtered.
Try this query.
proc sql ;
select 'HAVING',count(*) from sashelp.class having name like 'A%'
union all
select 'WHERE',count(*) from sashelp.class where name like 'A%'
;
quit;
Then change A% to Z% and run it again.

Using distinct on a column and doing order by on another column gives an error

I have a table:
abc_test with columns n_num, k_str.
This query doesnt work:
select distinct(n_num) from abc_test order by(k_str)
But this one works:
select n_num from abc_test order by(k_str)
How do DISTINCT and ORDER BY keywords work internally that output of both the queries is changed?
As far as i understood from your question .
distinct :- means select a distinct(all selected values should be unique).
order By :- simply means to order the selected rows as per your requirement .
The problem in your first query is
For example :
I have a table
ID name
01 a
02 b
03 c
04 d
04 a
now the query select distinct(ID) from table order by (name) is confused which record it should take for ID - 04 (since two values are there,d and a in Name column). So the problem for the DB engine is here when you say
order by (name).........
You might think about using group by instead:
select n_num
from abc_test
group by n_num
order by min(k_str)
The first query is impossible.
Lets explain this by example. we have this test:
n_num k_str
2 a
2 c
1 b
select distinct (n_num) from abc_test is
2
1
Select n_num from abc_test order by k_str is
2
1
2
What do you want to return
select distinct (n_num) from abc_test order by k_str?
it should return only 1 and 2, but how to order them?
How do extended sort key columns
The logical order of operations in SQL for your first query, is (simplified):
FROM abc_test
SELECT n_num, k_str i.e. add a so called extended sort key column
ORDER BY k_str DESC
SELECT n_num i.e. remove the extended sort key column again from the result.
Thanks to the SQL standard extended sort key column feature, it is possible to order by something that is not in the SELECT clause, because it is being temporarily added to it behind the scenes prior to ordering, and then removed again after ordering.
So, why doesn't this work with DISTINCT?
If we add the DISTINCT operation, it would need to be added between SELECT and ORDER BY:
FROM abc_test
SELECT n_num, k_str i.e. add a so called extended sort key column
DISTINCT
ORDER BY k_str DESC
SELECT n_num i.e. remove the extended sort key column again from the result.
But now, with the extended sort key column k_str, the semantics of the DISTINCT operation has been changed, so the result will no longer be the same. This is not what we want, so both the SQL standard, and all reasonable databases forbid this usage.
Workarounds
PostgreSQL has the DISTINCT ON syntax, which can be used here for precisely this job:
SELECT DISTINCT ON (k_str) n_num
FROM abc_test
ORDER BY k_str DESC
It can be emulated with standard syntax as follows, if you're not using PostgreSQL
SELECT n_num
FROM (
SELECT n_num, MIN(k_str) AS k_str
FROM abc_test
GROUP BY n_num
) t
ORDER BY k_str
Or, just simply (in this case)
SELECT n_num, MIN(k_str) AS k_str
FROM abc_test
GROUP BY n_num
ORDER BY k_str
I have blogged about SQL DISTINCT and ORDER BY more in detail here.
You are selecting the collection distinct(n_num) from the resultset from your query. So there is no actual relation with the column k_str anymore. A n_num can be from two rows each having a different value for k_str. So you can't order the collection distinct(n_num) by k_str.
According to SQL Standards, a SELECT clause may refer either to as clauses ("aliases") in the top level SELECT clause or columns of the resultset by ordinal position, and therefore nether of your queries would be compliant.
It seems Oracle, in common with other SQL implemetations, allows you to refer to columns that existed (logically) immediately prior to being projected away in the SELECT clause. I'm not sure whether such flexibility is such a good thing: IMO it is good practice to expose the sort order to the calling application by including the column/expressions etc in the SELECT clause.
As ever, you need to apply dsicpline to get meaningful results. For your first query, the definition of order is potentially entirely arbitrary.You should be grateful for the error ;)
This approach is available in SQL server 2000, you can select distinct values from a table and order by different column which is not included in Distinct.
But in SQL 2012 this will through you an error
"ORDER BY items must appear in the select list if SELECT DISTINCT is specified."
So, still if you want to use the same feature as of SQL 2000 you can use the column number for ordering(its not recommended in best practice).
select distinct(n_num) from abc_test order by 1
This will order the first column after fetching the result. If you want the ordering should be done based on different column other than distinct then you have to add that column also in select statement and use column number to order by.
select distinct(n_num), k_str from abc_test order by 2
When I got same error, I got it resolved by changing it as
SELECT n_num
FROM(
SELECT DISTINCT(n_num) AS n_num, k_str
FROM abc_test
) as tbl
ORDER BY tbl.k_str
My query doesn't match yours exactly, but it's pretty close.
select distinct a.character_01 , (select top 1 b.sort_order from LookupData b where a.character_01 = b.character_01 )
from LookupData a
where
Dataset_Name = 'Sample' and status = 200
order by 2, 1
did you try this?
SELECT DISTINCT n_num as iResult
FROM abc_test
ORDER BY iResult
you can do
select distinct top 10000 (n_num) --assuming you won't have more than 10,000 rows
from abc_test order by(k_str)

Are the results deterministic, if I partition SQL SELECT query without ORDER BY?

I have SQL SELECT query which returns a lot of rows, and I have to split it into several partitions. Ie, set max results to 10000 and iterate the rows calling the query select time with increasing first result (0, 10000, 20000). All the queries are done in same transaction, and data that my queries are fetching is not changing during the process (other data in those tables can change, though).
Is it ok to use just plain select:
select a from b where...
Or do I have to use order by with the select:
select a from b where ... order by c
In order to be sure that I will get all the rows? In other word, is it guaranteed that query without order by will always return the rows in the same order?
Adding order by to the query drops performance of the query dramatically.
I'm using Oracle, if that matters.
EDIT: Unfortunately I cannot take advantage of scrollable cursor.
Order is definitely not guaranteed without an order by clause, but whether or not your results will be deterministic (aside from the order) would depend on the where clause. For example, if you have a unique ID column and your where clause included a different filter range each time you access it, then you would have non-ordered deterministic results, i.e.:
select a from b where ID between 1 and 100
select a from b where ID between 101 and 200
select a from b where ID between 201 and 300
would all return distinct result sets, but order would not be any way guaranteed.
No, without order by it is not guaranteed that query will ALWAYS return the rows in the same order.
No guarantees unless you have an order by on the outermost query.
Bad SQL Server example, but same rules apply. Not guaranteed order even with inner query
SELECT
*
FROM
(
SELECT
*
FROM
Mytable
ORDER BY SomeCol
) foo
Use Limit
So you would do:
SELECT * FROM table ORDER BY id LIMIT 0,100
SELECT * FROM table ORDER BY id LIMIT 101,100
SELECT * FROM table ORDER BY id LIMIT 201,100
The LIMIT would be from which position you want to start and the second variable would be how many results you want to see.
Its a good pagnation trick.