select count(*) from tableA having product="abc";
select count(*) from tableA where product="abc";
why the outpur are different from the above statements as both are same?
Is it possible?
WHERE filters the records that go into the calculations. HAVING filters the result rows that are returned.
If you run your first query then SAS will warn you that it is remerging the results with the original data since you are referencing a non summary statistic variable in your HAVING clause. Note that if no original records meet your HAVING clause then you get no observations in your result set. But if ANY records meet your query then you get a separate observation for each observation that meets your HAVING clause, but count is for all observations since none were filtered.
Try this query.
proc sql ;
select 'HAVING',count(*) from sashelp.class having name like 'A%'
union all
select 'WHERE',count(*) from sashelp.class where name like 'A%'
;
quit;
Then change A% to Z% and run it again.
Related
I have this SQL query
SELECT ACCBAL_DATE, ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE ACC_KEY = '964570223'
AND ACCBAL_KEY = '16'
ORDER BY ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
It returns one row but I need to use this query for many ACC_KEYS (about 600).
So first way to do that is to run this query about 600x with different ACC_KEY parameter.
The second one is creating a procedure I think.
Procedure which will use variable acc_key and move it to WHERE statement.
Issue is that I can't create procedure stored on server because of permissions.
Is there some way to solve it without storing procedure on server?
EDIT: I know the IN clause but that is not what I need. I need something which will run the query about 600x, each execution with another ACC_KEY in WHERE clause and the output should be 600 rows.
when I used them in clause IN, then it will still return only one row. I want to return only one row because without limitations it returns about 100 rows, so I want only the first row which has needed data. For each ACC_KEY it should return only one row
You can still do that with an IN() clause listing all 600 key values:
select acc_key,
max(accbal_date) as accbal_date,
max(accbal_amount) keep (dense_rank last order by accbal_date) as accbal_amount
from account_balances t
where acc_key in ('964570223', '964570224', ...) -- up to 1000 allowed
and accbal_key = '16'
group by acc_key
order by acc_key;
This is using aggregate functions and grouping by the key, so you will get one row per key, with the data for the most recent date.
Read more about keep/last.
It would still be better to use a collection or a table - maybe an external table loaded from your Excel sheet, saved as a CSV; not least because you can only supply 1000 entries to a single IN() clause - or any expression list - but also for performance and readability/maintenance reasons.
You can store the keys in a table or use a derived table in the query. I would recommend something more like this:
WITH keys as (
SELECT '964570223' as ACC_KEY FROM DUAL UNION ALL
. . .
)
SELECT k.ACC_KEY, MAX(ab.ACCBAL_DATE) as ACCBAL_DATE,
MAX(ab.ACCBAL_AMOUNT) KEEP (DENSE_RANK FIRST ORDER BY ab.ACCBAL_DATE DESC) as ACCBAL_AMOUNT
FROM keys k LEFT JOIN
ACCOUNT_BALANCES ab
ON ab.ACC_KEY = k.ACC_KEY AND
ab.ACCBAL_KEY = '16'
GROUP BY k.ACC_KEY;
Of course the CTE keys could be replaced with a table that has the accounts of interest.
Note that this replaces your logic with aggregation logic. You just want the most recent date and balance, which Oracle supports using the KEEP keyword.
Step-1 : CREATE TABLE WITH 1 COLUMN ACC_KEY STORES ALL LIST OF ACC_KEY.
Step-2 : Code Run.
SELECT T.ACCBAL_DATE, T.ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE EXISTS(SELECT A.ACC_KEY FROM <TABLENAME> A WHERE A.ACC_KEY=T.ACC_KEY)
AND T.ACCBAL_KEY = '16'
ORDER BY T.ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
I'm using sas and I want to limit number of output rows for each table after order the data source, can anyone tell me how to achieve that in SAS? I know in mysql I can just use limit to do the work, but in SAS if I use (obs=10) or (outobs =10), it just limit the number of data input. Here is my proc sql
select distinct sales as a from lucas
group by province
outer union
select distinct sales as b from lucas
group by province
order by a desc, b asc;
Normally you would just use OBS= option when reading the data.
data top10;
set have (obs=10);
by size descending;
run;
If you don't already have a dataset sorted in that order and you want to avoid writing the full dataset out you could use a VIEW to do the generation and/or ordering for you.
proc sql ;
create view derived_sales as
select id,sum(sales) as total_sales
from have
group by id
order by calculated total_sales desc
;
quit;
data top10_sales;
set derived_sales(obs=10);
run;
Proc SQL does not implement current modern clauses such a LIMIT, OFFSET, FETCH, nor does it have partitioning functions you may be familiar with.
That said, you can not row limit the output of a sorted sub select or view, however, you can limit the output to a table using the OUTOBS option.
This sample creates two tables, each corresponding to a sub-select limiting 10 rows of a sorted result set. The option is reset prior to unioning them.
proc sql;
reset outobs=10;
create table have_ss1 as
select distinct msrp as msrp_1
from sashelp.cars
group by model
;
create table have_ss2 as
select distinct msrp as msrp_2
from sashelp.cars
group by model
;
reset outobs=&sysmaxlong;
create table want as
select * from have_ss1
outer union
select * from have_ss2
;
The SAS log window will show informative warnings, such as:
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the
SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function.
WARNING: The query as specified involves ordering by an item that doesn't appear in its SELECT
clause. Since you are ordering the output of a SELECT DISTINCT it may appear that some
duplicates have not been eliminated.
WARNING: Statement terminated early due to OUTOBS=10 option.
I would do it like that as this limits the dataset/table created in the proc sql and not the input from the lucas dataset/table :
proc sql outobs=10;
select distinct sales as a from lucas
group by province
outer union
select distinct sales as b from lucas
group by province
order by a desc, b asc;
quit;
this will only limit the output and not the input!
I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;
The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;
I am writing some queries with self-joins in SQL Server. When I have only one column in the SELECT clause, the query returns a certain number of rows. When I add another column, from the second instance of the table, to the SELECT clause, the results increase by 1000 rows!
How is this possible?
Thanks.
EDIT:
I have a subquery in the FROM clause, which is also a self-join on the same table.
How is this possible?
the only thing I can think of is that you have SELECT DISTINCT and the additional column makes some results distinct that weren't before the additional column.
For example I would expect the second result to have many more rows
SELECT DISTINCT First_name From Table
vs
SELECT DISTINCT First_name, Last_name From Table
But if we had the actual SQL then something else might come to mind