sas difference between sum and +? - sum

data whatever;
infile '';
input price cost;
<statement>;
run;
In <statement>, what's the difference between using total = sum(total,cost) and total = total + cost ?

The differnce can be seen below:
If you want to calculate cumulative total , you should use sum statement.
total = sum(total,cost) /* its a sum function */
total+cost /* its a sum statement,the form is variable+expression */
Here :
"total" specifies the name of the accumulator variable, which contains a numeric value.
1) The variable(total in this case) is automatically set to 0 before SAS reads the first observation. The variable's value is retained from one iteration to the next, as if it had appeared in a RETAIN statement.
2) To initialize a sum variable to a value other than 0, include it in a RETAIN statement with an initial value.
and
"Cost" is an expression
1) The expression is evaluated and the result added to the accumulator variable.
2) SAS treats an expression that produces a missing value as zero.
A sum statement is differ from a sum function in a way that a sum statement retains the value which it has calculated earlier.
However,
The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain total 0;
total=sum(total,cost);

You'd probably have trouble with either of those if you actually including it after the input statement.
The information that ProgramFOX posted is correct, but if you're asking about the difference between these three statements, there's a little more to it:
total = sum(total,cost);
total + cost;
The second of these implies a retain total; statement and will also treat nulls as zero. You run into the null problem when you're using this type of expression:
total = total + cost;

Related

pyspark count with condition with selectExpr

I have a DataFrame with a column "age" and I want to count how many rows with age = 60, for example. I know how to solve this using select or df.count() but I want to use selectExpr.
I tried
customerDfwithAge.selectExpr("count(when(col(age) = 60))")
but it returns me
Undefined function: 'col'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.;
If I try to remove col, it returns me
Invalid arguments for function when; line 1 pos 6
What is wrong?
If you want to use selectExpr you need to provide a valid SQL expression.
when() and col() are pyspark.sql.functions not SQL expressions.
In your case, you should try:
customerDfwithAge.selectExpr("sum(case when age = 60 then 1 else 0 end)")
Bear in mind that I am using sum not count. count will count every row (0s and 1s) and it would simply return the total number of rows of your dataframe.

how to calculate number with date in sql oracle cmd

so i try this
select sr.member,
sr.code_book,
bk.title_book,
sr.return_date,
(kg.borrowed_time - sr.borrow_date) as target back
from sirkulasi sr,
book bk,
kategori kg
where sr.code_book = bk.code_book
and sr.return_date = '';
but it say inconsistent datatypes: expected NUMBER got DATE
because the length of the loan is the number and the date of the loan is the date
the question is like this
Show circulation information that has not returned and when it is targeted to return
(The feature that has not returned is the data in the circulation table which is still dated
empty, the return target is calculated based on borrowed_time and borrowed_date according to
category)
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
Second, the only part of the query that could generate the error is the -. I'm pretty sure you want:
select sr.member, sr.code_book, bk.title_book, sr.return_date,
(sr.borrow_date + kg.borrowed_time ) as target_back
from sirkulasi sr join
book bk
sr.code_book = bk.code_book
where sr.return_date is null;
Notes:
The target return date is (presumably) the borrowed day by the length of time. + is allowed between a date and a number, when first operand is a date and the second a number that represents a number of days.
return_date certainly should be a date. Non-returns should be NULL values not strings. A string is not even appropriate for a date comparison. And = '' never evaluates to "true" in Oracle because Oracle (mistakenly) treats an empty string as NULL.
The table kategori is not used in the query. Remove it.
JOIN. JOIN. JOIN.

Average Row [SQL]

Actually I'm a bit confused about what should i wrote in the subject.
The point is like this, I want to average the Speed01,Speed02,Speed03 and Speed04 :
SELECT
Table01.Test_No,
Table01.Speed01,
Table01.Speed02,
Table01.Speed03,
Table01.Speed04,
I want to create new column that consists of this average -->>
AVG(Table01.Speed01, Table01.Speed02, Table01.Speed03,Table01.Speed04) as "Average"
I have tried this, but it did not work.
From
Table01
So, the contain of the Speed column could be exist but sometimes the Speed02 don't have number but the others are have numbers. sometimes speed04 data is also missing and the others is exist, sometimes only one data (example: only Speed01) have the data. lets say it depends on the sensor ability to catch the speed of the test material.
It will be a big help if you can find the solution. I'm newbie here.
THANK YOU ^^
AVG is a SQL aggregate function, therefore not applicable. So simply do the math. Average is sum divided by count:
(SPEED01 + SPEED02 + SPEED03 +SPEED04)/4
To deal with missing values, use NULLIF or COALESCE:
(COALESCE(SPEED01, 0) + COALESCE(SPEED02, 0) + COALESCE(SPEED03, 0) + COALESCE(SPEED04, 0))
That leaves the denominator. You need to add 1 for every non null. For example:
(COALESCE(SPEED01/SPEED01,0) + COALESCE(SPEED02/SPEED02,0) + ...)
You can also use CASE, depending on the supported SQL dialect, to avoid the possible divide by 0:
CASE WHEN SPEED01 IS NULL THEN 0 ELSE 1
OR you can normalize the data, extract all SPEEDs into a 1:M relation and use the AVG aggregate, avoiding all these issues. Not to mention the possibility to add a 5th measurement, then a 6th and so on and so forth!
Just add the columns and divide them by 4. To deal with the "missing" values use coalesce to treat NULL values as zero:
SELECT Test_No,
(coalesce(Speed01,0) + coalesce(Speed02,0) + coalesce(Speed03,0) + coalesce(Speed04,0)) / 4 as "Average"
FROM Table01;
You didn't mention your DBMS (Postgres, Oracle, ...), but the above is ANSI (standard) SQL and should run on nearly every DBMS.
As I understood your question, I supposed that Table01.Speed01, Table01.Speed03, Table01.Speed04 are nullable and of type int whereas Table01.Speed02 is nullable and of type nvarchar:
SELECT
Table01.Test_No,
(
ISNULL(Table01.Speed01, 0) +
CASE ISNUMERIC(Table01.Speed02) WHEN 0 THEN 0 ELSE CAST(Table01.Speed02 AS int) END +
ISNULL(Table01.Speed03, 0) +
ISNULL(Table01.Speed04, 0)
)/4 AS AVG
FROM Table01

Arithmetic operation on array aggregation in postgres

I have 2 questions regarding array_agg in postgres
1) I have a column which is of type array_agg. I need to divide each of the element of the array_agg by a constant value. Is it possible. I checked http://www.postgresql.org/docs/9.1/static/functions-array.html, but could not find any reference to any arithmetic operations on array_agg.
Edit:
An example of the desired operation.
select array_agg(value)/2 from some_table
Here, I create an array of the column value from the table some_table and I have to divide each of the column by 4
2) Is it possible to use coalesce in array_agg. In my scenario, there may be cases wherein, the array_agg of a column may result in a NULL array. Can we use coalesce for array_agg ?
Example:
select coalsece(array_agg(value1), 0)
Dividing is probably simper than you thought:
SELECT array_agg(value/2)
FROM ...
However, what value/2 does exactly depends on the data type. If value is an integer, fractional digits are truncated. To preserve fractional digits use value/2.0 instead. The fractional digit forces the calculation to be done with numeric values.
COALESCE won't make any difference outside the array. Either there are no rows, then you get no result at all ('no row'), or if there are, you get an array, possibly with NULL elements. But the value of the array itself is never NULL.
To replace individual NULL values with 0:
SELECT array_agg(coalesce(value/2.0, 0))
FROM ...

Query for dropping value from field

I have a query that looks at duty and vat information and does calculation based on the returned value.
The column that tells me the duty rates is in the table formatted as either, for example 3.7% or 8% in both bases I need remove the % from my return value. Otherwise my SUM clasue fails.
I have sorted the problem for the 3.7% example with the follwoing:
CASE WHEN CustomsTariff.CommodityCode.StandardDuty = 'Free' THEN '0.0' ELSE SUBSTRING(CustomsTariff.CommodityCode.StandardDuty, 1, 3) END AS DutyRate,
This drops the % for any returns where there is decimal palce but I need to add to the CASE to say if the StandardDuty value has no decimal places drop the % character as well without messing up the first statement that looks to the 1st 3 digits.
Thanks.
Did you try a replace() on the % character? Replace
CASE WHEN CustomsTariff.CommodityCode.StandardDuty = 'Free'
THEN '0.0' ELSE REPLACE(CustomsTariff.CommodityCode.StandardDuty, N'%', N'')
END AS DutyRate,