SQL group by steps - sql

I'm using SQL in SAS.
I'm doing a SQL query with a GROUP BY clause on a continuous variable (made discrete), and I'd like it to aggregate more. I'm not sure this is clear, so here is an example.
Here is my query :
SELECT CEIL(travel_time) AS time_in_mn, MEAN(foo) AS mean_foo
FROM my_table
GROUP BY CEIL(travel_time)
This will give me the mean value of foo for each different value of travel_time. Thanks to the CEIL() function, it will group by minutes and not seconds (travel_time can take values such as 14.7 (minutes)). But I'd like to be able to group by groups of 5 minutes, for instance, so that I have something like that :
time_in_mn mean_foo
5 4.5
10 3.1
15 17.6
20 12
(Of course, the mean(foo) should be done over the whole interval, so for time_in_mn = 5, mean_foo should be the mean of foo where travel_time in (0,1,2,3,4,5) )
How can I achieve that ?
(Sorry if the answer can be found easily, the only search term I could think of is group by step, which gives me a lot of "step by step tutorials" about SQL...)

A common idiom of "ceiling to steps" (or rounding, or flooring, for that matter) is to divide by the step, ceil (or round, or floor, of course) and then multiply by it again. This way, if we take, for example, 12.4:
Divide: 12.4 / 5 = 2.48
Ceil: 2.48 becomes 3
Multiply: 3 * 5 = 15
And in SQL form:
SELECT 5 * CEIL(travel_time / 5.0) AS time_in_mn,
MEAN(foo) AS mean_foo
FROM my_table
GROUP BY 5 * CEIL(travel_time / 5.0)

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Calculated query with parameters in HANA/CrystalReports

I'm having trouble trying to explain my necessity, so I'll describe the scenario.
Scenario:
Product A has a maximum production of 125KG at a time.
The operator received a production order of 1027,5KG of product A.
The operator have to calculate how many rounds he'll have to
manufacture and adjust the components quantity for each round.
We want to create a report where this calculations are already done and what we believe would be the first step, based on the values of this scenario, is to return something like this:
ROUND QUANTITY(KG)
1 125
2 125
3 125
4 125
5 125
6 125
7 125
8 125
9 27,5
After that, the recalculation of the components could be done with simple operations.
The problem is that we couldn't think of a way to get the desired return and we also couldn't think of a different way of achieving the said report.
All we could do is get the integer part of the division
SELECT FLOOR(1027.5/125) AS "TEST" FROM DUMMY
and the remainder
SELECT MOD(1027.5,125) AS "TEST" FROM DUMMY
We are using:
SAP HANA SQL
Crystal Reports
SAP B1
Any help would be appreciated
Thanks in advance!
There are several ways to achieve want you described.
One way is to translate the requirement into a function that takes the two input parameter values and returns the table of production rounds.
This can look like this:
create or replace function production_rounds(
IN max_production_volume_per_round decimal (10, 2)
, IN production_order_volume decimal (10, 2)
)
returns table (
production_round integer
, production_volume decimal (10, 2))
as
begin
declare rounds_to_produce integer;
declare remainder_production_volume decimal (10, 2);
rounds_to_produce := floor( :production_order_volume / :max_production_volume_per_round);
remainder_production_volume := mod(:production_order_volume, :max_production_volume_per_round);
return
select /* generate rows for all "max" rounds */
s.element_number as production_round
, :max_production_volume_per_round as production_volume
from
series_generate_integer
(1, 1, :rounds_to_produce + 1) s
UNION ALL
select /* generate a row for the final row with the remainder */
:rounds_to_produce + 1 as production_round
, :remainder_production_volume as production_volume
from
dummy
where
:remainder_production_volume > 0.0;
end;
You can use this function just like any table - but with parameters:
select * from production_rounds (125 , 1027.5) ;
PRODUCTION_ROUND PRODUCTION_VOLUME
1 125
2 125
3 125
4 125
5 125
6 125
7 125
8 125
9 27.5
The bit that probably needs explanation is SERIES_GENERATE_INTEGER. This is a HANA-specific built-in function that returns a number of records from a "series". Series here is a sequence of periods within a min and max limit and with a certain step-size between two adjacend periods.
More on how this works can be found in the HANA reference documentation, but for now just say, this is the fastest way to generate a result set with X rows.
This series-generator is used to create all "full" production rounds.
For the second part of the UNION ALL then creates just a single row by selecting from the built-in table DUMMY (DUAL in Oracle) which is guaranteed to only have a single record.
Finally, this second part needs to be "disabled" if there actually is no remainder, which is done by the WHERE clause.

Finding sequence starting from a particular number in Big query

How can we achieve the same functionality as of 'SEQUENCE' in provided in Netezza?
Please find below the link demonstrating the functionality I would like to achieve in Big query :
[https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_create_sequence.html][1]
I have reviewed RANK() but this is not solving my purpose to the core. Any leads would be appreciated.
in BigQuery Standard SQL you can find two function that can help you here -
GENERATE_ARRAY(start_expression, end_expression\[, step_expression\])
and
GENERATE_DATE_ARRAY(start_date, end_date\[, INTERVAL INT64_expr date_part\])
For example, below code
#standardSQL
SELECT sequence
FROM UNNEST(GENERATE_ARRAY(1, 10, 1)) AS sequence
produces result as
sequence
1
2
3
4
5
6
7
8
9
10

More efficient query to count group where values in range

I am working with data that essentially looks like this.
table:processed_data
sensor_id, reading, time_stamp
1,0.1,1234567890
1,0.3,1234567891
1,0.9,1234567892
1,0.32,1234567893
...
what I want to do is make a query that can make one loop through the data and count how many readings are in each category. Simple example,
categories (0-0.5,0.5-0.7,0.7-1) (I am actually planning on breaking them into 10 categories with 0.1 increments though).
This is essentially what I want, even though it isn't valid sql:
select count(reading between 0 and 0.5), count(reading between 0.5 and 0.7), count(reading between 0.7 and 1) from processed_data;
The only way I can think of doing it though, is to do an O*N operation, rather than a 1 time loop.
select count(*) as low from processed_data where reading between 0 and 0.5
union
select count(*) as med from processed_data where reading between 0.5 and 0.7
union
select count(*) as high from processed_data where reading between 0.7 and 1;
I might just resort to doing the processing in php and scan the data once, but I would prefer to have sql do it, if it can be smart enough.
You can derive the category from the value, and use that for grouping:
SELECT CAST(reading * 10 AS INTEGER),
COUNT(*)
FROM processed_data
GROUP BY CAST(reading * 10 AS INTEGER);

SQL server generate number from 1 to nth

I have a table like;
**ID** **CASH** **INTERVAL**
1 60 5
2 10 3
3 20 4
I want to add 2 columns deriving from current ones like; Column MULT means I list numbers from 1 to INTERVAL by commas and for VAL value I substract CASH from 100 and divide it by INTERVAL and list those intervals by comma listed values inside column VAL
**ID** **CASH** **INTERVAL** **MULT** **VAL**
1 60 5 1,2,3,4,5 8,8,8,8,8
2 10 3 1,2,3 30,30,30
3 20 4 1,2,3,4 20,20,20,20
I know it looks like not an informative question but at least anyone know about to list them in single column with commas using STUFF or etc?
Given how you phrase the question and the sample data you provide, I would be tempted to use a very bespoke approach for this:
with params as (
select '1,2,3,4,5,6,7,8,9' as numbers,
'x,x,x,x,x,x,x,x,x' as vals
)
select l.*,
left(numbers, interval * 2 - 1) as mult,
replace(left(vals, interval * 2 - 1), 'x', (100 - cash) / interval) as val
from params cross join
[like] l;
Of course, you might need to extend the strings in the CTE, if they are not long enough (and this might affect the arithmetic).
The advantage to this approach is speed. It should be pretty fast.
Note: you can also use replicate() rather than the vals.