avg of multiple columns

avg of multiple columns - sql

I would like to find the avg of multiple columns instead of rows.
At present, I transposed the table but that's impacting the performance as my table is very big and by transposing 30 columns the number of rows increased * 29 times.
colum1 measure1 measure2 measure3 avg
abc 100 200 300 200
def 50 60 70 60
I am not going to use all the 30 columns at a time for average and it depends on my parameters in the front end.
I would like to know if there any other solutions to achieve the desired result other than transpose.
In Redshift, I am doing a union of table 29 times to transpose columns to rows.
Your advises would be highly appreciated.
Thanks,
mc

Try something like this (Oracle query):
WITH input_data AS (
SELECT 100 AS measure1, 200 AS measure2 FROM DUAL
UNION ALL
SELECT 1000 AS measure1, 2000 AS measure2 FROM DUAL
)
SELECT (a.measure1 + a.measure2) / 2 AS measure_avg FROM input_data a
Output:
MEASURE_AVG
150
1500

Related

Rolling n-day aggregation conditional on another column

struggling to figure out how to implement a code that will allow me to calculate following (using SQL in BigQuery) in an elegant way.
I'd need to calculate a rolling n-day aggregation (let's assume rolling 3-day sum of units ) for each date but only taking into account data that where the price is less than a certain value (let's assume 50).
So based on below table
date
price
units
01-21
30
200
01-22
100
500
01-23
20
200
01-24
20
100
01-25
80
100
01-26
40
250
I'd need my query to return:
date
units
01-21
200
01-22
200
01-23
400
01-24
300
01-25
300
01-26
350
Struggling to figure out how to combine window calculations with the additional conditions.
Thanks in advance!

Consider below approach
select date, sum(if(price < 50, units, 0)) over win units
from your_table
window win as (order by unix_date(date) range between 2 preceding and current row)
if applied to sample data as in your question -
with your_table as (
select date '2022-01-21' date, 30 price, 200 units union all
select '2022-01-22', 100, 500 union all
select '2022-01-23', 20, 200 union all
select '2022-01-24', 20, 100 union all
select '2022-01-25', 80, 100 union all
select '2022-01-26', 40, 250
)
the output is

SQL update statement to sum column in one table, then add the total to a different column/table

Evening all, hoping for some pointers with an SQL Server query if possible.
I have two tables in a database, example as follows:
PostedTran
PostedTranID AccountID PeriodID Value TransactionDate
1 100 120 100 2019-01-01
2 100 120 200 2020-01-01
3 100 130 300 2021-01-01
4 101 120 400 2020-01-01
5 101 130 500 2021-01-01
PeriodValue
PeriodValueID AccountID PeriodID ActualValue
10 100 120 500
11 101 120 600
I have a mismatch in the two tables, and I'm failing miserably in my attempts. From the PostedTran table, I'm trying to select all transaction lines dated before 2021-01-01, then sum the Value for each AccountID from the results. I then need to add that value to the existing ActualValue in the PeriodValue table.
So, in the above example, the ActualValue on PeriodValueID 10 will update to 800, and 11 to 1000. The PeriodID in this example is constant and will always be 120.
Thanks in advance for any help.

Since RDMS not mentioned, pseudo-sql looks like:
with DataSum as
(
select AccountID, PeriodID, sum(Value) as TotalValue
from PostedTran
where TransactionDate<'1/1/2021'
group by AccountID, PeriodID
)
update PeriodValue set ActualValue = ActualValue + ds.TotalVaue
from PeriodValue pv inner join DataSum ds
on pv.accountid=ds.accountid and pv.periodid=ds.periodid

The following should do what you ask. I haven't included PeriodId in the correlation as you did not specify it in your description, however you can just include it if it's required.
update pv set pv.ActualValue=pv.ActualValue + t.Value
from PeriodValue pv
cross apply (
select Sum(value) value
from PostedTran pt
where pt.AccountId=pv.AccountId and pt.TransactionDate <'20210101'
)t

SQL DB2 Toad - Sum from two tables by ID

I was hoping to find the sum from two tables with columns ID and Amount, grouping by ID.
My first attempt was to UNION the two tables first and then conduct a sum and group by, but I was hoping to know of a better way.
Inputs:
Table 1
ID Amount
123 100
123 100
145 500
167 600
Table 2
ID Amount
123 100
123 100
145 500
199 600
Output
ID Amount
123 400
145 1000
167 600
199 600

You can do:
select id, sum(amount) as amount
from (
select id, amount from table_1
union all
select id, amount from table_2
) x
group by id

Window functions: PARTITION BY one column after ORDER BY another

Disclaimer: The shown problem is much more general than I expected first. The example below is taken from a solution to another question. But now I was taking this sample for solving many problems more - mostly related to time series (have a look at the "Linked" section in the right bar).
So I am trying to explain the problem more generally first:
I am using PostgreSQL but I am sure this problem exists in other window function supporting DBMS' (MS SQL Server, Oracle, ...) as well.
Window functions can be used to group certain values together by a common attribute or value. For example you can group rows by a date. Then you are able to calculate the max value within every single date or an average value or counting rows or whatever.
This can be achieved by defining a PARTITION. Grouping by dates would work with PARTITION BY date_column. Now you want to do an operation which needs a special order within your groups (calculating row numbers or sum up a column). This can be done with PARTITON BY date_column ORDER BY an_attribute_column.
Now think about a finer resolution of time series. What if you do not have dates but timestamps. Then you cannot group by the time column anymore. But nevertheless it might be important to analyse the data in the order they were added (maybe the timestamp is the creating time of your data set). Then you realize that some consecutive rows have the same value and you want to group your data by this common value. But the clue is that the rows have different timestamps.
The problem here is that you cannot do a PARTITION BY value_column. Because PARTITION BY forces an ordering first. So your table would be ordered by the value_column before the grouping and is not ordered by the timestamp anymore. This yields in results you are not expecting.
More general speaking: The problem is to ensure a special ordering even if the ordered column is not part of the created partition.
Example:
db<>fiddle
I have the following table:
ts val
100000 50
130100 30050
160100 60050
190200 100
220200 30100
250200 30100
300000 300
500000 100
550000 1000
600000 1000
650000 2000
700000 2000
720000 2000
750000 300
I had the problem that I had to group all tied values of the column val. But I wanted to hold the order by ts. To achieve this I wanted to add a column with a unique ID per val group
Expected result:
ts val group
100000 50 1
130100 30050 2
160100 60050 3
190200 100 4
220200 30100 5 \ same group
250200 30100 5 /
300000 300 6
500000 100 7
550000 1000 8 \ same group
600000 1000 8 /
650000 2000 9 \
700000 2000 9 | same group
720000 2000 9 /
750000 300 10
First try was the use of the rank window function which would do this job normally:
SELECT
*,
rank() OVER (PARTITION BY val ORDER BY ts)
FROM
test
But in this case this doesn't work because the PARTITION BY clause orders the table first by its partition columns (val in this case) and then by its ORDER BY columns. So the order is by val, ts instead of the expected order by ts. So the result was not the expected one of course.
ts val rank
100000 50 1
190200 100 1
500000 100 2
300000 300 1
750000 300 2
550000 1000 1
600000 1000 2
650000 2000 1
700000 2000 2
720000 2000 3
130100 30050 1
220200 30100 1
250200 30100 2
160100 60050 1
The question is: How to get the group ids with respect to the order by ts?
Edit: I added an own solution below but I feel very uncomfortable with it. It seems way too complicated. I was wondering if there's a better way to achieve this result.

I came up with this solution by myself (hoping someone else will get a better one):
demo:db<>fiddle
order by ts
give out the next val value with the lag window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
check if the next and the current values are the same. Then I can print out a 0 or a 1
sum up these values with an ordered SUM. This generates the groups I am looking for. They group the val column but ensure the ordering by the ts column.
The query:
SELECT
*,
SUM(is_diff) OVER (ORDER BY ts)
FROM (
SELECT
*,
CASE WHEN val = lag(val) over (order by ts) THEN 0 ELSE 1 END as is_diff
FROM test
)s
The result:
ts val is_diff sum
100000 50 1 1
130100 30050 1 2
160100 60050 1 3
190200 100 1 4
220200 30100 1 5 \ group
250200 30100 0 5 /
300000 300 1 6
500000 100 1 7
550000 1000 1 8 \ group
600000 1000 0 8 /
650000 2000 1 9 \
700000 2000 0 9 | group
720000 2000 0 9 /
750000 300 1 10

Eliminate duplicate entries in sql

I want to truncate duplicate rows but Qty should be added.
I have a table filled with data,
Item Qty MinQty MaxQty
ABC 10 20 50
XYZ 12 30 40
ABC 15 20 50
I want the result like,
Item Qty MinQty MaxQty
ABC 25 20 50
XYZ 12 30 40
Kindly help me to write the query for the same...

SELECT Item, SUM(Qty), MIN(MinQty), MAX(MaxQty)
FROM tablename
GROUP BY ITem;

The answer above is right, but you would also want to give the derived columns names:
SELECT Item, SUM(Qty) as Qty, MIN(MinQty) as MinQty, MAX(MaxQty) as MaxQty
FROM tablename
GROUP BY ITem;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas