sum of differencies of rows - sql

I have example values in column like this:
values
-------
89
65
56
78
74
73
45
23
5
654
643
543
345
255
233
109
43
23
2
The values are rising up and then fall down to 0 and rising up again.
I need to count differencies between rows in new column and the sum of these differencies too (cumulative sum) for all values. The values 56 and 5 are new differencies from zero
The sum is 819.
Example from bottom> (23-2)+(43-23)+(109-43)+..+(654-643)+(5)+(23-5)+..

Okay, here is my try. However, you need to add an Identity field (which I called "AddSequence") that starts with 1 for the first value ("2") and increments by one for every other value.
SELECT SUM(C.Diff) FROM
(
SELECT CASE WHEN (A.[Value] - (SELECT [Value] FROM [TestValue] AS B WHERE B.[AddSequence]= A.[AddSequence]-1)) > 0
THEN (A.[Value] - (SELECT [Value] FROM [TestValue] AS D WHERE D.[AddSequence]= A.[AddSequence]-1))
ELSE 0
END AS Diff
FROM [TestValue] AS A
) AS C
The first solution I had neglected that fact that we had to start over whenever the difference was negative.

I think you are looking for something like:
SELECT SUM(a - b)) as sum_of_differences
FROM ...

I think you want this for the differences, I've tested it in sqlite
SELECT CASE WHEN (v.value - val) < 0 THEN 0 ELSE (v.value - val) END AS differences
FROM v,
(SELECT rowid, value AS val FROM v WHERE rowid > 1) as next_val
WHERE v.rowid = next_val.rowid - 1
as for the sums
SELECT SUM(differences) FROM
(
SELECT CASE WHEN (v.value - val) < 0 THEN 0 ELSE (v.value - val) END AS differences
FROM v,
(SELECT rowid, value AS val FROM v WHERE rowid > 1) AS next_val
WHERE v.rowid = next_val.rowid - 1
)

EDITED - BASED OFF OF YOUR QUESTION EDIT (T-SQL)
I don't know how you can do this without adding an Id.
If you ad an Id - this gives the exact output you had posted before your edit. There's probably a better way, but this is quick and dirty - for a one time shot. Using a SELF JOIN. Differences was the name of your new column originally.
UPDATE A
SET differences = CASE WHEN A.[values] > B.[Values] THEN A.[values] - B.[Values]
ELSE A.[values] END
FROM SO_TTABLE A
JOIN SO_TTABLE B ON A.ID = (B.ID - 1)
OUTPUT
Select [Values], differences FROM SO_TTABLE
[values] differences
------------------------
89 24
65 9
56 56
78 4
74 1
73 28
45 22
23 18
5 5
654 11
643 100
543 198
345 90
255 22
233 124
109 66
43 20
23 21
2 0

Related

How to: For each unique id, for each unique version, grab the best score and organize it into a table

Just wanted to preface this by saying while I do have a basic understanding, I am still fairly new to using Bigquery tables and sql statements in general.
I am trying to make a new view out of a query that grabs all of the best test scores for each version by each employee:
select emp_id,version,max(score) as score from `project.dataset.table` where type = 'assessment_test' group by version,emp_id order by emp_id
I'd like to take the results of that query, and make a new table comprised of employee id's with a column for each versions best score for that rows emp_id. I know that I can manually make a table for each version by including a "where version = a", "where version = b", etc.... and then joining all of the tables at the end but that doesn't seem like the most elegant solution plus there is about 20 different versions in total.
Is there a way to programmatically create a column for each unique version or at the very least use my initial query as maybe a subquery and just reference it, something like this:
with a as (
select id,version,max(score) as score
from `project.dataset.table`
where type = 'assessment_test' and version is not null and score is not null and id is not null
group by version,id
order by id),
version_a as (select score from a where version = 'version_a')
version_b as (select score from a where version = 'version_b')
version_c as (select score from a where version = 'version_c')
select
a.id as id,
version_a.score as version_a,
version_b.score as version_b,
version_c.score as version_c
from
a,
version_a,
version_b,
version_c
Example Picture: left table is example data, right table is expected output
Example Data:
id
version
score
1
a
88
1
b
93
1
c
92
2
a
89
2
b
99
2
c
78
3
a
95
3
b
83
3
c
89
4
a
90
4
b
90
4
c
86
5
a
82
5
b
78
5
c
98
1
a
79
1
b
97
1
c
77
2
a
100
2
b
96
2
c
85
3
a
83
3
b
87
3
c
96
4
a
84
4
b
80
4
c
77
5
a
95
5
b
77
Expected Output:
id
a score
b score
c score
1
88
97
92
2
100
99
85
3
95
87
96
4
90
90
86
5
95
78
98
Thanks in advance and feel free to ask any clarifying questions
Use below approach
select * from your_table
pivot (max(score) score for version in ('a', 'b', 'c'))
if applied to sample data in your question - output is
In case if versions is not known in advance - use below
execute immediate (select '''
select * from your_table
pivot (max(score) score for version in (''' || string_agg(distinct "'" || version || "'") || "))"
from your_table
)

Postgres calculate average using distinct IDs‚ values also distinct

I have a postgres query that is supposed to calculate an average value based on a set of values. This set of values should be based on DISTINCT ID's.
The query is the following:
#{context.answers_base}
SELECT
stores.name as store_name,
answers_base.question_name as question_name,
answers_base.question_id as question_id,
(sum(answers_base.answer_value) / NULLIF(count(answers_base.answer_id),0)) as score, # <--- this line is calculating wrong
sum(answers_base.answer_value) as score_sum,
count(answers_base.answer_id) as question_answer_count,
count(DISTINCT answers_base.answer_id) as answer_count
FROM answers_base
INNER JOIN stores ON stores.id = answers_base.store_id
WHERE answers_base.answer_value IS NOT NULL AND answers_base.question_type_id = :question_type_id
AND answers_base.scale = TRUE
#{context.filter_answers}
GROUP BY stores.name, answers_base.question_name, answers_base.question_id, answers_base.sort_order
ORDER BY stores.name, answers_base.sort_order
The thing is, that on the indicated line (sum(answers_base.answer_value) / NULLIF(count(answers_base.answer_id),0)) some values are counted more than once.
Part of the solution is making it DISTINCT based on ID, like so:
(sum(answers_base.answer_value) / NULLIF(count(DISTINCT answers_base.answer_id),0))
This will result in an average that divided by the right number, but here the sum it's dividing is still wrong.
Doing the following (make sum() DISTINCT) does not work, for the reason that values are not unique. The values are either 0 / 25 / 50 / 75 / 100, so different IDs might contain 'same' values.
(sum(DISTINCT answers_base.answer_value) / NULLIF(count(DISTINCT answers_base.answer_id),0))
How would I go about making this work?
Here are simplified versions of the table structures.
Table Answer
ID
answer_date
1
Feb 01, 2022
2
Mar 02, 2022
3
Mar 13, 2022
4
Mar 21, 2022
Table AnswerRow
ID
answer_id
answer_value
1
1
25
2
1
50
3
1
50
4
2
75
5
2
100
6
2
0
7
3
25
8
4
25
9
4
100
10
4
50
Answer 1' answer_rows:
25 + 50 + 50 -> average = 125 / 3
Answer 2' answer_rows:
75 + 100 + 0 -> average = 175 / 3
Answer 3' answer_rows:
25 -> average = 25 / 1
Answer 4' answer_rows:
25 + 100 + 50 -> average = 175 / 3
For some reason, we get duplicate answer_rows in the calculation.
Example of the problem; for answer_id=1 we have the following answer_rows in the calculation, giving us a different average:
ID
answer_id
answer_value
1
1
25
2
1
50
3
1
50
3
1
50
3
1
50
3
1
50
Result: 25 + 50 + 50 + 50 + 50 + 50 -> 275 / 6
Desired result: 25 + 50 + 50 -> 125 / 3
Making answer_row_id distinct (see beginning of post) makes it possible for me to get:
25 + 50 + 50 + **50 + 50 + 50** -> 275 / **3**
But not
25 + 50 + 50 -> 275 / 3
What I would like to achieve is having a calculation that selects answer_row distinctly based on its ID, and those answer_rows will be used both for calculation x and y in calculation average -> x / y.
answers_base is the following (simplified):
WITH answers_base as (
SELECT
answers.id as answer_id,
answers.store_id as store_id,
answer_rows.id as answer_row_id,
question_options.answer_value as answer_value
FROM answers
INNER JOIN answer_rows ON answers.id = answer_rows.answer_id
INNER JOIN stores ON stores.id = answers.store_id
WHERE answers.status = 0
)
I think this would be best solved with a window function. Something along the lines of
SELECT
ROW_NUMBER() OVER (PARTITION BY answer_rows.id ORDER BY answer_rows.created_at DESC) AS duplicate_answers
...
WHERE
answer_rows.duplicate_answers = 1
This would filter out multiple rows with the same id, and only keep one entry. (I chose the "first by created_at", but you could change this to whatever logic suits you best.)
A benefit to this approach is that it makes the rationale behind the logic clear, contained and re-usable.

SQL: Subtracting certain rows with restrictions from a data table into a new table

I Have a data table in postgresql which has these columns and some rows like this:
st
epochnum
satnum
l1
l2
c1
p1
p2
1
1
1
10
11
12
13
14
1
1
2
15
16
17
18
19
1
2
1
20
21
22
23
24
1
2
2
25
26
27
28
29
20
1
1
30
41
52
63
74
20
1
2
75
76
87
88
null
20
2
1
...
I want to get some pairs of rows that have the same value for epochnum and satnum but have different value in "st". By the way, I have a list that specifies which "st" pairs should be subtracted. Its just another table that looks like this:
st1
st2
1
20
The rows in the first table have to be subtracted in l1,l2,c1,p1 and p2 with same epochnum and satnum according to this table and then inserted into a new table like this:
epochnum
st1
st2
satnum
dl1
dl2
dc1
dp1
dp2
1
1
20
1
20
30
40
50
60
1
1
20
2
65
65
75
75
null
...
The actual data has more than 400000 rows that has same epochnums and satnums like this. I have tried java programming in net-beans and used loops to simply get queries for each row and make the new table.
But I think maybe it is not efficient and unnecessarily takes long time due to the lots of queries that has to be done in java.
I wonder if there is a way that this can be done using just a few queries, or creating extra tables and .... I haven't come up with the best solution yet.
Are you looking for joins like this?
select t1.st, t1.epochnum, t1.satnum,
(t2.l1 - t1.l1),
(t2.l2 - t1.l2),
(t2.p1 - t1.p1),
(t2.p2 - t1.p2)
from t t1 join
t t2
on t1.epochnum = t2.epochnum and
t1.satnum = t2.satnum join
pairs p
on t1.st = p.st1 and t2.st = p.st2

Find the largest value from column using SQL?

I am using SQL where I have column having values like
A B
X1 2 4 6 8 10
X2 2 33 44 56 78 98 675 891 11111
X3 2 4 672 234 2343 56331
X4 51 123 232 12 12333
I want a query to get the value from col B with col A which has max count of values. I.e output should be
x2 2 33 44 56 78 98 675 891 11111
Query I use:
select max(B) from table
Results in
51 123 232 12 12333
Assuming that both columns are strings, and that column B uses single space for separators and no leading/trailing spaces, you can use this approach:
SELECT A, B
FROM MyTable
ORDER BY DESC LENGTH(B)-LENGTH(REPLACE(B, ' ', ''))
FETCH FIRST 1 ROW ONLY
The heart of this solution is LENGTH(B)-LENGTH(REPLACE(B, ' ', '')) expression, which counts the number of spaces in the string B.
Note: FETCH FIRST N ROWS ONLY is Oracle-12c syntax. For earlier versions use ROWNUM approach described in this answer.
In case there is more than one separating space or more then one row meets criteria do this: count number of spaces (or groups of spaces) in each row using regexp_count(). Use rank to find most (groups of) spaces. Take only rows ranked as 1:
demo
select *
from (select t.*, rank() over (order by regexp_count(b, ' +') desc) rnk from t)
where rnk = 1

Combining Rows and Taking Averages

I have a table that looks like
#Sector max1 avg1 max2 avg2 numb
C 133 14 45 3 27
N 174 9 77 3 18
M 63 3 28 1 16
I would like to join rows N and M together call it X and take the max value of max1 and max2 while taking the avg of avg1, avg2, and numb in their respective columns to return
#Sector max1 avg1 max2 avg2 numb
C 133 14 45 3 27
X 174 6 77 2 17
Try this way:
select sector, max1,avg1,max2,avg2,numb
from tab
where sector not in ('M','N')
union all
select 'X' as sector, max(max1),avg(avg1),max(max2),avg(avg2),avg(numb)
from tab
where sector in ('M','N')
something like:
select
case when sector in ('N','M') then 'X' else sector end sect,
max(max1) max1,
avg(avg1) avg1,
max(max2) max2,
avg(avg2) avg2,
avg(numb) numb
from tabname
group by
case when sector in ('N','M') then 'X' else sector end