SSRS - column calculations - sql

In SQL Server Reporting Services within Visual Studio, I created a report which has a detail and a total line. I try to subtract the value in the total line from the value in the detail line and I get a result of Zero which is incorrect. See example below :
Col A Col B
Detail 4.7 4.7 – 4.0
lines 3.7 3.7 – 4.0
3.5 3.5 – 4.0
Total/AVG 4.0
In Column B, I take the figure from detail line in col A and subtract the Total line from it and get zero instead of 0.7 etc....

You need to include the scope for the calculating the average within your detail row. If you are doing this at the group level, aggregate on the table's group:
=Fields!MyField.Value - AVG(Fields!MyField.Value, "table1_Group1")
If it is at the dataset level, you can do the same with the dataset:
=Fields!MyField.Value - AVG(Fields!MyField.Value, "MyDataset")

Related

How to compare a value against a column value containing csv in Postgres?

I have a table called device_info that looks like below (only a sample provided)
device_ip
cpu
memory
100.33.1.0
10.0
29.33
110.35.58.2
3.0, 2.0
20.47
220.17.58.3
4.0, 3.0
23.17
30.13.18.8
-1
26.47
70.65.18.10
-1
20.47
10.25.98.11
5.0, 7.0
19.88
12.15.38.10
7.0
22.45
Now I need to compare a number say 3 against the cpu column values and get the rows that are greater than that. Since the cpu column values are stored as a csv, I am not sure how to do the comparison.
I found there is a concept called string_to_array in Postgres which converts csv to array and accordingly tried the below query that didn't work out
select device_ip, cpu, memory
from device_info
where 3 > any(string_to_array(cpu, ',')::float[]);
What am I doing wrong?
Expected output
device_ip
cpu
memory
100.33.1.0
10.0
29.33
220.17.58.3
4.0, 3.0
23.17
10.25.98.11
5.0, 7.0
19.88
12.15.38.10
7.0
22.45
The statement as-is is saying "3 is greater than my array value". What I think you want is "3 is less than my array value".
Switch > to <.
select device_ip, cpu
from device_info
where 3 < any(string_to_array(cpu, ',')::float[]);
View on DB Fiddle

Pandas count rows before/after after current row

I need to calculate some measures on a window of my dataframe, with the value of interest in the centre of the window. To be more clear I use an example: if I have a dataset of 10 rows and a window size of 2, when I am in the 5th row I need to compute for example the mean of the values in 3rd, 4th, 5th, 6th and 7th row. When I am in the first row, I will not have the previous rows so I need to use only the following ones (so in the example, to compute the mean of 1st, 2nd and 3rd rows); if there are some rows but not enough, I need to use all the rows that are present (so fpr example if I am in the 2nd row, I will use 1st, 2nd, 3rd and 4th). How can I do that? As the title of my question suggest, the first idea I had was to count the number of rows preceding and following the current one, but I don't know how to do that. I am not forced to use this method, so if you have any suggestions on a better method feel free to share it.
What you want is a rolling mean with min_periods=1, center=True:
df = pd.DataFrame({'col': range(10)})
N = 2 # numbers of rows before/after to include
df['rolling_mean'] = df['col'].rolling(2*N+1, min_periods=1, center=True).mean()
output:
col rolling_mean
0 0 1.0
1 1 1.5
2 2 2.0
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 7.5
9 9 8.0
I assume that you have the target_row and window_size numbers as an input. You are trying to do an operation on a window_size of rows around the target_row in a dataframe df, and I gather from your question that you already know that you can't just grab +/- the window size, because it might exceed the size of the dataframe. Instead, just quickly define the resulting start and end rows based on the dataframe size, and then pull out the window you want:
start_row = max(target_row - window_size, 0)
end_row = min(target_row + window_size, len(df)-1)
window = df.iloc[start_row:end_row+1,:]
Then you can perform whatever operation you want on the window such as taking an average with window.mean().

SQL Compound Growth Calculation based on previous rows (varying rate)

Given a column for 'Growth Factors' and a starting value I need to compute future values. For example, if a starting value of 1 is provided then the computed 'Value' column would be as shown below. Thus, Value(t2) = Value(t1) x Growth_Factor(t2). Base condition is Value(t1) = Starting_Value x Growth_Factor(t1). Example shown below.
How do I compute this in SQL (or Presto) where the computed value is dependent on previous computed values?
Growth Factor
Value
Time
1.2
1.2
1
1.1
1.32
2
1.5
1.98
3
1.7
3.366
4
You could sum the logarithms and invert when finished. This will work other than some possibility of small floating point error. But you're also going to introduce error once you multiply more than a few numbers with doubling decimal places at every iteration.
exp(
sum(ln(growth)) over (order by time)
)

Finding significant values from a series

I have a series with index and the count can be 0 to 1000.
I can select all the entries where the value is greater than 3
But after looking at the data, I decide to select all the entries where the value is more than 10 because some values are significantly higher than others!
s[s > 3].dropna()
-PB-[variable][variable] 8.0
-[variable] 15.0
-[variable][variable] 6.0
A-[variable][variable] 5.0
B 5.0
B-[variable][variable] 5.0
Book 4.0
Bus 8.0
Date 5.0
Dear 1609.0
MR 4.0
Man[variable] 4.0
Number[variable] 5.0
PM[variable] 4.0
Pickup 12.0
Pump[variable] 5.0
RJ 9.0
RJ-[variable]-PB-[variable][variable] 6.0
Time[variable] 6.0
[variable] 103.0
[variable][variable] 15.0
I have refined my query to something like this...
s[s > 10].dropna()
-[variable] 15.0
Dear 1609.0
Pickup 12.0
[variable] 103.0
[variable][variable] 15.0
Is there any function in pandas to return the significant entries. I can sort in descending order and select the first 5 or 10, but there is no guarantee that those entries will be very high compared to average. In that case I will prefer to select all entries.
In other words, I have decided the threshold of 10 in this case after looking at the data. Is there any method to select that value programmatically?
Selecting a threshold value with the quntile method might be a better solution, but still not the exact answer.
You can use .head function to select default top 5 row and .sort_values to sort in that dataframe. If you want to select top 10 then pass 10 in head function.
Simply call:
s[s['column_name'] > 10].sort_values(kind='quicksort', by='column_name_to_sort', ascending=False).head(10)

Assign titles to a column according to percentiles in SQL

Trying to solve Python problems into SQL code.
I would like to assign titles according to the grade in a new column.
For example:
A for the top 0.9% of the column
B for next 15% of the column
C for next 25% of the column
D for next 30% of the column
E for next 13% of the column
F for rest of the column
There is this column:
Grades
2.3
3
2
3.3
3.5
3.6
3.2
2.1
2.3
3.7
3.3
3.1
4.4
4.3
1.4
4.5
3.5
I don't know how sqlite can work with this since it doesn't have a function like quantile that languages like R have.
Something that tried but not even close is this :
SELECT x
FROM MyTable
ORDER BY x
LIMIT 1
OFFSET (SELECT COUNT(*)
FROM MyTable) / 2
to get at half of the column.