wrong calculation of power query - 101 - sum

my data in the table is
2.8202148
1.810577904
4.399182566
78.56037454
4.62585733
3.905997503
3.877795355
normal sum gives the result as 99.9999999954482
but in pivot table (power query) it gives 101 ! somehow...
any suggestions ?
Thanks,

If I round those numbers to their nearest integer, and then sum then, I get 101. You've probably set up something to use integers instead of floating point numbers. Change to to floating point and you should be fine.

Related

SQL round off issue

Consider these values which are of type MONEY (sample values and these can change)
select 4796.529 + 1585.0414 + 350.9863 + 223.3549 + 127.6314+479.6529 + 158.5041
for some reason I need to round each value to a scale of 3 like this
select round(4796.529,3)+ round(1585.0414,3)+ round(350.9863,3)+ round(223.3549,3)+ round(127.6314,3)+ round(479.6529,3)+ round(158.5041,3)
but when I take the sum they shows a very minor variation. first line of code returns 7721.7000. and the second one 7721.6990. But this variation in not acceptable. What is the best way to solve this ?
As Whencesoever said, your problem is mathmatical one, not a programming error.
12.5 + 11.6 = 24.1
ROUND(12.5) + ROUND(11.6) = 25
ROUND(12.5 + 11.6) = 24
I'd talk with the business and figure out where they want the rounding applied.
Also, as a side note, MONEY is a terrible datatype. If you can, you may want to consider switching to a DECIMAL. See Should you choose the MONEY or DECIMAL(x,y) datatypes in SQL Server?
When you round numbers before you sum them you will get a different result than if you round numbers after you have summed them. Simple as that. There is no way to solve this.

I'm doing something wrong with calculating the median in Hive

My Hive table currently looks like this:
Numbers
0
0
-0.12745098
-0.218905473
0.026011561
0.235294118
-0.028
-0.052356021
0.052753355
0.008032129
0.012768817
0.115384615
0.040816327
The type is DOUBLE_TYPE. I would like to calculate the median. I would expect the answer to be 0.008032129, since this is the 7th observation ordering my numbers.
When I run this code (as suggested here How to calculate median in Hive):
select percentile_approx(Numbers, 0.5) AS Numbers
from tryout1
The answer I get is : 0.0040160642570281121. This is unexpected, and not even one of the numbers in my list! Does anyone know why Hive gives me this number, and what I should fix to make it work? If you know an entirely different way to calculate the median, I am also very interested!
Indeed the function percentile_approx in hive is not performing well.
Kudos to Liza for getting an approx answer:
FROM MY TRIALs:
select percentile_approx(numbers , 0.5 , 10 ) as A_mdn from tryout1 ;
-0.007249852187499999
FROM LIZA:
select (percentile(cast((numbers*1000000) as BIGINT), 0.5))/1000000 as A_mdn from tryout1;
0.008032
You can use the percentile function to compute the median and Try to cast the complete column into int or BIGINT and see if you come close to the answer. Try this:
select percentile(cast(g_rek_brutowinst as BIGINT), 0.5) AS g_rek_brutowinst from tryout1

SQL Long Decimal Value Comparison

So I have two identical values that are results of sum functions with the exact same length (no rounding is being done). (Update, data type is float)
Value_1 = 29.9539194336501
Value_2 = 29.9539194336501
The issue I'm having is when I do an IF statement for Value_1 = Value_2, it comes up as FALSE.
Value_1:
SELECT SUM([INVN_DOL])/SUM([AVG_DLY_SLS_LST_35_DYS]) end as DSO
FROM TABLE A
Value_2:
SELECT SUM ([Total_Inventory_Val]) / SUM ([Daily_Independent_Demand])
FROM TABLE B
Any idea why they may not be exactly equal and what I can do to get a TRUE value since they do match?
Thanks in advance
The issue you are having here is that your are using a calculated value that is held within a float, which will by design be slightly imprecise at higher levels of precision, which is why you are getting your mismatch.
Use data types like decimal with a defined precision and scale to hold your values and calculation results and you should get consistent results.
You can make use ROUND to limit the decimal points
Or
Try the ABS and see if that works out.

SQL - View column that calculates the percentage from other columns

I have a query from Access where I caluclated the percentage score of three seperate numbers Ex:
AFPercentageMajor: [AFNumberOfMajors]/([AFTotalMajor]-[AFMajorNA])
which could have values of 20/(23-2) = 95%
I have imported this table into my SQL database and tried to write a expression in the view (changed the names of the columns a bit)
AF_Major / (AF_Major_Totals - AF_Major_NA)
I tried adding *100 to the end of the statement but it only works if the calculation is at 100%. If it is anything less than that it puts it as a 0.
I have a feeling it just doesn't like the combincation of the three seperate column names. But like I said I'm still learning so I could be going at this completely wrong!
SQL Server does integer division. You need to change one of the values to a floating point representation. The following will work:
cast([AFNumberOfMajors] as float)/([AFTotalMajor]-[AFMajorNA])
You can multiply this by 100 to get the percentage value.

How to average values based on location proximity

I have an SQL table with geo-tagged values (Longitude, Latitude, value). The table is accumulated quickly and has thousands entries. Therefore, querying the table for values in some area return very large data-set.
I would like to know the way to average value with close location proximity to one value, here is an illustration:
Table:
Long lat value
10.123001 53.567001 10
10.123002 53.567002 12
10.123003 53.567003 18
10.124003 53.568003 13
lets say my current location is 10.123004, 53.567004. If I am querying for the values near by I will get the four raws with values 10, 12, 18, and 13. This works if the data-set is relatively small. If the data is large I would like to query sql for rounded location (10.123, 53.567) and need sql to return something like
Long lat value
10.123 53.567 10 (this is the average of 10, 12, and 18)
10.124 53.568 13
Is this possible? how we can average large data set based on locations?
Is sql database is the right choice in the first place?
GROUP BY rounded columns, and the AVG aggregate function should work fine for this:
SELECT ROUND(Long, 3) Long,
ROUND(Lat, 3) Lat,
AVG(value)
FROM Table
GROUP BY ROUND(Long, 3), ROUND(Lat, 3)
Add a WHERE clause to filter as needed.
Here's some rough pseudocode that might be a start. You need to provide the proper precision arguments for the round function in the dialect of SQL you are using for your project, so understand that the 3 I provide as the second argument to Round is the number of decimals of precision to which the number is rounded, as indicated by your original post.
Select round(lat,3),round(long,3),avg(value)
Group by round(lat,3),round(long,3)
The problem with the rounding approach is the boundary conditions -- what happens when points are close to the bounday.
However, for the neighborhood of a given point it is better to use something like:
select *
from table
where long between #MyLong - #DeltaLong and #MyLong + #DeltaLong and
lat between #MyLat - #DeltaLat and #MyLat + #DeltaLat
For this, you need to define #DeltaLong and #DeltaLat.
Rounding works fine for summarization, if that is your problem.