How to calculate the variance of a field in Hive? - hive

I want to get the variance value of a column from source table.
What is the function to do this in Hive? Like the udf function of "avg".
select udf_function(column) from sourcetable;
Many thanks.

There is the variance(col) function which returns the variance of a numeric column in the group. This page has some examples like: select variance(sal) from Tri100;

Related

Is there a way to change a data type of a temporary column?

I have a query where in I divide 2 columns and multiply it by 100 in a new column(DeathPercentage). But this new column data type is automatically set into bigint, is there a way that i could change it to decimal?
SELECT location, total_deaths, population.population, (total_deaths/population.population)*100 as DeathPercentage FROM covid_deaths
LEFT OUTER JOIN population
ON covid_deaths.location = population.country
WHERE covid_deaths.continent !='null'
Postgres has cast, 2 versions. In this case you can:
(cast(total_deaths as numeric)/population.population)*100
OR
(total_deaths::numeric/population.population)*100
The first using the SQL standard and the second a Postgres extension.
cast((total_deaths/population.population)*100) as numeric
I don't know if your language have this, but Java have something call casting. That might work for your language!

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

Extracting value from curly bracket SQL

I am trying figure out how to extract a value from curly brackets in a column in Prestosql.
The field looks like,
rates
{"B":750}
{"B":1600}
{"B":900}
I want to extract the number values only in each bracket.
Also, if I want to divide that by 10 and then divide by 20 would that be easy to add into the query?
The rates column is of type map(varchar, bigint).
Since rates column is of type map(varchar, bigint). You can use Presto Map Functions and Operators on it. Examples:
SELECT rates['B'] FROM ... -- value under key "B"
SELECT map_values(rates) FROM ... -- all values in a map
See more in the Presto documentation.
Use something like this, where the regexp_extract function pulls out the number from your string, and the cast function converts this from a string to a number, which you can then go on to divide by 10 etc.
select cast(regexp_extract(rates, '\d+') as double) / 10
from my_table

How to get maximum size used by a string in Hive?

I want to know the maximum length a particular string column is taking.
I tried taking the approached mentioned here :
how to get the max size used by a field in table but did not work in Hive
but that did not work in Hive.
In that example they use len, use length instead:
select max(length(mycolumn)) from mytable;
This works fine in hive QL.
multiple ways to check length of column in where clause as well:
select max(length(column_name)),min(length(column_name)) from table_name where length(column_name)<15
Here, checked column length with max and min values. Also use where clause if checking column length lesser than 15

how to return the value of a scalar function in db2

I have a db2 function returning an integer. As per my limited knowledge the only way to see this function working is using to return column in a query like the example below.
Is there a way to display a return value of a function given a parameter withoyt building up a more complex query?
Example
I have a function
myfoo(index integer) returns integer ...
And I am using it in a more complex quewry like
select myIndex, myfoo(myIndex), myValue from MyTable...
If I try to get the following
select from myfoo(3)
it will not work.
Is there any db2 function to print out the return value of that function without error?
SELECT myfoo(3) FROM SYSIBM.SYSDUMMY1
SYSIBM.SYSDUMMY1 is a special "dummy" table that contains a single row, the equivalent of Oracle's DUAL.
If you have the compatibility vector, you can even use Oracle's Dual table. http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.apdv.porting.doc/doc/r0052874.html
Also, you can use the 'values' sentence. For example,
values myfoo(myIndex)