NormDist & Square Root calculation in SQL - sql

Please check the below screenshot for data classification.
Just wondering is there any easy way to calculate NORMDIST, SQRT & PRODUCT using SQL commands just as like as I did below to calculate the AVG & VAR.
Please note the below code I was using for another purpose and doesn't suit the screenshot values.
SELECT state, AVG([v2t] * 1.0) FROM portability WHERE state = 'Queensland' GROUP BY state
SELECT state, STDEV([v2t] * 1.0) FROM portability WHERE state = 'Queensland' GROUP BY state

Assuming Oracle:
In Oracle/PLSQL, the sqrt function:
sqrt( n )
For Example
sqrt(9) would return 3
sqrt(37) would return 6.08276253029822
sqrt(5.617) would return 2.37002109695251
For NormDist, try CUM_DIST. The aggregate functions are documented in
http://docs.oracle.com/cd/E11882_01/server.112/e10592/functions003.htm#SQLRF20035
I do not believe Oracle has a function for product.

Related

Query N closest database entries using active record query interface

In my Rails project I have a table called "items", where each item has a latitude and longitude property.
When I retrieve the items for my client, I fetch the current user's longitude and latitude. Using these two sets of coordinates, I want to use a haversine function and fetch 100 results with the smallest haversine distance.
The SQL should look something like this:
SELECT
*,
haversine_distance(user_lat, user_long, item_lat, item_long) as distance
FROM
db.items
WHERE
item_lat < {user_lat} +1
AND item_lat > {user_lat}+1
AND ..
AND ..
ORDER BY distance DESC
LIMIT 100
but this would necessitate using a custom function in the query, which I'm not sure how to do.
Also, regarding lines 7 and 8 of the query, I'm attempting to a where clause to quickly filter the set of results within some reasonable distance so I don't have to apply the haversine to the entire table.
Right now, I have the function defined in a singleton called DistanceService, but I assume I cannot use Ruby functions in the query.
When I tried to add the function haversine_distance to the controller and call it like:
#items = Item.select("items.*, haversine_distance(geo_long, geo_lat,
#{#current_user.geo_long}, #{#current_user.geo_lat}) AS distance")
.where(status: 'available', ....)
I was met with
FUNCTION ****_api_development.haversine_distance does not exist. This leads me to think the function should be somehow defined in SQL first via some migration, but I'm not very familiar with SQL and don't know how I would do that with a migration.
I'm also wondering if there's a solution that would support pagination, as it will hopefully eventually become a requirement for my project.
Thanks.
In Rails you define database functions through migrations:
class AddHaversineDistanceFunction < ActiveRecord::Migration
def up
execute <<~SQL
DELIMITER $$
DROP FUNCTION IF EXISTS haversine_distance$$
CREATE FUNCTION haversine_distance(
lat1 FLOAT, lon1 FLOAT,
lat2 FLOAT, lon2 FLOAT
) RETURNS FLOAT
NO SQL DETERMINISTIC
COMMENT 'Returns the distance in km
between two known points of latitude and longitude'
BEGIN
RETURN DEGREES(ACOS(
COS(RADIANS(lat1)) *
COS(RADIANS(lat2)) *
COS(RADIANS(lon2) - RADIANS(lon1)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))
)) * 111.045;
END$$
DELIMITER;
SQL
end
def down
execute "DROP FUNCTION IF EXISTS haversine_distance"
end
end
The actual function here is adapted from Plum Island Media.
But you might want to check out the geocoder gem which provides this functionality and uses a better formula.

How to get From & To Ip Address from CIDR BigQuery

BigQuery provides updated geoip2 public dataset here [bigquery-publicdata -> geolite2 -> ipv4_city_blocks] which contains network column with IPv4 CIDR values.
How do I convert the CIDR values in the network column via BigQuery SQL (and not via a utility outside BigQuery) into start & end ip-address values so that I can find if an IP address is within a range or no? Would be helpful if you can provide the query to obtain the range ips for a CIDR value in the table.
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION cidrToRange(CIDR STRING)
RETURNS STRUCT<start_IP STRING, end_IP STRING>
LANGUAGE js AS """
var beg = CIDR.substr(CIDR,CIDR.indexOf('/'));
var end = beg;
var off = (1<<(32-parseInt(CIDR.substr(CIDR.indexOf('/')+1))))-1;
var sub = beg.split('.').map(function(a){return parseInt(a)});
var buf = new ArrayBuffer(4);
var i32 = new Uint32Array(buf);
i32[0] = (sub[0]<<24) + (sub[1]<<16) + (sub[2]<<8) + (sub[3]) + off;
var end = Array.apply([],new Uint8Array(buf)).reverse().join('.');
return {start_IP: beg, end_IP: end};
""";
SELECT network, IP_range.*
FROM `bigquery-public-data.geolite2.ipv4_city_blocks`,
UNNEST([cidrToRange(network)]) IP_range
It took about 60 sec to process all 3,037,858 rows with result like below
This query will do the job:
# replace with your source of IP addresses
# here I'm using the same Wikipedia set from the previous article
WITH source_of_ip_addresses AS (
SELECT REGEXP_REPLACE(contributor_ip, 'xxx', '0') ip, COUNT(*) c
FROM `publicdata.samples.wikipedia`
WHERE contributor_ip IS NOT null
GROUP BY 1
)
SELECT city_name, SUM(c) c, ST_GeogPoint(AVG(longitude), AVG(latitude)) point
FROM (
SELECT ip, city_name, c, latitude, longitude, geoname_id
FROM (
SELECT *, NET.SAFE_IP_FROM_STRING(ip) & NET.IP_NET_MASK(4, mask) network_bin
FROM source_of_ip_addresses, UNNEST(GENERATE_ARRAY(9,32)) mask
WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(ip)) = 4
)
JOIN `fh-bigquery.geocode.201806_geolite2_city_ipv4_locs`
USING (network_bin, mask)
)
WHERE city_name IS NOT null
GROUP BY city_name, geoname_id
ORDER BY c DESC
LIMIT 5000`
Find more details on:
https://towardsdatascience.com/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds-e9e652480bd2
The first thing you need to check is, if that function already exists, so please refer to the BigQuery Functions and Operators documentation.
If not, you need to use Standard SQL User-Defined Functions (UDF), which lets you create a function using another SQL expression or another programming language, such as JavaScript.
Keep in mind when using UDF JavaScript function, BigQuery initializes a JavaScript environment with the function's contents on every shard of execution. There is no optimization to avoid loading the environment, so it can slow down the query.
Regarding to GeoIP2 City and Country CSV Databases site, there is a utility to convert 'network' column to start/end IPs or start/end integers. Refer to Github site for details.
January 2023 solution
Just wanted to respond to Felipe's comment here. I'm not sure why he is suggesting an alternate solution using Snowflake, as his existing solution works just fine. The only difference is that you need to create the dataset yourself.
I managed to solve this by going through the exact same steps listed in Felipe's very helpful original blog article:
Sign-up to MaxMind and download the Geolite2 databases (link)
Download the two CSV files GeoLite2-City-Blocks-IPv4.csv and GeoLite2-City-Locations-en.csv, upload them to a GCP bucket, and create tables from them. I lazily used the BQ automated schema feature and it worked just fine :)
Simply create a geolite2_locs table using a query similar to the one below (just keep or drop your columns as required for your use-case)
CREATE OR REPLACE TALBLE `dataset.geolite2_locs` OPTIONS() AS (
SELECT
ip_ref.network,
NET.IP_FROM_STRING(REGEXP_EXTRACT(ip_ref.network, r'(.*)/' )) network_bin,
CAST(REGEXP_EXTRACT(ip_ref.network, r'/(.*)' ) AS INT64) mask,
ip_ref.geoname_id,
city_ref.continent_name as continent_name,
city_ref.country_name as country_name,
city_ref.city_name as city_name,
city_ref.subdivision_1_name as subdivision_1_name,
city_ref.subdivision_2_name as subdivision_2_name,
ip_ref.latitude as latitude,
ip_ref.longitude as longitude,
FROM `geolite2`.`geolite2-ipv4` ip_ref LEFT JOIN `geolite2`.`geolite2-city-en` city_ref USING (geoname_id)
);
Adapt the query in Felipe's guide or just replace the fh-bigquery.geocode.201806_geolite2_city_ipv4_locs with your new table in his answer above.
Should take you at max 1 hour to get this going. Hope it helps.

how to get a value from the results of another column calculation in same table

I tried to retrieve a value from the calculation of several columns,
in this case try to apply the formula "(a + (a / 25 * b)) - c" to be processed using sql language which I will use in codeigniter.
I also tried using "derived table" like SELECT .... FROM (SELECT... FROM...) AS dt
but I had difficulty when applying it to my case in codeigniter
private function _get_datatables_query(){
$intvl = '2';
$tgl_stok = '2019-09-30';
$this->db->SELECT('p.hso, p.no_part, p.nama_part, jml, sp.oh, sum(p.qty)+(sum(p.qty)/25*$intvl)-sp.oh as suggest');
$this->db->FROM('penjualan p');
$this->db->JOIN('stok_part sp', 'sp.no_part = p.no_part', 'left');
$this->db->WHERE("sp.tgl = '$tgl_stok' AND p.tgl BETWEEN DATE_SUB('$tgl_stok', INTERVAL $intvl DAY) AND '$tgl_stok'");
$this->db->GROUP_BY('p.no_part');
//...other code...
}
I want a column with the alias suggest in the code to produce a calculated value of several other columns
I know writing code that I created is not in accordance with the rules of writing SQL, I tried a number of ways but it did not work. I am very grateful for your help
Instead of var you should use parameter in this way you can easly pass the value you need and avoid sqlinjection eg:
$sql = "SELECT p.hso, p.no_part, p.nama_part, jml, sp.oh, sum(p.qty)+(sum(p.qty)/25*?)-sp.oh as suggest
FROM enjualan p
LEFT JOIN stok_part sp ON sp.no_part = p.no_part
WHERE sp.tgl = ? AND p.tgl BETWEEN DATE_SUB(?, INTERVAL ? DAY) AND ?
GROUP BY p.no_part";
$this->db->query($sql, array($intval,$tgl_stok, $tgl_stok, $intvl, tgl_stok ));
}
I was solved this case.
the problem is in my query:
$this->db->SELECT('p.hso, p.no_part, p.nama_part, jml, sp.oh, sum(p.qty)+(sum(p.qty)/25*$intvl)-sp.oh as suggest');
and replace it with
$this->db->SELECT("p.hso, p.no_part, p.nama_part, sum(p.qty) as jml, sp.oh, sum(p.qty)+(sum(p.qty)/25*'$intvl')-sp.oh as s_po");
my mistake was writing quotation marks in the query

Wrong number of arguments SQL MSACCESS

I am getting an error indicating the wrong number of arguments when I run the following query:
SELECT
population_postcodes.*,
target_postcodes.*,
SQR( EXP(population_postcodes.longitude- target_postcodes.longitude, 2) + EXP(population_postcodes.latitude-target_postcodes.latitude, 2) ) as distance
FROM population_postcodes INNER JOIN target_postcodes on Population_postcodes.Population_postcode = Target_postcodes.Target_postcode;
Could anyone please suggest how I can fix this?
I have also tried the following code:
SELECT Population_postcodes.*, Target_postcodes.*
FROM population_postcodes
INNER JOIN target_postcodes
ON Population_postcodes.Population_postcode = Target_postcodes.Target_postcode
SQR( (population_postcodes.longitude- target_postcodes.longitude)^2 + (population_postcodes.latitude-target_postcodes.latitude)^2 ) as distance;
And this code:
SELECT Population_postcodes.*, Target_postcodes.*, SQR( (population_postcodes.longitude- target_postcodes.longitude)^2 + (population_postcodes.latitude-target_postcodes.latitude)^2 ) as distance
FROM population_postcodes
INNER JOIN target_postcodes
ON Population_postcodes.Population_postcode = Target_postcodes.Target_postcode;
Exp needs one parameter, you give two.
Old: EXP(population_postcodes.longitude- target_postcodes.longitude, 2)
New: (population_postcodes.longitude- target_postcodes.longitude)*(population_postcodes.longitude- target_postcodes.longitude)
Try replacing...
EXP(<expression>, 2)
...to...
<expression>^2
In Access, the EXP function returns e (the base of natural logarithms) raised to a power. To raise an expression to a power, use the ^ operator.
In your case, be careful to put brackets around the expression, for example...
(population_postcodes.longitude- target_postcodes.longitude)^2
...to force the power to be applied last. By default, the ^ operator is evaluated before the - operator.

Surrounding Areas SQL

I'm trying to get a query to work which returns a list of locId's from the database when fed a long and a lat.
Here's the sql:
eg: lat = "-37.8333300" : lon = "145.000000" : radius = (5 * 0.621371192) ^ 2
SELECT locId,longitude,latitude FROM tbliplocations WHERE (69.1*([longitude]- "&lon&") * cos("&lat&"/57.3))^2 + (69.1*([latitude]- "&lat&"))^2 < "&radius
Here's the error I receive:
The data types float and int are incompatible in the '^' operator.
I'm unsure of a workaround, can anyone point me in the right direction?
Answer:
Using SQL Server 2008 R2
SELECT city FROM tbliplocationsnew WHERE POWER((69.1*([longitude]- "&lon&") * cos("&lat&"/57.3)),2) + POWER((69.1*([latitude]- "&lat&")),2) < "&radius
Not sure what database you use, but I think that "^2" in SQL does not mean "squared" like in maths. You should use a math "power" function, like POWER(number,2) in SQL Server (since you use VB maybe you use SQL Server ?)
You need to have two of the same data type it's saying. SQL thinks "5" is an int. So, you should be able to trick it into treating it as a float, by putting "5.0" instead.