how to show results of postcodes within a radius of a point - sql

hi back with another problem lol, i have a table with several columns; 2 of which latitude and longitude and other is crime types, what i need to do is work out how many crimes were committed within an x amount of meters from a certain point
what i need is to find the amount of crimes that took place 250m, 500m and 1km from E:307998m, N:188746m this point
help would be appreciated or even just a push in the right direction
thanks

What an interesting question. The following may help.
You can use Pythagoras's theorem to calculate the distance from a point ([100,100] in this case) and any incident, then count the total where this is less than a threshold and of the right type.
# select * from test;
┌─────┬─────┬──────┐
│ x │ y │ type │
├─────┼─────┼──────┤
│ 100 │ 100 │ 1 │
│ 104 │ 100 │ 1 │
│ 110 │ 100 │ 1 │
│ 110 │ 102 │ 1 │
│ 50 │ 102 │ 2 │
│ 50 │ 150 │ 2 │
│ 50 │ 152 │ 3 │
│ 150 │ 152 │ 1 │
│ 40 │ 152 │ 1 │
│ 150 │ 150 │ 2 │
└─────┴─────┴──────┘
(10 rows)
select count(*) from test where sqrt((x-100)*(x-100)+(y-100)*(y-100))<30 and type = 1;
┌───────┐
│ count │
├───────┤
│ 4 │
└───────┘
(1 row)

Related

Clickhouse: Mapping BETWEEN filtering from an array

I understand if I want to filter a column between two numbers I can use BETWEEN:
SELECT a
FROM table
WHERE a BETWEEN 1 AND 5
Is there a way of mapping the filtering to an array of values, for instance, if the array was [1, 10, ... , N]:
SELECT a
FROM table
WHERE (a BETWEEN 1 AND 1+4) AND (a BETWEEN 10 AND 10+4) AND ... AND (a BETWEEN N AND N+4)
Try this query:
WITH
[1, 10, 75] AS starts_from,
4 AS step,
arrayMap(x -> (x, x + step), starts_from) AS intervals
SELECT number
FROM numbers(100)
WHERE arrayFirstIndex(x -> number >= x.1 AND number <= x.2, intervals) != 0
/*
┌─number─┐
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
│ 75 │
│ 76 │
│ 77 │
│ 78 │
│ 79 │
└────────┘
*/

Display COUNT(*) for every week instead of every day

Let us say that I have a table with user_id of Int32 type and login_time as DateTime in UTC format. user_id is not unique, so SELECT user_id, login_time FROM some_table; gives following result:
┌─user_id─┬──login_time─┐
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-01 │
│ 1 │ 2021-03-02 │
│ 2 │ 2021-03-02 │
│ 2 │ 2021-03-03 │
└─────────┴─────────────┘
If I run SELECT COUNT(*) as count, toDate(login_time) as l FROM some_table GROUP BY l I get following result:
┌─count───┬──login_time─┐
│ 2 │ 2021-03-01 │
│ 2 │ 2021-03-02 │
│ 1 │ 2021-03-03 │
└─────────┴─────────────┘
I would like to reformat the result to show COUNT on a weekly level, instead of every day, as I currently do.
My result for the above example could look something like this:
┌──count──┬──year─┬──month──┬─week ordinal┐
│ 5 │ 2021 │ 03 │ 1 │
│ 0 │ 2021 │ 03 │ 2 │
│ 0 │ 2021 │ 03 │ 3 │
│ 0 │ 2021 │ 03 │ 4 │
└─────────┴───────┴─────────┴─────────────┘
I have gone through the documentation, found some interesting functions, but did not manage to make them solve my problem.
I have never worked with clickhouse before and am not very experienced with SQL, which is why I ask here for help.
Try this query:
select count() count, toYear(start_of_month) year, toMonth(start_of_month) month,
toWeek(start_of_week) - toWeek(start_of_month) + 1 AS "week ordinal"
from (
select *, toStartOfMonth(login_time) start_of_month,
toStartOfWeek(login_time) start_of_week
from (
/* emulate test dataset */
select data.1 user_id, toDate(data.2) login_time
from (
select arrayJoin([
(1, '2021-02-27'),
(1, '2021-02-28'),
(1, '2021-03-01'),
(1, '2021-03-01'),
(1, '2021-03-02'),
(2, '2021-03-02'),
(2, '2021-03-03'),
(2, '2021-03-08'),
(2, '2021-03-16'),
(2, '2021-04-01')]) data)
)
)
group by start_of_month, start_of_week
order by start_of_month, start_of_week
/*
┌─count─┬─year─┬─month─┬─week ordinal─┐
│ 1 │ 2021 │ 2 │ 4 │
│ 1 │ 2021 │ 2 │ 5 │
│ 5 │ 2021 │ 3 │ 1 │
│ 1 │ 2021 │ 3 │ 2 │
│ 1 │ 2021 │ 3 │ 3 │
│ 1 │ 2021 │ 4 │ 1 │
└───────┴──────┴───────┴──────────────┘
*/

Julia: how to compute a particular operation on certain columns of a Dataframe

I have the following Dataframe
using DataFrames, Statistics
df = DataFrame(name=["John", "Sally", "Kirk"],
age=[23., 42., 59.],
children=[3,5,2], height = [180, 150, 170])
print(df)
3×4 DataFrame
│ Row │ name │ age │ children │ height │
│ │ String │ Float64 │ Int64 │ Int64 │
├─────┼────────┼─────────┼──────────┼────────┤
│ 1 │ John │ 23.0 │ 3 │ 180 │
│ 2 │ Sally │ 42.0 │ 5 │ 150 │
│ 3 │ Kirk │ 59.0 │ 2 │ 170 │
I can compute the mean of a column as follow:
println(mean(df[:4]))
166.66666666666666
Now I want to get the mean of all the numeric column and tried this code:
x = [2,3,4]
for i in x
print(mean(df[:x[i]]))
end
But got the following error message:
MethodError: no method matching getindex(::Symbol, ::Int64)
Stacktrace:
[1] top-level scope at ./In[64]:3
How can I solve the problem?
You are trying to access the DataFrame's column using an integer index specifying the column's position. You should just use the integer value without any : before i, which would create the symbol :i but you do not a have column named i.
x = [2,3,4]
for i in x
println(mean(df[i])) # no need for `x[i]`
end
You can also index a DataFrame using a Symbol denoting the column's name.
x = [:age, :children, :height];
for c in x
println(mean(df[c]))
end
You get the following error in your attempt because you are trying to access the ith index of the symbol :x, which is an undefined operation.
MethodError: no method matching getindex(::Symbol, ::Int64)
Note that :4 is just 4.
julia> :4
4
julia> typeof(:4)
Int64
Here is a one-liner that actually selects all Number columns:
julia> mean.(eachcol(df[findall(x-> x<:Number, eltypes(df))]))
3-element Array{Float64,1}:
41.333333333333336
3.3333333333333335
166.66666666666666
For many scenarios describe is actually more convenient:
julia> describe(df)
4×8 DataFrame
│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │
│ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Nothing │ DataType │
├─────┼──────────┼─────────┼──────┼────────┼───────┼─────────┼──────────┼──────────┤
│ 1 │ name │ │ John │ │ Sally │ 3 │ │ String │
│ 2 │ age │ 41.3333 │ 23.0 │ 42.0 │ 59.0 │ │ │ Float64 │
│ 3 │ children │ 3.33333 │ 2 │ 3.0 │ 5 │ │ │ Int64 │
│ 4 │ height │ 166.667 │ 150 │ 170.0 │ 180 │ │ │ Int64 │
In the question println(mean(df[4])) works as well (instead of println(mean(df[:4]))).
Hence we can write
x = [2,3,4]
for i in x
println(mean(df[i]))
end
which works

operator does not exist: integer = integer[] plpgsql error

I have a problem where operator does not exist,
integer = integer[] error comes up when I try to perform the query
select staff
from affiliations
where orgUnit = any (select unnest(*) from get_ou(661));
The function get_ou(661) returns a array of integers. Iwas wondering why I can't use the = any to obtain the staff from any of the orgunits from the array.
Thank you for your help!
The ANY predicate used with subselect ensure comparing value against any value returned by subselect.
postgres=# SELECT * FROM foo_table;
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 1 │ 9 │
│ 2 │ 4 │
│ 3 │ 1 │
│ 4 │ 3 │
│ 5 │ 7 │
│ 6 │ 5 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(10 rows)
CREATE OR REPLACE FUNCTION public.foo(VARIADIC integer[])
RETURNS integer[]
LANGUAGE sql
AS $function$ SELECT $1 $function$
It is strange, your example is broken (but with syntax error). When I fix it, it is working:
postgres=# SELECT * FROM foo_table
WHERE x = ANY(SELECT unnest(v) FROM foo(3,8) g(v));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)
You should to change syntax and move from subselect to array expression (this solution should be preferred for this purpose):
postgres=# SELECT * FROM foo_table WHERE x = ANY(foo(3,8));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)

Are there any visualization tools to realize the hierarchical style of the SQL query data as an output result?

I guess should be some visualization tools (MS SQL Server) to represent some hierarchical style of the SQL query data as an output result.
I just have some hierarchical table chain with 7 tables and I have to query 1-2nd level of it very often in order to check the bottom of this chain as well as some intermediate tables.
Any clue guys?
Thank you in advance!
P.S. It would be cool if MS SQL Management Studio could accept some plugins in its next generation... :)
Brad Schulz has a pretty amazing proc (usp_DrawTree) here:
http://bradsruminations.blogspot.com/2010/04/t-sql-tuesday-005-reporting.html
Here is one of his example outputs:
/*
┌───────────┐
│ Anne │
┌─┤ Dodsworth │ Sales Representative
│ │ Ext452 │
│ └───────────┘
┌──────────┐ │
│ Steven │ │
┌─┤ Buchanan ├─┤ Sales Manager
│ │ Ext3453 │ │
│ └──────────┘ │
│ │ ┌────────┐
│ │ │ Robert │
│ ├─┤ King │ Sales Representative
│ │ │ Ext465 │
│ │ └────────┘
│ │ ┌─────────┐
│ │ │ Michael │
│ └─┤ Suyama │ Sales Representative
│ │ Ext428 │
│ └─────────┘
│ ┌──────────┐
│ │ Laura │
├─┤ Callahan │ Inside Sales Coordinator
│ │ Ext2344 │
│ └──────────┘
┌─────────┐ │
│ Andrew │ │
│ Fuller ├─┤ Vice President, Sales
│ Ext3457 │ │
└─────────┘ │
│ ┌─────────┐
│ │ Nancy │
├─┤ Davolio │ Sales Representative
│ │ Ext5467 │
│ └─────────┘
│ ┌───────────┐
│ │ Janet │
├─┤ Leverling │ Sales Representative
│ │ Ext3355 │
│ └───────────┘
│ ┌──────────┐
│ │ Margaret │
└─┤ Peacock │ Sales Representative
│ Ext5176 │
└──────────┘
*/
For Oracle anyway (i got here via SQL tag), you can use lpad with the associated level to format the output (similar to an expanded folder view, deeper levels have more indentation):
SELECT LEVEL,
LPAD(' ', 2 * LEVEL - 1) || first_name || ' ' ||
last_name AS employee
FROM employee
START WITH employee_id = 1
CONNECT BY PRIOR employee_id = manager_id;