I understand if I want to filter a column between two numbers I can use BETWEEN:
SELECT a
FROM table
WHERE a BETWEEN 1 AND 5
Is there a way of mapping the filtering to an array of values, for instance, if the array was [1, 10, ... , N]:
SELECT a
FROM table
WHERE (a BETWEEN 1 AND 1+4) AND (a BETWEEN 10 AND 10+4) AND ... AND (a BETWEEN N AND N+4)
Try this query:
WITH
[1, 10, 75] AS starts_from,
4 AS step,
arrayMap(x -> (x, x + step), starts_from) AS intervals
SELECT number
FROM numbers(100)
WHERE arrayFirstIndex(x -> number >= x.1 AND number <= x.2, intervals) != 0
/*
┌─number─┐
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
│ 75 │
│ 76 │
│ 77 │
│ 78 │
│ 79 │
└────────┘
*/
How can I make sure that with this join I'll only receive the sum of results and not the product?
I have a project entity, which contains two one-to-many relations. If I query disposal and supply.
With the following query:
SELECT *
FROM projects
JOIN disposals disposal on projects.project_id = disposal.disposal_project_refer
WHERE (projects.project_name = 'Höngg')
I get following result:
project_id,project_name,disposal_id,depository_refer,material_refer,disposal_date,disposal_measurement,disposal_project_refer
1,Test,1,1,1,2020-08-12 15:24:49.913248,123,1
1,Test,2,1,2,2020-08-12 15:24:49.913248,123,1
1,Test,7,2,1,2020-08-12 15:24:49.913248,123,1
1,Test,10,3,4,2020-08-12 15:24:49.913248,123,1
The same amount of results get returned by same query for supplies.
type Project struct {
ProjectID uint `gorm:"primary_key" json:"ProjectID"`
ProjectName string `json:"ProjectName"`
Disposals []Disposal `gorm:"ForeignKey:disposal_project_refer"`
Supplies []Supply `gorm:"ForeignKey:supply_project_refer"`
}
If I query both tables I would like to receive the sum of both single queries. Currently I am receiving 16 results (4 supply results multiplied by 4 disposal results).
The combined query:
SELECT *
FROM projects
JOIN disposals disposal ON projects.project_id = disposal.disposal_project_refer
JOIN supplies supply ON projects.project_id = supply.supply_project_refer
WHERE (projects.project_name = 'Höngg');
I have tried achieving my goal with union queries but I was not sucessfull. What else should I try to achieve my goal?
It is your case (simplified):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select * from a join b on (a.x=b.x) join c on (b.x=c.x);
┌───┬───┬───┬────┬───┬─────┐
│ x │ y │ x │ z │ x │ t │
├───┼───┼───┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 11 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 11 │ 1 │ 222 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 222 │
└───┴───┴───┴────┴───┴─────┘
It produces cartesian join because the value for join is same in all tables. You need some additional condition for joining your data.For example (tests for various cases):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
└───┴───┴────┴───┴────┴────┴───┴─────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬─────┬──────┬──────┬──────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ 3 │ 1 │ 33 │ ░░░░ │ ░░░░ │ ░░░░ │
└───┴───┴────┴───┴─────┴──────┴──────┴──────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222),(1,333))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬──────┬──────┬──────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼──────┼──────┼──────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ ░░░░ │ ░░░░ │ ░░░░ │ 3 │ 1 │ 333 │
└───┴───┴──────┴──────┴──────┴────┴───┴─────┘
db<>fiddle
Note that there is no any obvious relations between disposals and supplies (b and c in my example) so the order of both could be random. As for me the better solution for this task could be the aggregation of the data from those tables using JSON for example:
with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select
*,
(select json_agg(to_json(b.*)) from b where a.x=b.x) as b,
(select json_agg(to_json(c.*)) from c where a.x=c.x) as c
from a;
┌───┬───┬──────────────────────────────────────────────────┬────────────────────────────────────┐
│ x │ y │ b │ c │
├───┼───┼──────────────────────────────────────────────────┼────────────────────────────────────┤
│ 1 │ 1 │ [{"x":1,"z":11}, {"x":1,"z":22}, {"x":1,"z":33}] │ [{"x":1,"t":111}, {"x":1,"t":222}] │
└───┴───┴──────────────────────────────────────────────────┴────────────────────────────────────┘
I have the following Dataframe
using DataFrames, Statistics
df = DataFrame(name=["John", "Sally", "Kirk"],
age=[23., 42., 59.],
children=[3,5,2], height = [180, 150, 170])
print(df)
3×4 DataFrame
│ Row │ name │ age │ children │ height │
│ │ String │ Float64 │ Int64 │ Int64 │
├─────┼────────┼─────────┼──────────┼────────┤
│ 1 │ John │ 23.0 │ 3 │ 180 │
│ 2 │ Sally │ 42.0 │ 5 │ 150 │
│ 3 │ Kirk │ 59.0 │ 2 │ 170 │
I can compute the mean of a column as follow:
println(mean(df[:4]))
166.66666666666666
Now I want to get the mean of all the numeric column and tried this code:
x = [2,3,4]
for i in x
print(mean(df[:x[i]]))
end
But got the following error message:
MethodError: no method matching getindex(::Symbol, ::Int64)
Stacktrace:
[1] top-level scope at ./In[64]:3
How can I solve the problem?
You are trying to access the DataFrame's column using an integer index specifying the column's position. You should just use the integer value without any : before i, which would create the symbol :i but you do not a have column named i.
x = [2,3,4]
for i in x
println(mean(df[i])) # no need for `x[i]`
end
You can also index a DataFrame using a Symbol denoting the column's name.
x = [:age, :children, :height];
for c in x
println(mean(df[c]))
end
You get the following error in your attempt because you are trying to access the ith index of the symbol :x, which is an undefined operation.
MethodError: no method matching getindex(::Symbol, ::Int64)
Note that :4 is just 4.
julia> :4
4
julia> typeof(:4)
Int64
Here is a one-liner that actually selects all Number columns:
julia> mean.(eachcol(df[findall(x-> x<:Number, eltypes(df))]))
3-element Array{Float64,1}:
41.333333333333336
3.3333333333333335
166.66666666666666
For many scenarios describe is actually more convenient:
julia> describe(df)
4×8 DataFrame
│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │
│ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Nothing │ DataType │
├─────┼──────────┼─────────┼──────┼────────┼───────┼─────────┼──────────┼──────────┤
│ 1 │ name │ │ John │ │ Sally │ 3 │ │ String │
│ 2 │ age │ 41.3333 │ 23.0 │ 42.0 │ 59.0 │ │ │ Float64 │
│ 3 │ children │ 3.33333 │ 2 │ 3.0 │ 5 │ │ │ Int64 │
│ 4 │ height │ 166.667 │ 150 │ 170.0 │ 180 │ │ │ Int64 │
In the question println(mean(df[4])) works as well (instead of println(mean(df[:4]))).
Hence we can write
x = [2,3,4]
for i in x
println(mean(df[i]))
end
which works
hi back with another problem lol, i have a table with several columns; 2 of which latitude and longitude and other is crime types, what i need to do is work out how many crimes were committed within an x amount of meters from a certain point
what i need is to find the amount of crimes that took place 250m, 500m and 1km from E:307998m, N:188746m this point
help would be appreciated or even just a push in the right direction
thanks
What an interesting question. The following may help.
You can use Pythagoras's theorem to calculate the distance from a point ([100,100] in this case) and any incident, then count the total where this is less than a threshold and of the right type.
# select * from test;
┌─────┬─────┬──────┐
│ x │ y │ type │
├─────┼─────┼──────┤
│ 100 │ 100 │ 1 │
│ 104 │ 100 │ 1 │
│ 110 │ 100 │ 1 │
│ 110 │ 102 │ 1 │
│ 50 │ 102 │ 2 │
│ 50 │ 150 │ 2 │
│ 50 │ 152 │ 3 │
│ 150 │ 152 │ 1 │
│ 40 │ 152 │ 1 │
│ 150 │ 150 │ 2 │
└─────┴─────┴──────┘
(10 rows)
select count(*) from test where sqrt((x-100)*(x-100)+(y-100)*(y-100))<30 and type = 1;
┌───────┐
│ count │
├───────┤
│ 4 │
└───────┘
(1 row)
julia> x = [rand(k) for k in 1:10];
julia> d = DataFrame(x=Vector{Float64}[]);
julia> for k in 1:10
push!(d, [x[k]])
end
julia> d
10×1 DataFrames.DataFrame
│ Row │ x │
├─────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1 │ [0.912215] │
│ 2 │ [0.0865126, 0.260076] │
│ 3 │ [0.61766, 0.969529, 0.177093] │
│ 4 │ [0.927896, 0.521724, 0.669713, 0.148345] │
│ 5 │ [0.779086, 0.715808, 0.943805, 0.197353, 0.716311] │
│ 6 │ [0.0932849, 0.660737, 0.547138, 0.00146499, 0.0726306, 0.84183] │
│ 7 │ [0.246593, 0.131446, 0.378437, 0.584403, 0.777732, 0.670934, 0.618792] │
│ 8 │ [0.00339141, 0.704945, 0.0235316, 0.0806565, 0.332005, 0.304394, 0.157108, 0.12613] │
│ 9 │ [0.401086, 0.802521, 0.661974, 0.369114, 0.331184, 0.341598, 0.138835, 0.673759, 0.599687] │
│ 10 │ [0.615559, 0.445397, 0.104951, 0.182031, 0.844579, 0.613385, 0.887714, 0.139976, 0.991951, 0.2642] │
julia> #save "test.jld" d
ERROR: DimensionMismatch("mismatch in dimension 1 (expected 1 got 2)")
How do I save this DataFrame (preferably using JLD)?