operator does not exist: integer = integer[] plpgsql error - sql

I have a problem where operator does not exist,
integer = integer[] error comes up when I try to perform the query
select staff
from affiliations
where orgUnit = any (select unnest(*) from get_ou(661));
The function get_ou(661) returns a array of integers. Iwas wondering why I can't use the = any to obtain the staff from any of the orgunits from the array.
Thank you for your help!

The ANY predicate used with subselect ensure comparing value against any value returned by subselect.
postgres=# SELECT * FROM foo_table;
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 1 │ 9 │
│ 2 │ 4 │
│ 3 │ 1 │
│ 4 │ 3 │
│ 5 │ 7 │
│ 6 │ 5 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(10 rows)
CREATE OR REPLACE FUNCTION public.foo(VARIADIC integer[])
RETURNS integer[]
LANGUAGE sql
AS $function$ SELECT $1 $function$
It is strange, your example is broken (but with syntax error). When I fix it, it is working:
postgres=# SELECT * FROM foo_table
WHERE x = ANY(SELECT unnest(v) FROM foo(3,8) g(v));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)
You should to change syntax and move from subselect to array expression (this solution should be preferred for this purpose):
postgres=# SELECT * FROM foo_table WHERE x = ANY(foo(3,8));
┌────┬───┐
│ id │ x │
╞════╪═══╡
│ 4 │ 3 │
│ 7 │ 3 │
│ 8 │ 8 │
│ 9 │ 3 │
│ 10 │ 8 │
└────┴───┘
(5 rows)

Related

Clickhouse: Mapping BETWEEN filtering from an array

I understand if I want to filter a column between two numbers I can use BETWEEN:
SELECT a
FROM table
WHERE a BETWEEN 1 AND 5
Is there a way of mapping the filtering to an array of values, for instance, if the array was [1, 10, ... , N]:
SELECT a
FROM table
WHERE (a BETWEEN 1 AND 1+4) AND (a BETWEEN 10 AND 10+4) AND ... AND (a BETWEEN N AND N+4)
Try this query:
WITH
[1, 10, 75] AS starts_from,
4 AS step,
arrayMap(x -> (x, x + step), starts_from) AS intervals
SELECT number
FROM numbers(100)
WHERE arrayFirstIndex(x -> number >= x.1 AND number <= x.2, intervals) != 0
/*
┌─number─┐
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
│ 75 │
│ 76 │
│ 77 │
│ 78 │
│ 79 │
└────────┘
*/

SQL query returns product of results instead of sum

How can I make sure that with this join I'll only receive the sum of results and not the product?
I have a project entity, which contains two one-to-many relations. If I query disposal and supply.
With the following query:
SELECT *
FROM projects
JOIN disposals disposal on projects.project_id = disposal.disposal_project_refer
WHERE (projects.project_name = 'Höngg')
I get following result:
project_id,project_name,disposal_id,depository_refer,material_refer,disposal_date,disposal_measurement,disposal_project_refer
1,Test,1,1,1,2020-08-12 15:24:49.913248,123,1
1,Test,2,1,2,2020-08-12 15:24:49.913248,123,1
1,Test,7,2,1,2020-08-12 15:24:49.913248,123,1
1,Test,10,3,4,2020-08-12 15:24:49.913248,123,1
The same amount of results get returned by same query for supplies.
type Project struct {
ProjectID uint `gorm:"primary_key" json:"ProjectID"`
ProjectName string `json:"ProjectName"`
Disposals []Disposal `gorm:"ForeignKey:disposal_project_refer"`
Supplies []Supply `gorm:"ForeignKey:supply_project_refer"`
}
If I query both tables I would like to receive the sum of both single queries. Currently I am receiving 16 results (4 supply results multiplied by 4 disposal results).
The combined query:
SELECT *
FROM projects
JOIN disposals disposal ON projects.project_id = disposal.disposal_project_refer
JOIN supplies supply ON projects.project_id = supply.supply_project_refer
WHERE (projects.project_name = 'Höngg');
I have tried achieving my goal with union queries but I was not sucessfull. What else should I try to achieve my goal?
It is your case (simplified):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select * from a join b on (a.x=b.x) join c on (b.x=c.x);
┌───┬───┬───┬────┬───┬─────┐
│ x │ y │ x │ z │ x │ t │
├───┼───┼───┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 11 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 11 │ 1 │ 222 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 111 │
│ 1 │ 1 │ 1 │ 22 │ 1 │ 222 │
└───┴───┴───┴────┴───┴─────┘
It produces cartesian join because the value for join is same in all tables. You need some additional condition for joining your data.For example (tests for various cases):
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
└───┴───┴────┴───┴────┴────┴───┴─────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬────┬───┬─────┬──────┬──────┬──────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼────┼───┼─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ 3 │ 1 │ 33 │ ░░░░ │ ░░░░ │ ░░░░ │
└───┴───┴────┴───┴─────┴──────┴──────┴──────┘
# with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22)), c(x,t) as (values(1,111),(1,222),(1,333))
select *
from a
cross join lateral (
select *
from (select row_number() over() as rn, * from b where b.x=a.x) as b
full join (select row_number() over() as rn, * from c where c.x=a.x) as c on (b.rn=c.rn)
) as bc;
┌───┬───┬──────┬──────┬──────┬────┬───┬─────┐
│ x │ y │ rn │ x │ z │ rn │ x │ t │
├───┼───┼──────┼──────┼──────┼────┼───┼─────┤
│ 1 │ 1 │ 1 │ 1 │ 11 │ 1 │ 1 │ 111 │
│ 1 │ 1 │ 2 │ 1 │ 22 │ 2 │ 1 │ 222 │
│ 1 │ 1 │ ░░░░ │ ░░░░ │ ░░░░ │ 3 │ 1 │ 333 │
└───┴───┴──────┴──────┴──────┴────┴───┴─────┘
db<>fiddle
Note that there is no any obvious relations between disposals and supplies (b and c in my example) so the order of both could be random. As for me the better solution for this task could be the aggregation of the data from those tables using JSON for example:
with a(x,y) as (values(1,1)), b(x,z) as (values(1,11),(1,22),(1,33)), c(x,t) as (values(1,111),(1,222))
select
*,
(select json_agg(to_json(b.*)) from b where a.x=b.x) as b,
(select json_agg(to_json(c.*)) from c where a.x=c.x) as c
from a;
┌───┬───┬──────────────────────────────────────────────────┬────────────────────────────────────┐
│ x │ y │ b │ c │
├───┼───┼──────────────────────────────────────────────────┼────────────────────────────────────┤
│ 1 │ 1 │ [{"x":1,"z":11}, {"x":1,"z":22}, {"x":1,"z":33}] │ [{"x":1,"t":111}, {"x":1,"t":222}] │
└───┴───┴──────────────────────────────────────────────────┴────────────────────────────────────┘

Julia: how to compute a particular operation on certain columns of a Dataframe

I have the following Dataframe
using DataFrames, Statistics
df = DataFrame(name=["John", "Sally", "Kirk"],
age=[23., 42., 59.],
children=[3,5,2], height = [180, 150, 170])
print(df)
3×4 DataFrame
│ Row │ name │ age │ children │ height │
│ │ String │ Float64 │ Int64 │ Int64 │
├─────┼────────┼─────────┼──────────┼────────┤
│ 1 │ John │ 23.0 │ 3 │ 180 │
│ 2 │ Sally │ 42.0 │ 5 │ 150 │
│ 3 │ Kirk │ 59.0 │ 2 │ 170 │
I can compute the mean of a column as follow:
println(mean(df[:4]))
166.66666666666666
Now I want to get the mean of all the numeric column and tried this code:
x = [2,3,4]
for i in x
print(mean(df[:x[i]]))
end
But got the following error message:
MethodError: no method matching getindex(::Symbol, ::Int64)
Stacktrace:
[1] top-level scope at ./In[64]:3
How can I solve the problem?
You are trying to access the DataFrame's column using an integer index specifying the column's position. You should just use the integer value without any : before i, which would create the symbol :i but you do not a have column named i.
x = [2,3,4]
for i in x
println(mean(df[i])) # no need for `x[i]`
end
You can also index a DataFrame using a Symbol denoting the column's name.
x = [:age, :children, :height];
for c in x
println(mean(df[c]))
end
You get the following error in your attempt because you are trying to access the ith index of the symbol :x, which is an undefined operation.
MethodError: no method matching getindex(::Symbol, ::Int64)
Note that :4 is just 4.
julia> :4
4
julia> typeof(:4)
Int64
Here is a one-liner that actually selects all Number columns:
julia> mean.(eachcol(df[findall(x-> x<:Number, eltypes(df))]))
3-element Array{Float64,1}:
41.333333333333336
3.3333333333333335
166.66666666666666
For many scenarios describe is actually more convenient:
julia> describe(df)
4×8 DataFrame
│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │
│ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Nothing │ DataType │
├─────┼──────────┼─────────┼──────┼────────┼───────┼─────────┼──────────┼──────────┤
│ 1 │ name │ │ John │ │ Sally │ 3 │ │ String │
│ 2 │ age │ 41.3333 │ 23.0 │ 42.0 │ 59.0 │ │ │ Float64 │
│ 3 │ children │ 3.33333 │ 2 │ 3.0 │ 5 │ │ │ Int64 │
│ 4 │ height │ 166.667 │ 150 │ 170.0 │ 180 │ │ │ Int64 │
In the question println(mean(df[4])) works as well (instead of println(mean(df[:4]))).
Hence we can write
x = [2,3,4]
for i in x
println(mean(df[i]))
end
which works

how to show results of postcodes within a radius of a point

hi back with another problem lol, i have a table with several columns; 2 of which latitude and longitude and other is crime types, what i need to do is work out how many crimes were committed within an x amount of meters from a certain point
what i need is to find the amount of crimes that took place 250m, 500m and 1km from E:307998m, N:188746m this point
help would be appreciated or even just a push in the right direction
thanks
What an interesting question. The following may help.
You can use Pythagoras's theorem to calculate the distance from a point ([100,100] in this case) and any incident, then count the total where this is less than a threshold and of the right type.
# select * from test;
┌─────┬─────┬──────┐
│ x │ y │ type │
├─────┼─────┼──────┤
│ 100 │ 100 │ 1 │
│ 104 │ 100 │ 1 │
│ 110 │ 100 │ 1 │
│ 110 │ 102 │ 1 │
│ 50 │ 102 │ 2 │
│ 50 │ 150 │ 2 │
│ 50 │ 152 │ 3 │
│ 150 │ 152 │ 1 │
│ 40 │ 152 │ 1 │
│ 150 │ 150 │ 2 │
└─────┴─────┴──────┘
(10 rows)
select count(*) from test where sqrt((x-100)*(x-100)+(y-100)*(y-100))<30 and type = 1;
┌───────┐
│ count │
├───────┤
│ 4 │
└───────┘
(1 row)

Saving a DataFrame with different length array entries

julia> x = [rand(k) for k in 1:10];
julia> d = DataFrame(x=Vector{Float64}[]);
julia> for k in 1:10
push!(d, [x[k]])
end
julia> d
10×1 DataFrames.DataFrame
│ Row │ x │
├─────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1 │ [0.912215] │
│ 2 │ [0.0865126, 0.260076] │
│ 3 │ [0.61766, 0.969529, 0.177093] │
│ 4 │ [0.927896, 0.521724, 0.669713, 0.148345] │
│ 5 │ [0.779086, 0.715808, 0.943805, 0.197353, 0.716311] │
│ 6 │ [0.0932849, 0.660737, 0.547138, 0.00146499, 0.0726306, 0.84183] │
│ 7 │ [0.246593, 0.131446, 0.378437, 0.584403, 0.777732, 0.670934, 0.618792] │
│ 8 │ [0.00339141, 0.704945, 0.0235316, 0.0806565, 0.332005, 0.304394, 0.157108, 0.12613] │
│ 9 │ [0.401086, 0.802521, 0.661974, 0.369114, 0.331184, 0.341598, 0.138835, 0.673759, 0.599687] │
│ 10 │ [0.615559, 0.445397, 0.104951, 0.182031, 0.844579, 0.613385, 0.887714, 0.139976, 0.991951, 0.2642] │
julia> #save "test.jld" d
ERROR: DimensionMismatch("mismatch in dimension 1 (expected 1 got 2)")
How do I save this DataFrame (preferably using JLD)?