How do I use stored geometries in the Spatial functions? - sql

I've been trying to evaluate the use of OrientDB for our spatial data.
I'm using the following version:
OrientDB: orientdb-community-2.2.0-20160217.214325-39
OrientDB-Spatial: JAR built from develop branch of the github Repo OS:
Win7 64Bit
Right now what I was to do, is if I have polygons stored in the db, and the input is a location (latitude & longitude) then I need to get the polygon which contains that location.
I created a Class to store the State polygons like this:
CREATE class state
CREATE PROPERTY state.name STRING
CREATE PROPERTY state.shape EMBEDDED OPolygon
I Inserted a State with the following command:
INSERT INTO state SET name = 'Center', shape = ST_GeomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))')
I've tried several ways of getting the state which contains the given latlong, but all of them give error.
Even something as simple as:
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Gives the following error:
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Error on parsing command at position #0: Error parsing query: SELECT
from state WHERE ST_Contains(shape,
ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" -->
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial"
I can run all the spatial functions when I enter the geometries directly in the spatial function, such as:
Select ST_Contains(ST_geomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))'), ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
I just can't figure out how to get these function to run on shapes which are stored as properties on records.
How are the geometries which are stored, to be used in these spatial functions? Is there some other syntax for doing so?

try this
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)')) = true
The syntax with where function() is not supported yet

Related

AsterixDB error when joining geometry: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null"

I am attempting to use AsterixDB (which uses SQL++) to join two sets together via an sql query. In one dataset, I have a series of points in the form of latitude and longitude. The other dataset is geometries for zip codes. I am trying to append the relevant zip code to the first dataset based on whether the point exists in the zip code or not.
The query is below as well as the schema for each dataset
use csv;
select sett.lat, sett.long, zip.g
from csv_set as sett
left join csv_zipset as zip
on st_contains(zip.g, st_make_point(sett.lat, sett.long));
create type csv_type as {
id:uuid,
...
lat: double,
long: double
};
create type csv_ziptype as {
id: uuid,
g:geometry
};
This is the error I am facing:
ERROR: Code: 1 "java.lang.NullPointerException: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null"
I have tried adding null checks for both the point and geometry with no luck.
I have also validated that st_make_point is working properly, and st_contains works when I pass it a fixed geometry which leads me to believe that this is an issue with the geometry.
Any help is much appreciated
After exhausting more options I came to the realization that there were multiple types in my dataset: polygon, multipolygon, linestring, and geometryCollection. It seems that AsterixDB doesn't yet have the ability to compute st_contains with geometry collections. Once I removed those entries from the dataset the query was able to complete successfully.

Passing DataFrame from notebook to another with pyspark

i'am trying to call a DataFrame that i created in notebook1 to use it in my notebook2 in Databricks Community addition with pyspark and i tried this code dbutils.notebook.run("notebook1", 60, {"dfnumber2"})
but it shows this error.
py4j.Py4JException: Method _run([class java.lang.String, class java.lang.Integer, class java.util.HashSet, null, class java.lang.String]) does not exist
any help please?
The actual problem is that you pass last parameter ({"dfnumber2"}) incorrectly - with this syntax it's a set, not the map type. You need to use syntax: {"table_name": "dfnumber2"} to represent it as a dict/map.
But if you look into documentation of dbutils.notebook.run, you will see following phrase:
To implement notebook workflows, use the dbutils.notebook.* methods. Unlike %run, the dbutils.notebook.run() method starts a new job to run the notebook.
But jobs aren't supported on the Community Edition, so it won't work anyway.
Create a global temp view and pass the table name as argument to your next notebook.
Drnumber2.createOrReplaceGlobalTempView("dfnumber2")
dbutils.notebook.run("notebook1", 60, {table_name:"dfnumber2"})
In your notebook1 you can do
table_name= dbutils.widgets.get("table_name")
Dfnumber2 = spark.sql("select * from global_temp."+table_name)

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

Extract JSON content in Metabase SQL query

Using: Django==2.2.24, Python=3.6, PostgreSQL is underlying DB
Working with Django ORM, I can easily make all sort of queries, but I started using Metabase, and my SQL might be a bit rusty.
The problem:
I am trying to get a count of the items in a list, under a key in a dictionary, stored as a JSONField:
from django.db import models
from jsonfield import JSONField
class MyTable(models.Model):
data_field = JSONField(blank=True, default=dict)
Example of the dictionary stored in data_field:
{..., "my_list": [{}, {}, ...], ...}
Under "my_list" key, the value stored is a list, which contains a number of other dictionaries.
In Metabase, I am trying to get a count for the number of dictionaries in the list, but even more basic things, none of which work.
Some stuff I tried:
Attempt:
SELECT COUNT(elem->'my_list') as my_list_count
FROM my_table, json_object_keys(data_field:json) AS elem
Error:
ERROR: syntax error at or near ":" Position: 226
Attempt:
SELECT ARRAY_LENGTH(elem->'my_list') as my_list_count
FROM my_table, JSON_OBJECT_KEYS(data_field:json) AS elem
Error:
ERROR: syntax error at or near ":" Position: 233
Attempt:
SELECT JSON_ARRAY_LENGTH(data_field->'my_list'::json)
FROM my_table
Error:
ERROR: invalid input syntax for type json Detail: Token "my_list" is invalid. Position: 162 Where: JSON data, line 1: my_list
Attempt:
SELECT ARRAY_LENGTH(JSON_QUERY_ARRAY(data_field, '$.my_list'))
FROM my_table
Error:
ERROR: function json_query_array(text, unknown) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 140
Basically, I think the issue is that I am using the wrong signatures (most of the time) in the methods I am trying to use.
I used this query to make sure I can at least get the keys from the dictionary:
SELECT JSON_OBJECT_KEYS(data_field::json)
FROM my_table
I was not able to use JSON_OBJECT_KEYS() without adding the ::json cast, I was getting this error:
ERROR: function json_object_keys(text) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 127
But with the json cast, I am getting all the keys as intended.
Thank you for taking a look!
EDIT:
I also found this interesting article with different solution but none of the solutions worked.
Also seen this SO post which did not help.
Ok, after some more digging around, I found this article, which had the correct format/syntax.
This code is what I used to fetch the list from the JSON object successfully:
select data_field::json->'my_list' as the_list
from my_table
Then, I used json_array_length() to get the number of elements:
select json_array_length(data_field::json->'my_list') as number_of_elements
from my_table
All done! :)
EDIT:
I just found the reason to this whole shenanigan.
In the code (which goes years back) we used this package:
jsonfield==1.0.3
And used this way:
from jsonfield import JSONField
The issue is that in the background, Postgres saves the data as a string, so it needs to be cast into a JSON.
Later Django introduced its own JSONField, which stores data as you would expect, without a need to cast:
from django.contrib.postgres.fields import JSONField

Error: Not found: Dataset my-project-name:domain_public was not found in location US

I need to make a query for a dataset provided by a public project. I created my own project and added their dataset to my project. There is a table named: domain_public. When I make query to this table I get this error:
Query Failed
Error: Not found: Dataset my-project-name:domain_public was not found in location US
Job ID: my-project-name:US.bquijob_xxxx
I am from non-US country. What is the issue and how to fix it please?
EDIT 1:
I change the processing location to asia-northeast1 (I am based in Singapore) but the same error:
Error: Not found: Dataset censys-my-projectname:domain_public was not found in location asia-northeast1
Here is a view of my project and the public project censys-io:
Please advise.
EDIT 2:
The query I used to type is based on censys tutorial is:
#standardsql
SELECT domain, alexa_rank
FROM domain_public.current
WHERE p443.https.tls.cipher_suite = 'some_cipher_suite_goes_here';
When I changed the FROM clause to:
FROM `censys-io.domain_public.current`
And the last line to:
WHERE p443.https.tls.cipher_suite.name = 'some_cipher_suite_goes_here';
It worked. Shall I understand that I should always include the projectname.dataset.table (if I'm using the correct terms) and point the typo the Censys? Or is this special case to this project for some reason?
BigQuery can't find your data
How to fix it
Make sure your FROM location contains 3 parts
A project (e.g. bigquery-public-data)
A database (e.g. hacker_news)
A table (e.g. stories)
Like so
`bigquery-public-data.hacker_news.stories`
*note the backticks
Examples
Wrong
SELECT *
FROM `stories`
Wrong
SELECT *
FROM `hacker_news.stories`
Correct
SELECT *
FROM `bigquery-public-data.hacker_news.stories`
In Web UI - click Show Options button and than select your location for "Processing Location"!
Specify the location in which the query will execute. Queries that run in a specific location may only reference data in that location. For data in US/EU, you may choose Unspecified to run the query in the location where the data resides. For data in other locations, you must specify the query location explicitly.
Update
As it stated above - Queries that run in a specific location may only reference data in that location
Assuming that censys-io.domain_public dataset has its data in US - you need to specify US for Processing Location
The problem turned out to be due to wrong table name in the FROM clause.
The right FROM clause should be:
FROM `censys-io.domain_public.current`
While I was typing:
FROM domain_public.current
So the project name is required in the FROM and `` are required because of - in the project name.
Make sure your FROM location contains 3 parts as #stevec mentioned
A project (e.g. bigquery-public-data)
A database (e.g. hacker_news)
A table (e.g. stories)
But in my case, I was using the LegacySql within the Google script editor, so in that case you need to state that to false, for example:
var projectId = 'xxxxxxx';
var request = {
query: 'select * from project.database.table',
useLegacySql: false
};
var queryResults = BigQuery.Jobs.query(request, projectId);
check exact case [upper or lower] and spelling of table or view name.
copy it from table definition and your problem will be solved.
i was using FPL009_Year_Categorization instead of FPL009_Year_categorization
using c as C and getting the error "not found in location asia-south1"
I copied with exact case and problem is resolved.
On your Big Query console, go to the Data Explorer on the left pane, click the small three dots, then select query option from the list. This step confirms you choose the correct project and dataset. Then you can edit the query on the query pane on the right.
may be dataset name changed in create dataset option. it should be US or default location
enter image description here