AsterixDB error when joining geometry: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null" - sql

I am attempting to use AsterixDB (which uses SQL++) to join two sets together via an sql query. In one dataset, I have a series of points in the form of latitude and longitude. The other dataset is geometries for zip codes. I am trying to append the relevant zip code to the first dataset based on whether the point exists in the zip code or not.
The query is below as well as the schema for each dataset
use csv;
select sett.lat, sett.long, zip.g
from csv_set as sett
left join csv_zipset as zip
on st_contains(zip.g, st_make_point(sett.lat, sett.long));
create type csv_type as {
id:uuid,
...
lat: double,
long: double
};
create type csv_ziptype as {
id: uuid,
g:geometry
};
This is the error I am facing:
ERROR: Code: 1 "java.lang.NullPointerException: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null"
I have tried adding null checks for both the point and geometry with no luck.
I have also validated that st_make_point is working properly, and st_contains works when I pass it a fixed geometry which leads me to believe that this is an issue with the geometry.
Any help is much appreciated

After exhausting more options I came to the realization that there were multiple types in my dataset: polygon, multipolygon, linestring, and geometryCollection. It seems that AsterixDB doesn't yet have the ability to compute st_contains with geometry collections. Once I removed those entries from the dataset the query was able to complete successfully.

Related

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

Create subgraph query in Gremlin around single node with outgoing and incoming edges

I have a large Janusgraph database and I'd to create a subgraph centered around one node type and including incoming and outgoing nodes of specific types.
In Cypher, the query would look like this:
MATCH (a:Journal)N-[:PublishedIn]-(b:Paper{paperTitle:'My Paper Title'})<-[:AuthorOf]-(c:Author)
RETURN a,b,c
This is what I tried in Gremlin:
sg = g.V().outE('PublishedIn').subgraph('j_p_a').has('Paper','paperTitle', 'My Paper Title')
.inE('AuthorOf').subgraph('j_p_a')
.cap('j_p_a').next()
But I get a syntax error. 'AuthorOf' and 'PublishedIn' are not the only edge types ending at 'Paper' nodes.
Can someone show me how to correctly execute this query in Gremlin?
As written in your query, the outE step yields edges and the has step will check properties on those edges, following that the query processor will expect an inV not another inE. Without your data model it is hard to know exactly what you need, however, looking at the Cypher I think this is what you want.
sg = g.V().outE('PublishedIn').
subgraph('j_p_a').
inV().
has('Paper','paperTitle', 'My Paper Title').
inE('AuthorOf').
subgraph('j_p_a')
cap('j_p_a').
next()
Edited to add:
As I do not have your data I used my air-routes graph. I modeled this query on yours and used some select steps to limit the data size processed. This seems to work in my testing. Hopefully you can see the changes I made and try those in your query.
sg = g.V().outE('route').as('a').
inV().
has('code','AUS').as('b').
select('a').
subgraph('sg').
select('b').
inE('contains').
subgraph('sg').
cap('sg').
next()

Error: Not found: Dataset my-project-name:domain_public was not found in location US

I need to make a query for a dataset provided by a public project. I created my own project and added their dataset to my project. There is a table named: domain_public. When I make query to this table I get this error:
Query Failed
Error: Not found: Dataset my-project-name:domain_public was not found in location US
Job ID: my-project-name:US.bquijob_xxxx
I am from non-US country. What is the issue and how to fix it please?
EDIT 1:
I change the processing location to asia-northeast1 (I am based in Singapore) but the same error:
Error: Not found: Dataset censys-my-projectname:domain_public was not found in location asia-northeast1
Here is a view of my project and the public project censys-io:
Please advise.
EDIT 2:
The query I used to type is based on censys tutorial is:
#standardsql
SELECT domain, alexa_rank
FROM domain_public.current
WHERE p443.https.tls.cipher_suite = 'some_cipher_suite_goes_here';
When I changed the FROM clause to:
FROM `censys-io.domain_public.current`
And the last line to:
WHERE p443.https.tls.cipher_suite.name = 'some_cipher_suite_goes_here';
It worked. Shall I understand that I should always include the projectname.dataset.table (if I'm using the correct terms) and point the typo the Censys? Or is this special case to this project for some reason?
BigQuery can't find your data
How to fix it
Make sure your FROM location contains 3 parts
A project (e.g. bigquery-public-data)
A database (e.g. hacker_news)
A table (e.g. stories)
Like so
`bigquery-public-data.hacker_news.stories`
*note the backticks
Examples
Wrong
SELECT *
FROM `stories`
Wrong
SELECT *
FROM `hacker_news.stories`
Correct
SELECT *
FROM `bigquery-public-data.hacker_news.stories`
In Web UI - click Show Options button and than select your location for "Processing Location"!
Specify the location in which the query will execute. Queries that run in a specific location may only reference data in that location. For data in US/EU, you may choose Unspecified to run the query in the location where the data resides. For data in other locations, you must specify the query location explicitly.
Update
As it stated above - Queries that run in a specific location may only reference data in that location
Assuming that censys-io.domain_public dataset has its data in US - you need to specify US for Processing Location
The problem turned out to be due to wrong table name in the FROM clause.
The right FROM clause should be:
FROM `censys-io.domain_public.current`
While I was typing:
FROM domain_public.current
So the project name is required in the FROM and `` are required because of - in the project name.
Make sure your FROM location contains 3 parts as #stevec mentioned
A project (e.g. bigquery-public-data)
A database (e.g. hacker_news)
A table (e.g. stories)
But in my case, I was using the LegacySql within the Google script editor, so in that case you need to state that to false, for example:
var projectId = 'xxxxxxx';
var request = {
query: 'select * from project.database.table',
useLegacySql: false
};
var queryResults = BigQuery.Jobs.query(request, projectId);
check exact case [upper or lower] and spelling of table or view name.
copy it from table definition and your problem will be solved.
i was using FPL009_Year_Categorization instead of FPL009_Year_categorization
using c as C and getting the error "not found in location asia-south1"
I copied with exact case and problem is resolved.
On your Big Query console, go to the Data Explorer on the left pane, click the small three dots, then select query option from the list. This step confirms you choose the correct project and dataset. Then you can edit the query on the query pane on the right.
may be dataset name changed in create dataset option. it should be US or default location
enter image description here

How do I use stored geometries in the Spatial functions?

I've been trying to evaluate the use of OrientDB for our spatial data.
I'm using the following version:
OrientDB: orientdb-community-2.2.0-20160217.214325-39
OrientDB-Spatial: JAR built from develop branch of the github Repo OS:
Win7 64Bit
Right now what I was to do, is if I have polygons stored in the db, and the input is a location (latitude & longitude) then I need to get the polygon which contains that location.
I created a Class to store the State polygons like this:
CREATE class state
CREATE PROPERTY state.name STRING
CREATE PROPERTY state.shape EMBEDDED OPolygon
I Inserted a State with the following command:
INSERT INTO state SET name = 'Center', shape = ST_GeomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))')
I've tried several ways of getting the state which contains the given latlong, but all of them give error.
Even something as simple as:
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Gives the following error:
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Error on parsing command at position #0: Error parsing query: SELECT
from state WHERE ST_Contains(shape,
ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" -->
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial"
I can run all the spatial functions when I enter the geometries directly in the spatial function, such as:
Select ST_Contains(ST_geomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))'), ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
I just can't figure out how to get these function to run on shapes which are stored as properties on records.
How are the geometries which are stored, to be used in these spatial functions? Is there some other syntax for doing so?
try this
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)')) = true
The syntax with where function() is not supported yet

How to access data NOAA data through GRADS?

I'm trying to get some DAP data from noaa, but can't figure out how to pass variables to it. I've looked and looked and haven't found how to just poke around at it with my browser. The data is located at http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110725/ruc_f17.info (which may become outdated some time after this post sits around.)
I want to access the ugrd10m variable with the variables time, latitude, and longitude. Any ideas what url is needed to do this?
According to their documentation, it sounds like you want to point your browser at a URL like:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.ascii?ugrd10m[0:1][0:1][0:1]
That will return a table of the ugrd10m values for the first two time/lat/lon points:
ugrd10m, [2][2][2]
[0][0], 9.999E20, 9.999E20
[0][1], 9.999E20, 9.999E20
[1][0], 9.999E20, 9.999E20
[1][1], 9.999E20, 9.999E20
time, [2]
734395.0, 734395.0416666666
lat, [2]
16.281, 16.46570909091
lon, [2]
-139.856603, -139.66417731424
The number of time/lat/lon points is given under the long description of ugrd10m at the dataset info address:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.info
time: Array of 64 bit Reals [time = 0..18]
means that there are 19 different time values, at indexes 0 to 18. In this case, the complete dataset can be retrieved by setting all the ranges to the max values:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.ascii?ugrd10m[0:18][0:226][0:427]
According to this reference, you can access data with this URL:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_00z.asc?ugrd10m&time=19&time=227&time=428
However, I can't confirm the data's validity.