How to access data NOAA data through GRADS? - noaa

I'm trying to get some DAP data from noaa, but can't figure out how to pass variables to it. I've looked and looked and haven't found how to just poke around at it with my browser. The data is located at http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110725/ruc_f17.info (which may become outdated some time after this post sits around.)
I want to access the ugrd10m variable with the variables time, latitude, and longitude. Any ideas what url is needed to do this?

According to their documentation, it sounds like you want to point your browser at a URL like:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.ascii?ugrd10m[0:1][0:1][0:1]
That will return a table of the ugrd10m values for the first two time/lat/lon points:
ugrd10m, [2][2][2]
[0][0], 9.999E20, 9.999E20
[0][1], 9.999E20, 9.999E20
[1][0], 9.999E20, 9.999E20
[1][1], 9.999E20, 9.999E20
time, [2]
734395.0, 734395.0416666666
lat, [2]
16.281, 16.46570909091
lon, [2]
-139.856603, -139.66417731424
The number of time/lat/lon points is given under the long description of ugrd10m at the dataset info address:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.info
time: Array of 64 bit Reals [time = 0..18]
means that there are 19 different time values, at indexes 0 to 18. In this case, the complete dataset can be retrieved by setting all the ranges to the max values:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_f17.ascii?ugrd10m[0:18][0:226][0:427]

According to this reference, you can access data with this URL:
http://nomads.ncep.noaa.gov:9090/dods/ruc/ruc20110914/ruc_00z.asc?ugrd10m&time=19&time=227&time=428
However, I can't confirm the data's validity.

Related

AsterixDB error when joining geometry: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null"

I am attempting to use AsterixDB (which uses SQL++) to join two sets together via an sql query. In one dataset, I have a series of points in the form of latitude and longitude. The other dataset is geometries for zip codes. I am trying to append the relevant zip code to the first dataset based on whether the point exists in the zip code or not.
The query is below as well as the schema for each dataset
use csv;
select sett.lat, sett.long, zip.g
from csv_set as sett
left join csv_zipset as zip
on st_contains(zip.g, st_make_point(sett.lat, sett.long));
create type csv_type as {
id:uuid,
...
lat: double,
long: double
};
create type csv_ziptype as {
id: uuid,
g:geometry
};
This is the error I am facing:
ERROR: Code: 1 "java.lang.NullPointerException: Cannot invoke \"org.apache.hyracks.control.nc.io.FileHandle.close()\" because \"fHandle\" is null"
I have tried adding null checks for both the point and geometry with no luck.
I have also validated that st_make_point is working properly, and st_contains works when I pass it a fixed geometry which leads me to believe that this is an issue with the geometry.
Any help is much appreciated
After exhausting more options I came to the realization that there were multiple types in my dataset: polygon, multipolygon, linestring, and geometryCollection. It seems that AsterixDB doesn't yet have the ability to compute st_contains with geometry collections. Once I removed those entries from the dataset the query was able to complete successfully.

RStudio Error: Unused argument ( by = ...) when fitting gam model, and smoothing seperately for a factor

I am still a beginnner in R. For a project I am trying to fit a gam model on a simple dataset with a timeset and year. I am doing it in R and I keep getting an error message that claims an argument is unused, even though I specify it in the code.
It concerns a dataset which includes a categorical variable of "Year", with only two levels. 2020 and 2022. I want to investigate if there is a peak in the hourly rate of visitors ("H1") in a nature reserve. For each observation period the average time was taken, which is the predictor variable used here ("T"). I want to use a Gam model for this, and have the smoothing applied differently for the two years.
The following is the line of code that I tried to use
`gam1 <- gam(H1~Year+s(T,by=Year),data = d)`
When I try to run this code, I get the following error message
`Error in s(T, by = Year) : unused argument (by = Year)`
I also tried simply getting rid of the "by" argument
`gam1 <- gam(H1~Year+s(T,Year),data = d)`
This allows me to run the code, but when trying to summon the output using summary(gam1), I get
Error in [<-(tmp, snames, 2, value = round(nldf, 1)) : subscript out of bounds
Since I feel like both errors are probably related to the same thing that I'm doing wrong, I decided to combine the question.
Did you load the {mgcv} package or the {gam} package? The latter doesn't have factor by smooths and as such the first error message is what I would expect if you did library("gam") and then tried to fit the model you showed.
To fit the model you showed, you should restart R and try in a clean session:
library("mgcv")
# load you data
# fit model
gam1 <- gam(H1 ~ Year + s(T, by = Year), data = d)
It could well be that you have both {gam} and {mgcv} loaded, in which case whichever you loaded last will be earlier on the function search path. As both packages have functions gam() and s(), R might just be finding the wrong versions (masking), so you might also try
gam1 <- mgcv::gam(H1 ~ Year + mgcv::s(T, by = Year), data = d)
But you would be better off only loading {mgcv} if you wan factor by smooths.
#Gavin Simpson
I did have both loaded, and I tried just using mgcv as you suggested. However, then I get the following error.
Error in names(dat) <- object$term :
'names' attribute [1] must be the same length as the vector [0]
I am assuming this is simply because it's not actually trying to use the "gam" function, but rather it attempts to name something gam1. So I would assume I actually need the package of 'gam' before I could do this.
The second line of code also doesn't work. I get the following error
Error in model.frame.default(formula = H1 ~ Year + mgcv::s(T, by = Year), :
invalid type (list) for variable 'mgcv::s(T, by = Year)'
This happens no matter the order I download the two packages in. And if I don't download 'gam', I get the error as described above.

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

Are Heart Points available in the REST API?

Are Heart Points available in the REST API for reading? IF so, how do we get to them? I'm not seeing the documentation data. Thanks.
Eric
You should use the Users.dataSources.datasets API endpoint. You can grab the hearts points merged from all data points by querying the dataSourceId "derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes". It returns a JSON object with an array called "points". You'll find each heart point in that list and if you drill down further for each heart point you'll get the derived source.
The endpoint takes the form:
https://www.googleapis.com/fitness/v1/users/me/dataSources/dataSourceId/datasets/datasetId
Replace the following in the URL above:
dataSourceId: derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes
datasetId: The ID is formatted like: "startTime-endTime" where startTime and endTime are 64 bit integers.
Expanding on WiteCastle's answer, this datasource will provide you with the heart points.
"derived:com.google.heart_minutes:com.google.android.gms:merge_heart_minutes"
You will need to specify a timeframe denoted by the datasetId parameter which is a start time and and an end time in epoch time with nanoseconds format, e.g.:
1607904000000000000-1608057778000000000
The json response includes an array of points, essentially each time the sensor detected the user's activity. The 'heart points' are accessible within each point's "fpVal". Example of a point is below:
{
"startTimeNanos": "1607970900000000000",
"endTimeNanos": "1607970960000000000",
"dataTypeName": "com.google.heart_minutes",
"originDataSourceId": "derived:com.google.heart_rate.bpm:com.google.android.gms:merge_heart_rate_bpm",
"value": [
{
"fpVal": 2, <--- 2 heart points recorded during this activity
"mapVal": []
}
],
"modifiedTimeMillis": "1607976569329"
},
To get the heart points for today, specify the timeframe (00:00-23:59 in epcoch format), then loop through each point adding up all the "fpVal" values.

Apache solr - more like this score

I have a small index with ~1000 documents with only two fields:
- id (string)
- content (text_general)
I noticed that when I do MLT search by id for similar content, the original document(which id is the searched id) have a score 5.241327.
There is 1:1 duplicated document and for the duplicated content it is returning score = 1.5258181. Why? Why it is not 5.241327 when it is 100% duplicate.
Another question is can I in any way to get similarity documents by content by passing some text in the query.
Example:
/mlt/?q=content:Some encoded long text&mlt.fl=content
I am trying to check if there is similar content uploaded and the check must be performed at new content upload time.
It might be worth to try some different parameters. I also use MLT on only one field, I use the following parameters:
'mlt.boost': 'true',
'mlt.fl': 'my_field_name',
'mlt.maxqt': 1000,
'mlt.mindf': '0',
'mlt.mintf': '0',
'qt': 'mlt',
'rows': '10'
See http://wiki.apache.org/solr/MoreLikeThis for an explanation of the parameters. I think with a small index mindf might be important and I see the default mintf (term frequency) is 2, so I assume an ID is only one term, so this is probably ignored!
First, how does Solr More-Like-This works?
A regular Solr query is conducted (e.g. "?q=content:Some encoded long text&.....".
For each document returned by the above query, More-Like-This conduct More like this query...
So, the first result set "response", is just like any Solr query results set.
The More-Like-This appears below and start with something like that (Json format):
"moreLikeThis":{
"57375":{"numFound":18155,"start":0,"docs":["
For an explanation about More Like This algorithm, please read that:
http://blog.brattland.no/node/18
and: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If you didn't solved the problem yet, please let me know and I will guide you through.