I got several grib2 and point based data sets.
The point based datasets are showing values in hPa, while the grib2 based datasets show values in Pa. For usability its bad if the mapserver returns different units for the same parameters. gdal has a feature called GRIB_NORMALIZE_UNITS which will convert Kelvin to Celsius automatically. Is there a way to do the same for Pa to hPa? Or is there a way to tell the mapserver to divide all values by 100 before shipping them via GetFeatureInfo and in the legend?
This is only possible with changing gdal source code. However there is a python API (eccodes) with which you can change the values (multiply with 100) - the unit in the grib will be wrong then so the files should not be handed over to anywhere else as the data is actually wrong.
Related
My data frame has 3.8 million rows and 20 or so features, many of which are categorical. After paring down the number of features, I can "dummy up" one critical column with 20 or so categories and my COLAB with (allegedly) TPU running won't crash.
But there's another column with about 53,000 unique values. Trying to "dummy up" this feature crashes my session. I can't ditch this column.
I've looked up target encoding, but the data set is very imbalanced and I'm concerned about target leakage. Is there a way around this?
EDIT: My target variable is a simple binary one.
Without knowing more details of the problem/feature, there's no obvious way to do this. This is the part of Data Science/Machine Learning that is an art, not a science. A couple ideas:
One hot encode everything, then use a dimensionality reduction algorithm to remove some of the columns (PCA, SVD, etc).
Only one hot encode some values (say limit it to 10 or 100 categories, rather than 53,000), then for the rest, use an "other" category.
If it's possible to construct an embedding for these variables (Not always possible), you can explore this.
Group/bin the values in the columns by some underlying feature. I.e. if the feature is something like days_since_X, bin it by 100 or something. Or if it's names of animals, group it by type instead (mammal, reptile, etc.)
Trying to get familiar with the datatype 'geometry', I want to import a GPX-file in a table and show it on an OSM-map. I'm using MariaDB/phpmyadmin because that's what my hoster is providing me. I'm using the 'geometry'-type because I like the ST_functions (instead of putting the lat-lon in two columns and develop/copy the needed algorithm's).
After googling and youtubing for some time now, I'm at the point that I'm wondering if I'm doing things wrong or if I'm encountering bugs. Because I don't know what to expect, I hope someone can get me on the right track.
I started on a local PC with XAMPP 7.3.4 (phpmyadmin 4.8.5/mariadb 10.1.38) installed. I started with a column datatype POINT, was surprised that phpmyadmin has the option to show the contents of a record on a map, and was dissapointed I only saw blue water. When editting a record, phpmyadmin showed a map and data which have to presented, which makes clear that SRID is '0'. Couldn't get the SRID on '4326', until some text somewhere hinted met to use a column with datatype GEOMETRY. But only a worldmap showed up.
After a lot of trying, I decided to use the hosted environment (phpmyadmin 4.9.5/mariadb 10.3.22). To my surprise the point was visible at the map. Only, on a different part of the world. Looking at the lat-lon I saw that they were interchanged. Putting them in the lon-lat sequence, the point was visible at the place where I expected it.
Because the hoster provides higher versions installed, I installed a newer XAMPP 7.4.6 (phpmyadmin 5.0.2/mariadb 10.4.1). It was a big surprise that my point wasn't showing up, just the worldmap again. So it's some configuration with the OSM-map on the local machine that needs attention? The lat-lon still have to be interchanged.
Mapping is ok, lat-lon interchanged
Mapping wrong, lat-lon ok
Mapping of a walk in Paris. First is mapping of a GPX in Prune, second is an import of the tracked points in MariaDB. Exactly the same mapping, just had to interchange the points lat - lon. So, nothing wrong with used SRID and/or coordinates I think, just phpmysql taking the lat as y and lon as x, instead of the expected lat as x and lon as Y which puts the walk somewhere in the sea in front of Somalie:
Mapping of walk in Paris presented in Prune
Mapping of same gps-points in phpmyadmin, lat-lon interchanged
Apart from the presentation of the data, I have difficulties when using the insert-option of phpmyadmin. I only get data in one pass in the table when using sql. The insert-option generates sql which gives errors. I have to edit that sql, there are " ' " and " \ " which I have to remove. Comparing the used versions I detected differences in number and places of the to remove " ' "and " \ ".
I looked at the phpmyadmin-issues, nothing seems open. I can find closed ones who indicate to some sort of issues I'm experiencing. A lot of docu on geo is offcourse about postgresql, some about mysql, but less about mariadb and phpmysql, it's hard to find the good directions.
So, my biggest three questions are if it's intended to store lon-lat instead of lat-lon (Or do I have to use another srid?) Second question is what I have to configure to get the map working locally like it does with my hoster (if that's what's causing the problem)? Third is if people can use the insert-option of phpmysql without editing the generated sql?
Thanks in advance.
If you are going to use great-circle distances, be sure to have a version of MySQL/MaraiDB that includes st_distance_sphere. (I think that limits you to MySQL 8.0.)
If you are going to have code to "find the nearest", you will find that challenging. Here's my discussion of techniques for such. http://mysql.rjweb.org/doc.php/find_nearest_in_mysql (Also included is a Stored Function to do great-circle calculations.) That discusses using SPATIAL and other techniques.
Part of your issue is the mapping from y and x to whatever projection of the world your map has. The spherical long-lat are thinking "sphere" not a "projection".
phpmyadmin is just a UI toop. It may not be smart enough to deal with some of the SPATIAL issue. Suggest you switch to MySQL commandline tool and/or your application code. BTW, what language will you be writing in?
Backtics are used around table and column names. Quotes (' or ") are used around strings. In some contexts, backslash (\) may need to be doubled or even quadrupled up.
POINT is a 25-byte binary format. It is best to dynamically construct the value, not spell out the value, as can be done with decimal numbers for INT and FLOAT.
And, yes, longitude comes first in POINT() and other SPATIAL thingies.
The main question is connected with extracting the contact pressure from .odb file.
The issue is described in three facts written below:
Imagine that we have simple 3D contact model in Abaqus/CAE
1.If we make a plot of CPRESS on a deformed shape in visualisation module, we'll get a one value of CPRESS for each node. The same (one value for one node) we will get if we request XYdata field output for all frames. And this all seems to be ok, because as far as I know Abaqus CAE use averaging for surface output (CPRESS) to make it possible to request as nodal output.
2.If we use "Probe values" instrument to examine CPRESS value in node, we'll get four values for one node. It still seems to be ok, because, i suppose, it shows the values befor averaging.
3.If we request CPRESS value from command window using this script:
odb.steps['step_name'].frames[frame_number].fieldOutputs['CPRESS'].getSubset(region='node_path').values
length of this vector of CPRESS values in a single node may be from 1 to 6 depending on a chosen node. And the quantity of CPRESS valuse got using this method have no connection with the quantity got using method 2.
So the trick is that I can't inderstand how the vector of CPRESS in node is forming.
Found very little information about this topic in Abaqus Manual.
Hope somebody may help)
Probe Values, extacts the CPRESS values for the whole element. It shows the face number and its node IDs toghether with their corresponding values.
i have a dataframe with columns accounting for different characteristics of stars and rows accounting for measurements of different stars. (something like this)
\property_______A _______A_error_______B_______B_error_______C_______C_error ...
star1
star2
star3
...
in some measurements the error for a specifc property is -1.00 which means the measurement was faulty.
in such case i want to discard the measurement.
one way to do so is by eliminating the entire row (along with other properties who's error was not -1.00)
i think it's possible to fill in the faulty measurement with a value generated by the distribution based on all the other measurements, meaning - given the other properties which are fine, this property should have this value in order to reduce the error of the entire dataset.
is there a proper name to the idea i'm referring to?
how would you apply such an algorithm?
i'm a student on a solo project so would really appreciate answers that also elaborate on theory (:
edit
after further reading, i think what i was referring to is called regression imputation.
so i guess my question is - how can i implement multidimensional linear regression in a dataframe in the most efficient way???
thanks!
Many of the variables in the data I use on a daily basis have blank fields, some of which, have meaning (ex. A blank response for a variable dealing with the ratio of satisfactory accounts to toal accounts, thus the individual does not have any accounts if they do not have a response in this column, whereas a response of 0 means the individual has no satisfactory accounts).
Currently, these records do not get included into logistic regression analyses as they have missing values for one or more fields. Is there a way to include these records into a logistic regression model?
I am aware that I can assign these blank fields with a value that is not in the range of the data (ex. if we go back to the above ratio variable, we could use 9999 or -1 as these values are not included in the range of a ratio variable (0 to 1)). I am just curious to know if there is a more appropriate way of going about this. Any help is greatly appreciated! Thanks!
You can impute values for the missing fields, subject to logical restrictions on your experimental design and the fact that it will weaken the power of your experiment some relative to having the same experiment with no missing values.
SAS offers a few ways to do this. The simplest is to use PROC MI and PROC MIANALYZE, but even those are certainly not a simple matter of plugging a few numbers in. See this page for more information. Ultimately this is probably a better question for Cross-Validated at least until you have figured out the experimental design issues.