GeoTIFF software generation - gdal

I have some data about the height of a set of points (for example, from the Google Elevation API). There is a task to save this data in GeoTIFF format, then to use in osgEarth (GDAL). How can this be done? It does not matter in what language.
A quick search on the Internet only gave me the answer to the reverse question (How do I open geotiff images with gdal in python?)
I would be very grateful for any help.

So i would do this with GDAL from python (You could also use rasterio which is a nice wrapper around gdal for file raster file handling)
You should put your data in a numpy array,let us call it some_nparray.
Then create the tif dataset gtiffDriver.Create(). Here you can provide the name of your file, the dimensions in number of columns and rows of your image, the number of bands (here 1), and the datatype. Here i said float32, however byte, int16 etc could also work, depending on your data (you can check it with heigh_data_array.dtype)
Next you should set the geotransform, which is the information about the corner coordinates and pixel resolution, and you should set the projection you are using. This is done with dataset.SetGeoTransform and dataset.SetProjection. How these are created is not in the scope of this question I believe. If you do not need it, i guess you can even skip that part.
Finally write your array to the file with WriteArray and close the file.
You code should look something like this. Here I use the convention that variables prefixed with some_ should be provided by you.
from osgeo import gdal
height_data_array = some_nparray
gtiffDriver = gdal.GetDriverByName('GTiff')
dataset = gtiffDriver.Create('result.tif',
height_data_array.shape[1],
height_data_array.shape[0],
1,
gdal.GDT_Float32)
dataset.SetGeoTransform(some_geotrans)
dataset.SetProjection(some_projection)
dataset.GetRasterBand(1).WriteArray(height_data_array)
dataset = None

Related

How to serialize data in example-in-example format for tensorflow-ranking?

I'm building a ranking model with tensorflow-ranking. I'm trying to serialize a data set in the TFRecord format and read it back at training time.
The tutorial doesn't show how to do this. There is some documentation here on an example-in-example data format but it's hard for me to understand: I'm not sure what the serialized_context or serialized_examples fields are or how they fit into examples and I'm not sure what the Serialize() function in the code block is.
Concretely, how can I write and read data in example-in-example format?
The context is a map from feature name to tf.train.Feature. The examples list is a list of maps from feature name to tf.train.Feature. Once you have these, the following code will create an "example-in-example":
context = {...}
examples = [{...}, {...}, ...]
serialized_context = tf.train.Example(features=tf.train.Features(feature=context)).SerializeToString()
serialized_examples = tf.train.BytesList()
for example in examples:
tf_example = tf.train.Example(features=tf.train.Features(feature=example))
serialized_examples.value.append(tf_example.SerializeToString())
example_in_example = tf.train.Example(features=tf.train.Features(feature={
'serialized_context': tf.train.Feature(bytes_list=tf.train.BytesList(value=[serialized_context])),
'serialized_examples': tf.train.Feature(bytes_list=serialized_examples)
}))
To read the examples back, you may call
tfr.data.parse_from_example_in_example(example_pb,
context_feature_spec = context_feature_spec,
example_feature_spec = example_feature_spec)
where context_feature_spec and example_feature_spec are maps from feature name to tf.io.FixedLenFeature or tf.io.VarLenFeature.
First of all, I recommend reading this article to ensure that you know how to create a tf.Example as well as tf.SequenceExample (which by the way, is the other data format supported by TF-Ranking):
Tensorflow Records? What they are and how to use them
In the second part of this article, you will see that a tf.SequenceExample has two components: 1) Context and 2)Sequence (or examples). This is the same idea that Example-in-Example is trying to implement. Basically, context is the set of features that are independent of the items that you want to rank (a search query in the case of search, or user features in the case of a recommendation system) and the sequence part is a list of items (aka examples). This could be a list of documents (in search) or movies (in recommendation).
Once you are comfortable with tf.Example, Example-in-Example will be easier to understand. Take a look at this piece of code for how to create an EIE instance:
https://www.gitmemory.com/issue/tensorflow/ranking/95/518480361
1) bundle context features together in a tf.Example object and serialize it
2) bundle sequence(example) features (each of which could contain a list of values) in another tf.Example object and serialize this one too.
3) wrap these inside a parent tf.Example
4) (if you're writing to tfrecords) serialize the parent tf.Example object and write to your tfrecord file.

Reading Fortran binary file in Python

I'm having trouble reading an unformatted F77 binary file in Python.
I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...
The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).
open (unit=21, file=bendnm, status='new'
. ,form='unformatted')
write (21) bcube1
close (unit=21)
I can successfully read it in IDL using the following (also not my code, copied from colleague):
bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun
The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).
Now I try to get the same effect using Python, but no luck. I've tried the following methods.
In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')
which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.
Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).
In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030, 4.97284231e-020, -1.06514594e+299, ...,
8.97359707e-029, 6.79921640e-316, -1.79102266e-037])
I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.
It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?
I played a bit around with it, and I think I have an idea.
How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:
The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.
The header usually stores the length of the record in bytes, and is repeated at the end.
Then again, it is not standardized, so no guarantees.
The endianness, little or big.
Technically for both header and values, but I assume they're the same.
Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.
When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')
When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:
vals = ff.read_reals('>f8')
Look here for a description of the syntax of the data type.
If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.
Fortran has record demarcations which are poorly documented, even in binary files.
So every write to an unformatted file:
integer*4 Test1
real*4 Matrix(3,3)
open(78,format='unformatted')
write(78) Test1
write(78) Matrix
close(78)
Should ultimately be padded by an np.int32 values. (I've seen references that this tells you the record length, but haven't verified persconally.)
The above could be read in Python via numpy as:
input_file = open(file_location,'rb')
datum = np.dtype([('P1',np.int32),('Test1',np.int32),('P2',np.int32),('P3',mp.int32),('MatrixT',(np.float32,(3,3))),('P4',np.int32)])
data = np.fromfile(input_file,datum)
Which should fully populate the data array with the individual data sets of the format above. Do note that numpy expects data to be packed in C format (row major) while Fortran format data is column major. For square matrix shapes like that above, this means getting the data out of the matrix requires a transpose as well, before using. For non square matrices, you will need to reshape and transpose:
Matrix = np.transpose(data[0]['MatrixT']
Transposing your 4-D data structure is going to need to be done carefully. You might look into SciPy for automated ways to do so; the SciPy package seems to have Fortran related utilities which I have not fully explored.

How can I change paper size when using Knit PDF in RStudio throughout a document?

I am looking for ways to change the paper size throughout a pdf document. I know that I can specify classotion: a3paper for the entire document in the yaml header. I also know that I can change margins with the geometry package (\newgeometry{· · ·} and \restoregeometry) throughout a document. Unfortunately, there is no option to change paper sizes, with the geomerty package, throughout a document though.
I would like to do something like this but with paper size instead.
Is it even possible?
I am asking because I have some wide tables in my document where letters and numbers overlap when having a(4|5|6)paper specified. Other tables are narrow and I would like to have them bigger.
My table output is not from kable or any other easily modifiable package outputs e.g xtable. So what I am saying is I can't modify the dimensions of my table in my code.
Any help is much appreciated. Thank you.
The geometry package knows about a3paper, so the following works for me
---
output: pdf_document
geometry: a3paper
---
test
producing a PDF with page size "841.89 x 1190.55 pts" (A4 would be "595.276 x 841.89 pts"). For readability you should use at least two columns for the text, though.

Accessing a .fits file and plotting its columns

I'm trying to access a .fits file and plotting two columns (out of many!).
I used pyfits to access the file, and
plt.plotfile('3XMM_DR5.fits', delimiter=' ', cols=(0, 1), names=('x-axis','y-axis'))
but that's not working. Are there any alternatives? And is there any way to open the file using python? In order to access the data table
According to the docs from matplotlib for plotfile:
Note: plotfile is intended as a convenience for quickly plotting data from flat files; it is not intended as an alternative interface to general plotting with pyplot or matplotlib.
This isn't very clear. I think by "flat files" it just means CSV data or something--this function isn't used very much in my experience, and it certainly does't know anything about FITS files, which are seldom used outside astronomy. You mentioned in your post that you did something with PyFITS, but that isn't demonstrated anywhere in your question.
PyFITS, incidentally, has been deprecated for several years now, and its functionality is integrated into Astropy.
You can open a table from a FITS file with astropy.Table.read:
from astropy.table import Table
table = Table.read('3XMM_DR5.fits')
then access the columns with square bracket notation like:
plt.plot(table['whatever the x axis column is named'], table['y axis column name'])

Extracting Data from an Area file

I am trying to extract information at a specific location (lat,lon) from different satellite images. These images are were given to me in the AREA format and I cooked up a simple jython script to extract temperature values like so.
While the script works, here is small snippet from it that prints out the data value at a point.
from edu.wisc.ssec.mcidas import AreaFile as af
url="adde://localhost/imagedata?&PORT=8113&COMPRESS=gzip&USER=idv&PROJ=0& VERSION=1&DEBUG=false&TRACE=0&GROUP=FL&DESCRIPTOR=8712C574&BAND=2&LATLON=29.7276 -85.0274 E&PLACE=ULEFT&SIZE=1 1&UNIT=TEMP&MAG=1 1&SPAC=4&NAV=X&AUX=YES&DOC=X&DAY=2012002 2012002&TIME=&POS=0&TRACK=0"
a=af(url);
value=a.getData();
print value
array([[I, [array([I, [array('i', [2826, 2833, 2841, 2853])])])
So what does this mean?
Please excuse me if the question seems trivial, while I am comfortable with python I am really new to dealing with scientific data.
Note
Here is a link to the entire script.
After asking around, I found out that the Area objects returns data in multiples of four. So the very first value is what I am looking for.
Grabbing the value is as simple as :
ar[0][0][0]