How to generate a fits file from the beginning - header

In this post, they explain how to generate a fits file from ascii file. However, I also would like to know how to define header and data into fits file. (Converting ASCII Table to FITS image)
For example, when I call a spectral fits file with astropy (which is downloaded from a telescope), I can call data and header separately.
I.E
In [1]:hdu = fits.open('observation.fits', memmap=True)
In [2]:header = hdu[0].header
In [3]:header
Out [3]:
SIMPLE = T / conforms to FITS standard
BITPIX = 8
NAXIS = 1
NAXIS1 = 47356
EXTEND = T
DATE = 'date' / file creation date (YYYY-MM-DDThh:mm:ss UT)
ORIGIN = 'XXX ' / European Southern Observatory
TELESCOP= 'XXX' / ESO Telescope Name
INSTRUME= 'Instrument' / Instrument used.
OBJECT = 'ABC ' / Original target.
RA = 30.4993 / xx:xx:xx.x RA (J2000) pointing
DEC = -20.0009 / xx:xx:xx.x DEC (J2000) pointing
CTYPE1 = 'WAVE ' / wavelength axis in nm
CRPIX1 = 0. / Reference pixel in z
CRVAL1 = 298.903594970703 / central wavelength
CDELT1 = 0.0199999995529652 / nm per pixel
CUNIT1 = 'nm ' / spectral unit
..
bla bla
..
END
In [3]:data = hdu[0].data
In [4]:data
Out [4]:array([ 1000, 1001, 1002, ...,
5.18091546e-13, 4.99434453e-13, 4.91280864e-13])
Lets assume, I have data like below
WAVE FLUX
1000 2.02e-12
1001 3.03e-12
1002 4.04e-12
..
bla bla
..
So, I'd like to generate a spectral fits file with my own data (with its own header).
Mini question : Now lets assume, I generate spectral fits file correctly, but I realised that I forgot to take logarithm of WAVE values in X axis (1000, 1001, 1002, ....) . How can I do that without touching FLUX values of Y-axis (2.02e-12, 3.03e-13, 4.04e-13) ?

FITS files are organized as one or more HDUs (Header Data Units) consisting, as the name suggests, as one data object (generally, a single array for an observation, though sometimes something else like a table), and the header of metadata that goes with that data.
To create a file from scratch, especially an image, the simplest way is to directly create an ImageHDU object:
>>> from astropy.io import fits
>>> hdu = fits.ImageHDU()
Just as with an HDU read from an existing file, this HDU has a (mostly empty) header, and an empty data attribute that you can then assign to:
>>> hdu.data = np.array(<some numpy array>)
>>> hdu.header['TELESCOP'] = 'Gemini'
When you're satisfied you can write the HDU out to a file with:
>>> hdu.writeto('filename.fits')
(Note: A lot of the documentation you'll see demonstrates a more complex process of creating an HDUList object, appending the HDU to the HDU list, and then writing the full HDU list. This is only necessary if you're creating a multi-extension FITS file. For a single HDU, you can use hdu.writeto directly and the framework will handle the other structural details.)
In general you don't need to manipulate the headers that describe the format of the data itself--that is automatic and should not be touched by hand (FITS has the unfortunate misfeature of mixing information about data structure with actual metadata). You can see more examples on how to manipulate FITS data here: http://docs.astropy.org/en/stable/generated/examples/index.html#astropy-io
Your other question pertains to manipulating the WCS (World Coordinate System) of the image, and in particular for spectral data this can be non-trivial. I would ask a separate question about that with more details about what you hope to accomplish.

Related

Issues trying to open a bi-dimensional array leave contained in a ROOT Tree in Pyroot

I’m stuck with a problem using Pyroot. I’m not able to read a leaf on a tree which is a two dimensional array of float values. You can see the related Tree in the following:
root [1] TTree tr=(TTree)g->Get(“tevent_2nd_integral”)
root [2] tr.Print()
*Tree :tevent_2nd_integral: Event packet tree 2nd GTUs integral *
*Entries : 57344 : Total = 548967602 bytes File Size = 412690067 *
: : Tree compression factor = 1.33 *
*Br 7 :photon_count_data : photon_count_data[1][1][48][48]/F *
*Entries : 57344 : Total Size= 530758073 bytes File Size = 411860735 *
*Baskets : 19121 : Basket Size= 32000 bytes Compression= 1.29 *
…
The array (the bold one) is photon_count_data[1][1][48][48]. Actually i have several root files and I tried both to make a chain and to use hadd method like hadd file.root 'ls /path/.root’.*
I tried several ways as i will show soon. Anytime i found different problem: once the numpy array which should contain the 48x48 values per each event was not created at all, others just didn’t write anything or strange values (negative also which is not possible).
My code is the following:
# calling the root file after using hadd to merge all files
rootFile = path+"merge.root"
f = XROOT.TFile(rootFile,'read')
tree = f.Get('tevent_2nd_integral')
# making a chain
PDMchain=TChain("tevent_2nd_integral")
for filename in sorted(os.listdir(path)):
if filename.endswith('.root') and("CPU_RUN_MAIN" in filename) :
PDMchain.Add(filename)
pdm_counts = []
#First method using python pyl class
leaves = tree.GetListOfLeaves()
# define dynamically a python class containing root Leaves objects
class PyListOfLeaves(dict) :
pass
# create an istance
pyl = PyListOfLeaves()
for i in range(0,leaves.GetEntries() ) :
leaf = leaves.At(i)
name = leaf.GetName()
# add dynamically attribute to my class
pyl.__setattr__(name,leaf)
for iev in range(0,nEntries_pixel) :
tree.GetEntry(iev)
pdm_counts.append(pyl.photon_count_data.GetValue())
# the Draw method
count = tree.Draw("photon_count_data","","")
pdm_counts.append(np.array(np.frombuffer(tree.GetV1(), dtype=np.float64, count=count)))
#ROOT buffer method
for event in PDMchain:
pdm_data_for_this_event = event.photon_count_data
pdm_data_for_this_event.SetSize(2304) #ROOT buffer
pdm_couts.append(np.array(pdm_data_for_this_event,copy=True))
with the python class method the array pdm_counts is filled with just the first element contained in photon_count_data
with the Draw method I get a segmentation violation or a strange kernel issue
with the root buffer method I get right back a list containing all the 2304 (48x48) values but they are completely different from those in the photon_count_data, id est, negative values or orders of magnitude senseless
Could you tell me where I’m wrong or if there could be a more elegant and quick method to do so.
Thanks in advance
actually I found the solution and I would like to share it if anytime someone will need it!
Actually the third method explained
for event in PDMchain:
pdm_data_for_this_event = event.photon_count_data
pdm_data_for_this_event.SetSize(2304) #ROOT buffer
pdm_couts.append(np.array(pdm_data_for_this_event,copy=True))
works, but unfortunately I was using Spyder to visualize data and for some reason it return strange values which are not right! So...don't use Spyder!!!
Moreover another method works fine:
from root_pandas import read_root
data = read_root('merge.root', 'tevent_2nd_integral', columns=['cpu_packet_time', 'photon_count_data'])
Cheers!

Trying to load an hdf5 table with dataframe.to_hdf before I die of old age

This sounds like it should be REALLY easy to answer with Google but I'm finding it impossible to answer the majority of my nontrivial pandas/pytables questions this way. All I'm trying to do is to load about 3 billion records from about 6000 different CSV files into a single table in a single HDF5 file. It's a simple table, 26 fields, mixture of strings, floats and ints. I'm loading the CSVs with df = pandas.read_csv() and appending them to my hdf5 file with df.to_hdf(). I really don't want to use df.to_hdf(data_columns = True) because it looks like that will take about 20 days versus about 4 days for df.to_hdf(data_columns = False). But apparently when you use df.to_hdf(data_columns = False) you end up with some pile of junk that you can't even recover the table structure from (or so it appears to my uneducated eye). Only the columns that were identified in the min_itemsize list (the 4 string columns) are identifiable in the hdf5 table, the rest are being dumped by data type into values_block_0 through values_block_4:
table = h5file.get_node('/tbl_main/table')
print(table.colnames)
['index', 'values_block_0', 'values_block_1', 'values_block_2', 'values_block_3', 'values_block_4', 'str_col1', 'str_col2', 'str_col3', 'str_col4']
And any query like df = pd.DataFrame.from_records(table.read_where(condition)) fails with error "Exception: Data must be 1-dimensional"
So my questions are: (1) Do I really have to use data_columns = True which takes 5x as long? I was expecting to do a fast load and then index just a few columns after loading the table. (2) What exactly is this pile of garbage I get using data_columns = False? Is it good for anything if I need my table back with query-able columns? Is it good for anything at all?
This is how you can create an HDF5 file from CSV data using pytables. You could also use a similar process to create the HDF5 file with h5py.
Use a loop to read the CSV files with np.genfromtxt into a np array.
After reading the first CSV file, write the data with .create_table() method, referencing the np array created in Step 1.
For additional CSV files, write the data with .append() method, referencing the np array created in Step 1
End of loop
Updated on 6/2/2019 to read a date field (mm/dd/YYY) and convert to datetime object. Note changes to genfromtxt() arguments! Data used is added below the updated code.
import numpy as np
import tables as tb
from datetime import datetime
csv_list = ['SO_56387241_1.csv', 'SO_56387241_2.csv' ]
my_dtype= np.dtype([ ('a',int),('b','S20'),('c',float),('d',float),('e','S20') ])
with tb.open_file('SO_56387241.h5', mode='w') as h5f:
for PATH_csv in csv_list:
csv_data = np.genfromtxt(PATH_csv, names=True, dtype=my_dtype, delimiter=',', encoding=None)
# modify date in fifth field 'e'
for row in csv_data :
datetime_object = datetime.strptime(row['my_date'].decode('UTF-8'), '%m/%d/%Y' )
row['my_date'] = datetime_object
if h5f.__contains__('/CSV_Data') :
dset = h5f.root.CSV_Data
dset.append(csv_data)
else:
dset = h5f.create_table('/','CSV_Data', obj=csv_data)
dset.flush()
h5f.close()
Data for testing:
SO_56387241_1.csv:
my_int,my_str,my_float,my_exp,my_date
0,zero,0.0,0.00E+00,01/01/1980
1,one,1.0,1.00E+00,02/01/1981
2,two,2.0,2.00E+00,03/01/1982
3,three,3.0,3.00E+00,04/01/1983
4,four,4.0,4.00E+00,05/01/1984
5,five,5.0,5.00E+00,06/01/1985
6,six,6.0,6.00E+00,07/01/1986
7,seven,7.0,7.00E+00,08/01/1987
8,eight,8.0,8.00E+00,09/01/1988
9,nine,9.0,9.00E+00,10/01/1989
SO_56387241_2.csv:
my_int,my_str,my_float,my_exp,my_date
10,ten,10.0,1.00E+01,01/01/1990
11,eleven,11.0,1.10E+01,02/01/1991
12,twelve,12.0,1.20E+01,03/01/1992
13,thirteen,13.0,1.30E+01,04/01/1993
14,fourteen,14.0,1.40E+01,04/01/1994
15,fifteen,15.0,1.50E+01,06/01/1995
16,sixteen,16.0,1.60E+01,07/01/1996
17,seventeen,17.0,1.70E+01,08/01/1997
18,eighteen,18.0,1.80E+01,09/01/1998
19,nineteen,19.0,1.90E+01,10/01/1999

Importing a TermDocumentMatrix into R

I am working on a qualitative analysis project in the tm package of R. I have built a corpus and created a term document matrix and long story short I need to edit my term document matrix and conflate some of its rows. To do this I have exported it out of R using
write.csv()
I then have imported the csv file back into R but am struggling to figure out how to get R to read it as a TermDocumentMatrix or DocumentTermMatrix.
I tried using the suggestions of the following example code with no avail.
It seems to keep reading my matrix as if it was a corpus and each cell as a single document.
# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)
Is there any way to import in a csv matrix that will be read as a termdocumentmatrix or documenttermmatrix without having R read the csv as if each cell is a document?
You're not reading documents, so skip the Corpus() step. This should work directly:
myDTM <- as.DocumentTermMatrix(x, weighting = weightTf)
For next time, consider saving the TDM object as .RData as this will not require conversion, and is also much more efficient.
If you want to keep the format of any data, I would recommend to use the save() function.
You can save any R objects into a .RData file. And when you want to retrieve the data, you can use the load() function.

Hiding data from a text file in a image file using dwt steganography

The code below hides the text "helloworld" in the two specified DWT coefficients using steganography. I have been trying to adapt the code to hide data contained in a .txt file. I have been working on this for a while but cant seem to get anything to work correctly. Can anyone help please?
clear all;
close all;
dataToHide = 'helloworld';
wavename = 'haar';
data = zeros(1,length (dataToHide));
for i =1 : length(dataToHide);
d = dataToHide (i)+0;
data (i) = d;
end
im=imread ('cameraman.tif');
%imshow(im);
[cA1, cH1,cV1, cD1]= dwt2(im,wavename);
A1 = upcoef2('a',cA1,wavename,1);
H1 = upcoef2('h',cH1,wavename,1);
V1 = upcoef2('v',cV1,wavename,1);
D1 = upcoef2('d',cD1,wavename,1);
subplot(2,2,1); image(wcodemat(A1,192));
title ('A1');
subplot(2,2,2); image(wcodemat(H1,192));
title ('H1');
M=max(data);
normilize = data/M;
n=length(data);
cH1 (1,1) = -1*(n/10);
cH1 (1,2) = -1*(M/10);
[~ , y] =size(cH1);
for i = 1 : ceil(n/2)
cV1 (i,y) = normilize(i);
end
for i= ceil(n/2)+1 :n;
cD1 (i,y) = normilize(i);
end
Update
I can know read text from the file.However, I have come across another problem. When I read from file I want to convert the text to binary (name=dec2bin(dataToHide). The above code doesn't want to hide binary data for me?? I am very new to matlab & steganography/watermarking. I have been doing lots of research regarding LSB embedding in the discrete wavelet transform. However, the code above, which I took from the web is manipulating subband coefficients, but from what I can read from the code it is not doing it by LSB replacement. (i.e replace LSB of cover image with MSB of the secret data file). Can anyone recommend some code for me to look at that works by LSb wavelets embedding?

Importing timeseries datasets to MATLAB (all values are displayed as NaN)

I am stuck trying to run an economic model using MATLAB - at the data importing part. For most of my code I'm using a freeware toolbox called IRIS.
I have quarterly dataset with 14 variables and 160 datapoints. Essentially the dataset is a 15X161 matrix- including the dates(col1) and variable names(B1:O1).
The command used for uploading data on IRIS is
d = dbload('filename.csv')
but this isn't working. Although MATLAB is creating a 1X1 array called d and creating fields under it (one for each variable). All cells display NaN - not a number.
Why is this happening?
I checked the tutorials on the IRIS toolbox website and tried running and loading a sample dataset from there using this command, but it leads to the same problem. Everywhere I checked- including MATLAB help, this seems to be the correct command to use when using IRIS, but somehow it isn't working.
I also tried uploading the data directly using MATLAB functions and not IRIS. The command I'm using is:
d = dataset('XLSFile','filename.xls','ReadVarNames', true).
Although this is working, and I can see all the variable names, but MATLAB can't read the dates. I tried xlsread and importdata as well, but they don't read the variable names. Is there any way for me to upload the entire Excel sheet with the variable names and dates?
It would be best if I could get the IRIS command to work, since the rest of my code would be compatible with that.
The dataset looks somewhat like this..
HO_GDP HO_CPI HO_CPI HO_RS HO_ER HO_POIL....
4/1/1970 82.33 85.01 55.00 99.87 08.77
7/1/1970 54.22 8.98 25.22 95.11 91.77
10/1/1970 85.41 85.00 85.22 95.34 55.00
1/1/1971 85.99 899 8.89 85.1
You can use the TEXTSCAN function to read the CSV file in MATLAB:
%# some options
numCols = 15; %# number of columns
opts = {'Delimiter',',', 'MultipleDelimsAsOne',true, 'CollectOutput',true};
%# open file for reading
fid = fopen('filename.csv','rt');
%# read header line
headers = textscan(fid, repmat('%s',1,numCols), 1, opts{:});
%# read rest of data rows
%# 1st column as string, the other 14 as floating point
data = textscan(fid, ['%s' repmat('%f',1,numCols-1)], opts{:});
%# close file
fclose(fid);
%# collect data
headers = headers{1};
data = [datenum(data{1},'mm/dd/yyyy') data{2}];
The result for the above sample you posted (assuming values are comma-separated):
>> headers
headers =
'HO_GDP' 'HO_CPI' 'HO_CPI' 'HO_RS' 'HO_ER' 'HO_POIL'
>> data
data =
7.1962e+05 82.33 85.01 55 99.87 8.77
7.1971e+05 54.22 8.98 25.22 95.11 91.77
7.198e+05 85.41 85 85.22 95.34 55
7.1989e+05 85.99 899 8.89 85.1 0
Note how in the last line of the code we convert the date column to serial date number, so that we can store the entire data in one numeric matrix. You can always go back to string representation of dates using DATESTR function:
>> datestr(data(:,1))
ans =
01-Apr-1970
01-Jul-1970
01-Oct-1970
01-Jan-1971