Flattening time-series IoT data with pandas - pandas

Really appreciate some help in transforming a flat Excel table into a single column series.
The Excel data reflects daily IoT sensor data (365 days/year as rows) collected hourly (24 columns of observations by hour). Current presentation from Excel file is hourly readings (columns) and dates (rows). I'm new to Stack Overflow, so I cannot directly embed images yet.
Before 1:
After: 2:
I have successfully imported the .xls file with pd.read_excel, datetime type is set for the index column, and file is imported with skiprows/skipfooter
Problem #1: How to flatten file/transpose the multi-column dataframe into a single series by hour/date.
Problem #2: How to create a multiindex that combines the date of the observation with the hour of the observation.
The following images show where the data is and where it needs to go.
I apologize in advance for any lacks in posting protocols. As I mentioned, I'm new and therefore limited in what I can post to make it easier for you to assist.

You can use df.stack() for this.
See df.stack() for more information on usage
just using df.stack() will also automatically create a multiindex with the date and hour.

Related

Need to Combine two data frames with different information but same dates within the yfinance library

How to combine fundamental data and historical into the same data frame based on the date
I've tried a lot of things

In ArcGIS or QGIS is it possible to add timestamps to geotiffs from a CSV file containing the timestamps?

I have about 400 geotiffs consisting of satellite data obtained on different dates. I would like to make an animated time series from these data using GIS (either in ArcGIS or QGIS), Python or R. The issue I am having with the GIS approach is that the geotiffs do not have a 'date' field and so I cannot do this directly. However, I have a CSV file containing the geotiff file names in one column and the date for each geotiff in another. So my question is: Is it possible to populate the geotiffs with the time information contained in the CSV file? Furthermore, each of my geotiff filenames contain the date within them (in format YYYY-mm-dd), i.e. 'XYZ_2000-01-01.tif'.
So far I have tried using the Time Manager plugin in QGIS, with which I attempted to extract the date for each geotiff from the filename (see screenshot below). This was unsuccessful (Error = Could not find a suitable time format or date). I managed to create an animated time series for the data in R but I couldn't add a basemap, so the geographical context was lost.
You are specifying your date start wrongly and I think you probably want to add them as rasters not layers. Your date starts at a long way past 0 - 50 something. So I would rename all your files to put the date at the start to save counting all the way to the end.
My setup looks like (in QGIS Pi):

Manipulating time series with different start dates

I'm a novice Python programmer and I have an issue that I was hoping you could help me with.
I have two time series in Pandas, but they start at different dates. Let's say one starts in 1989, and the other in 2002. Now I want to to compare the cumulative growth of the two, by indexing both series to 2002 (the first time period where I have data for both), and calculate the ratio.
What is the best way to go about it? Ideally, the script should check what's the earliest available data for a pair of series and index both to 100 from that point onward.
Thank you in advance!
A practical solution may be to split the dataframe into two columns, one for each time series, and add a 'monthyear' column to each dataframe, that only lists the month and year (e.g. 05-2015). Then, you can use pd.merge on both dataframe on that month variable, keeping only the rows that have overlapping months in which they occur. The function would be pd.merge(df1, df2, on='monthyear', how='inner')
You can split the pandas dataframe by creating a new dataframe and loading in only 1 column (or row, depending on how your dataframe looks like). df1 = pd.Dataframe(original_dataframe[0]) and df2 = pd.Dataframe(original_dataframe[1])

Pandas Dataframe Functionality

I would like to create dataframes using excel spreadsheets as source data. I need to transform the data series from the format used to store the data in the excel spreadsheets to the dataframe variable end product.
I would like to know if users have experience in using various python methods to accomplish the following:
-data series transform: I have a series that includes one data value per month, but would like to expand the table of values to include one value per day using an index (or perhaps column with date values). So if table1 has a month based index and table2 has a daily index how can I convert table1 values to the table2 based index.
-dataframe sculpting: the data I am working with is not similar in length, some data sets are longer than others. By what methods is it possible to find the shortest series length in a column in the context of a multicolumn dataframe?
Essentially, I would like to take individual tables from workbooks and combine them into a single dataframe that uses a single index value as the basis for their presentation. My workbook tables may have data point frequencies of daily, weekly, or monthly and I would like to build a dataframe that uses the daily index as a basis for the table elements while including an element for each day in series that are weekly and monthly.
I am looking at the Pandas library, but perhaps there are other libraries that I have overlooked with additional functionality.
Thanks for helping!
For your first question, try something like:
df1 = df1.resample('1d').first()
df2.merge(df1)
That will upsample your monthly or weekly dataframes and merge it with your daily dataframe. Take a look at the interpolate method to fill in missing values. To get the name of the shortest column try this out:
df.count().idxmin()
Hope that helps!

pandas average relative elements from multi files

I want to average several .csv files which are same dimension, and save as a new file. such as:
df_new=avergae(df1,df2,df3.....)
Are there any ready function carrying out it?
Suppose all your DF columns are numeric and you want to take the mean value across multiple DFs, you can do something like this.
pd.DataFrame(columns=df1.columns, data = np.mean(np.array([df1.values,df2.values,df3.values]),axis=0))