Printing a large dataframe across pages - pandas

I have a need to print a large table across multiple pages which contains both header rows and a “header”column. Representative of what I would like to achieve is:
https://github.com/EricG-Personal/table_print/blob/master/table.png
I do not want the contents of any cell to be clipped, split between pages, or auto-scaled to be smaller. Each page should have the appropriate header rows and each page should have the appropriate header column (the ID column).
The only aspect not depicted is that some of the cells would contain image data.
Can I achieve this with pandas?
What possible solutions do I have when attempting to print a large dataframe?

Pandas has no such capabilities, it wasn't designed for that in the first place.
I'd suggest converting your DataFrame to excel sheet and print that using MS Excel. It has -to the best of my knowledge- all what you need.

Related

read_excel only read cells formated as table

This is the way I am currently importing the information from an excel file where all rows that contains information are formatted as a table. Row number 13 is the header.
df = pd.read_excel('path_to_file', skiprows=12, usecols="C:T", na_values='N/A')
My question is, considering that I have skipped the rows(skiprows) and columns(usecols) without information, is pandas only reading down to the end of the cells that contain values? Currently 65.500 but increasing everyday, or always read a fix amount(e.g. 1M)
Is there any way that I can improve the performance/only read the necessary rows(cells with values)?
Thank you!

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

Machine Learning - Feature Generation using dataframes with different sizes

I have multiple CSV files with 18 columns of sensordata of a production cycle ordered by time. Each CSV file represents one product (smartphone), which production was either sucessful (1) or unsucessful (0). I converted each CSV to a dataframe and brought them together in a dictionary. The CSV files have different numbers of rows.
My question is now, if I do have to compress them into one single row with the result of either 1 or 0 at the end to compare different machine learning algorithm (like the multiple logistic regression). For my algorithm, each input is a dataframe, and the output is a label. Concatenating all the rows side by side into one single row could create feature vectors of different lengths.
For example: I have 7 CSV files converted into 7 dataframes and put them together in one dataframe with 7 rows (a single row for every dataframe).
If I have to compresss one DataFrame to a single row, could you tell me how to do so?
Or is it possible to tell the algorithm, that it has to consider every row of a whole dataframe (30000 rows).
Thank you very much!

Comparing two datasets in SSRS

I'm looking to compare two datasets with each other. In an ideal world, I'd like to have it to show a green item if the data matches between the two. I have created two different GDocs files to get the code out there, to prevent SO from dinging me on formatting.
The first dataset is from our program itself, it pulls everything from our application, and displays the information, based on company code. The second dataset is from an external source requiring validation. The main fields I am matching are "NPI Number (Type 1)" from DS1 vs. "NPI" from DS2. If there is a match to highlight in green the row from both sides of data.
Dataset 1
Dataset 2
You may need to use LookUp function and set that as a expression to fill the background color of a text box or row of a table
Sample Expression: =iif(Len(Lookup(Fields!NPI.Value, Fields!NPI.Value, Fields!ProviderName.Value, "DS1"))>0,"Green","Red")
I have created a sample here. Download entire content and run it.

What is the best way to store and access static table data?

A real beginner here,
I am looking to have a table of static data with about 300 cells in it. (There will be 12 distinct tables in all)
The user would input two values, the first would indicate the row, and the second would point to the cell within that row, and I want my app to be able to read back the column heading for that row.
What is the best way to have this data stored in my app? Currently the data is in a spreadsheet.
The data looks like:
Index 0,Index 1,Index 2,Index 3 ,Index 4,Index 5,Index 6,Index 7,Index 8,Index 9
10,156,326,614,1261,1890,3639,5800,10253,20914
20,107,224,422,867,1299,2501,3986,7047,14374 ...etc.
Where the number at index zero is the name of the row (entered by user) and the numbers after that are the values also entered by the user.
I want the code to take the two numbers (row and value) and then return a string based on the column heading (shown here as index 0 - 9)
the last tricky bit is if the user enters a value that is in between the values give I want it to use the next highest value from the data. E.g. if in row "10" the user inputs 700 I want the code to return the index heading for 1261.
Does that make sense?
Possibilities are endless...
In code as a static 2D array
XML
JSON
Tab Delimited Text File
Comma Delimited Text File
PList
etc.
All depends on your needs and wants.
On the CONs for each:
Static 2D array may consume some memory every time the app runs...
A file will involve some disk IO or processing requirements to read the values out of the file stored in the Bundle.
On the PROs for each:
Data from the static array would be FAST...
Updating data in a file could be done on-the-fly over the web.
You could write a simple routine to dump your spreadsheet into any of the above listed options, so I don't think that's a real serious consideration. It's mostly about what works best for you in terms of size of data and updatability/maintainability.