Borderless table some values are not separated into columns - dataframe

I am using Camelot to extract borderless tables from a pdf file. I've used the below parameters
budget_tables = camelot.read_pdf(budget_file,pages='all',flavor='stream',edge_tol=80,strip_text='\n')
The issue is that for some tables(there are over 300 tables in this file) some of the values that are too large end up grouped together in the same cell. So that I have an output like the below where some rows each value is in a separate column and other values are separated by a space and placed in the same cell.
I was thinking I'd have to create a function that goes through the dataframe and check each cell for the delimiters (' '), splits it and fills the empty cells around it with the splits (which I think i still need help with as its not consistent whether cells are empty to the left or right). But if there is a method within the Camelot line that may help reduce these type of outputs then i think that's where I'd prefer to start.
Sorry for the bad table formatting below. Any tips on showing this table a bit better would be appreciated. I can't upload images from my workstation.
0 | 1 | 2 | 3 | 4 | 5 | 6
30 Sales of non-financial assets |173,853 |192,108 |176,957 |226,843 |188,370 |74,022
31 Payments for non-financial asset|-1,274,120 |-866,331 |-1,372,111 -1,100,557 -1,359,568 ...
32 Net cash flows from investments |-1,100,267 |-674,223 -1,195,154| |-873,714 -1,171,198 -1,229,102|
33 in non-financial assets
34 Cash flows from investments in
35 financial assets for policy
36 purposes
37 Receipts
38 Repayment of loans | 30,044 | 29,409 | 1,185 |3,235 |6,136 |9,036

Related

Matrix in SQL/VB.net

i have a Datagrid that stores the number pencils produced each day of the month it looks like this:
Pencil | day 1 | day 2 | day 3 | ... |day 31
Red 0 0 13 0 0
blue 5 1 0 8 0
yellow 0 9 5 0 0
I need to save this data into SQL table but im not sure what's the most efficent way to design the table in SQL.
I was thinking about creating a table in SQL with the fields:
pencilmodel
date
quantity
and then in vb.net making a loop that saves 1 by 1 each cell of the datagrid in to the table, but i dont think this is the best way since i will have like 30 rows and a month has 31 days max so it will be 30*31= 930 times.
Im using VB.net and SQL Server
i would create the table that way (as you suggested):
ID | pencilmodel | ProducedDate | Quantity
1 blue dd-mm-yyyy 7
2 red dd-mm-yyyy 4
3 yellow dd-mm-yyyy 6
also, dont loop and insert each row to database, its not efficient, add it to a dataset first and then update it using DataAdapter.Update or bind a dataset to the datagrid view:
How to: Bind Data to the Windows Forms DataGridView Control
I dont know if this one is relevant but why dont you create a fields based on the date and time? lets say like this in your PC
12/14/2016
You can create a program that will create a field for you everyday for example when the day passes by then add a column look like this.
__________________________________
|12/14/2016|12/15/2016|12/15/2016|
so what will happen is you dont need to loop in DGV you just do your INSERT COMMAND
you just need some modifications and validations in here like
if Date_Has_Been_Changed then
Create Table Add Columns
End If

Transpose a large dataset

I have a lot of data for each User ID that needs to be organized by column rather than by row as it is currently. I have tried standard transposition methods but cannot figure this out. Any ideas would be greatly appreciated.
Current data set:
UserId Item Value(mL)
1 AAA 12
1 AAB 21
1 AAC 31
2 AAA 15
2 AAB 21
2 AAC 34
2 AAD 16
Desired outcome:
UserID AAA AAB AAC AAD
1 12 21 31
2 15 21 34 16
With formula:
=SUMIFS($C:$C,$A:$A,$F2,$B:$B,G$1)
Copy over and down.
As #skkakkar stated: with Pivot Table
There is an excel paste option called "transpose" that will allow you to accomplish this. Select your data and copy it. Then go to the target cell and go to paste options and press "T" or click the transpose button.
EDIT:
There are other ways of solving this, as Scott has shown in his answer. If you are performing this on a large data set, my solution will be the fastest by far, but his solution is also very sleek. In addition, this won't work to only keep non-duplicate headers. You will need to do a bit of work to have this work the exact way the poster wanted.

Table Total Column based on cell values - SQL Report Builder 3.0

I have a table built off a dataset containing timesheet data with possible multiple entries per day (day_date) for a given person. The table is grouped on day_date. The field for hours is effort_hr (see dataset and report layout below).
The table generates a single row with one column for each day (as expected).
For each day I want only one value (total hours for person) so the expression is Sum(Fields!effort_hr.Value) This is properly adding up all the hours for each day.
Now I add a total column at the end of the row to see ALL the hours for the whole timesheet. The expression in the total column cell is Sum(Fields!effort_hr.Value) which is exactly the same as the daily ones. Again, this is adding up all hours for the timesheet.
So this is working great.
I now need a new row that only shows a max of 8 hours per day. So if the person works less, it shows less, but if the person works more, show a max of 8.
In this case, the daily column expression is:
IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value))
And again, it displays perfectly for each day.
The total for this row is where I run into trouble. I have tried so many ways, but I cannot get the total for the columns in this row. The report keeps showing an #Error in the cell. The report saves fine and there is no error in the expr.
The problem seems to come from the fact that there are 2 values for a given day. So in other words, for 5 days, the person has 6 entries. When I try it for a person with only 5 entries, no problem.
I have tried:
Sum(IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value)))
RunningValue(IIF(Sum(Fields!effort_hr.Value)>8.0,8.0,Sum(Fields!effort_hr.Value)),Sum,Nothing)
I either get an #Error, or I get the wrong total. Is there any way to just get a total for the cell values in the table? The daily numbers are correct, just give me the total at the end (like Excel).
I could do this in the SQL, but that would mess up other parts of this report.
DataSet:
res_name | day_date | effort_hr
J. Doe | Apr 6, 2015 | 2
J. Doe | Apr 6, 2015 | 9
J. Doe | Apr 7, 2015 | 8
J. Doe | Apr 8, 2015 | 7
J. Doe | Apr 9, 2015 | 10
J. Doe | Apr 10, 2015 | 9
REPORT TABLE Layout:
| Apr 6 | Apr 7 | Apr 8 | Apr 9 | Apr 10 | Totals
Total | 11 | 8 | 7 | 10 | 9 | 45
Reg | 8 | 8 | 7 | 8 | 8 | 39
OT | 3 | 0 | 0 | 2 | 1 | 6
Problem:
Row 1 Column Total works great and gives 45 hours ;
Row 2 Column Total either gives #Error, 41, or some other wrong number - just need it to total the actual values of each cell in the row ;
same problem for Row 3 total
Thanks in advance for your time!
Posting another answer as the previous one has become so long.
I referred to this MSDN link, and used the selected answer. Apparently we need to use custom code to achieve this (if you are not willing to change your dataset and have the calculated values in there).
Right click on report --> report properties --> Go to tab 'Code' --> Paste this
Dim public nettotal as Double
Public Function Getvalue (ByVal subtotal AS Double) AS Double
nettotal = nettotal+ subtotal
return subtotal
End Function
Public Function Totalvalue()
return nettotal
End Function
In the row group expression of second row put
= code.Getvalue(IIF(Sum(Fields!Efforts.Value)>8.0,8.0,Sum(Fields!Efforts.Value)))
In the Total cell expression (for second row) put
=code.Totalvalue()
Save and run, you should see following result.
I used your input data and tried to create the report in given format. I used following function for Row 2 Total
=Sum(IIF(Fields!Efforts.Value>8.0,8.0,Fields!Efforts.Value),"DataSet1",Recursive)
This shows sum as 39 for second row. You can try and let me know if it works for you. If it doesn't I will list the exact steps how I created Matrix and groups.
Note: Don't forget to put your dataset name in the second argument of function Sum. And Recursive, as clear by name, applies Sum recursively for the group.
Update: I followed following steps.
1. Add a Matrix on the report.
2. Under Column group section on Matrix, Select any column name from the dataset. (Otherwise it won't show any columns in the next step)
2. Right click Column --> Add Group --> (Under column group) Add Parent Group. Select Day as Group By --> OK. It will create a new row. Put expression Sum(Efforts) in first row. And your expression =IIF(Sum(Fields!Efforts.Value)>8.0,8.0,Sum(Fields!Efforts.Value)) in the second row.
Right click on the column group section in the group pane --> Select Add Total --> After. It will add new column at the end of Matrix. Put expression Sum(Efforts) in first row and expression =Sum(IIF(Fields!Efforts.Value>8.0,8.0,Fields!Efforts.Value),"DataSet1",Recursive) in the second row.
Save and run you should see following in the report.
Remember to change the names of columns and dataset as par your code.
This is an idea on how to do such grouping, obviously you'd need to do changes for the headers and the 3rd row etc.
HTH.

Excel VBA SubTotals

Excel build-in functions are, at most of the time, effective. However, there are some functions really like implemented half-way and some how dictated their usage. The SUBTOTAL function is one of them.
My data is presented in the following format:
Value Count
100 20
102 3
105 4
102 5
And I want to build a table in this format:
Value Count
100 20
101 0
102 8
103 0
104 0
105 4
I've read this in SO but my situation is a bit differ. Pivot table will be able to give you the subtotals of the values appears in the original data and I don't want to have a loop to insert missing values in the original data (if it is gonna to be a loop over the original data, the loop could use to build the table - which I would prefer to avoid at all)

Highlighting Values in a Crystal Reports Crosstab based on sibling values

I have crosstab which has row columns indicating different classes, and then peoples names across the top.
| | Required | Person 1 | Person 2 | Person 3 |
| Class 1 | 8 6 | 1 6 | 3 6 | 4 6 |
| Class 2 | 6 2 | 3 2 | 2 2 | 1 2 |
Each field contains 2 values The first value is the number of hours spent in the class, the second field is the number of hours required for certification.
The Required field id my grand total summary.
In the cross tab expert the fields are defined as follows.
Rows:
Command.descr -> a field containing the class names
Columns:
Command.fullname -> a field containing students full names
Summarized Fields:
Sum of Command.evlength -> summation of all time spent in a given course
Max of #required -> this formula returns the number of required hours based on the course name
I am trying to highlight the field Sum of Command.evlength if it is greater than or equal to the value of Max of #required.
My solution was to perform background formatting. Right-Click on the Sum of Command.evlength field, select Format Field. Click the borders tab, check Background, and enter a formula.
The formula I was using is:
if CurrentFieldValue >= {#required} then color(152, 251, 152) else crNoColor
This is not the correct formula. My crosstab has been placed in the footer, which causes {#required} to contain the last value in the grid which in the above example is 2.
From my research I thought I would have to use GridRowColumnValue(row or column name) to access the value of {#required} in the crosstab, but I could not come up with the correct string to represent it.
Does anyone have a way for me to correctly perform this comparison?
Frustratingly I don't think you can use the highlighting expert to compare to a dynamic value. You could swap the columns round then add the following formulas:
To the max_of_required background colour:
whileprintingrecords;
global numbervar required_hrs := currentfieldvalue;
crNoColor;
To the sum_of_command.evlength background colour:
whileprintingrecords;
global numbervar required_hrs;
if currentfieldvalue >= required_hrs then
crRed
else
crNoColor;
I think there are a few other ways but i'm not as confident with those so start here.