Apache Poi Word Table, information about the Alt Text

Apache Poi Word Table, information about the Alt Text - apache

How to get the Alt Text from a Table in Word, e.g. Title or Description.
All the Information, that I found had the context, Text, Width, Style etc. to read or modify.
My goal is to identify certain Tables in a Template, so I can modify them.

I am going to make some assumptions here. First that you are talking about the docx format, and second that by "Alt Text" you mean a caption.
A caption is just a paragraph that either precedes, or follows a table. It will have a style of Caption, a run with some text like Table, and probably includes a simple field SEQ Table. That would be the default Table caption, but it could be just a run with a name like Department Heads. The key is the style name. Word uses standard style names to calculate other things as well such as TOC.
Note: in Word, you cannot modify a caption by selecting a table, and clicking a menu option. It isn't really linked in any meaningful way. You have to modify the paragraph.
So to find a caption, you need to look in the Document elements list XWPFDocument.getBodyElements(), and find each paragraph with a style of Caption. Once you have found the one you want, then you can either look at the element immediately above or below to find the table. Your search will be easier if you can know that captions are all above or all below the tables.
So to retrieve the table following a specific named caption I would try something like this:
public XWPFTable FindTable(String name) {
boolean foundTable = false;
XWPFParagraph p;
for (IBodyElement elem : doc.getBodyElements()) {
switch (elem.getElementType()) {
case PARAGRAPH:
foundTable = false;
p = (XWPFParagraph) elem;
if (p.getStyle() == "Caption" && p.getText() == name) {
foundTable = true;
}
break;
case TABLE:
if (foundTable) {
return (XWPFTable) elem;
}
break;
case CONTENTCONTROL:
foundTable = false;
break;
default:
foundTable = false;
break;
}
}
return null;
}

In Word you can set the table caption to something unique, and then get the table in xml:
String tableXML = mytable.getCTTbl();
To extract the table caption:
String[] xml = tableXML.split(System.lineSeparator());
String caption = null;
for (String x : xml)
{
if (x.contains("w:tblCaption"))
{
caption = x.split("w:val=")[1].replace("/>", "");
caption = caption.replace("\"", "");
}
}

Related

How do I read data in one cell and write data into another cell using Google Sheets?

So let's all assume that column B is filled with multiple, short statements. These statements may be used more than once, not at all, or just once throughout the column. I want to be able to read what's in each cell of column B and assign a category to it in column F using the Google Sheets script editor. I'll include some pseudo-code of how I would do something like this normally.
for (var i = 0; i < statements.length; i++) {
if (statements[i] == 'Description One') {
category[i] = 'Category One';
}
else if (statements[i] == 'Description Two') {
category[i] = 'Category Two';
}
// and so on for all known categories....
}
How do I go about accessing a cell for a read and accessing a different cell for a write?
Thanks in advance for the help!

Ok, so after a little more thought on the subject, I've arrived at a solution. It's super simple, albeit tedious
function assignCategory(description) {
if (description == 'Description One') {
return 'Category One';
}
// and so on for all known categories
}
Hopefully someone will see this and be helped anyway, if you guys think of a more efficient and easier to maintain way of doing this, by all means do chime in.

Assuming a sheet such as this one, which has a header and six different columns (where B is the description, and F the category); you could use a dictionary to translate your values as follows:
// (description -> category) dictionary
var translations = {
"cooking": "Cooking",
"sports": "Sport",
"leisure": "Leisure",
"music": "Music",
"others": "Other"
}
function assignCategories() {
var dataRange = SpreadsheetApp.getActiveSheet().getDataRange();
for (var i=2; i<=dataRange.getNumRows(); i++) {
var description = dataRange.getCell(i, 2).getValue();
var category = translations[description];
dataRange.getCell(i, 6).setValue(category);
}
}
In case you need additional ruling (i.e. descriptions that contain cricket must be classified as sport), you could accomplish your desired results by implementing your own custom function and using string functions (such as indexOf) or regular expressions.
Using indexOf
// (description -> category) dictionary
var translations = {
"cooking": "Cooking",
"sports": "Sport",
"leisure": "Leisure",
"music": "Music",
"others": "Other"
}
function assignCategories() {
var dataRange = SpreadsheetApp.getActiveSheet().getDataRange();
for (var i=2; i<=dataRange.getNumRows(); i++) {
var description = dataRange.getCell(i, 2).getValue()
var category = assignCategory(description);
if (category) dataRange.getCell(i, 6).setValue(category);
}
}
function assignCategory(description) {
description = description.toLowerCase();
var keys = Object.keys(translations);
for (var i=0; i<categories.length; i++) {
var currentKey = keys[i];
if (description.indexOf(currentKey) > -1)
return translations[currentKey];
}
}
This version is a bit more sophisticated. It will make the 'description' of each row lowercase in order to better compare with your dictionary, and also uses indexOf for checking whether the 'translation key' appears in the description, rather than checking for an exact match.
You should be aware however that this method will be considerably slower, and that the script may timeout (see GAS Quotas). You could implement ways to 'resume' your script operations such that you can re-run it and continue where it left off, in case that this hinders your operations.

How do I get a list of fields in a generic sObject?

I'm trying to build a query builder, where the sObject result can contain an indeterminate number of fields. I'm using the result to build a dynamic table, but I can't figure out a way to read the sObject for a list of fields that were in the query.
I know how to get a list of ALL fields using the getDescribe information, but the query might not contain all of those fields.
Is there a way to do this?

Presumably you're building the query up as a string, since it's dynamic, so couldn't you just loop through the fields in the describe information, and then use .contains() on the query string to see if it was requested? Not crazy elegant, but seems like the simplest solution here.
Taking this further, maybe you have the list of fields selected in a list of strings or similar, and you could just use that list?

Not sure if this is exactly what you were after but something like this?
public list<sObject> Querylist {get; set;}
Define Search String
string QueryString = 'select field1__c, field2__c from Object where';
Add as many of these as you need to build the search if the user searches on these fields
if(searchParameter.field1__c != null && searchParameter.field1__c != '')
{
QueryString += ' field1__c like \'' + searchParameter.field1__c + '%\' and ';
}
if(searchParameter.field2__c != null && searchParameter.field2__c != '')
{
QueryString += ' field2__c like \'' + searchParameter.field2__c + '%\' and ';
}
Remove the last and
QueryString = QueryString.substring(0, (QueryString.length()-4));
QueryString += ' limit 200';
add query to the list
for(Object sObject : database.query(QueryString))
{
Querylist.add(sObject);
}

To get the list of fields in an sObject, you could use a method such as:
public Set<String> getFields(sObject sobj) {
Set<String> fieldSet = new Set<String>();
for (String field : sobj.getSobjectType().getDescribe().fields.getMap().keySet()) {
try {
a.get(field);
fieldSet.add(field);
} catch (Exception e) {
}
}
return fieldSet;
}
You should refactor to bulkily this approach for your context, but it works. Just pass in an sObject and it'll give you back a set of the field names.

I suggest using a list of fields for creating both the query and the table. You can put the list of fields in the result so that it's accesible for anyone using it. Then you can construct the table by using result.getFields() and retrieve the data by using result.getRows().
for (sObject obj : result.getRows()) {
for (String fieldName : result.getFields()) {
table.addCell(obj.get(fieldName));
}
}
If your trying to work with a query that's out of your control, you would have to parse the query to get the list of fields. But I wouldn't suggest trying that. It complicates code in ways that are hard to follow.

Export SQL query data to Excel

I have a query that returns a very large data set. I cannot copy and paste it into Excel which I usually do. I have been doing some research on how to export directly to an Excel sheet. I am running SQL SERVER 2008 on a server running Microsoft Server 2003. I am trying to use the Microsoft.Jet.OLEDB.4.0 data provider and Excel 2007. I've pieced together a small piece of code that looks like this from what I've seen in examples.
INSERT INTO OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\Working\Book1.xlsx;Extended Properties=EXCEL 12.0;HDR=YES')
SELECT productid, price FROM dbo.product
However this is not working, I am getting an error message saying
"Incorrect syntax near the keyword 'SELECT'".
Does anyone have any ideas about how to do this or possibly a better approach?

I don't know if this is what you're looking for, but you can export the results to Excel like this:
In the results pane, click the top-left cell to highlight all the records, and then right-click the top-left cell and click "Save Results As". One of the export options is CSV.
You might give this a shot too:
INSERT INTO OPENROWSET
('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=c:\Test.xls;','SELECT productid, price FROM dbo.product')
Lastly, you can look into using SSIS (replaced DTS) for data exports. Here is a link to a tutorial:
http://www.accelebrate.com/sql_training/ssis_2008_tutorial.htm
== Update #1 ==
To save the result as CSV file with column headers, one can follow the steps shown below:
Go to Tools->Options
Query Results->SQL Server->Results to Grid
Check “Include column headers when copying or saving results”
Click OK.
Note that the new settings won’t affect any existing Query tabs — you’ll need to open new ones and/or restart SSMS.

If you're just needing to export to excel, you can use the export data wizard.
Right click the database, Tasks->Export data.

I had a similar problem but with a twist - the solutions listed above worked when the resultset was from one query but in my situation, I had multiple individual select queries for which I needed results to be exported to Excel. Below is just an example to illustrate although I could do a name in clause...
select a,b from Table_A where name = 'x'
select a,b from Table_A where name = 'y'
select a,b from Table_A where name = 'z'
The wizard was letting me export the result from one query to excel but not all results from different queries in this case.
When I researched, I found that we could disable the results to grid and enable results to Text. So, press Ctrl + T, then execute all the statements. This should show the results as a text file in the output window. You can manipulate the text into a tab delimited format for you to import into Excel.
You could also press Ctrl + Shift + F to export the results to a file - it exports as a .rpt file that can be opened using a text editor and manipulated for excel import.
Hope this helps any others having a similar issue.

For anyone coming here looking for how to do this in C#, I have tried the following method and had success in dotnet core 2.0.3 and entity framework core 2.0.3
First create your model class.
public class User
{
public string Name { get; set; }
public int Address { get; set; }
public int ZIP { get; set; }
public string Gender { get; set; }
}
Then install EPPlus Nuget package. (I used version 4.0.5, probably will work for other versions as well.)
Install-Package EPPlus -Version 4.0.5
The create ExcelExportHelper class, which will contain the logic to convert dataset to Excel rows. This class do not have dependencies with your model class or dataset.
public class ExcelExportHelper
{
public static string ExcelContentType
{
get
{ return "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"; }
}
public static DataTable ListToDataTable<T>(List<T> data)
{
PropertyDescriptorCollection properties = TypeDescriptor.GetProperties(typeof(T));
DataTable dataTable = new DataTable();
for (int i = 0; i < properties.Count; i++)
{
PropertyDescriptor property = properties[i];
dataTable.Columns.Add(property.Name, Nullable.GetUnderlyingType(property.PropertyType) ?? property.PropertyType);
}
object[] values = new object[properties.Count];
foreach (T item in data)
{
for (int i = 0; i < values.Length; i++)
{
values[i] = properties[i].GetValue(item);
}
dataTable.Rows.Add(values);
}
return dataTable;
}
public static byte[] ExportExcel(DataTable dataTable, string heading = "", bool showSrNo = false, params string[] columnsToTake)
{
byte[] result = null;
using (ExcelPackage package = new ExcelPackage())
{
ExcelWorksheet workSheet = package.Workbook.Worksheets.Add(String.Format("{0} Data", heading));
int startRowFrom = String.IsNullOrEmpty(heading) ? 1 : 3;
if (showSrNo)
{
DataColumn dataColumn = dataTable.Columns.Add("#", typeof(int));
dataColumn.SetOrdinal(0);
int index = 1;
foreach (DataRow item in dataTable.Rows)
{
item[0] = index;
index++;
}
}
// add the content into the Excel file
workSheet.Cells["A" + startRowFrom].LoadFromDataTable(dataTable, true);
// autofit width of cells with small content
int columnIndex = 1;
foreach (DataColumn column in dataTable.Columns)
{
int maxLength;
ExcelRange columnCells = workSheet.Cells[workSheet.Dimension.Start.Row, columnIndex, workSheet.Dimension.End.Row, columnIndex];
try
{
maxLength = columnCells.Max(cell => cell.Value.ToString().Count());
}
catch (Exception) //nishanc
{
maxLength = columnCells.Max(cell => (cell.Value +"").ToString().Length);
}
//workSheet.Column(columnIndex).AutoFit();
if (maxLength < 150)
{
//workSheet.Column(columnIndex).AutoFit();
}
columnIndex++;
}
// format header - bold, yellow on black
using (ExcelRange r = workSheet.Cells[startRowFrom, 1, startRowFrom, dataTable.Columns.Count])
{
r.Style.Font.Color.SetColor(System.Drawing.Color.White);
r.Style.Font.Bold = true;
r.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
r.Style.Fill.BackgroundColor.SetColor(Color.Brown);
}
// format cells - add borders
using (ExcelRange r = workSheet.Cells[startRowFrom + 1, 1, startRowFrom + dataTable.Rows.Count, dataTable.Columns.Count])
{
r.Style.Border.Top.Style = ExcelBorderStyle.Thin;
r.Style.Border.Bottom.Style = ExcelBorderStyle.Thin;
r.Style.Border.Left.Style = ExcelBorderStyle.Thin;
r.Style.Border.Right.Style = ExcelBorderStyle.Thin;
r.Style.Border.Top.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Bottom.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Left.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Right.Color.SetColor(System.Drawing.Color.Black);
}
// removed ignored columns
for (int i = dataTable.Columns.Count - 1; i >= 0; i--)
{
if (i == 0 && showSrNo)
{
continue;
}
if (!columnsToTake.Contains(dataTable.Columns[i].ColumnName))
{
workSheet.DeleteColumn(i + 1);
}
}
if (!String.IsNullOrEmpty(heading))
{
workSheet.Cells["A1"].Value = heading;
// workSheet.Cells["A1"].Style.Font.Size = 20;
workSheet.InsertColumn(1, 1);
workSheet.InsertRow(1, 1);
workSheet.Column(1).Width = 10;
}
result = package.GetAsByteArray();
}
return result;
}
public static byte[] ExportExcel<T>(List<T> data, string Heading = "", bool showSlno = false, params string[] ColumnsToTake)
{
return ExportExcel(ListToDataTable<T>(data), Heading, showSlno, ColumnsToTake);
}
}
Now add this method where you want to generate the excel file, probably for a method in the controller. You can pass parameters for your stored procedure as well. Note that the return type of the method is FileContentResult. Whatever query you execute, important thing is you must have the results in a List.
[HttpPost]
public async Task<FileContentResult> Create([Bind("Id,StartDate,EndDate")] GetReport getReport)
{
DateTime startDate = getReport.StartDate;
DateTime endDate = getReport.EndDate;
// call the stored procedure and store dataset in a List.
List<User> users = _context.Reports.FromSql("exec dbo.SP_GetEmpReport #start={0}, #end={1}", startDate, endDate).ToList();
//set custome column names
string[] columns = { "Name", "Address", "ZIP", "Gender"};
byte[] filecontent = ExcelExportHelper.ExportExcel(users, "Users", true, columns);
// set file name.
return File(filecontent, ExcelExportHelper.ExcelContentType, "Report.xlsx");
}
More details can be found here

I see that you’re trying to export SQL data to Excel to avoid copy-pasting your very large data set into Excel.
You might be interested in learning how to export SQL data to Excel and update the export automatically (with any SQL database: MySQL, Microsoft SQL Server, PostgreSQL).
To export data from SQL to Excel, you need to follow 2 steps:
Step 1: Connect Excel to your SQL database‍ (Microsoft SQL Server, MySQL, PostgreSQL...)
Step 2: Import your SQL data into Excel
The result will be the list of tables you want to query data from your SQL database into Excel:

Step1: Connect Excel to an external data source: your SQL database
Install An ODBC
Install A Driver
Avoid A Common Error
Create a DSN
Step 2: Import your SQL data into Excel
Click Where You Want Your Pivot Table
Click Insert
Click Pivot Table
Click Use an external data source, then Choose Connection
Click on the System DSN tab
Select the DSN created in ODBC Manager
Fill the requested username and password
Avoid a Common Error
Access The Microsoft Query Dialog Box
Click on the arrow to see the list of tables in your database
Select the table you want to query data from your SQL database into Excel
Click on Return Data when you’re done with your selection
To update the export automatically, there are 2 additional steps:
Create a Pivot Table with an external SQL data source
Automate Your SQL Data Update In Excel With The GETPIVOTDATA Function
I’ve created a step-by-step tutorial about this whole process, from connecting Excel to SQL, up to having the whole thing automatically updated. You might find the detailed explanations and screenshots useful.

Issue with itextsharp

I have a PDF document that has several hundred fields. All of the field names have periods in them, such as "page1.line1.something"
I want to remove these periods and replace them with either an underscore or (better) nothing at all
There appears to be a bug in the itextsharp libraries where the renamefield method does not work if the field has a period, so the following does not work (always returns false)
Dim formfields As AcroFields = stamper.AcroFields
Dim renametest As Boolean
renametest = formfields.RenameField("page1.line1.something", "page1_line1_something")
If the field does not have a period in it, it works fine.
Has anyone come across this and is there a workaround?

Is this an AcroForm form or a LiveCycle Designer (xfa) form?
If it's XFA (which is likely given the field names), iText can't help you. It can only get/set field values when working with XFA.
Okay, an AcroForm. Rather than go the route used in your source, I suggest you directly manipulate the existing field dictionaries and the acroForm field list.
I'm a Java native when it comes to iText, so you'll have to do some translation, but here goes:
A) Delete the AcroForm's field array. Leave the calculation order alone if present (/CO). I think.
PdfDictionary acroDict = reader.getCatalog().getAsDictionary(PdfName.ACROFORM);
acroDict.remove(PdfName.FIELDS);
B) Attach all the 'top level' fields to a new FIELDS array.
PdfArray newFldArray = new PdfArray();
acroDict.put(newFldArray, PdfName.FIELDS);
// you could wipe this between pages to speed things up a bit
Set<PdfIndirectReference> radioFieldsAdded = new HashSet<PdfIndirectReference>();
int numPages = reader.getNumberOfPages();
for (int curPg = 1; curPg <= numPages; ++curPg) {
PdfDictionary curPageDict = reader.getPageN(curPg);
PdfArray annotArray = curPageDict.getAsArray(PdfName.ANNOTS);
if (annotArray == null)
continue;
for (int annotIdx = 0; annotIdx < annotArray.size(); ++annotIdx) {
PdfIndirectReference fieldReference = (PdfIndirectReference) annotArray.getAsIndirect(annotIdx);
PdfDictionary field = (PdfDictionary)PdfReader.getObject(fieldReference);
// if it's a radio button
if ((PdfFormField.FF_RADIO & field.getAsNumber(PdfName.FF).intValue()) != 0) {
fieldReference = field.get(pdfName.PARENT);
field = field.getAsDict(PdfName.PARENT); // looks up indirect reference for you.
// only add each radio field once.
if (radioFieldsAdded.contains(fieldReference)) {
continue;
} else {
radioFieldsAdded.add(fieldReference);
}
}
field.remove(PdfName.PARENT);
// you'll need to assemble the original field name manually and replace the bits
// you don't like. Parent.T + '.' child.T + '.' + ...
String newFieldName = SomeFunction(field);
field.put(PdfName.T, new PdfString( newFieldName ) );
// add the reference, not the dictionary
newFldArray.add(fieldReference)
}
}
C) Clean up
reader.removeUnusedObjects();
Disadvantage:
More Work.
Advantages:
Maintains all field types, attributes, appearances, and doesn't change the file as a whole all that much. Less CPU & memory.
Your existing code ignores field script, all the field flags (read only, hidden, required, multiline text, etc), lists/combos, radio buttons, and quite a few other odds and ends.

if you use periods in your field name, only the last part can be renamed, e.g. in page1.line1.something only "something" can be renamed. This is because the "page1" and "line1" are treated by adobe as parents to the "something" field
I needed to delete this hierarchy and replace it with a flattened structure
I did this by
creating a pdfdictionary object for each field
reading the annotations I needed for each field into an array
deleting the field hierarchy in my (pdfstamper) document
creating a new set of fields from my array data
I have created some sample code for this if you want to see how I did it.

Insert text into flex 3 textarea

I have a textArea and a list. When a user double clicks a list item, the label of the selected item should be inserted into the textarea. When a text is selected in the textArea, it should be replaced, otherwise the text just needs to be inserted into the existing text at the caret point.
I've managed to get the text and everything, I just can't manage to insert it at the caret point. Does anyone know how to do this?

It's actually not JavaScript but Adobe Flex 3. Thanks for the help though, it did push me in the right direction. This is the way its done in Flex 3:
var caretStart:int = textArea.selectionBeginIndex;
var caretEnd:int = textArea.selectionEndIndex;
textArea.text = textArea.text.substring(0,caretStart)
+ newText
+ textArea.text.substr(caretEnd);

The accepted answer works great if you do not have existing HTML formatting. In my case, I inserted a new button into the editor that the user could click to put in a key word. I kept losing all HTML formatting until I dug around in the actual class and sided with a TextRange object:
public function keyWord_Click(event:Event) : void
{
var caretStart:int = txtEditor.textArea.selectionBeginIndex;
var caretEnd:int = txtEditor.textArea.selectionEndIndex;
var newText : String = "[[[KEYWORD]]]";
var tf:TextRange = new TextRange(txtEditor,true,caretStart,caretEnd);
tf.text = newText;
}
The nice thing about this approach is, you can also apply conditional formatting to that TextRange object as needed.

You can use txtarea.selectionStart and txtarea.selectionEnd to get Selected text position.
After that, You delete txt and add new selected text.
I don't known much about Javascript, so I wrote it for U.
You can search on google with keywords:
"Javascript Selected Text TextArea"
"Javascript add text at position"
Sample code:
function insertAtCursor(myField, myValue) {
//IE support
if (document.selection) {
myField.focus();
sel = document.selection.createRange();
sel.text = myValue;
}
//MOZILLA/NETSCAPE support
else if (myField.selectionStart || myField.selectionStart == '0') {
var startPos = myField.selectionStart;
var endPos = myField.selectionEnd;
myField.value = myField.value.substring(0, startPos)
+ myValue
+ myField.value.substring(endPos, myField.value.length);
} else {
myField.value += myValue;
}
caretPos = doGetCaretPosition(myField);
alert(caretPos);
setCaretPosition(myField,caretPos-3);
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Apache Poi Word Table, information about the Alt Text - apache

How to get the Alt Text from a Table in Word, e.g. Title or Description. All the Information, that I found had the context, Text, Width, Style etc. to read or modify. My goal is to identify certain Tables in a Template, so I can modify them.

Related

How do I read data in one cell and write data into another cell using Google Sheets?

How do I get a list of fields in a generic sObject?

Export SQL query data to Excel

Issue with itextsharp

Insert text into flex 3 textarea

Categories

Resources