Docx4j 6.1.2 - Convert Flat Open XML to Xlsx - docx4j

I can create a Flat OPC XML File from an Xlsx using following code:
SpreadsheetMLPackage spreadsheetMLPackage = SpreadsheetMLPackage.load(new File("test.xlsx"));
FlatOpcXmlCreator flatOpcXmlCreator = new FlatOpcXmlCreator(spreadsheetMLPackage);
String flatOpcXml = org.docx4j.XmlUtils.marshaltoString(flatOpcXmlCreator.get(), false, true, org.docx4j.jaxb.Context.jcXmlPackage);
Files.write(Path.of("testFlatOpc.xml"), flatOpcXml.getBytes(), StandardOpenOption.CREATE, StandardOpenOption.WRITE);
but if I now try to read the generated Flat OPC XML in order to convert it back to an Xlsx using following code
FlatOpcXmlImporter flatOpcXmlImporter = new FlatOpcXmlImporter(new FileInputStream("testFlatOpc.xml"));
OpcPackage opcPackage = flatOpcXmlImporter.get();
the flatOpcXmlImporter.get() call throws following Exception:
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:297)
at org.docx4j.convert.in.FlatOpcXmlImporter.get(FlatOpcXmlImporter.java:221)
at at.apa.psp.TestExcel.main(TestExcel.java:38)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Failed to getPart
at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:659)
at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:426)
at org.docx4j.convert.in.FlatOpcXmlImporter.getPart(FlatOpcXmlImporter.java:365)
at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:295)
... 2 more
Caused by: javax.xml.bind.JAXBException: Preprocessing exception
- with linked exception:
[javax.xml.bind.UnmarshalException: unerwartetes Element (URI:"http://schemas.openxmlformats.org/spreadsheetml/2006/main", lokal:"workbook"). Erwartete Elemente sind <{http://schemas.openxmlformats.org/markup-compatibility/2006}AlternateContent>, ...
at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:707)
at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:515)
... 5 more
Caused by: javax.xml.bind.UnmarshalException: unerwartetes Element (URI:"http://schemas.openxmlformats.org/spreadsheetml/2006/main", lokal:"workbook"). Erwartete Elemente sind ...
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:662)
at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:258)
at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:253)
at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:120)
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1063)
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:498)
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:480)
at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:75)
at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:150)
Should it be possibly to convert Flat OPC XML to an Excel using Docx4j 6.1.2?
Why does the FlatOpcXmlCreator write Namespaces the FlatOpcXmlImporter cannot read?
If it is not possible with docx4j, would there be any alternatives to create an Excel from a Flat OPC XML?

This is now fixed by https://github.com/plutext/docx4j/commit/f5f8b2c9caa9a3d8d339b74e7e878d19c56ad526
This will be in the next 8.1.x release, or you could patch 6.1.2 with that fix yourself.

Related

Nifi CompressContent - Got this exception "IOException thrown from CompressContent: java.io.IOException: Input is not in the .gz format"

I am new to Nifi and trying to do a POC on below flow.
I get XML messages from a Kakfa topic. I need to consume the XML message get few attributes and data which is GZIP compressed format from XML elements, GZIP decompress the data (which is again an XML) and then load to MySQL DB. I am trying this and got stuck in below step.
(1)ConsumeKafka → (2)EvaluateXPath (flowfile-attribute = I set few XML elements as flowfile-attributes which is useful downstream) → (3)EvaluateXPath (flowfile-content = get gzip data using XPATH expression = string(//ABC/data) ) → (4)UpdateAttribute (mime.type = application/gzip) → (5) CompressContent (Compression Format = use mime.type attribute and mode = decompress)
My CompressContent is failing with the below Exception.
org.apache.nifi.processor.exception.ProcessException: IOException thrown from CompressContent[id=be4b9583-016e-1000-7cce-b9d822334c4c]: java.io.IOException: java.io.IOException: Input is not in the .gz format
It could be because my datatype of flowfile-content from (3)EvaluateXPath is set to String. Do I need to convert String to byte before feeding to CompressContent? If Yes, how can I do that in the same (3)EvaluateXPath by using some kind of function toBytes()?
Thanks in advance for your help!!!
Got the solution for this issue. Data is Base64 encoded and hence the Gzip process is unable to decompress, so I have added "Base64EncodeContent" processor before "CompressContent' (Gzip Decompress) and that solved the issue.

XSSFWorkbook Creation failure

I am creating .xlsx file using poi jars which are poi-3.15.jar, poi-ooxml->3.15.jar, poi-ooxml-schemas-3.15.jar, ooxml-schemas-1.3.jar, xml-apis->2.0.2.jar, xbean-2.2.0, xmlschema-1.4.7.jar and commons-collections4-.4.0.jar.> But still im getting follwoing error.Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.commons.collections4.ListValuedMap
can you please help me to create simple .xlsx file using >XSSFWorkbook class.
ListValuedMap is contained in this jar: commons-collections4. You should also use Xml Beans 2.3.0 or later.

SSIS XML Source Error - Input string was not in a correct format

I have an attribute tlost with the definition below in the XSD file. I have tried both use="required" and use="optional".
<xs:attributeGroup name="defense">
<xs:attribute name="tlost" use="required" type="xs:decimal"/>
</xs:attributeGroup>
In the XML document I am trying to import I will get a value like the following:
<defense ast="0" category="special_team" tlost="0" int="0"/>
I am executing an SSIS package that takes the tlost value and inserts it into a sql database table. The column in the database table has a datatype of DECIMAL(28,10) and allows nulls.
When I execute the package, the previous values work perfectly and the data is inserted. However, when I get a value where tlost="" in the XML file, the package fails and the record is not inserted.
In the data flow path editor, the data type for tlost is DT_DECIMAL. When I check the Advanced Editor for the XML Source, the Input and Output properties have a data type for tlost as decimal [DT_DECIMAL].
I can't figure out why this is failing. I tried to create a derived column and cast it as a (DT_DECIMAL, 10) data type. That didn't work. I tried to check for a null value and replace with 0 if null, that didn't work. So I just ignored the column all together and in the Derived Column task, I replaced the tlost column value with (DT_DECIMAL, 10) 0 to just insert a 0 value and ignore whatever is in the xml file, and the job still failed with the following error message:
Error: 0xC020F444 at Load Play Summary Tables, XML Source [1031]: The error "Input string was not in a correct format." occurred while processing "XML Source.Outputs[defense].Columns[tlost]".
Error: 0xC02090FB at Load Play Summary Tables, XML Source [1031]: The "XML Source" failed because error code 0x80131537 occurred, and the error row disposition on "XML Source.Outputs[defense].Columns[tlost]" at "XML Source.Outputs[defense]" specifies failure on error. An error occurred on the specified object of the specified component.
Error: 0xC02092AF at Load Play Summary Tables, XML Source [1031]: The XML Source was unable to process the XML data. Pipeline component has returned HRESULT error code 0xC02090FB from a method call.
Error: 0xC0047038 at Load Play Summary Tables, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on XML Source returned error code 0xC02092AF. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Please help. I have exhausted everything I can think of to fix this issue. I am processing hundreds of files, and I can't keep fixing bad data files every time this issue occurs.
Can you please try these
1 - Change to data type to string in xsd and before loading into tables take care of data type conversion.
2 - If possible generate the xsd by passing your xml and then verify the data type and use it accordingly ...
rest of the xsd can be changed accordingly...
below is screen grab of what I tried. hope it helps]1

HSQLDB throws Asset failed exception and file io error on db.script.new file during Checkpoint

Our application is a Java based desktop application which will download the binary data from the source, parses it and add it to HSQLDB database. When downloading from the sources individually, application works perfectly. But when doing the same from multiple sources simultaneously with each source in an individual thread, I am getting an error of
java.sql.SQLException: Assert failed: java.lang.ArrayIndexOutOfBoundsException: 23 in statement [CHECKPOINT]
at org.hsqldb.jdbc.Util.throwError(Unknown Source)
at org.hsqldb.jdbc.jdbcPreparedStatement.execute(Unknown Source)
or sometimes,
java.sql.SQLException: Assert failed: java.lang.ArrayIndexOutOfBoundsException: 1016 in statement [CHECKPOINT]
followed by
java.sql.SQLException: File input/output error: C:\ProgramData\test\data\database\db.script.new in statement [CHECKPOINT]
at org.hsqldb.jdbc.Util.throwError(Unknown Source)
at org.hsqldb.jdbc.jdbcPreparedStatement.execute(Unknown Source)
Java: 1.8;
HSQL version: 1.8.10
We are not in the position to migrate the HSQLDB to latest version because of various reasons.
HSQL Properties:
hsqldb.script_format=0
runtime.gc_interval=0
sql.enforce_strict_size=false
hsqldb.cache_size_scale=8
readonly=false
hsqldb.nio_data_file=true
hsqldb.cache_scale=14
version=1.8.0
hsqldb.default_table_type=memory
hsqldb.cache_file_scale=1
hsqldb.log_size=200
modified=yes
hsqldb.cache_version=1.7.0
hsqldb.original_version=1.8.0
hsqldb.compatible_version=1.8.0
Any help or hint will be appreciated.
This is an 7 year old version which is not ideal for multi-threaded usage.
The simple solution is to perform the database updates with a single thread. You can retrofit your multi-threaded application with a synchronized block over a singleton object around the code that performs the database update.

SSIS export to CSV file failing

I am trying to export the contents of a SQL Server 2005 table to a csv file using SSIS. In the Data Flow Task I have a OLE DB Source for the table and a Flat File Destination for the file.
When copying the data I started getting a failure on one of the column on a certain row and following some investigation found the problem was with comma's in the data below
Data Issue (nvarchar255)
errors code l075 showing,,,re test.
OLE DB Source for Comment col
Derived Column
Given that this was the issue I created a Derived Column object between the source and destination and destination objects and tried filtering out the comma's using a replace REPLACE(Comment,","," ") but the same column is still failing with the below errors.
Destination Component
Exception
[Inspection Failures Destination [206]] Error: Data conversion failed.
The data conversion for column "Comment" returned status value 4 and
status text "Text was truncated or one or more characters had no
match in the target code page.".
[Inspection Failures Destination [206]] Error: Cannot copy
or convert flat file data for column "Comment".
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED.
The ProcessInput method on component "Inspection Failures
Destination" (206) failed with error code 0xC02020A0 while
processing input "Flat File Destination Input" (207). The
identified component returned an error from the ProcessInput
method. The error is specific to the component, but the error
is fatal and will cause the Data Flow task to stop running.
There may be error messages posted before this with more
information about the failure.
[Inspecton Failures Source [128]] Error: The attempt to
add a row to the Data Flow task buffer failed with error
code 0xC0047020.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.
The PrimeOutput method on component "Inspecton Failures Source"
(128) returned error code 0xC02020C4. The component returned
a failure code when the pipeline engine called PrimeOutput().
The meaning of the failure code is defined by the component,
but the error is fatal and the pipeline stopped executing.
There may be error messages posted before this with more
information about the failure.
Ok, the problem actually appears to be a hidden illegal character in the text
In the image below the top line shows a square before the re test string. The comment column in the database is an nvarchar which apparently uses a different character set so I can not just use the CHAR(13) + CHAR(10) to replace the carriage return.
The fix involved converting the field from an nvarchar to a varchar then performing a replace on the converter ? character resulting in the corrected second ling in the image
SELECT ID,
REPLACE(REPLACE(CAST(Comment AS varchar(255)),'?',' '),',',' ') Comment
FROM tblInspectionFailures WHERE (ID = 216899)
The conversion requirement is detailed here
This does not should like an ideal solution to me but it does work. Does anyone have any other options.
Without replacing comment column can you create another column and map the new derived column to destination column and see.