Convert RTF (Rich Text Format) code into plain text in Excel

Convert RTF (Rich Text Format) code into plain text in Excel - vba

I'm exporting a database query as Excel and I am getting rows with RTF formatting.
How can I convert these fields into plain text? I've found answers that are pretty old, so I was wondering if anyone knows a way.

The .Net Framework RichTextBox class can perform the conversion. Fortunately, this class has the ComVisibleAttribute set, so it can be used from VBA without much difficulty.
I had to create a .tlb file to Reference. In the
%SYSTEMROOT%\Microsoft.NET\Framework\currentver\
directory, run the command
regasm /codebase system.windows.forms.dll
to create the system.windows.forms.tlb file. I already had this .tlb file on my system, but I had to recreate it using this command to be able to create a .Net System.Windows.Forms RichTextBox object successfully in VBA.
With the new .tlb file created, in VBA link it to your project via Tools->References in the VBA IDE.
I wrote this test code in Access to demonstrate the solution.
Dim rtfSample As String
rtfSample = "{\rtf1\ansi\deflang1033\ftnbj\uc1 {\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset0 Segoe UI;}} {\colortbl ;\red255\green255\blue255 ;} {\stylesheet{\fs22\cf0\cb1 Normal;}{\cs1\cf0\cb1 Default Paragraph Font;}} \paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\headery720\footery720\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot \sectd\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\headery720\footery720\sbkpage\pgnstarts1\pgncont\pgndec \plain\plain\f1\fs22\lang1033\f1 hello question stem\plain\f1\fs22\par}"
Dim miracle As System_Windows_Forms.RichTextBox
Set miracle = New System_Windows_Forms.RichTextBox
With miracle
.RTF = rtfSample
RTFExtractPlainText = .TEXT
End With
MsgBox RTFExtractPlainText(rtfSample)
With the result
hello question stem
I'd assume re-creating the .tlb file in the \Framework64\ directory would be needed on 64-bit Windows with 64-bit Office. I am running 64-bit Win10 with 32-bit Office 2013, so I had to have a 32-bit .tlb file.

Another alternative can be using Microsoft Rich Textbox Control (but can't test it on x64 Office)
Sub rtfToText()
With CreateObject("RICHTEXT.RichtextCtrl") ' or add reference to Microsoft Rich Textbox Control for early binding and With New RichTextLib.RichTextBox
.SelStart = 0 ' needs to be selected
.TextRTF = Join(Application.Transpose(Cells.CurrentRegion.Columns(1)))
[C1] = .Text ' set the destination cell here
' or if you want them in separate cells:
a = Split(.Text, vbNewLine)
Range("C3").Resize(UBound(a) + 1) = Application.Transpose(a)
End With
End Sub

I'm revisiting this question to provide 2 javascript solutions, rather than a .NET one.
Approach 1
const parseRTF = require("rtf-parser");
let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}} {\\colortbl ;\\red0\\green0\\blue0;} {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1 \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par 1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par } `;
function convertRTFtoPlainText(rtf) {
return new Promise((resolve, reject) => {
parseRTF.string(rtf, (err, doc) => {
if (err) {
reject(err);
}
let string = "";
doc.content.forEach((item) => {
if (item.content) {
item.content.forEach((span) => {
string += span.value;
});
} else {
string += item.value;
}
});
resolve(string.trim());
});
});
}
(async () => {
let value = await convertRTFtoPlainText(rtf);
console.log(value);
})();
Approach 2
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
function stringToArrayBuffer(string) {
if (string == null) return;
let buffer = new ArrayBuffer(string.length);
let bufferView = new Uint8Array(buffer);
for (let i = 0; i < string.length; i++) {
bufferView[i] = string.charCodeAt(i);
}
return buffer;
}
// callback = function to run after the DOM has rendered, defined when calling runRtfjs
function runRtfjs(rtf, callback, errorCallback) {
const virtualConsole = new jsdom.VirtualConsole();
virtualConsole.sendTo(console);
let dom = new JSDOM(
`
<script src="./node_modules/rtf.js/dist/RTFJS.bundle.js"></script>
<script>
RTFJS.loggingEnabled(false);
try {
const doc = new RTFJS.Document(rtfFile);
const meta = doc.metadata();
doc
.render()
.then(function(htmlElements) {
const div = document.createElement("div");
div.append(...htmlElements);
// window.done(meta, div.innerHTML);
// window.done(meta, div.innerText);
window.done(meta, div.textContent); // pass the data to the callback
}).catch(error => window.onerror(error))
} catch (error){
window.onerror(error)
}
</script>
`,
{
resources: "usable",
runScripts: "dangerously",
url: "file://" + __dirname + "/",
virtualConsole,
beforeParse(window) {
window.rtfFile = stringToArrayBuffer(rtf);
window.done = function (meta, html) {
callback(meta, html); // call the callback
};
window.onerror = function (error) {
errorCallback(error);
};
},
}
);
}
let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}} {\\colortbl ;\\red0\\green0\\blue0;} {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1 \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par 1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par } `;
runRtfjs(
rtf,
(meta, html) => {
console.log(html);
},
(error) => console.error(error)
);

Related

Problems with formatting data coming from the html table to the xlsx file in filesaver.js - VueJs

I'm using the Filesaver.js library to export an .xlsx file from a table I own. But I am getting some errors. See below:
Original table created using VueJs with Quasar Framework:
Generated .xlsx file:
Another case
Generated .xlsx file:
My code:
exportTable() {
let wb = XLSX.utils.table_to_book(document.querySelector('.q-table'), {
sheet: "Sheet JS",
})
let wbout = XLSX.write(wb, {
bookType: 'xlsx',
bookSST: true,
type: 'binary'
})
function s2ab(s) {
let buf = new ArrayBuffer(s.length)
let view = new Uint8Array(buf)
for (let i = 0; i < s.length; i++) view[i] = s.charCodeAt(i) & 0xFF
return buf
}
saveAs(new Blob([s2ab(wbout)], {
type: "text/plain;charset=utf-8"
}), 'test.xlsx')
Why is the format of the characters in the table not being kept when exporting the spreadsheet ?
Is my spreadsheet data not treated as a string?
I'm brazilian. Sorry for bad English (=

After a few days of looking for answers, I managed to solve my problem.
More precisely, just use {raw: true}. By doing this, the lib no longer formats the data, leaving it in the raw form that comes from HTML. Interestingly, I didn't find this in the documentation.
// import something here
import Vue from 'vue'
import XLSX from 'xlsx'
import {
saveAs
} from 'file-saver'
// Global Function
const exportExcel = (table) => {
let wb = XLSX.utils.table_to_book(table, {
sheet: "Sheet JS",
raw: true // Here
})
let wbout = XLSX.write(wb, {
bookType: 'xlsx',
bookSST: true,
type: 'binary'
})
function s2ab(s) {
let buf = new ArrayBuffer(s.length)
let view = new Uint8Array(buf)
for (let i = 0; i != s.length; i++) view[i] = s.charCodeAt(i) & 0xFF
return buf
}
saveAs(new Blob([s2ab(wbout)], {
type: "text/plain;charset=utf-8"
}), 'spreadsheet.xlsx')
}
Vue.prototype.$exportExcel = exportExcel;
// "async" is optional;
// more info on params: https://quasar.dev/quasar-cli/cli-documentation/boot-files#Anatomy-of-a-boot-file
//export default exportExcel
This link helped me

HCL Domino AppDevPack - Problem with writing Rich Text

I use the code proposed as an example in the documentation for Domino AppDev Pack 1.0.4 , the only difference is the reading of a text file (body.txt) as a buffer, this file containing only simple long text (40Ko).
When it is executed, the document is created in the database and the rest of the code does not return an error.
But finally, the rich text field was not added to the document.
Here the response returned:
response: {"fields":[{"fieldName":"Body","unid":"8EA69129BEECA6DEC1258554002F5DCD","error":{"name":"ProtonError","code":65577,"id":"RICH_TEXT_STREAM_CORRUPT"}}]}
My goal is to write very long text (more than 64 Ko) in a rich text field. I use in the example a text file for the buffer but it could be later something like const buffer = Buffer.from ('very long text ...')
Is this the right way or does it have to be done differently ?
I'm using a Windows system with IBM Domino (r) Server (64 Bit), Release 10.0.1FP4 and AppDevPack 1.0.4.
Thank you in advance for your help
Here's code :
const write = async (database) => {
let writable;
let result;
try {
// Create a document with subject write-example-1 to hold rich text
const unid = await database.createDocument({
document: {
Form: 'RichDiscussion',
Title: 'write-example-1',
},
});
writable = await database.bulkCreateRichTextStream({});
result = await new Promise((resolve, reject) => {
// Set up event handlers.
// Reject the Promise if there is a connection-level error.
writable.on('error', (e) => {
reject(e);
});
// Return the response from writing when resolving the Promise.
writable.on('response', (response) => {
console.log("response: " + JSON.stringify(response));
resolve(response);
});
// Indicates which document and item name to use.
writable.field({ unid, fieldName: 'Body' });
let offset = 0;
// Assume for purposes of this example that we buffer the entire file.
const buffer = fs.readFileSync('/driver/body.txt');
// When writing large amounts of data, it is necessary to
// wait for the client-side to complete the previous write
// before writing more data.
const writeData = () => {
let draining = true;
while (offset < buffer.length && draining) {
const remainingBytes = buffer.length - offset;
let chunkSize = 16 * 1024;
if (remainingBytes < chunkSize) {
chunkSize = remainingBytes;
}
draining = writable.write(buffer.slice(offset, offset + chunkSize));
offset += chunkSize;
}
if (offset < buffer.length) {
// Buffer is not draining. Whenever the drain event is emitted
// call this function again to write more data.
writable.once('drain', writeData);
}
};
writeData();
writable = undefined;
});
} catch (e) {
console.log(`Unexpected exception ${e.message}`);
} finally {
if (writable) {
writable.end();
}
}
return result;
};

As of appdev pack 1.0.4, the rich text stream accepts writing data of valid rich text cd format, in the LMBCS character set. We are currently working on a library to help you write valid rich text data to the stream.
I'd love to hear more about your use cases, and we're excited you're already poking around the feature! If you can join the openntf slack channel, I usually hang out there.

Win 8 Apps : saving and retrieving data in roamingfolder

I'm trying to store few user data into a roamingFolder method/property of Windows Storage in an app using JavaScript. I'm following a sample code from the Dev Center, but no success. My code snippet is as follows : (OR SkyDrive link for the full project : https://skydrive.live.com/redir?resid=F4CAEFCD620982EB!105&authkey=!AE-ziM-BLJuYj7A )
filesReadCounter: function() {
roamingFolder.getFileAsync(filename)
.then(function (filename) {
return Windows.Storage.FileIO.readTextAsync(filename);
}).done(function (data) {
var dataToRead = JSON.parse(data);
var dataNumber = dataToRead.count;
var message = "Your Saved Conversions";
//for (var i = 0; i < dataNumber; i++) {
message += dataToRead.result;
document.getElementById("savedOutput1").innerText = message;
//}
//counter = parseInt(text);
//document.getElementById("savedOutput2").innerText = dataToRead.counter;
}, function () {
// getFileAsync or readTextAsync failed.
//document.getElementById("savedOutput2").innerText = "Counter: <not found>";
});
},
filesDisplayOutput: function () {
this.filesReadCounter();
}
I'm calling filesDisplayOutput function inside ready method of navigator template's item.js file, to retrieve last session's data. But it always shows blank. I want to save upto 5 data a user may need to save.

I had some trouble running your code as is, but that's tangential to the question. Bottom line, you're not actually reading the file. Note this code, there's no then or done to execute when the promise is fulfilled.
return Windows.Storage.FileIO.readTextAsync(filename);
I hacked this in your example solution and it's working... typical caveats of this is not production code :)
filesReadCounter: function () {
roamingFolder.getFileAsync(filename).then(
function (filename) {
Windows.Storage.FileIO.readTextAsync(filename).done(
function (data) {
var dataToRead = JSON.parse(data);
var dataNumber = dataToRead.count;
var message = "Your Saved Conversions";
//for (var i = 0; i < dataNumber; i++) {
message += dataToRead.result;
document.getElementById("savedOutput1").innerText = message;
//}
//counter = parseInt(text);
//document.getElementById("savedOutput2").innerText = dataToRead.counter;
}, function () {
// readTextAsync failed.
//document.getElementById("savedOutput2").innerText = "Counter: <not found>";
});
},
function () {
// getFileAsync failed
})
},

File input and Dart

I'm trying out Dart, but I cant figure out, how to send an image from the user to the server. I have my input-tag, and i can reach this in the DART code, but i cant seem to read from it. Im trying something like:
InputElement ie = document.query('#myinputelement');
ie.on.change.add((event){<br/>
InputElement iee = document.query('#myinputelement');<br/>
FileList mfl = iee.files;<br/>
File myFile = mlf.item(0);<br/>
FileReader fr = new FileReader();
fr.readAsBinaryString(myFile);
String result = fr.result; //this is always empty
});
With the html containing:
<input type="file" id="myinputelement">
I really hope you cant help me, im kinda stuck. I might just be missing how to do the onload for the filereader, or maybe im doing it totally wrong.

The FileReader API is asynchronous so you need to use event handlers.
var input = window.document.querySelector('#upload');
Element log = query("#log");
input.addEventListener("change", (e) {
FileList files = input.files;
Expect.isTrue(files.length > 0);
File file = files.item(0);
FileReader reader = new FileReader();
reader.onLoad = (fileEvent) {
print("file read");
log.innerHTML = "file content is ${reader.result}";
};
reader.onerror = (evt) => print("error ${reader.error.code}");
reader.readAsText(file);
});
you also need to allow file uploads from to your browser, which can be done in Chrome by starting it with the flag --allow-file-access-from-files

This is how to read a file using dart:html.
document.querySelector('#myinputelement`).onChange.listen((changeEvent) {
List fileInput = document.querySelector('#myinputelement').files;
if (fileInput.length > 1) {
// More than one file got selected somehow, could be a browser bug.
// Unless the "multiple" attribute is set on the input element, of course
}
else if (fileInput.isEmpty) {
// This could happen if the browser allows emptying an upload field
}
FileReader reader = new FileReader();
reader.onLoad.listen((fileEvent) {
String fileContent = reader.result;
// Code doing stuff with fileContent goes here!
});
reader.onError.listen((itWentWrongEvent) {
// Handle the error
});
reader.readAsText(fileInput[0]);
});

It's not necessary (any more) to use dart:dom FileReader instead of the one from dart:html.
Your code should work if you add an event listener to the file reader, like this:
FileReader fr = new FileReader();
fr.on.load.add((fe) => doSomethingToString(fe.target.result));
fr.readAsBinaryString(myFile);

My attempt
void fileSelected(Event event) async {
final files = (event.target as FileUploadInputElement).files;
if (files.isNotEmpty) {
final reader = new FileReader();
// ignore: unawaited_futures
reader.onError.first.then((evt) => print('error ${reader.error.code}'));
final resultReceived = reader.onLoad.first;
reader.readAsArrayBuffer(files.first);
await resultReceived;
imageReference.fileSelected(reader.result as List<int>);
}
}

Thanks to the help from this post, I got it to work. I still utilized my event handler in the input tag and made sure that I DID NOT import both dart:io and dart:html, only dart:html is needed.
This is what my final AppComponent looked like.
import 'dart:html';
import 'package:angular/angular.dart';
#Component(
selector: 'my-app',
styleUrls: ['app_component.css'],
templateUrl: 'app_component.html',
directives: [coreDirectives],
)
class AppComponent {
// Stores contents of file upon load
String contents;
AppComponent();
void fileUpload(event) {
// Get tag and the file
InputElement input = window.document.getElementById("fileUpload");
File file = input.files[0];
// File reader and event handler for end of loading
FileReader reader = FileReader();
reader.readAsText(file);
reader.onLoad.listen((fileEvent) {
contents = reader.result;
});
}
}
This is what my template looks like:
<h1>File upload test</h1>
<input type="file" (change)="fileUpload($event)" id="fileUpload">
<div *ngIf="contents != null">
<p>Hi! These are the contents of your file:</p>
<p>{{contents}}</p>
</div>

HTML5 Drag n Drop File Upload

I'm running a website, where I'd like to upload files with Drag 'n Drop, using the HTML5 File API and FileReader. I have successfully managed to create a new FileReader, but I don't know how to upload the file. My code (JavaScript) is the following:
holder = document.getElementById('uploader');
holder.ondragover = function () {
$("#uploader").addClass('dragover');
return false;
};
holder.ondragend = function () {
$("#uploader").removeClass('dragover');
return false;
};
holder.ondrop = function (e) {
$("#uploader").removeClass('dragover');
e.preventDefault();
var file = e.dataTransfer.files[0],
reader = new FileReader();
reader.onload = function (event) {
//I shoud upload the file now...
};
reader.readAsDataURL(file);
return false;
};
I also have a form (id : upload-form) and an input file field (id : upload-input).
Do you have any ideas?
P.S. I use jQuery, that's why there is $("#uploader") and others.

Rather than code this from scratch, why not use something like html5uploader, which works via drag n drop (uses FileReader etc.): http://code.google.com/p/html5uploader/
EDIT: apparently we respondents are supposed to tend to our answers forever more, for fear for down-votes. The Google Code link is now dead (four years later), so here's a jQuery plugin that is very similar: http://www.igloolab.com/jquery-html5-uploader/

You'll want to extract the base64 encoded file contents and ajax them over tot the server.
JavaScript
var extractBase64Data;
extractBase64Data = function(dataUrl) {
return dataUrl.substring(dataUrl.indexOf(',') + 1);
};
// Inside the ondrop event
Array.prototype.forEach.call(event.dataTransfer.files, function(file) {
var reader;
if (!file.type.match(options.matchType)) {
return;
}
reader = new FileReader();
reader.onload = function(event) {
var contentsBase64;
if (event.target.readyState === FileReader.DONE) {
contentsBase64 = extractBase64Data(event.target.result);
return $.post(someURL, {
contentsBase64: contentsBase64
});
}
};
reader.readAsDataURL(file);
});
CoffeeScript
extractBase64Data = (dataUrl) ->
dataUrl.substring(dataUrl.indexOf(',') + 1)
# Inside the ondrop event
Array::forEach.call event.dataTransfer.files, (file) ->
return unless file.type.match(options.matchType)
reader = new FileReader()
reader.onload = (event) ->
if event.target.readyState == FileReader.DONE
contentsBase64 = extractBase64Data(event.target.result)
$.post someURL,
contentsBase64: contentsBase64
reader.readAsDataURL(file)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert RTF (Rich Text Format) code into plain text in Excel - vba

I'm exporting a database query as Excel and I am getting rows with RTF formatting. How can I convert these fields into plain text? I've found answers that are pretty old, so I was wondering if anyone knows a way.

Related

Problems with formatting data coming from the html table to the xlsx file in filesaver.js - VueJs

HCL Domino AppDevPack - Problem with writing Rich Text

Win 8 Apps : saving and retrieving data in roamingfolder

File input and Dart

HTML5 Drag n Drop File Upload

Categories

Resources