I am in the process of putting together some code that will merge pdf's based on the file name prefix. I currently have the below code that grabs the filename and doesn't merge, but overwrites. I believe my problem is the FileStream placement, but if I move it out of the current location, I can't get the filename. Any suggestions? Thanks.
static void CreateMergedPDFs()
{
string srcDir = "C:/PDFin/";
string resultPDF = "C:/PDFout/";
{
var files = System.IO.Directory.GetFiles(srcDir);
string prevFileName = null;
int i = 1;
foreach (string file in files)
{
string filename = Left(Path.GetFileName(file), 8);
using (FileStream stream = new FileStream(resultPDF + filename + ".pdf", FileMode.Create))
{
if (prevFileName == null || filename == prevFileName)
{
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdfDoc.Open();
{
pdf.AddDocument(new PdfReader(file));
i++;
}
if (pdfDoc != null)
pdfDoc.Close();
Console.WriteLine("Merges done!");
}
}
}
}
}
}
}
The behavior you are describing is consistent with your code. You are creating the loop in an incorrect way.
Try this:
static void CreateMergedPDFs()
{
string srcDir = "C:/PDFin/";
string resultPDF = "C:/PDFout/merged.pdf";
FileStream stream = new FileStream(resultPDF, FileMode.Create);
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdfDoc.Open();
var files = System.IO.Directory.GetFiles(srcDir);
foreach (string file in files)
{
pdf.AddDocument(new PdfReader(file));
}
pdfDoc.Close();
Console.WriteLine("Merges done!");
}
}
That makes more sense, doesn't it?
If you want to group files based on their prefix, you should read the answer to the question Group files in a directory based on their prefix
In the answer to this question, it is assumed that the prefix and the rest of the filename are separated by a - character. For instance 1-abc.pdf and 1-xyz.pdf have the prefix 1 whereas 2-abc.pdf and 2-xyz.pdf have the prefix 2. In your case, it's not clear how you'd determine the prefix, but it's easy to get a list of all the files, sort them and make groups of files based on whatever algorithm you want to determine the prefix.
Related
[HttpPost("FilePost")]
public async Task<IActionResult> FilePost(List<IFormFile> files)
{
long size = files.Sum(f => f.Length);
var filePath = Directory.GetCurrentDirectory() + "/files";
if (!System.IO.Directory.Exists(filePath))
{
Directory.CreateDirectory(filePath);
}
foreach (var item in files)
{
if (item.Length > 0)
{
using (var stream = new FileStream(filePath,FileMode.CreateNew))
{
await item.CopyToAsync(stream);
}
}
}
return Ok(new { count = files.Count, size, filePath });
}
FormFile. FileName = directory + filename,
Uploaded file, file name with path information, how to do?
I just need to get the name of the file.
I just need to get the name of the file.
Use Path.GetFileName() to get the name of the file , and use Path.Combine() to combine the the save path you want with the file name , try the code like below
var filesPath = Directory.GetCurrentDirectory() + "/files";
if (!System.IO.Directory.Exists(filesPath))
{
Directory.CreateDirectory(filesPath);
}
foreach (var item in files)
{
if (item.Length > 0)
{
var fileName = Path.GetFileName(item.FileName);
var filePath = Path.Combine(filesPath, fileName);
using (var stream = new FileStream(filesPath, FileMode.CreateNew))
{
await item.CopyToAsync(stream);
}
}
}
Seem like you want to get the file name base on your file path.
You can get it into way
using System.IO;
Path.GetFileName(filePath);
or extension method
public static string GetFilename(this IFormFile file)
{
return ContentDispositionHeaderValue.Parse(
file.ContentDisposition).FileName.ToString().Trim('"');
}
Please let me know if you need any help
I faced the same issue with different browsers. IE send FileName with full path and Chrome send only the file name. I used Path.GetFileName() to overcome issue.
Other fix is at your front end side. Refer this to solve from it front end side.
I am using MVC4. My requirement is:
I have to convert the file into byte array and save to database varbinary column.
For this I written code like below:
public byte[] Doc { get; set; }
Document.Doc = GetFilesBytes(PostedFile);
public static byte[] GetFilesBytes(HttpPostedFileBase file)
{
MemoryStream target = new MemoryStream();
file.InputStream.CopyTo(target);
return target.ToArray();
}
I am downloading the file by using the following code:
public ActionResult Download(int id)
{
List<Document> Documents = new List<Document>();
using (SchedulingServiceInstanceManager facade = new SchedulingServiceInstanceManager("SchedulingServiceWsHttpEndPoint"))
{
Document Document = new Document();
Document.DMLType = Constant.DMLTYPE_SELECT;
Documents = facade.GetDocuments(Document);
}
var file = Documents.FirstOrDefault(f => f.DocumentID == id);
return File(file.Doc.ToArray(), "application/octet-stream", file.Name);
}
when I am downloading pdf file then it is showing message as "There was an error opening this document. The file is damaged and could not be repaired."
Any thing else I need to do?
I tried with the following code but no luck
return File(file.Doc.ToArray(), "application/pdf", file.Name);
Please help me to solve the issue.
Thanks in advance.
Please try as in below code in your controller
FileStream stream = File.OpenRead(#"c:\path\to\your\file\here.txt");
byte[] fileBytes= new byte[stream.Length];
stream.Read(fileBytes, 0, fileBytes.Length);
stream.Close();
//Begins the process of writing the byte array back to a file
using (Stream file = File.OpenWrite(#"c:\path\to\your\file\here.txt"))
{
file.Write(fileBytes, 0, fileBytes.Length);
}
It may helps you...
I'm using Pdfbox (1.8.8) to adding attachments to a pdf. My problem is when one of the attachments is of type .pdf and i'm saving the PDDocument to OutputStream the final pdf document does not include the attachments. If a save the PDDocument to a file instead an OutputStream all works just fine, and if the attachments does not include any pdf, both save to file or OutputStream works fine.
I would like to know if there is any way to add pdf embedded Files and save the PDDocument to OutputStream keeping the attached files in the final pdf that is generated.
The code i'm using is:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
final PDDocument doc;
Boolean hasPdfAttach = false;
try {
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
// final PDFTextStripper pdfStripper = new PDFTextStripper();
// final String text = pdfStripper.getText(doc);
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
final Map embeddedFileMap = new HashMap();
PDEmbeddedFile embeddedFile;
File file = null;
for (Attachment attach : attachmentsResources) {
// first create the file specification, which holds the embedded file
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile(attach.getFilename());
file = AttachmentUtils.getAttachmentFile(attach);
final InputStream is = new FileInputStream(file.getAbsolutePath());
embeddedFile = new PDEmbeddedFile(doc, is);
// set some of the attributes of the embedded file
if ("application/pdf".equals(attach.getMimetype())) {
hasPdfAttach = true;
}
embeddedFile.setSubtype(attach.getMimetype());
embeddedFile.setSize((int) (long) attach.getFilesize());
fileSpecification.setEmbeddedFile(embeddedFile);
// now add the entry to the embedded file tree and set in the document.
embeddedFileMap.put(attach.getFilename(), fileSpecification);
// final String text2 = pdfStripper.getText(doc);
}
// final String text3 = pdfStripper.getText(doc);
efTree.setNames(embeddedFileMap);
// ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
// attachments are stored as part of the "names" dictionary in the document catalog
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
// final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
final String tmpfile = "temporary.pdf";
if (hasPdfAttach) {
final File f = new File(tmpfile);
doc.save(f);
doc.close();
//i have try with parser but without success too
// PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
// parser.parse();
// PDDocument doc2 = parser.getPDDocument();
final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
+ "tempppp.pdf"), "r"));
doc2.save(out);
doc2.close();
} else {
doc.save(out);
doc.close();
}
//that does not work too
// final InputStream in = new FileInputStream(tmpfile);
// IOUtils.copy(in, out);
// out = new FileOutputStream(tmpFile);
// doc.save (out);
} catch (IOException e1) {
e1.printStackTrace();
} catch (Exception e2) {
e2.printStackTrace();
}
}
Best regards
Solution:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
final PDDocument doc;
try {
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
((ByteArrayOutputStream) out).reset();
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
final Map embeddedFileMap = new HashMap();
PDEmbeddedFile embeddedFile;
File file = null;
for (Attachment attach : attachmentsResources) {
// first create the file specification, which holds the embedded file
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile(attach.getFilename());
file = AttachmentUtils.getAttachmentFile(attach);
final InputStream is = new FileInputStream(file.getAbsolutePath());
embeddedFile = new PDEmbeddedFile(doc, is);
// set some of the attributes of the embedded file
embeddedFile.setSubtype(attach.getMimetype());
embeddedFile.setSize((int) (long) attach.getFilesize());
fileSpecification.setEmbeddedFile(embeddedFile);
// now add the entry to the embedded file tree and set in the document.
embeddedFileMap.put(attach.getFilename(), fileSpecification);
}
efTree.setNames(embeddedFileMap);
((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
// attachments are stored as part of the "names" dictionary in the document catalog
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
doc.save(out);
doc.close();
} catch (IOException e1) {
e1.printStackTrace();
} catch (Exception e2) {
e2.printStackTrace();
}
}
You store the new PDF after the original PDF in out:
Look at all the uses of out in your method:
private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
...
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
...
doc2.save(out);
...
doc.save(out);
So you get as input a ByteArrayOutputStream and take its current content as input (i.e. the ByteArrayOutputStream is not empty but already contains a PDF) and after some processing you append the modified PDF to the ByteArrayOutputStream. Depending on the PDF viewer you present this to, you will be shown either the original or the manipulated PDF or a (very correct) error message that the file is garbage.
If you want the ByteArrayOutputStream to contain only the manipulated PDF, simply add
((ByteArrayOutputStream) out).reset();
or (if you are not sure about the state of the stream)
out = new ByteArrayOutputStream();
right after
doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
PS: According to the comments the OP tried the above proposed changes to his code without success.
I cannot run the code as presented in the question because it is not self-contained. Thus, I reduced it to the essentials to get a self-contained test:
#Test
public void test() throws IOException, COSVisitorException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (
InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
{
final PDDocument document = PDDocument.load(sourceStream);
final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
embeddedFile.setSubtype("application/pdf");
embeddedFile.setSize(10993);
final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
fileSpecification.setFile("artificial text.pdf");
fileSpecification.setEmbeddedFile(embeddedFile);
final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
embeddedFileMap.put("artificial text.pdf", fileSpecification);
final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
efTree.setNames(embeddedFileMap);
final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
names.setEmbeddedFiles(efTree);
document.getDocumentCatalog().setNames(names);
document.save(baos);
document.close();
}
Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}
As you see PDFBox here uses only streams. The result:
Thus, PDFBox without problem stores a PDF into which it has embedded a PDF file attachment.
The problem, therefore, most likely have nothing to do with this work flow as such
Is there any way to read chunk at a time (instead of reading the entire file) from a file using Tika API?
following is my code. As you can see I am reading the entire file at once. I would like to read chunk at a time and create a text file the content.
InputStream stream = new FileInputStream(file);
Parser p = new AutoDetectParser();
Metadata meta =new Metadata();
WriteOutContentHandler handler = new WriteOutContnetHandler(-1);
ParseContext parse = new ParseContext();
....
p.parse(stream,handler,meta, context);
...
String content = handler.toString();
There's (now) and Apache Tika example which shows how you can capture the plain text output, and return it in chunks based on the maximum allowed size of a chunk. You can find it in ContentHandlerExample - method is parseToPlainTextChunks
Based on that, if you wanted to output to a file instead, and on a per-chunk basis, you'd tweak it to be something like:
final int MAXIMUM_TEXT_CHUNK_SIZE = 100 * 1024 * 1024;
final File outputDir = new File("/tmp/");
private class ChunkHandler extends ContentHandlerDecorator {
private int size = 0;
private int fileNumber = -1;
private OutputStreamWriter out = null;
#Override
public void characters(char[] ch, int start, int length) throws IOException {
if (out == null || size+length > MAXIMUM_TEXT_CHUNK_SIZE) {
if (out != null) out.close();
fileNumber++;
File f = new File(outputDir, "output-" + fileNumber + ".txt);
out = new OutputStreamWriter(new FileOutputStream(f, "UTF-8"));
}
out.write(ch, start, length);
}
public void close() throws IOException {
if (out != null) out.close();
}
}
public void parse(File file) {
InputStream stream = new FileInputStream(file);
Parser p = new AutoDetectParser();
Metadata meta =new Metadata();
ContentHandler handler = new ChunkHandler();
ParseContext parse = new ParseContext();
p.parse(stream,handler,meta, context);
((ChunkHandler)handler).close();
}
That will give you plain text files in the given directory, of no more than a maximum size. All html tags will be ignored, you'll only get the plain textual content
How to append pages to one pdf file from another pdf file without creating a new pdf using itextsharp. I have metadata attached to one pdf so i just want to add only the other pdf pages,so that first pdf metadata should remain as it is.
Regards
Himvj
Assuming you have 2 pdf files: file1.pdf and file2.pdf that you want to concatenate and save the resulting pdf to file1.pdf (by replacing its contents) you could try the following:
using (var output = new MemoryStream())
{
var document = new Document();
var writer = new PdfCopy(document, output);
document.Open();
foreach (var file in new[] { "file1.pdf", "file2.pdf" })
{
var reader = new PdfReader(file);
int n = reader.NumberOfPages;
PdfImportedPage page;
for (int p = 1; p <= n; p++)
{
page = writer.GetImportedPage(reader, p);
writer.AddPage(page);
}
}
document.Close();
File.WriteAllBytes("file1.pdf", output.ToArray());
}
You can try this it add the whole document with metadata
public static void MergeFiles(string destinationFile, string[] sourceFiles)
{
try
{
//1: Create the MemoryStream for the destination document.
using (MemoryStream ms = new MemoryStream())
{
//2: Create the PdfCopyFields object.
PdfCopyFields copy = new PdfCopyFields(ms);
// - Set the security and other settings for the destination file.
//copy.Writer.SetEncryption(PdfWriter.STRENGTH128BITS, null, "1234", PdfWriter.AllowPrinting | PdfWriter.AllowCopy | PdfWriter.AllowFillIn);
copy.Writer.ViewerPreferences = PdfWriter.PageModeUseOutlines;
// - Create an arraylist to hold bookmarks for later use.
ArrayList outlines = new ArrayList();
int pageOffset = 0;
int f = 0;
//3: Import the documents specified in args[1], args[2], etc...
while (f < sourceFiles.Length)
{
// Grab the file from args[] and open it with PdfReader.
string file = sourceFiles[f];
PdfReader reader = new PdfReader(file);
// Import the pages from the current file.
copy.AddDocument(reader);
// Create an ArrayList of bookmarks in the file being imported.
// ArrayList bookmarkLst = SimpleBookmark.GetBookmark(reader);
// Shift the pages to accomidate any pages that were imported before the current document.
// SimpleBookmark.ShiftPageNumbers(bookmarkLst, pageOffset, null);
// Fill the outlines ArrayList with each bookmark as a HashTable.
// foreach (Hashtable ht in bookmarkLst)
// {
// outlines.Add(ht);
// }
// Set the page offset to the last page imported.
//copy.Writer.SetPageSize(rec);
pageOffset += reader.NumberOfPages;
f++;
}
//4: Put the outlines from all documents under a new "Root" outline and
// set them for destination document
// copy.Writer.Outlines = GetBookmarks("Root", ((Hashtable)outlines[0])["Page"], outlines);
//5: Close the PdfCopyFields object.
copy.Close();
//6: Save the MemoryStream to a file.
MemoryStreamToFile(ms, destinationFile);
}
}
catch (System.Exception e)
{
System.Console.Error.WriteLine(e.Message);
System.Console.Error.WriteLine(e.StackTrace);
System.Console.ReadLine();
}
}
public static void MemoryStreamToFile(MemoryStream MS, string FileName)
{
using (FileStream fs = new FileStream(#FileName, FileMode.Create))
{
byte[] data = MS.ToArray();
fs.Write(data, 0, data.Length);
fs.Close();
}
}