I need to create a pdf and write into it from a list using c# but not use any third party document nor ITextSharp.Just plain and simple C# code [closed] - pdf

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I need to create an excel file and write into it from a list of objects and convert the excel into a pdf using c# but not use any third party document nor ITextSharp. Just plain and simple C# code. No other components.

You don't want to do this. Here's why: PDF is a non-trivial file format. Yes, you can sit down with the spec and come up with an understanding and let's assume that you are as efficient and clever an engineer as me (I work on PDF tools for a living and worked on Acrobat). It will take you at least two to three months working full-time to have the infrastructure to create decently rich PDFs in a way that is reliable and maintainable. Let's say that you're not as good an engineer as me. It will likely take you the same three months to create tools to generate PDF and the tools will be hard to maintain, hard to extend for future needs, and likely will generate incorrect PDF that Acrobat and my tools are expected to accept and interpret in a meaningful way.
Maybe your project has dictated "NO THIRD PARTY ASSEMBLIES." If so, you might want to explain the cost of your time (and future time) and why it's not so bad to have a third party library. I can tell you right now that the tools I build aren't cheap, until you consider that the time it will take you to go from download to a generated PDF document will be somewhere between a couple hours and a couple days, depending on how complex your needs are and you will not need to know the PDF spec.
But if you choose to go this route, be sure to download the PDF spec from Adobe and make sure that you read section 14.3.3 about the Document Information Dictionary, and make sure that you fill in the Producer section in every document you generate, because whenever I find PDF files that are wildly out of spec, that's the first place I look to find out who was responsible and I make note and promise to curse loudly and question your parentage (fair's fair - if you use our tools and find that my code has generated a bad PDF, you're free to do the same and I invite you to contact our support line so we can make it right).

There is an open source library called PDF Sharp
About the excel file, just use an open source file format for it.

Related

Difference between VB6.0, VB2010, VB.NET [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have studied VB 6.0 but have hardly any knowledge of .NET. Can someone please tell me the difference between the three versions namely VB6.0, VB 2010 express and VB.NET?
Now this is a somewhat wide question, but in short, VB.NET is the language and VB 2010 is a VB.NET version released with Visual Studio 2010 and .NET 4.
So the main comparison should really be between VB6 and VB.NET because that's where you'll find the big differences.
VB.NET includes a lot of functionality that has been around in other languages like C++ for ages, and is by some considered way to different from VB6 even to be called VB anymore. But let's set aside the arguing for a moment, what are those new shining thingies? Well, among other you have this:
True object oriented inheritance
Overloading
Free Threading
Strict type checking
and alot more. Then there are some changes that might be a bit harder to adjust to since they're to close to the old one, like zero-based arrays, returning values from functions using a
return statement instead of using the function name, passing of parameters by value instead of by reference, new error handling (using try, catch, finally etc), usage of namespaces etc. The list goes on and on.
The shear breadth of the .NET Framework which VB.NET makes use of makes it a more versatile platform (IMO). It also runs in the CLR (Common Language Runtime) which is more or less a virtual machine with a just-in-time compilation engine.
When it comes to compiling, VB6 compiled to native code while VB.NET compiles to CIL (Common Intermediate Language) which makes it a lot easier to reverse engineer, however you can obfuscate the code in order to make it less readable.
As you can see from what I just wrote it's quite a wide subject, but if you have a more precise question, feel free to ask, otherwise I hope you have a bit clearer image of the differences now. :)

Why do IDEs have Projects? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
Why do IDEs structure code as "Projects" or "Solutions"? And no, I'm not trying to troll, I really want to know what people use them for.
It always seems to me like "Project" is just a redundant alias for "executable", and I find the "Project" structure tends to get in my way when I want to share code across several executable processes. This is especially true in languages like Java, where there's already a rigorous packaging system for organizing code with, but it applies to pretty much every IDE I've seen. So why do they always adopt this structure? Is there some trick to using it?
It always seems to me like "Project" is just a redundant alias for "executable",
I actually tend to think of "project" more as a "compilation unit" or a "deployment item" - at least for most compiled languages. Projects typically map to a single executable or library (or other compilation unit in languages where that's supported).
As such, a "project" is a very valuable method of organization.
Not all IDEs use these names, but in general, they are a way to organize code.
This is needed in any code base of a certain size - some sort of hierarchy that helps and logically separate code components from each other.
A solution can contain multiple projects. This concept is very much useful when you have a tiered software achitecture.
e.g. A solution can have following projects:
Data Access Layer Project
Business Layer Project
Presentation Layer Project
Every Project (tier) has a specific purpose in the same website.

When do you need a new class in Xcode [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I am just learning objective c and xcode. I love the interface building, but as far as the coding goes, I'm a little lost.
Basically, I am trying to understand when I need a new class. I am reading a book that covers the "how-to" really well, but not the "why."
I am building an application and I have the interface pretty much complete.
In other words, I have a lot of NSObjects floating around but, unfortunately, these objects don't know how to communicate with each other or with the underlying program. Here's the hypothetical.
I have several text fields that will eventually communicate their input to tables within my interface and also external PDF templates.
My basic understanding of Obj. C and Xcode is that in order for the text field to communicate it's contents to either the PDF file or the table, I will need to create a new class to specify the text field's contents as a variable, and to send that variable somewhere (PDF or table).
However, if I have a button that will ultimately be responsible for sending the textfield's data somewhere, I will also need to make a connection between the button and the text field, like this.
(button) --- fetches ---> (text field contents) ---sends to ---> (table)
So, up this point, I would be including all of this in one class, right? With the text field input as a variable that I declare in my header file, the button's method/action which I include in the header file and implement in the .m file, and the table which would also be declared in both .h. and .m files?
Am I on the right track? Also, this is just one connection from one text field. If I decided to this with more text fields in the application, would I have to create a separate class for each? Or could I use the same class and distinguish them by id's?
I am clearly a noob.
I think the piece of the puzzle that you're missing is design patterns. The docs that you've read provide the how - how to create classes, add methods, etc. That's like learning preparation techniques in a cooking class. How to chop, blend, dice, marinate, etc. Design patterns are the higher-level recipes that show you how to put it all together, using the techniques you've learned to assemble the ingredients into a finished meal.
To get started, have a look at the Design Patterns section in Apple's own Cocoa Fundamentals Guide.

How to easily link a document with others [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
let's say that in order for me to make clear how my program works or even for the own sake of documentation I am writing some text document in word or whatever text editor or online wit google docs, etc. Let's say that at some part of the document I open a new one and extend there the idea. Then later, I go back to the "master" document and add some hyperlink or just put reference (it follows in doc XXXX, page YYYY). I wonder what could be the fastest approach to do this
* google docs adding easily hyperlink? but how can I organize docs effectively?
* some text editor with this functionality?
Thanks for your responses
P.D. Are there easier solutions than WIKI's? In terms of easiness of installation and setup
Wiki's are definitely great for that. I have used two notetaking solutions that have what you want (I've used more, but these were the best for me).
TiddlyWiki
Great feature set since it's javascript/html with wiki functionality: intralinking to pages you create, embedding of images from any directory, hyperlinks, and printability
Also has an amazing amount of plugins at places like TiddlyTools and other repos like it
I combined it with TeamTasks which not only made it look better, but brought in some todo functionality; with TiddlyWiki Address Book (twab) for contact management, and several other plugins that worked great
emacs + orgmode
This is my top choice today for notetaking and all kinds of stuff
It is literally unbelievable. Todos, contacts, scheduling, linking, export to html, export to PDF via LaTeX, interfacing with gnuplot, TikZ, ditaa, R (like Matlab), etc.
Inter and intralinking are very easy. See THIS for specifically how orgmode deals with various linking.
Lastly, if orgmode sounds interesting, I would strongly recommend checking out the orgmode wiki, Worg, where tutorials, videos, and screencasts exist.
Your initial inquiry mentions simplicity, so perhaps one of the wikis is best. If you're unfamiliar with them, it's quite easy to just download a copy (just an html file), open it in a browser and play around. There's a lot of documentation out there to help you and your linking is as often as easy as writing something like [[page::section]] or similar in your document.
On the other hand, if you're looking for advanced usage and especially exporting of documents, I have yet to find anything to beat orgmode for me! I highly recommend it and the mailing list is very, very active and supportive.
Every document type or system has its own standard for interlinking documents. HTML has hyperlinks, MS Word has document references, etc. I personally find wikis to be the easiest way to grow and expand interlinked documents. HTML hyperlinks are probably the most common standard way of linking documents, and wiki syntax makes it especially easy.

Extracting information from PDFs of research papers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need a mechanism for extracting bibliographic metadata from PDF documents, to save people entering it by hand or cut-and-pasting it.
At the very least, the title and abstract. The list of authors and their affiliations would be good. Extracting out the references would be amazing.
Ideally this would be an open source solution.
The problem is that not all PDF's encode the text, and many which do fail to preserve the logical order of the text, so just doing pdf2text gives you line 1 of column 1, line 1 of column 2, line 2 of column 1 etc.
I know there's a lot of libraries. It's identifying the abstract, title authors etc. on the document that I need to solve. This is never going to be possible every time, but 80% would save a lot of human effort.
I'm only allowed one link per posting so this is it:
pdfinfo Linux manual page
This might get the title and authors. Look at the bottom of the manual page, and there's a link to www.foolabs.com/xpdf where the open source for the program can be found, as well as binaries for various platforms.
To pull out bibliographic references, look at cb2bib:
cb2Bib is a free, open source, and multiplatform application for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files.
You might also want to check the discussion forums at www.zotero.org where this topic has been discussed.
We ran a contest to solve this problem at Dev8D in London, Feb 2010 and we got a nice little GPL tool created as a result. We've not yet integrated it into our systems but it's there in the world.
https://code.google.com/p/pdfssa4met/
Might be a tad simplistic but Googling "bibtex + paper title" ussualy gets you a formated bibtex entry from the ACM,Citeseer, or other such reference tracking sites. Ofcourse this is assuming the paper isn't from a non-computing journal :D
-- EDIT --
I have a feeling you won't find a custom solution for this, you might want to write to citation trackers such as citeseer, ACM and google scholar to get ideas for what they have done. There are tons of others and you might find their implementations are not closed source but not in a published form. There is tons of research material on the subject.
The research team I am part of has looked at such problems and we have come to the conclusion that hand written extraction algorithms or machine learning are the way to do it. Hand written algorithms are probably your best bet.
This is quite a hard problem due to the amount of variation possible. I suggest normalizing the PDF's to text (which you get from any of the dozens of programmatic PDF libraries). You then need to implement custom text scrapping algorithms.
I would start backward from the end of the PDF and look what sort of citation keys exist -- e.g., [1], [author-year], (author-year) and then try to parse the sentence following. You will probably have to write code to normalize the text you get from a library (removing extra whitespace and such). I would only look for citation keys as the first word of a line, and only for 10 pages per document -- the first word must have key delimiters -- e.g., '[' or '('. If no keys can be found in 10 pages then ignore the PDF and flag it for human intervention.
You might want a library that you can further programmatically consult for formatting meta-data within citations --e.g., itallics have a special meaning.
I think you might end up spending quite some time to get a working solution, and then a continual process of tuning and adding to the scrapping algorithms/engine.
In this case i would recommend TET from PDFLIB
If you need to get a quick feel for what it can do, take a look at the TET Cookbook
This is not an open source solution, but it's currently the best option in my opinion. It's not platform-dependant and has a rich set of language bindings and a commercial backing.
I would be happy if someone pointed me to an equivalent or better open source alternative.
To extract text you would use the TET_xxx() functions and to query metadata you can use the pcos_xxx() functions.
You can also use the commanline tool to generate an XML-file containing all the information you need.
tet --tetml word file.pdf
There are examples on how to process TETML with XSLT in the TET Cookbook
What’s included in TETML?
TETML output is encoded in UTF-8 (on zSeries with USS or
MVS: EBCDIC-UTF-8, see www.unicode.org/reports/tr16), and includes the following information:
general document information and metadata
text contents of each page (words or paragraph)
glyph information (font name, size, coordinates)
structure information, e.g. tables
information about placed images on the page
resource information, i.e. fonts, colorspaces, and images
error messages if an exception occurred during PDF processing
CERMINE - Content ExtRactor and MINEr
Described in the paper: TKACZYK, Dominika, et al. CERMINE: automatic extraction of structured metadata from scientific literature. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18.4: 317-335.
Mainly written in Java and available as open source at github.
Another Java library to try would be PDFBox. PDFs are really designed to viewed and printed, so you definitely want a library to do some of the heavy lifting for you. Even so, you might have to do a little gluing of text pieces back together to get the data you want extracted. Good Luck!
Just found pdftk... it's amazing, comes in a binary distribution for Win/Lin/Mac as well as source.
In fact, I solved my other problem (look at my profile, I asked then answered another pdf question .. can't link due to 1 link limitation).
It can do pdf metadata extraction, for example, this will return the line containing the title:
pdftk test.pdf dump_data output test.txt | grep -A 1 "InfoKey: Title" | grep "InfoValue"
It can dump title, author, mod-date, and even bookmarks and page numbers (test pdf had bookmarks)... obviously a bit of work will be needed to properly grep the output, but I think this should fit your needs.
If your pdfs don't have metadata (ie, no "Abstract" metadata), you can cat the text using a different tool like pdf2text, and use some grep tricks like above. If your pdfs are not OCR'd, you have a much bigger problem, and ad-hoc querying of the pdf(s) will be painfully slow (best to OCR).
Regardless, I would recommend you build an index of your documents instead of having each query scan the file metadata/text.
Take a look at iText. It is a Java library that will let you read PDFs. You will still face the problem of finding the right data, but the library will provide formatting and layout information that might be usable to infer purpose.
PyPDF might be of help. It provides extensive API for reading and writing the content of a PDF file (un-encrypted), and its written in an easy language Python.
Have a look at this research paper - Accurate Information Extraction from Research Papers using Conditional Random Fields
You might want to use an open-source package like Stanford NER to get started on CRFs.
Or perhaps, you could try importing them (the research papers) to Mendeley. Apparently, it should extract the necessary information for you.
Hope this helps.
Here is what I do using linux and cb2bib.
Open up cb2bib and make sure that clipboard connection is ON, and that your reference database is loaded
Find your paper on google scholar
Click 'import to bibtex' underneath the paper
Select (highlight) everything on the next page (ie., the bibtex code)
It should now appear formatted in cb2bib
Optionally now press network search (the globe icon) to add additional info.
Press save in cb2bib to add the paper to your ref database.
Repeat this for all the papers. I think in the absence of a method that reliably extracts metadata from PDFs, this is the easiest solution I found.
I recommend gscholar in combination with pdftotext.
Although PDF provides meta data, it is seldomly populated with correct content. Often "None" or "Adobe-Photoshop" or other dumb strings are inplace of the title field, for example. That is why none of the above tools might derive correct information from PDFs as the title might be anywhere in the document. Another example: many papers of conference proceedings might also have the title of the conference, or the name of the editors which confuses automatic extraction tools. The results are then dead wrong when you are interested of the real authors of the paper.
So I suggest a semi-automatic approach involving google scholar.
Render the PDF to text, so you might extract: author, and title.
Second copy paste some of this info and query google scholar. To automate this, I employ the cool python script gscholar.py.
So in real life this is what I do:
me#box> pdftotext 10.1.1.90.711.pdf - | head
Computational Geometry 23 (2002) 183–194
www.elsevier.com/locate/comgeo
Voronoi diagrams on the sphere ✩
Hyeon-Suk Na a , Chung-Nim Lee a , Otfried Cheong b,∗
a Department of Mathematics, Pohang University of Science and Technology, South Korea
b Institute of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
Received 28 June 2001; received in revised form 6 September 2001; accepted 12 February 2002
Communicated by J.-R. Sack
me#box> gscholar.py "Voronoi diagrams on the sphere Hyeon-Suk"
#article{na2002voronoi,
title={Voronoi diagrams on the sphere},
author={Na, Hyeon-Suk and Lee, Chung-Nim and Cheong, Otfried},
journal={Computational Geometry},
volume={23},
number={2},
pages={183--194},
year={2002},
publisher={Elsevier}
}
EDIT: Be careful, you might encounter captchas. Another great script is bibfetch.