Content related to a topic from a text file - lucene

Do we have any API that can identity content from a text file related to a particular topic?
For example I have a text file having 5000 lines of text in it.
I want to extract the text related to TOPIC ABC. Does lucene or any other api do that? Any idea?
I have used Lucene for identifying the documents that contain a particular WORDbut would like to know if we have any api that extracts the content from a file related to a particular topic.

This is quite a broad question, but from the information you have supplied it is clear you have a couple of options.
Option 1: Use an API
You could use the Thomson Reuters Open Calais platform which is the best that I have ever came across available for developers. However, it I can imagine it would get expensive over time. They provide a demo on their site which is worth checking out.
Option 2: Extend Lucene's VSM
When I say extend Lucene, I don't mean you need to. There is open-source projects readily available to be taken advantage of. For example, Luence-LDA which allows queries over Latent Dirichlet allocation (LDA). This particular project hasn't been updated in about 3/4 years so it may want to fork it or build your own.

Related

Modular MediaWiki

I wonder if it is possible to configure MediaWiki (or other wiki tools) as a modular predefined wiki. For instance, on a regular wiki page one can freely edit sections, text, everything.
I am looking for a solution that predefines a number of sections (or modules) that can be added to each wiki page. Then users are free to edit inside those sections within their predefined formats.
Hope someone can help, thanks.
As for MediaWiki, there is at least one extension that can work that way: Semantic Forms, usually used together with Semantic MediaWiki (though that is not necessary). With SF, you will define one or more templates that receives the data entered in the form, and the form can be divided into sections.
A more lighweight solution might be using one of the many boiler plate extensions available.
Either way, with a wiki you can never force your users to follow a certain scheme. The whole philosophy, making wiki's unique among collaborative tools, is that the users, not you, create not only the content but also the structure for the content!
The former Semantic Forms is now called Page_Forms and it is not dependent on SMW https://www.mediawiki.org/wiki/Extension:Page_Forms and can also make use of the Cargo extension https://www.mediawiki.org/wiki/Extension:Cargo
I would disagree that wiki users cannot or should not be forced to follow a scheme for some types of information, though the default is that they do control the categories and namespaces and can create those at will as the data evolves. All this means though is that you manage such issues socially rather than with complex permissions structures, i.e. someone undoes your change and says "do it this way instead". So it's a different kind of forcing, but, still, someone has to make sure categories don't proliferate with bad names, capitalization, etc.
The typical use of forms data is when it must be used to satisfy some legal or professional requirement (say logging for what reason a change was made for Sarbanes-Oxley, or logging what precedents were consulted for logging legal time), or will be providing input strictly to some application (like maps). It would not be a good idea to rigorize literally every page of a wiki this way.

folder structure for project documentation

I saw some questions raised about the folder structure of source codes, but I never see the question about folder structure of project documentation. I googled it and still do not see many articles talk about.
Here is one http://www.projectperfect.com.au/downloads/Info/info_project_folder_structure.pdf
To quote some of its words:
"There are two broad approaches:
Organize by phase so that each top
directory is a phase. For example,
you might have directories for
Feasibility, Business Analysis,
Design etc. or whatever your phases
are called.
Organize by function so that the top
directory level are functions. For
example, Risks, Requirements, Scope,
Change Control, Development.
Most times a mix of both are used..."
So any thought about it? I believe this is also an important issue!
IMHO depending on your document management system the choice of structure for your documents may not be an issue. When looking at the problems project related documents are trying to solve you typically come to the conclusion that documents are about communication.
Different documents attempt to communicate different things (or contexts); test plans discuss how testing should/has been executed, requirements specifications discuss how the business rules should be applied, architecture documents discuss the technical components and so forth. Each of these documents might have the need for its own unique structure. For example the structure you choose for your test plans may be vastly different from the structure you need for your architecture documents.
When keeping the communication issue and the document context in mind I generally come back to these 2 key aspects.
Searchability – What is the easiest way to find the document I am looking for?
Versioning – How do I know that the document I am looking for is the most recent one?
I feel searchability is the most important thing to remember because different people call the same document by different names. For example some people call Business Requirements documents Functional Specifications. Some people call Functional Specifications use case documents. As you cannot always govern the naming convention of documents I feel finding the right document to be far more important than the folder or place in which it is stored.
So to answer your question I would simply answer by saying it doesn’t really matter which structure you use, just that you should use some form of document management system (SharePoint, Documentum, Trim, etc). The benefits are simply too great to work without one :)

Extracting information from PDFs of research papers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need a mechanism for extracting bibliographic metadata from PDF documents, to save people entering it by hand or cut-and-pasting it.
At the very least, the title and abstract. The list of authors and their affiliations would be good. Extracting out the references would be amazing.
Ideally this would be an open source solution.
The problem is that not all PDF's encode the text, and many which do fail to preserve the logical order of the text, so just doing pdf2text gives you line 1 of column 1, line 1 of column 2, line 2 of column 1 etc.
I know there's a lot of libraries. It's identifying the abstract, title authors etc. on the document that I need to solve. This is never going to be possible every time, but 80% would save a lot of human effort.
I'm only allowed one link per posting so this is it:
pdfinfo Linux manual page
This might get the title and authors. Look at the bottom of the manual page, and there's a link to www.foolabs.com/xpdf where the open source for the program can be found, as well as binaries for various platforms.
To pull out bibliographic references, look at cb2bib:
cb2Bib is a free, open source, and multiplatform application for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files.
You might also want to check the discussion forums at www.zotero.org where this topic has been discussed.
We ran a contest to solve this problem at Dev8D in London, Feb 2010 and we got a nice little GPL tool created as a result. We've not yet integrated it into our systems but it's there in the world.
https://code.google.com/p/pdfssa4met/
Might be a tad simplistic but Googling "bibtex + paper title" ussualy gets you a formated bibtex entry from the ACM,Citeseer, or other such reference tracking sites. Ofcourse this is assuming the paper isn't from a non-computing journal :D
-- EDIT --
I have a feeling you won't find a custom solution for this, you might want to write to citation trackers such as citeseer, ACM and google scholar to get ideas for what they have done. There are tons of others and you might find their implementations are not closed source but not in a published form. There is tons of research material on the subject.
The research team I am part of has looked at such problems and we have come to the conclusion that hand written extraction algorithms or machine learning are the way to do it. Hand written algorithms are probably your best bet.
This is quite a hard problem due to the amount of variation possible. I suggest normalizing the PDF's to text (which you get from any of the dozens of programmatic PDF libraries). You then need to implement custom text scrapping algorithms.
I would start backward from the end of the PDF and look what sort of citation keys exist -- e.g., [1], [author-year], (author-year) and then try to parse the sentence following. You will probably have to write code to normalize the text you get from a library (removing extra whitespace and such). I would only look for citation keys as the first word of a line, and only for 10 pages per document -- the first word must have key delimiters -- e.g., '[' or '('. If no keys can be found in 10 pages then ignore the PDF and flag it for human intervention.
You might want a library that you can further programmatically consult for formatting meta-data within citations --e.g., itallics have a special meaning.
I think you might end up spending quite some time to get a working solution, and then a continual process of tuning and adding to the scrapping algorithms/engine.
In this case i would recommend TET from PDFLIB
If you need to get a quick feel for what it can do, take a look at the TET Cookbook
This is not an open source solution, but it's currently the best option in my opinion. It's not platform-dependant and has a rich set of language bindings and a commercial backing.
I would be happy if someone pointed me to an equivalent or better open source alternative.
To extract text you would use the TET_xxx() functions and to query metadata you can use the pcos_xxx() functions.
You can also use the commanline tool to generate an XML-file containing all the information you need.
tet --tetml word file.pdf
There are examples on how to process TETML with XSLT in the TET Cookbook
What’s included in TETML?
TETML output is encoded in UTF-8 (on zSeries with USS or
MVS: EBCDIC-UTF-8, see www.unicode.org/reports/tr16), and includes the following information:
general document information and metadata
text contents of each page (words or paragraph)
glyph information (font name, size, coordinates)
structure information, e.g. tables
information about placed images on the page
resource information, i.e. fonts, colorspaces, and images
error messages if an exception occurred during PDF processing
CERMINE - Content ExtRactor and MINEr
Described in the paper: TKACZYK, Dominika, et al. CERMINE: automatic extraction of structured metadata from scientific literature. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18.4: 317-335.
Mainly written in Java and available as open source at github.
Another Java library to try would be PDFBox. PDFs are really designed to viewed and printed, so you definitely want a library to do some of the heavy lifting for you. Even so, you might have to do a little gluing of text pieces back together to get the data you want extracted. Good Luck!
Just found pdftk... it's amazing, comes in a binary distribution for Win/Lin/Mac as well as source.
In fact, I solved my other problem (look at my profile, I asked then answered another pdf question .. can't link due to 1 link limitation).
It can do pdf metadata extraction, for example, this will return the line containing the title:
pdftk test.pdf dump_data output test.txt | grep -A 1 "InfoKey: Title" | grep "InfoValue"
It can dump title, author, mod-date, and even bookmarks and page numbers (test pdf had bookmarks)... obviously a bit of work will be needed to properly grep the output, but I think this should fit your needs.
If your pdfs don't have metadata (ie, no "Abstract" metadata), you can cat the text using a different tool like pdf2text, and use some grep tricks like above. If your pdfs are not OCR'd, you have a much bigger problem, and ad-hoc querying of the pdf(s) will be painfully slow (best to OCR).
Regardless, I would recommend you build an index of your documents instead of having each query scan the file metadata/text.
Take a look at iText. It is a Java library that will let you read PDFs. You will still face the problem of finding the right data, but the library will provide formatting and layout information that might be usable to infer purpose.
PyPDF might be of help. It provides extensive API for reading and writing the content of a PDF file (un-encrypted), and its written in an easy language Python.
Have a look at this research paper - Accurate Information Extraction from Research Papers using Conditional Random Fields
You might want to use an open-source package like Stanford NER to get started on CRFs.
Or perhaps, you could try importing them (the research papers) to Mendeley. Apparently, it should extract the necessary information for you.
Hope this helps.
Here is what I do using linux and cb2bib.
Open up cb2bib and make sure that clipboard connection is ON, and that your reference database is loaded
Find your paper on google scholar
Click 'import to bibtex' underneath the paper
Select (highlight) everything on the next page (ie., the bibtex code)
It should now appear formatted in cb2bib
Optionally now press network search (the globe icon) to add additional info.
Press save in cb2bib to add the paper to your ref database.
Repeat this for all the papers. I think in the absence of a method that reliably extracts metadata from PDFs, this is the easiest solution I found.
I recommend gscholar in combination with pdftotext.
Although PDF provides meta data, it is seldomly populated with correct content. Often "None" or "Adobe-Photoshop" or other dumb strings are inplace of the title field, for example. That is why none of the above tools might derive correct information from PDFs as the title might be anywhere in the document. Another example: many papers of conference proceedings might also have the title of the conference, or the name of the editors which confuses automatic extraction tools. The results are then dead wrong when you are interested of the real authors of the paper.
So I suggest a semi-automatic approach involving google scholar.
Render the PDF to text, so you might extract: author, and title.
Second copy paste some of this info and query google scholar. To automate this, I employ the cool python script gscholar.py.
So in real life this is what I do:
me#box> pdftotext 10.1.1.90.711.pdf - | head
Computational Geometry 23 (2002) 183–194
www.elsevier.com/locate/comgeo
Voronoi diagrams on the sphere ✩
Hyeon-Suk Na a , Chung-Nim Lee a , Otfried Cheong b,∗
a Department of Mathematics, Pohang University of Science and Technology, South Korea
b Institute of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
Received 28 June 2001; received in revised form 6 September 2001; accepted 12 February 2002
Communicated by J.-R. Sack
me#box> gscholar.py "Voronoi diagrams on the sphere Hyeon-Suk"
#article{na2002voronoi,
title={Voronoi diagrams on the sphere},
author={Na, Hyeon-Suk and Lee, Chung-Nim and Cheong, Otfried},
journal={Computational Geometry},
volume={23},
number={2},
pages={183--194},
year={2002},
publisher={Elsevier}
}
EDIT: Be careful, you might encounter captchas. Another great script is bibfetch.

Company insists on using a binary format for all our documentation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I work at a company that, for some reason, insists that all our development documentation should be in MS Word format. Which, being a binary format, means we cannot:
Diff versions of a document against each other (so peer reviewing them is a pain - because of the domain we work in, peer reviews for all changes are essential)
Grep a folder-full of documents for keywords
What do you use to write documentation in and why?
Please also give me ammo to change this situation with...
I recently started using DocBook XML to author my documentation.
On the upside, it's a pure text format. You can break a large document into multiple files, and use nodes to bring them all together into a single book. Table of contents and index are automatically generated. Intra-document links (within arbitrary text, pointing to chapters or sections) are very easy. And with a push of a button, I can create a single-html-file version, a chunked-html version (one file per chapter), and a PDF version.
After some tweaking and customization, I'm very happy with the output. The documents look great!!
DocBook is used extensively by real publishers (most notably, O'Reilly), and it's been around for more than fifteen years, so it's reached a certain level of maturity.
On the other hand, all of the processing is done with XSLT, using an ad-hoc collection of tools. (My own docbook pipeline includes Python, Java, Xerces, Xalan, Apache FOP, and PDF-SAM. Plus the official XSLT stylesheet distribution, and my own XSLT customizations.)
DocBook is not a turnkey solution. You won't be able to get going quickly, without reading the manual. And if you don't know anything about XSLT, you'll have to learn.
On the other hand, there are only a dozen or two XML tags that you really need to know to write the documents. (The real expertise comes into play during doc generation from the XML sources.) If one person on your team was willing to be responsible for writing the doc build script, then everyone else on the team could just learn the DTD and do a decent job contributing.
Anyhow... DocBook definitely has some faults. It's not the easiest system for tech authorship. But it's the best open source tool I know of.
The "Subversion Book" is written in DocBook. Here's a page with links to the different book versions (single-html, chunked-html, and PDF):
http://svnbook.red-bean.com/
And here's a link to the DocBook XML sources for the first chapter, so that you can get an idea for how it works:
http://sourceforge.net/p/svnbook/source/HEAD/tree/branches/1.7/en/book/ch01-fundamental-concepts.xml
For ammo, there's the trusty old Pragmatic Programmer, chapter 14: The Power of Plain Text.
As Pragmatic Programmers, our base
material isn't wood or iron, it's
knowledge. We gather requirements as
knowledge, and then express that
knowledge in our designs,
implementations, tests, and documents.
And we believe the best format for
storing knowledge persistently is
plain text. With plain text, we give
ourselves the ability to manipulate
knowledge, both manually and
programmatically, using virtually
every tool at our disposal.
We use a wiki (specifically the one provided by Trac) for the two reasons you mentioned. Plus, if we really need to we can get the text version of the markup and manipulate it in a text-only environment, too (e.g. as part of svn comments during commit).
A format that can be easily reduced to text-only (non-binary) is definitely a must. Having the ability to upconvert it to a pretty format like a PDF is, for us, not terribly important.
Word has change tracking for documents (although it only works up until you accept the changes) and you can also grep them (the text isn't encrypted). So I'm not sure either of your arguments will hold up under scrutiny. I'd love to give you the ammo to change this but I've become jaded and cynical with age.
We use MS Word for our docs (which is a huge improvement over the earlier choice (Lotus WordPro - ugh!).
We use a wiki - specifically Confluence by Atlassian.
It's a commercial product, and it's great. One of the reasons we picked it over free/open wiki engines is that it has a full-blown WYSIWYG editor and various other features that make it more easily accessible to users who are familiar with Word.
We've also come up with a neat trick where we store images, designs, wireframes, etc. in Subversion, and then embed links in the wiki documents to those resources URLs via the Apache/SVN web interface module; notes on how we do this are here if you're interested.
Like Dylan's organisation, we also use the excellent Confluence wiki. I wrote an article about why this is better approach called Wiki is my word-processor, which should give you some reasons to change the situation.
Benefits of using a wiki for internal documentation include the following.
Word-processor users get sucked into changing the layout and typography, however good your templates are, which wastes time and reduces consistency.
A wiki provides full-text search, which you are unlikely to have for your body of the MS Word documents written by everyone.
A wiki provides a document version history; I have never heard of a team successfully keeping all revisions in Word documents and always being able to compare old versions, or using a version control system (with the possible exception of SharePoint but that's whole different failure scenario).
A wiki makes hyperlinks between documents easy; it is too hard to reliably link between documents in a collection of Word documents, so new documents end up duplicating older content into new monolithic documents which means they take more time to read and write.
Separate wiki pages can be edited by different people at the same time, and Confluence can merge changes when multiple people edit the same page at the same time; collaboration is harder with a Word document that only one person can edit at a time.
A wiki like Confluence automatically generates navigation pages based on wiki structure and tags; you need a librarian and lots of discipline to make it possible to browse a large collection of Word documents.
A wiki page usually loads and displays more quickly than a Word document.
A wiki page has more automatic meta-data; you need templates and discipline to make sure that Word documents always have Title, Author and Version set in the document properties and visible in the document on-screen and in print.
If you want more ammunition than this, then there is lots of wiki-promotion on The Atlassian Blog.
You could ask for documentation to be in OOXML (.docx, in the case of Word) format. Not as ideal as using ODT, in my opinion, however, it's still just a zip file with a bunch of XML files inside. :-)
A textual format facilitates merging your documentation with generated items such as JavaDoc, API references or data dictionaries. It also scales much better than word, which is hard to use for large documents. Finally, a format that allows includes allows multiple authors to work on a document concurrently.
LaTeX and FrameMaker (the two systems I have used for this) both have vastly superior indexing and cross-referencing capabilities and have either a native textual format or a textual version of their native format that can be included (MIF in the case of Framemaker). They are also both much more stable than word.
I've built tools that read data dictionaries and generate documentation that can be included into a larger document with stable indexing and two-way cross-referencing. The functional specification for This product was done with LaTeX in this way and got me another gig with the company. I have also developed a similar process with FrameMaker.
Is the entire development team against this requirement, or is it a small group? If it's the entire team, just ignore the mandate and use a text-based format -- wouldn't be the first time employees ignored a silly rule. Works especially well if you've not made a big fuss about it in the past. If you have, management might look especially hard at your docs.
MS Word supports document changes tracking and peer review.
The new MS Office format is fully XML based (to see this, rename a MS Word .docx file to a .zip, then unpack it to see).
Maybe Office 2007 may fit both your company requirements and your concerns ?
You can at least compare Word documents, see the "Track changes" command in the "Extra" menu, or use software like DeltaView. Found via google search first link at lifehacker.com. Searching in word documents should be possible with Google Desktop Search or other similar programs that index all files they are able to read.
Do they insist that you write it in Word or only that it's available in Word format? You could write in a text format and convert it to Word automatically.
Don't you store documentation files in some kind of Version Control System, ideally together with the source code? I would recommend to do this (makes it easy to get the documentation for old software releases).
And if you do store the docs in VCS, you will notice that plain text or XML-bases files are much better for this, because you can get diffs; also, changes between text files are usually stored more efficiently than changes between binary files.
Not to defend MS products here, but MS word can diff documents.
If you use Beyond Compare as the diff tool for your source-control system (As we do, with Perforce), it will show you differences between revisions of your Word docs. Admittedly, it only shows the textual differences - formatting changes are not shown - but this is usually enough for you to see what changed.
This is just another reason to invest in Beyond Compare, as it is one of the most polished pieces of software I've ever used - and it's the best $30 dollars (Less if you buy several) I've spent on software
There are many tools for word document comparison. I currently use a python script that puts a command-line on the built-in compare and merge functionality of word.
http://nicolas.lehuen.com/index.php/post/2005/06/30/60-comparing-microsoft-word-documents-stored-in-a-subversion-repository
It should be easy to automate word to extract all text from a word document into a text file. So you could write a script creating text files from word docs, and grep, compare, version control, Review these text files.
Of course this is not an ideal solution, since you loose your pretty formatting, but it should work.
I think there are programmes that convert Word docs to plain text. Use one of them to convert the word doc to plain text and then use diff, grep etc
Also have a look into recommended toolchain(s) for DocBook.

Tools to effectively manage the information?

How do you guys manage the information overflow?
What are the tools that you guys use?
One of the usefull tool is RSS feed reader.
Does Any body uses any other tools or any other ways to effectively manage the information?
Be an information snob.
If the blog doesn't absolutely rock your world, don't read it. It's so easy to get bogged down, even obsessed, with too much information. No matter what tools you have, you're still human and can only read so many words per day.
I use Evernote to keep notes and search through them.
I use Google Reader for the feeds. Split it up in multiple categories, 'A' with the more unique stuff, 'B' with the spam (Digg for example, easy to ignore because the important stuff shows up in 'A'), 'C' for my webcomics.
I always read the stuff in 'A', when bored I read 'C' and 'B' when I have spare time. It happens a lot of time that I'll mark 'B' as read just to get rid of it.
For work I'm stuck with Outlook, so I use the 'Tasks' function of Outlook a lot to get things sorted. Also a big believer of 'Inbox Zero' (http://www.43folders.com/izero).
I use a small number of tools and techniques, because it is easy to get distracted managing the information management tools, rather than managing the information.
Google Reader - The key for me was creating #work and #home labels, for the appropriate location.
TiddlyWiki - I keep track of all my notes for work projects in a TiddlyWiki file.
Delicious - I keep my bookmarks here. When I come across a link I want to read later (usually in my RSS Reader), I tag it #readreview. When I read it, I delete it unless it is useful reference, then I retag appropriately.
Local bookmarks - I store bookmarks on the browser toolbar in folders so I can middle-click and open all in tabs. Obviously these would be limited in number :-). I also have a bookmarklets folder.
I don't have a PDA. I have a pad of graph paper on my desk that I use for writing temporary notes and diagrams (permanent notes go into the TiddlyWiki). A lot of "productivity blogs" like to promote various tools, and some of these caught on for people, but I find my system is pretty simple and easy for me to manage. This makes it useful.
Well, this is an obvious one, but iGoogle seems to do a great job for me.
Depends on what information you are looking to manage. Can you be more specific?
I use google reader to handle things i read, RememberTheMilk to remind me of what i have to do, and gmail overall to quickly store and search data/correspondences.
Oh and i use the hipster PDA too!
You should probably check out Lifehacker for more tools and Getting Things Done apps.
Like you say most sites have a RSS feed today. Get a RSS Reader that sync between computers if you use more than one computer, so you don't have to mark alot of post as read. A good program is FeedDeamon, its free and sync between computers, there is even a online version as well, if you are on the road. FeedDeamon also have tools to help you identify the feeds, that you dont really read, and gives you a top 10 of feeds that you look on alot. This can help you delete bad RSS-Feeds, and also help you organize you're feeds.
I also use Delicious, to keep my bookmarks in sync, and is very handy if you bookmark alot.
Other than that, I don't really use any more tools - just the common sence that there is only 24 hours in the day, so dont use it to just read information that you don't need - bookmark interesting blog post from RSS, and read them later when you need to.
I've been using Delicious quite a bit over the past 2 years and it's been a great help.
If you're primarily interested in blogs, what I think we need is a way to prioritize the information that we, personally, are interested in. There used to be an RSS reader called wTicker (now demised) that used Bayesian filtering to rate articles for you. Another product under development, Particls, would similarly watch what you read and highlight similar content.
What about other types of information, though? For example, the tasks that OneNote or EverNote, or more obscure tools like Zoot aim to facilitate?
It depends on the type of information you meant. The answers above contain most of the tools. But if you use ms office you shall explore Office OneNote.
iGoogle: News, RSS, Wether, New Films, E-Mail widgets
ToDoList: every day work aspects
Local MediaWiki, for local company knowleges
Smartphone MS Excel for personal finances.
I still read news and blogs from RSS feeds. Feedly is the best tool for that right now.
When I find something interesting in Feedly, I add it to Pocket and read later. A Premium account allows me to highlight paragraphs I would like to save.
I also set up a receipt on IFTTT that monitors my likes on Twitter and adds links from the liked tweets to Pocket too.
As Substack grows, there is the new email newsletter boom. But my inbox is also a place where I do my work. So, I wrote an apps script file to receive newsletters once or twice a day and prevent them from distracting me from work. And then, I published it as Silent Inbox add-on that plugs into your Gmail.