Differential Open Chromatin Locations from DNAse-seq data - sequence

I have DNase-seq data (two SAM files or two BED files) in two different cell lines (no control data in any of the cell line) from ENCODE. I would like to have enriched/significantly differentially open chromatin location. In short comparison between accessible regions between: cell-A/cell-B.
Is there any tool that would do that for me? Suggested/recommended command-line option settings for that tool would also help.
thanks

You could try the "intersectbed" program from BEDTools and using the "-wao" flag to report how much overlap exists between each element across the BED files. You can use the "-f" and "-r" flags in conjunction with the "-wao" flag to restrict the reported overlap to "reciprocal overlap". With the "-f" and "-r" flags set, you specify a reciprocal overlap threshold as a percentage of bases overlapping from file A to file B and from file B to file A for each element. This way you can search for BED elements that are covering mostly the same set of bases over the two files and simply identify those BED elements that don't show high levels of reciprocal overlap to define regions of significant difference.
Bedtools: http://code.google.com/p/bedtools/
The above site includes good documentation and examples, including an excellent downloadable PDF manual. I hope that helps!

Related

Can man pass an option to the roff formatter?

SYNOPSIS
From man(1):
-l
Format and display local manual files instead of
searching through the system's manual collection.
-t
Use groff -mandoc to format the manual page to stdout.
From groff_tmac(5):
papersize
This macro file is already loaded at start-up by troff so it
isn't necessary to call it explicitly. It provides an interface
to set the paper size on the command line with the option
-dpaper=size. Possible values for size are the same as
the predefined papersize values in the DESC file (only
lowercase; see groff_font(5) for more) except a7–d7.
An appended l (ell) character denotes landscape orientation.
Examples: a4, c3l, letterl.
Most output drivers need additional command-line switches -p
and -l to override the default paper length and orientation
as set in the driver-specific DESC file. For example, use the
following for PS output on A4 paper in landscape orientation:
sh# groff -Tps -dpaper=a4l -P-pa4 -P-l -ms foo.ms > foo.ps
THE PROBLEM
I would like to use these to format local and system man pages to print out, but want to switch the paper size from letter to A4. Unfortunately I couldn't find anything in man(1) about passing options to the underlying roff formatter.
Right now I can use
zcat `man -w man` | groff -tman -dpaper=a4 -P-pa4
to format man(1) on stdout, but that's kind of long and I'd rather have man build the pipeline for me if I can. In addition the above pipeline might need changing for more complicated man pages, and while I could use grog, even it doesn't detect things like accented characters (for groff's -k option), while man does (perhaps using locale settings).
The man command is typically intended only for searching for and displaying manual pages on a TTY device, not for producing typeset and paper printed output.
Depending on the host system, and/or the programs of interest, the a fully typeset printable form of a manual page can sometimes be generated when a program (or the whole system) is compiled. This is more common for system documents and less common for manual pages though.
Note that depending on which manual pages you are trying to print there may be additional steps required. Traditionally the following pipeline would be used to cover all the bases:
grap $MANFILE | pic | tbl | eqn /usr/pub/eqnchar | troff -tman -Tps | lpr -Pps
Your best solution for simplifying your command line would probably be to write a little tiny script which encapsulates what you're doing. Note that man -w might find several different filenames, so you would probably want to print each separately (or maybe only print the first one).

PDF File Merge Based on Filename

I have large batches of pdf files that must be merged.
Folder1 FileName Explaination: invoice12-105767-1510781492.pdf - 105767 is the component that will match with a pdf filename in Folder2.
"invoice12-" First section of the filename. This can sometimes be "invoice11-" or "invoice6-" so merging based on character length became challenging. The "invoicexx-" are based on where in the system the file came from.
"105767" Second part of the filename. This is the key component for matching and merging. this will be the filename in Folder2 it belongs with.
"-1510781492.pdf" Third part of the filename is a system generated unique ID, which can contain more or less characters.
Folder1:
invoice12-105767-1510781492.pdf
invoice12-105768-1510781484.pdf
invoice12-105769-1510781469.pdf
Folder2:
105767.pdf
105768.pdf
105769.pdf
OutputFolder:
Example I don't want to merge all the files in both folders into 1 huge file. I need them merged based on the Folder2 filename. (105767.pdf + invoice12-105767-1510781492.pdf) in that order specifically, also.
The final output should be three pdf files merged in order as follows:
105767.pdf + invoice12-105767-1510781492.pdf to make 1 file named 105767.pdf
105768.pdf + invoice12-105768-1510781484.pdf to make 1 file named 105768.pdf
105769.pdf + invoice12-105769-1510781469.pdf to make 1 file named 105769.pdf
I would appreciate any assistance with a way to automate this process. I merge over 800 files per day. This small automation would shave hours off my day and my wrist from carpel tunnel.
I primarily use Mac OS 10.13.1. I have looked around in Mac's "Automater" program and cannot figure out how to get it to do what I need. (I did figure out a great way to split files into single pages)
I downloaded pdftk server (since that is Mac compatible) but cannot figure out if this type of match and merge is capable with this program.
I have Adobe Acrobat DC Professional and it does not seem to have this match and merge function.
I am even open to other paid programs. I just need a fairly future-proof way of getting this mundane task done through automation on my Mac.
You can take a look at the APDFL library examples that are provided with sample code. These libraries are supported on Mac, but are not free.
https://dev.datalogics.com/adobe-pdf-library/sample-program-descriptions/c1samples/#mergedocuments
Here is a snippet of the code you would need to use:
APDFLDoc doc1 ( csInputFileName1.c_str(), true);
APDFLDoc doc2 ( csInputFileName2.c_str(), true);
// Insert doc2's pages into doc1.
// Here, we've stated PDLastPage, which adds the pages just before the last page of the target.
// If we specify PDBeforeFirstPage instead, doc2's pages will be inserted at the head of doc1.
PDDocInsertPages ( doc1.getPDDoc(),
PDLastPage,
doc2.getPDDoc(),
0,
PDAllPages,
PDInsertAll,
NULL, NULL, NULL, NULL);
doc1.saveDoc ( csOutputFileName.c_str(), PDSaveFull | PDSaveLinearized);

Moving the "cursor" back a line for stdout

I have a little command line tool (written in Objective C, runs under MacOS) that tracks changes to folders and applies rules to files. This tool also informs the user about the progress. It says like:
"Found 3 files of type Z and applied rule"
"Found 6 files of typ x and applied rules"
Currently, the tool outputs the feedback as an endless list but this does not look very handy. What I'm after is a solution to only type the line per file type once and then update the number in the terminal if the tool finds another file of that type. Very similar to how "top" under Unix gives the feedback.
However, to do so, I'll need to move the cursor in the terminal backwards to the beginning of the line and also one or multiple lines backwards.
Is this possible and does anybody know, how to do so?
Thanks
Norbert

Creating image retention test im builder view

I just downloaded psychopy this morning and have spent the day trying to figure out how to work with builder view. I watched the youtube video "Build your first PsychoPy experiment (Stroop task)" by Jon Pierce. In his video he was explaining how to make a conditions file with excel that would be used in his experiment. I wanted to make a very similar test where images would appear and subjects would be required to give a yes or no answer to them (the correct answer is already predefined). In his conditions file he had the columns 'word' 'colour' and 'corrANS'. I was wondering if instead of a 'word' column, I can have an 'image' column. In this column I would like to upload all my images to them in the same way I would words, and have them correlated to a correct answer of either 'yes' or 'no'. We tried doing this and uploaded images to the conditions file, but we haven't had any success in running the test successfully and were hoping somebody could help us.
Thank you in advance.
P.S. we are not familiar with python, or code in general, so we were hoping to get this running using the builder view.
EDIT: Here is the error message we are receiving when running the program
#### Running: C:\Users\mr00004\Desktop\New folder\1_lastrun.py
4.8397 ERROR Couldn't find image file 'C:/Users/mr00004/Desktop/New folder/PPT Retention 1/ Slide102.JPG'; check path?
Traceback (most recent call last):
File "C:\Users\mr00004\Desktop\New folder\1_lastrun.py", line 174, in
image.setImage(images)
File "C:\Program Files (x86)\PsychoPy2\lib\site-packages\psychopy-1.80.03-py2.7.egg\psychopy\visual\image.py", line 271, in setImage
maskParams=self.maskParams, forcePOW2=False)
File "C:\Program Files (x86)\PsychoPy2\lib\site-packages\psychopy-1.80.03-py2.7.egg\psychopy\visual\basevisual.py", line 652, in createTexture
% (tex, os.path.abspath(tex))#ensure we quit
OSError: Couldn't find image file 'C:/Users/mr00004/Desktop/New folder/PPT Retention 1/ Slide102.JPG'; check path? (tried: C:\Users\mr00004\Desktop\New folder\PPT Retention 1\ Slide102.JPG)
Yes, certainly, that is exactly how PsychoPy is designed to work. Simply place the image names in a column in your conditions file. You can then use the name of that column in the Builder Image component's "Image" field. The appropriate image file for a given trial will be selected.
It is difficult to help you further, though, as you haven't specified what went wrong. "we haven't had any success" doesn't give us much to go on.
Common problems:
(1) Make sure you use full filenames, including extensions (.jpg, .png, etc). These aren't always visible in Windows at least I think, but they are needed by Python.
(2) Have the images in the right place. If you just use a bare filename (e.g. image01.jpg), then PsychoPy will expect that the file is in the same directory as your Builder .psyexp file. If you want to tidy the images away, you could put them in a subfolder. If so, you need to specify a relative path along with the filename (e.g. images/image01.jpg).
(3) Avoid full paths (starting at the root level of your disk): they are prone to errors, and stop the experiment being portable to different locations or computers.
(4) Regardless of platform, use forward slashes (/) not backslashes (\) in your paths.
make a new folder in H drive and fill in the column of image in psychopy as e.g. 'H:\psych\cat.jpg' it works for me

How to find file on NTFS volume given a volume offset

Using a hex-editor to mount a NTFS volume, I've found an offset within the volume containing data I'm interested in. How can I figure out the full path/name of the file containing this volume offset?
Perhaps there are still some people searching for the solution. There is a tool for this problem: SleuthKit Tools.
Given an byte offset from the beginning of the partition table you have to divide it by the block size of your NTFS-Partition (usually 4096).
ifind /dev/... -d block_offset => inode_number
ffind /dev/... inode_number => Location of file
You need to read the MFT and parse the Data attributes for each file to find the one that includes the particular offset.
Note that you might need to look at every files stream, not only the default, so you have to parse all the Data attributes.
Unfortunately, I couldn't find a quick link to the binary structure of the NTFS Data attribute. you're on your own for this one.