More deletions than asterisks in samtools pileup file - samtools

I am looking at a 6-bp deletions in a pileup file. I see the deletion being reported at position n-1 as .-6ACGTAC around 13k times. However, when I check positions n to n+5 where the deletion actually occurs, I see asterisks being reported 12.8k times. ~200 occurrences are therefore missing.
In a simple example with a 2bp deletion at locus A, this would look like :
locus position sequence quality
A n-1 .-2AC.-2AC.-2AC.-2AC.-2AC.-2AC XXXXXX
A n *** XXX
A n+1 *** XXX
How could there be less asterisks than there are deletion occurrences?
Many thanks!

Related

provide codes to solve prob

Problem
A person is said to be sleep deprived if he slept strictly less than 7 hours in a day. Chef was only able to sleep X hours yesterday. Determine if he is sleep deprived or not.
Input Format
• The first line contains a single integer T-the number of test cases. Then the test cases follow.
• The first and only line of each test case contains one integer X-the number of hours Chef slept.
Output Format
For each test case, output YES if Chef is sleep-deprived. Otherwise, output NO.
You may print each character of YES and NO in uppercase or lowercase (for example, yes, yes, Yes will be considered identical).
its expected to solve the problem using python codes

I would like to MOVE just one line in Vim

yy and p should copy and paste 1 line of text.
But I have to go back and delete the original line.
:2,5m10
should move lines from 2 to 5 to line 10. however I need to enable :set number
to see what lines I am moving
I would like to move just 1 line of text, sort of like yy+p and not use :2,3m10
to move just one line.
Is there something like mm+p ?
so it copies the current line into buffer and deletes the line and you p paste it where you want ?
:3m . moves line 3 to your current line.
Above line does the function I want. Can I set a key mapping so
that "mm" replaces ":3m." ? I find it easier to type. TIA
What you're describing is the default behaviour when using dd -it deletes a
line into the buffer and p will paste it.
So dd and p works.
If you're new to vim, then it might seem a little strange that 'yanking' (with
y) and 'deleting' (with d) both copy to the buffer, given the 'cut', 'copy'
and 'paste' behaviours of most other editors.
You can read more about it with :help change.txt and in particular :help registers.
Also, since you say you need to enable :set number, I wonder if you've come
across :set relativenumber? This is very useful - in the example below, the
numbers would look this way if the your cursor was on the line with
'demonstrate':
3 This is just
2 a small
1 example to
0 demonstrate
1 how relative
2 numbers can
3 be useful
Thus if you wanted to move the line 'a small' below the line with 'numbers
can', you could use the relative line numbers to know that 2k would put the
cursor on the line you want, where you'd hit dd, then you'd have this
situation (the deleted line is now in the buffer:
1 This is just
0 example to
1 demonstrate
2 how relative
3 numbers can
4 be useful
Then you can do 3j to move to the 'numbers can' line, and hit p. So
relative numbers are a nice way to move quickly to lines you can see. Also,
just for completeness, you can use relative numbers in a similar way on the
command line::-2m+3 (although I know this isn't what you're after). You can
even set both relative number and set number at the same time, in which case
it's like in the example above, only you have the absolute line number
displayed on the current line instead of a zero.

What does "hyphenation vector" mean?

The Hyphen library seems to be a very popular and free way to have hyphenation in your app.
What does hyphenation vector mean?
I am running the example attached to the library source code.
Example output:
hibernate // input word
030412000 // output hyphenation vector
hi=ber=nate // hyphen points
- hi=bernate
- hiber=nate
Odd numbers in the vector indicate hyphenation points. But what do all of those values mean?
László Németh describes the algorithm in OpenOffice's documentation in full detail.
The library uses the algorithm developed by Frank M. Liang ("Word Hy-phen-a-tion by Com-pu-ter"): all letters in digrams, trigrams, and longer patterns are assigned numerical values to indicate it's a 'usual' place (an odd number) or an 'unusual' place (an even number) for a hyphen to occur. The higher the number is, the greater importance -- a pattern will almost never be broken on a larger even number, and almost always on a larger odd number. The number sequences are statistically determined on a corpus of pre-hyphenated words.
Note that the numbers are for positions between two characters. A better notation would have been
h i b e r n a t e
0 3 0 4 1 2 0 0 (0)
(where the last 0 is obsolete).

Read text file line by line but only specific columns

How do we read a specific file line by line while skipping some columns in it?
For example, I have a text file which has data, sorted out in 5 columns, but I need to read only two columns out of it, they can be first two or any other random combination (I mean, need a solution which would work with any combination of columns like first and third only).
Code something like this
open(1, file=data_file)
read (1,*) ! to skip first line, with metadata
lmax = 0
do while (.true.)
! read column 1 and 3 here, either write
! that to an array or just loop through each row
end do
99 continue
close (1)
Any explanation or example would help a lot.
High Performance Mark's answer gives the essentials of simple selective column reading: one still reads the column but transfers it to a then-ignored variable.
To extend that answer, then, consider that we want to read the second and fourth columns of a five-column line:
read(*,*) junk, x, junk, y
The first value is transferred into junk, then the second into x, then the third (replacing the one just acquired a moment ago) into junk and finally the fourth into y. The fifth is ignored because we've run out of input items and the transfer statement terminates (and the next read in a loop will go to the next record).
Of course, this is fine when we know it's those columns we want. Let's generalize to when we don't know in advance:
integer col1, col2 ! The columns we require, defined somehow (assume col1<col2)
<type>, dimension(nrows) :: x, y, junk(3) ! For the number of rows
integer i
do i=1,nrows
read(*,*) junk(:col1-1), x(i), junk(:col2-col1-1), y(i)
end do
Here, we transfer a number of values (which may be zero) up to just before the first column of interest, then the value of interest. After that, more to-be-ignored values (possibly zero), then the final value of interest. The rest of the row is skipped.
This is still very basic and avoids many potential complications in requirements. To some extent, it's such a basic approach one may as well just consider:
do i=1,nrows
read(*,*) allofthem(:5)
x(i) = allofthem(col1)
y(i) = allofthem(col2)
end do
(where that variable is a row-by-row temporary) but variety and options are good.
This is very easy. You simply read 5 variables from each line and ignore the ones you have no further use for. Something like
do i = 1, 100
read(*,*) a(i), b, c(i), d, e
end do
This will overwrite the values in b, d, and e at every iteration.
Incidentally, your line
99 continue
is redundant; it's not used as the closing line for the do loop and you're not branching to it from anywhere else. If you are branching to it from unseen code you could just attach the label 99 to the next line and delete the continue statement. Generally, continue is redundant in modern Fortran; specifically it seems redundant in your code.

Find the optimum sequence of keyboard hits to produce the most repeated characters

You are provided with four possible operations that can be done on the editor (each operation requires one keyboard hit).
A
Ctrl+A
Ctrl+C
Ctrl+V
Now you can hit the keyboard N times and you need to find the maximum number of A's that can be printed. Also print the sequence of keyboard hits.
Googled for the answer:
http://podlipensky.com/post/2011/02/07/Sundays-puzzle.aspx
Off of the top of my head...
A single A followed by iterations of Ctrl+A Ctrl+C Ctrl+V Ctrl+V, where each iteration doubles the size of the text, starting at 1 character, then 2, then 4, then 8, etc.
So given N keystrokes, you can produce at most 2(N-1)/4 characters.
I suspect that this is not the optimal (minimal) number of keystrokes, though. (I have not yet read the answer posted by #David.)