Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I installed a new version of os X (10.7 initially and then updated to 10.7.5) - I lost man fgets in Terminal, it no longer exists(not olny fgets, some other too). I'm using xcode 4.6.3, updated all kinds of documentation. In documentation i got only FGETS(3), not fgets!
When i write this code:
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[])
{
FILE *wordFile = fopen ("/tmp/words.txt", "r");
char word[100];
while (fgets(word, 100, wordFile))
{
word[strlen(word) - 1] = '\0'; // strip off the trailing \n
NSLog (#"%s is %lu characters long", word, strlen(word));
}
fclose (wordFile);
return 0;
}
i got output:
Joe-Bob "Handyman" Brown
Jacksonville "Sly" Murphy
Shinara Bain
George "Guitar" Book is 84 characters long
Why?
My money on #Martin R.
fgets() did not find a converted \n in file "/tmp/words.txt" even though it was open in text mode "t". The editor used to create/edit the file is ending the lines with \r.
See #Michael Haren in Do line endings differ between Windows and Linux?
BTW: word[strlen(word) - 1] = '\0'; is potentially a problem, though not in this case, as strlen(word) may be 0. (#wildplasser)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a long text file with the following format:
>foo_bar
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGGCAGCCGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGTATTATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
>bar_foo
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGTACTACGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGCTGCATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
I.e., there is a header line which begins with a ">", and then an arbitrary number of lines with no more than 100 letters in them. I would like to find the positions within the non-header lines that match either "GCAGC" or "GCTGC". Overlapping match sites would both get recorded individually.
An example output would be a three column text file where the first column contained the header line for that block of text minus the ">", the second column contained the start position of a pattern match (i.e., the number of characters into the text block, excluding line-break characters), and the third column recorded which of the two patterns were matched. E.g.:
foo_bar 109 GCAGC
bar_foo 58289 GCTGC
Not sure how complex this task is, and in particular whether there is a memory-efficient way to perform this operation in a streaming fashion. awk or sed seem like two utilities which might work, but the required command is beyond my limited understanding of the programs.
A tiny tweak on yesterdays answer:
sub(/^>/,"") {
hdr = $0
next
}
{
while ( match($0,/GC[AT]GC/) ) {
print hdr, RSTART, substr($0,RSTART,RLENGTH)
$0 = substr($0,1,RSTART-1) " " substr($0,RSTART+1)
}
}
Please get the book Effective AWK Programming, 5th Edition, by Arnold Robbins to learn the basics of awk.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
i have a csv file as below only one column(cust_code) with quotation marks and each row also has quotations
“CUST_CODE”
“CST001001”
“CST000235”
“CST010231”
“CST010235”
“CST010231”
“CST010235”
“CST010231”
“CST040015”
i am tried to read this file in pandas and i'm getting error as
'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
also, i tried by passing encoding type as ascii and utf-8
but nothing worked
Try passing encoding='cp1252' instead. Make sure to swap out 'Documents\Book1.csv' with whatever your filepath to the file is below:
df = pd.read_csv('Documents\Book1.csv', encoding='cp1252')
df
“CUST_CODE”
0 “CST001001”
1 “CST000235”
2 “CST010231”
3 “CST010235”
4 “CST010231”
5 “CST010235”
6 “CST010231”
7 “CST040015”
Here is a wikipedia with more info about that encoding type: https://en.wikipedia.org/wiki/Windows-1252 . A quote from the Wikipedia article:
"...common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read."
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a very big text file (1 GB) and I see that there are few places where the http url field has a space there.
For example in the lines below we have space between "brad pitt" and "[30 wet=]". They should be changed to "bradpitt" and "[30wet=]" but they can occur in any url or trim_url. I am currently finding these places using my program and then manually fixing it vim. Is there a way using awk/sed we can do it?
0.0 q:hello url:http://sapient.com/bapper/30/brad pitt/C345/surf trim_url:http://sapient.com/bapper/30/brad pitt/C345 rating:good
0.0 q:hello url:http://sick.com/bright/[30 wet=]/sound trim_url:http://sick.com/bright/[30 wet=]rating:good
What I tried to do was sed:
sed -i -e 's/*http*[:space:]*/*http*/g' test.txt
Using perl and a proper module to URI encode the URL:
perl -MURI::Escape -pe 's!(https?://)(.*)!$1 . uri_escape($2)!e' file
You even can replace the file in place with -i switch (just like sed) perl -MURI::Escape -i -pe [...]
Output
0.0 q:hello url:http://sapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%2Fsurf%20trim_url%3Ahttp%3A%2F%2Fsapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%20rating%3Agood
0.0 q:hello url:http://sick.com%2Fbright%2F%5B30%20wet%3D%5D%2Fsound%20trim_url%3Ahttp%3A%2F%2Fsick.com%2Fbright%2F%5B30%20wet%3D%5Drating%3Agood
URI::Escape - Percent-encode and percent-decode unsafe characters
Note
As msanford said in comments, spaces in a URL are meaningful. You can't decide to cut them without breaking the link in something that just become not reachable
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Could somebody give me an example(e.g. input->output) of what this block does? Explanation is also appreciated.
From the official documentation (which, if your GNU Radio build is intact, you can also access from the documentation tab of your block properties in GRC):
Convert a stream of packed bytes or shorts to stream of unpacked bytes or shorts.
input: stream of unsigned char; output: stream of unsigned char
This is the inverse of gr::blocks::unpacked_to_packed_XX.
The bits in the bytes or shorts input stream are grouped into chunks of bits_per_chunk bits and each resulting chunk is written right- justified to the output stream of bytes or shorts. All b or 16 bits of the each input bytes or short are processed. The right thing is done if bits_per_chunk is not a power of two.
The combination of gr::blocks::packed_to_unpacked_XX_ followed by gr_chunks_to_symbols_Xf or gr_chunks_to_symbols_Xc handles the general case of mapping from a stream of bytes or shorts into arbitrary float or complex symbols.
so, you get a byte in, consisting of 8 bits, and you produce bytes, each of one with bits_per_chunk bits set according to the input. Example (let bits_per_chunk=1, MSB first):
in 0b11110000
out 0b00000001 0b00000001 0b00000001 0b00000001 0b00000000 0b00000000 0b00000000 0b00000000
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create an input mask which looks like this C-HG__.
But because C represent option character or space in masking (VB.net). It wouldn't let me.
Please assist.
Try using the escape element: \
MSDN has a fairly nice write-up. Here's an excerpt:
\
Escape. Escapes a mask character, turning it into a literal. "\\" is the escape sequence for a backslash.
Possible duplicate with this question and/or this question.