AIX: remove the last symbols (CRLF) from a file - aix

There is a large file where the last symbols are \r\n. I need to remove them. It seems to be equivalent to removing the last line(?).
UPD: no, it's not: a file have only one line, which ends with \r\n.
I know two ways, but both don't work for AIX:
sed 's/\r\n$//' file # I don't why it doesn't work
head -c-2 # head doesn't work with negative numbers
Is there any solution for AIX? A lot of large files must be processed, so performance is important.

Usually, if you need to edit a file via a script in place, I use ed due to historical reasons. For example:
ed - /tmp/foo.txt <<EOF
g/^$/d
w
q
EOF
ed is more than a bit cantankerous. Note also that you did not really remove the empty lines at the bottom of the file but rather all of the empty lines. With ed and some practice you can probably achieve deleting only the empty lines at the bottom of the file. e.g. go to the bottom of the file, search up for a non-empty line, then move down a line and delete from that point to the end of the file. ed command scripts act (pretty much) as you would expect.
Also, if they really do have \r\n, then those are not going to be considered empty lines but rather lines with a control-M (\r) in them. You may need to adjust your pattern if that is the case.

My answer https://stackoverflow.com/a/46083912/3220113 to the duplicate question should work here too. Another solution is using
awk ' (NR>1) { print s }
{s=$0}
END { printf("%s",substr($2, 1, length($2)-1) ) }
' inputfile

Related

Recursively search directory for occurrences of each string from one column of a .csv file

I have a CSV file--let's call it search.csv--with three columns. For each row, the first column contains a different string. As an example (punctuation of the strings is intentional):
Col 1,Col 2,Col 3
string1,valueA,stringAlpha
string 2,valueB,stringBeta
string'3,valueC,stringGamma
I also have a set of directories contained within one overarching parent directory, each of which have a subdirectory we'll call source, such that the path to source would look like this: ~/parentDirectory/directoryA/source
What I would like to do is search the source subdirectories for any occurrences--in any file--of each of the strings in Col 1 of search.csv. Some of these strings will need to be manually edited, while others can be categorically replaced. I run the following command . . .
awk -F "," '{print $1}' search.csv | xargs -I# grep -Frli # ~/parentDirectory/*/source/*
What I would want is a list of files that match the criteria described above.
My awk call gets a few hits, followed by xargs: unterminated quote. There are some single quotes in some of the strings in the first column that I suspect may be the problem. The larger issue, however, is that when I did a sanity check on the results I got (which seemed far too few to be right), there was a vast discrepancy. I ran the following:
ag -l "searchTerm" ~/parentDirectory
Where searchTerm is a substring of many (but not all) of the strings in the first column of search.csv. In contrast to my above awk-based approach which returned 11 files before throwing an error, ag found 154 files containing that particular substring.
Additionally, my current approach is too low-resolution even if it didn't error out, in that it wouldn't distinguish between which results are for which strings, which would be key to selectively auto-replacing certain strings. Am I mistaken in thinking this should be doable entirely in awk? Any advice would be much appreciated.

Find longest file in the project IntelliJ IDEA

Hello I want to know any trick or shortcut by which one can know which is the longest file in project.
i.e which file has the longest lines of code.Is there any shortcut or plugin available?
I believe the OP was asking about the length of file, not the length of single line. You can try with such iteration:
(.*\n){100,}
(.*\n){1000,}
(.*\n){10000,}
Although this is kind of hacky it still works.
You can search your whole project using the regex repetition pattern. Just right-click your project folder in the project structure view and choose "Find in path...". Be sure to check "Regex" in the search window that appears.
So you'll start out and match any line with any length in your project
^.$
(If you're not familiar with regex: ^ and $ are used to denote the beginning and end of a line and . matches any character)
Then you gradually increase the number of matched repetitions
^.{1,}$
^.{10,}$
^.{100,}$
^.{1000,}$
(You use {start, end} to indicate to interval of repetitions. If you leave end blank it will match anything from start)
Using this you will soon be left with the longest line(s) in your project.
As I said it's kinda hacky but it's also quick and works if you don't have to automate the task.
Hope this helps you!

awk: How can I use awk to determine if lines in one file of my choosing (lines 8-12, for example) are also present anywhere in another file

I have two files, baseline.txt and result.txt. I need to be able to find if lines in baseline.txt are also in results.txt. For example, if lines 8-12, is in results.txt. I need to use awk. Thanks.
Assuming the files are sorted, it looks like comm is more of what you're looking for if you want lines that are present in both files:
comm -12 baseline.txt results.txt
The -12 argument suppresses lines that are unique to baseline.txt and results.txt, respectively, leaving you with only lines that are common to both files ("suppress lines unique to file 1, suppress lines unique to file 2").
If you are dead set on using awk, then perhaps this question can help you.

jEdit in hard word-wrap mode: insert comment character automatically?

Probably quite a niche question, but I believe in the power of a big community: Is it possible to set up jEdit in way, that it automatically inserts a comment character (//, #, ... depending on the edit mode) at the beginning of a new line, if the line before the wrap was a comment?
Sample:
# This is a comment spanning multiple lines. If I continue to type here, it
# wraps around automatically, but I have to manually add a `#` to each line.
If I continue to type after the . the third line should start with the # automatically. I searched in the plugin repository but could not find anything related.
Background: jEdit has the concepct of soft and hard wrap. While soft wrap only breaks lines visually at a character limit, it does not insert line breaks in the file. Hard wrap on the other hand inserts \n into the file at the desired character count.
This is not exactly what you want: I use the macros Enter_with_Prefix.bsh to automatically insert the prefix (e.g., #, //) at the beginning of the new line.
Description copied from Enter_with_Prefix.bsh:
Enter_with_Prefix.bsh - a Beanshell macro for jEdit
that starts a new line continuing any recognized
sequence that started the previous. For example,
if the previous line beings with "1." the next will
be prefixed with "2.". It supports alpha lists (a., b., etc...),
bullet lists (+, =, *, etc..), comments, Javadocs,
Java import statements, e-mail replies (>, |, :),
and is easy to extend with new sequence types. Suggested
shortcut for this macro is S+ENTER (SHIFT+ENTER).

Is there a tool to clean the output of the script(1) tool?

script(1) is a tool for keeping a record of an interactive terminal session; by default it writes to the file transcript. My problem is that I use ksh93, which has readline features, and so the transcript is mucked up with all sorts of terminal escape sequences and it can be very difficult to reconstruct the command that was actually executed. Not to mention the stray ^M's and the like.
I'm looking for a tool that will read a transcript file written by script, remove all the junk, and reconstruct what the shell thought it was executing, so I have something that shows $PS1 and the commands actually executed. Failing that, I'm looking for suggestions on how to write such a tool, ideally using knowledge from the terminfo database, or failing that, just using ANSI escape sequences.
A cheat that looks in shell history, as long as it really really works, would also be acceptable.
Doesn't cat/more work by default for browsing the transcript? Do you intend to create a script out of the commands actually executed (which in my experience can be dangerous)?
Anyway, 3 years without an answer, so I will give it a shot with an incomplete solution. If your are only interested in the commands actually typed, remove the non-printable characters, then replace PS1' with something readable and unique, and grep for that unique string. Like this:
$ sed -i 's/[^[:print:]]//g' transcript
$ sed 's/]0;cartman#southpark: ~cartman#southpark:~/CARTMAN/g' transcript | grep CARTMAN
Explanation: After first sed, PS1' can be taken from one of the first few lines of the transcript file, as is -- PS1' is different from PS1 -- and can be modified with a unique readable string ("CARTMAN" here). Note that the dollar sign at the end of the prompt was left out intentionally.
In the few examples that I tried, this didn't solve everything but took care of most issues.
This is essentially the same question asked recently in Can I programmatically “burn in” ANSI control codes to a file using unix utils? -- removing all nonprinting characters will not fix
embedded escape sequences
backspace/overstriking for underlining
use of carriage-returns for overstriking