Is there any limitation in giving file name in Unix? - variables

We are using crontab to schedule jobs and it was not picking the files for processing that have [ or ] or ¿ . Is there any limitation in giving file name or these characters means something in UNIX? Is there any other variables like these we shouldnt use in file name?? Thanks in advance.

Following are general rules for both Linux, and Unix (including *BSD) like systems:
All file names are case sensitive. So filename vivek.txt Vivek.txt VIVEK.txt all are three different files.
You can use upper and lowercase letters, numbers, "." (dot), and "_" (underscore) symbols.
You can use other special characters such as blank space, but they are hard to use and it is better to avoid them.
In short, filenames may contain any character except / (root directory), which is reserved as the separator between files and directories in a pathname. You cannot use the null character.
No need to use . (dot) in a filename. Some time dot improves readability of filenames.
And you can use dot based filename extension to identify file. For example:
.sh = Shell file
.tar.gz = Compressed archive
Most modern Linux and UNIX limit filename to 255 characters (255 bytes). However, some older version of UNIX system limits filenames to 14 characters only.
A filename must be unique inside its directory. For example, inside /home/vivek directory you cannot create a demo.txt file and demo.txt directory name. However, other directory may have files with the same names. For example, you can create demo.txt directory in /tmp.
Linux / UNIX: Reserved Characters And Words
Avoid using the following characters from appearing in file names:
/
>
<
|
:
&
Please note that Linux and UNIX allows white spaces, <, >, |, \, :, (, ), &, ;, as well as wildcards such as ? and *, to be quoted or escaped using \ symbol.
It will be good if you can avoid white spaces in your filename. It will make your scripting a lot more easier.
I got the answer from this link. I am just pasting it here so that this info will be available even if that website goes down.

The only characters that are actually illegal in *nix filenames are / (reserved as the directory separator) and NUL (because it's the C string terminator). Everything else is fair game, although various utilities may fail on certain characters - typically characters that have special meaning to the shell. These will need quoting or escaping to be handled correctly.

Related

Base64 Encoded String for Filename

I cant think of an OS (Linux, Windows, Unix) where this would cause an issue but maybe someone here can tell me if this approach is undesirable.
I would like to use a base64 encoded string as a filename. Something like gH9JZDP3+UEXeZz3+ng7Lw==. Is this likely to cause issues anywhere?
Edit: I will likely keep this to a max of 24 characters
Edit: It looks like I have a character that will cause issues. My function that generated my string is providing stings like: J2db3/pULejEdNiB+wZRow==
You will notice that this has a / which is going to cause issues.
According to this site the / is a valid base64 character so I will not be able to use a base64 encoded string for a filename.
No. You can not use a base64 encoded string for a filename. This is because the / character is valid for base64 strings which will cause issues with file systems.
https://base64.guru/learn/base64-characters
Alternatives:
You could use base64 and then replace unwanted characters but a better option would be to hex encode your original string using a function like bin2hex().
The official RFC 4648 states:
An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead. The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.
I also found on the serverfault stackexchange I found this:
There is no such thing as a "Unix" filesystem. Nor a "Windows" filesystem come to that. Do you mean NTFS, FAT16, FAT32, ext2, ext3, ext4, etc. Each have their own limitations on valid characters in names.
Also, your question title and question refer to two totally different concepts? Do you want to know about the subset of legal characters, or do you want to know what wildcard characters can be used in both systems?
http://en.wikipedia.org/wiki/Ext3 states "all bytes except NULL and '/'" are allowed in filenames.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx describes the generic case for valid filenames "regardless of the filesystem". In particular, the following characters are reserved < > : " / \ | ? *
Windows also places restrictions on not using device names for files: CON, PRN, AUX, NUL, COM1, COM2, COM3, etc.
Most commands in Windows and Unix based operating systems accept * as a wildcard. Windows accepts % as a single char wildcards, whereas shells for Unix systems use ? as single char wildcard.
And this other one:
Base64 only contains A–Z, a–z, 0–9, +, / and =. So the list of characters not to be used is: all possible characters minus the ones mentioned above.
For special purposes . and _ are possible, too.
Which means that instead of the standard / base64 character, you should use _ or .; both on UNIX and Windows.
Many programming languages allow you to replace all / with _ or ., as it's only a single character and can be accomplished with a simple loop.
In Windows, you should be fine as long if you conform to the naming conventions of Windows:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions.
As far a I know, any base64 encoded string does not contain any of the reserves characters.
The thing that is probably going to be a problem is the lengte of the file name.

removing unconventional field separators (^#^#^#) in a text file [duplicate]

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^# symbols, interleaved in normal text. How can I:
Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?
I’d use tr:
tr < file-with-nulls -d '\000' > file-without-nulls
If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.
Use the following sed command for removing the null characters in a file.
sed -i 's/\x0//g' null.txt
this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.
A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.
I discovered the following, which prints out which lines, if any, have null characters:
perl -ne '/\000/ and print;' file-with-nulls
Also, an octal dump can tell you if there are nulls:
od file-with-nulls | grep ' 000'
If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.
tr -d '\n\000' <infile | tr '\r' '\n' >outfile
Here is example how to remove NULL characters using ex (in-place):
ex -s +"%s/\%x00//g" -cwq nulls.txt
and for multiple files:
ex -s +'bufdo!%s/\%x00//g' -cxa *.txt
For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).
Useful for scripting since sed and its -i parameter is a non-standard BSD extension.
See also: How to check if the file is a binary file and read all the files which are not?
I used:
recode UTF-16..UTF-8 <filename>
to get rid of zeroes in file.
I faced the same error with:
import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')
I solved the problem by changing the encoding to utf-16
f=cd.open(filePath,'r','utf-16')
Remove trailing null character at the end of a PDF file using PHP, . This is independent of OS
This script uses PHP to remove a trailing NULL value at the end of a binary file, solving a crashing issue that was triggered by the NULL value. You can edit this script to remove all NULL characters, but seeing it done once will help you understand how this works.
Backstory
We were receiving PDF's from a 3rd party that we needed to upload to our system using a PDF library. In the files being sent to us, there was a null value that was sometimes being appended to the PDF file. When our system processed these files, files that had the trailing NULL value caused the system to crash.
Originally we were using sed but sed behaves differently on Macs and Linux machines. We needed a platform independent method to extract the trailing null value. Php was the best option. Also, it was a PHP application so it made sense :)
This script performs the following operation:
Take the binary file, convert it to HEX (binary files don't like exploding by new lines or carriage returns), explode the string using carriage return as the delimiter, pop the last member of the array if the value is null, implode the array using carriage return, process the file.
//In this case we are getting the file as a string from another application.
// We use this line to get a sample bad file.
$fd = file_get_contents($filename);
//We trim leading and tailing whitespace and convert the string into hex
$bin2hex = trim(bin2hex($fd));
//We create an array using carriage return as the delminiter
$bin2hex_ex = explode('0d0a', $bin2hex);
//look at the last element. if the last element is equal to 00 we pop it off
$end = end($bin2hex_ex);
if($end === '00') {
array_pop($bin2hex_ex);
}
//we implode the array using carriage return as the glue
$bin2hex = implode('0d0a', $bin2hex_ex);
//the new string no longer has the null character at the EOF
$fd = hex2bin($bin2hex);

What is the meaning of the file names flanked by the '#' sign and how can I remove them?

When I do the 'ls' command in the terminal on my Raspberry Pi 2, I see different types of names of files, some like "#example.cpp#", as well as others like "homework1.cpp~".
What do these two file types mean, and how can I get rid of them? Simply using the 'rm' command doesn't seem to be working for me. Thanks!
Some applications will create a copy of a file and use special characters when creating the filename for the copy. For instance some text editors will make a copy of a file you are starting to edit by using the same name and adding a tilde character (~) to the end of the file. That way you will have a backup of the file that you are about to edit.
Another reason would be if an application is processing the file into a temporary file with the temporary file then being used for the next step. For example perhaps the C/C++ compiler is reading the file homework1.cpp with the C Preprocessor to generate the temporary file #homework1.cpp# which is then compiled by the compiler to generate the object code file.
I am not familiar with raspberry pi so am not sure as to what may be creating the filenames with the pound sign (#) on the front and back. Perhaps it is the C++ compiler. I am pretty sure the files with the tilde character on appended to the end of the file name is a back file from vi or vim containing a copy of the file at the time it was last opened with the text editor.
One thing that you could do is to look in those files to see what is there using a Linux command or a text editor. If you use a text editor I would copy the file to another folder as a back up and then look at it there.
Edit: Someone just posted and then deleted an answer which also mentioned about how to remove these files.
What I read was that the rm command is used however for some kinds of special characters you will need to use quotes around the name and you may also need to use an escape to escape certain special characters.
The command shell reads the command line you type in and makes changes to the text before passing it on to the command you type in. So if the filename has a space in it, say jj Johny then when you remove the file you have to specify rm "jj Johny" since spaces are used by the command processor to separate out arguments.
The other poster mentioned that you had to escape out the pound sign (#) using the back slash character in order to prevent it from being modified by the command processor.

Renaming file via UNC path

I need to have my VB.NET program rename a file over the network.
Microsoft says that My.Computer.FileSystem.RenameFile does not work if the file path starts with two backslashes ("\\"). So, what other way is there of doing this? I just need to rename a file in the domain, for instance:
rename("\\domain\1\exemple.txt", "\\domain\1\exemple2.txt")
The second parameter for rename should be just the file name eg:
My.Computer.FileSystem.RenameFile("C:\Test.txt", "SecondTest.txt")
So try changing your code to this:
My.Computer.FileSystem.RenameFile(#"\\domain\1\exemple.txt", "exemple2.txt")
Also beware of escaping because \ is an escape character, so add a # before any string that contains \. This will cause it to ignore escaping and therefore will treat \ as a normal character

handling strings with \n in plain text e-mail

I have a column in my database that contains a string like this:
"Warning set for 7 days.\nCritical Notice - Last Time Machine backup was 118 days ago at 2012-11-16 20:40:52\nLast Time Machine Destination was FreeAgent GoFlex Drive\n\nDefined Destinations:\nDestination Name: FreeAgent GoFlex Drive\nBackup Path: Not Specified\nLatest Backup: 2012-11-17"
I am displaying this data in an e-mail to users. I have be able to easily format the field in my html e-mails perfectly by doing the following:
simple_format(#servicedata.service_exit_details.gsub('\n', '<br>'))
The above code replaces the "\n" with "<br>" tags and simple_format handles the rest.
My issues arises with how to format it properly in the plain text template. Initially I thought I could just call the column, seeing as it has "\n" I assumed the plain text would interpret and all would be well. However this simply spits out the string with "\n" intact just as displayed above rather than created line breaks as desired.
In an attempt to find a way to parse the string so the line breaks are acknowledged. I have tried:
#servicedata.service_exit_details.gsub('\n', '"\r\n"')
#servicedata.service_exit_details.gsub('\n', '\r\n')
raw #servicedata.service_exit_details
markdown(#servicedata.service_exit_details, autolinks: false) # with all the necessary markdown setup
simple_format(#servicedata.service_exit_details.html_safe)
none of which worked.
Can anyone tell me what I'm doing wrong or how I can make this work?
What I want is for the plain text to acknowledge the line breaks and format the string as follows:
Warning set for 7 days.
Critical Notice - Last Time Machine backup was 118 days ago at 2012-11-16 20:40:52
Last Time Machine Destination was FreeAgent GoFlex Drive
Defined Destinations:
Destination Name: FreeAgent GoFlex Drive
Backup Path: Not Specified\nLatest Backup: 2012-11-17"
I see.
You need to differentiate a literal backslash followed by a letter n as a sequence of two characters, and a LF character (a.k.a. newline) that is usually represented as \n.
You also need to distinguish two different kinds of quoting you're using in Ruby: singles and doubles. Single quotes are literal: the only thing that is interpreted in single quotes specially is the sequence \', to escape a single quote, and the sequence \\, which produces a single backslash. Thus, '\n' is a two-character string of a backslash and a letter n.
Double quotes allow for all kinds of weird things in it: you can use interpolation with #{}, and you can insert special characters by escape sequences: so "\n" is a string containing the LF control character.
Now, in your database you seem to have the former (backslash and n), as hinted by two pieces of evidence: the fact that you're seeing literal backslash and n when you print it, and the fact that gsub finds a '\n'. What you need to do is replace the useless backslash-and-n with the actual line separator characters.
#servicedata.service_exit_details.gsub('\n', "\r\n")