Generate file name from SHA hash - filenames

I want to use the output of a SHA hash to generate a filename. Any recommended way to do that? I tried Base64 encoding, but for some input that results in the filename containing forward slashes. Obviously I would prefer a method whose output will never contain characters reserved by file systems. Converting each byte to a two-digit hex number would work, but something that produces shorter output would be preferable.

You could use Base32
It only uses alpha and numeric characters.

You can use this
function getfilename(){
mt_srand((double)microtime()*10000);
$token=mt_rand(1, mt_getrandmax());
$uid=uniqid(md5($token),true);
if($uid!=false && $uid!='' && $uid !=NULL){ return $filename =sha1($uid);}
}
//create file name
$filename=getfilename();
$filename = substr($filename, 0, 10);
The above code uses the current system time to generate the File Name and also uses mt_rand and MD5 so as to create a unique file name every time the code is run. The final filename is 10 characters, and you can adjust it to whatever number of characters you want.

Related

removing unconventional field separators (^#^#^#) in a text file [duplicate]

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^# symbols, interleaved in normal text. How can I:
Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?
I’d use tr:
tr < file-with-nulls -d '\000' > file-without-nulls
If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.
Use the following sed command for removing the null characters in a file.
sed -i 's/\x0//g' null.txt
this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.
A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.
I discovered the following, which prints out which lines, if any, have null characters:
perl -ne '/\000/ and print;' file-with-nulls
Also, an octal dump can tell you if there are nulls:
od file-with-nulls | grep ' 000'
If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.
tr -d '\n\000' <infile | tr '\r' '\n' >outfile
Here is example how to remove NULL characters using ex (in-place):
ex -s +"%s/\%x00//g" -cwq nulls.txt
and for multiple files:
ex -s +'bufdo!%s/\%x00//g' -cxa *.txt
For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).
Useful for scripting since sed and its -i parameter is a non-standard BSD extension.
See also: How to check if the file is a binary file and read all the files which are not?
I used:
recode UTF-16..UTF-8 <filename>
to get rid of zeroes in file.
I faced the same error with:
import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')
I solved the problem by changing the encoding to utf-16
f=cd.open(filePath,'r','utf-16')
Remove trailing null character at the end of a PDF file using PHP, . This is independent of OS
This script uses PHP to remove a trailing NULL value at the end of a binary file, solving a crashing issue that was triggered by the NULL value. You can edit this script to remove all NULL characters, but seeing it done once will help you understand how this works.
Backstory
We were receiving PDF's from a 3rd party that we needed to upload to our system using a PDF library. In the files being sent to us, there was a null value that was sometimes being appended to the PDF file. When our system processed these files, files that had the trailing NULL value caused the system to crash.
Originally we were using sed but sed behaves differently on Macs and Linux machines. We needed a platform independent method to extract the trailing null value. Php was the best option. Also, it was a PHP application so it made sense :)
This script performs the following operation:
Take the binary file, convert it to HEX (binary files don't like exploding by new lines or carriage returns), explode the string using carriage return as the delimiter, pop the last member of the array if the value is null, implode the array using carriage return, process the file.
//In this case we are getting the file as a string from another application.
// We use this line to get a sample bad file.
$fd = file_get_contents($filename);
//We trim leading and tailing whitespace and convert the string into hex
$bin2hex = trim(bin2hex($fd));
//We create an array using carriage return as the delminiter
$bin2hex_ex = explode('0d0a', $bin2hex);
//look at the last element. if the last element is equal to 00 we pop it off
$end = end($bin2hex_ex);
if($end === '00') {
array_pop($bin2hex_ex);
}
//we implode the array using carriage return as the glue
$bin2hex = implode('0d0a', $bin2hex_ex);
//the new string no longer has the null character at the EOF
$fd = hex2bin($bin2hex);

How to get text bytes used by a string in Hive?

I have some data in Hive 1.2.1 table. I have to get raw bytes of a specific column. The column data is html raw in multiple languages. In order to get length of characters, I can use simple query like below
select baseurl, LENGTH(content) from clss limit 30;
Above query is ok for characters length the problem is for text other is English, their value is incorrect. For a Character in Arabic, it is saved as unicoded that's why character length is changed. Some characters are of two bytes and some are single byte.
Is there any builtin function to know bytes of text instead of characters ?
Function character_length(string str) was added in Jira HIVE-15979 And it says Fix versions 2.3.0. If you cannot upgrade your Hive (and this is quite risky), then try to download UDF source codes and build it, then add jar and create temporary function.
Download code: GenericUDFCharacterLength.java

Null char returning from reading a file in Common Lisp

I’m reading files and storing them as a string using this function:
(defun file-to-str (path)
(with-open-file (stream path) :external-format 'utf-8
(let ((data (make-string (file-length stream))))
(read-sequence data stream)
data)))
If the file has only ASCII characters, I get the content of the files as expected; but if there are characters beyond 127, I get a null character (^#), at the end of the string, for each such character beyond 127. So, after $ echo "~a^?" > ~/teste I get
CL-USER> (file-to-string "~/teste")
"~a^?
"
; but after echo "aaa§§§" > ~/teste , the REPL gives me
CL-USER> (file-to-string "~/teste")
"aaa§§§
^#^#^#"
and so forth. How can I fix this? I’m using SBCL 1.4.0 in an utf-8 locale.
First of all, your keyword argument :external-format is misplaced and has no effect. It should be inside the parenteses with stream and path. However, this has no effect to the end result, as UTF-8 is the default encoding.
The problem here is that in UTF-8 encoding, it takes a different number of bytes to encode different characters. ASCII characters all encode into single bytes, but other characters take 2-4 bytes. You are now allocating, in your string, data for every byte of the input file, not every character in it. The unused characters end up unchanged; make-string initializes them as ^#.
The (read-sequence) function returns the index of the first element not changed by the function. You are currently just discarding this information, but you should use it to resize your buffer after you know how many elements have been used:
(defun file-to-str (path)
(with-open-file (stream path :external-format :utf-8)
(let* ((data (make-string (file-length stream)))
(used (read-sequence data stream)))
(subseq data 0 used))))
This is safe, as length of the file is always greater or equal to the number of UTF-8 characters encoded in it. However, it is not terribly efficient, as it allocates an unnecessarily large buffer, and finally copies the whole output into a new string for returning the data.
While this is fine for a learning experiment, for real-world use cases I recommend the Alexandria utility library that has a ready-made function for this:
* (ql:quickload "alexandria")
To load "alexandria":
Load 1 ASDF system:
alexandria
; Loading "alexandria"
* (alexandria:read-file-into-string "~/teste")
"aaa§§§
"
*

Binary file output for fixed length string

I am trying to write a binary file which also has a string which i want to have as fixed length in vb.net. I tried lset, padleft in debug, the value returned is correct but in the output file, the first character before the string is the fixed length i specified. why does the binary writer write the additional char ?
I found out that if if you don't want or need the length byte you can call Write with a Char [] array instead of a String

How does %NNN$hhn work in a format string?

I am trying out a classic format string vulnerability. I want to know how exactly the following format string works:
"%NNN$hhn" where 'N' is any number.
E.g: printf("%144$hhn",....);
How does it work and how do I use this to overwrite any address I want with arbitrary value?
Thanks and Regards,
Hrishikesh Murali
It's a POSIX extension (not found in C99) which will simply allow you to select which argument from the argument list to use for the source of the data.
With regular printf, each % format specifier grabs the current argument from the list and advances the "pointer" to the next one. That means if you want to print a single value in two different ways, you need something like:
printf ("%c %d\n", chVal, chVal);
By using positional specifiers, you can do this as:
printf ("%1$c %1$d\n", chVal);
because both format strings will use the first argument as their source.
Another example on the wikipedia page is:
printf ("%2$d %2$#x; %1$d %1$#x",16,17);
which will give you the output:
17 0x11; 16 0x10
It basically allows you to disconnect the order of the format specifiers from the provided values, letting you bounce around the argument list in any way you want, using the values over and over again, in any arbitrary order.
Now whether you can use this as an user attack vector, I'm doubtful, since it only adds a means for the programmer to change the source of the data, not where the data is sent to.
It's no less secure than the regular style printf and I can see no real vulnerabilities unless you have the power to change the format string somehow. But, if you could do that, the regular printf would also be wide open to abuse.