Groovy WithWriter method with explicitly given character encoding destroys specific chars - file-io

I read a file encoded in Windows-1250. I read each line into a list, then do some append operations and store the collection into a new file.
Problem. If I write explicitly encoding, then the output file appear to be badly encoded. If I do not set any encoding, output is OK.
enrichedFile.withWriter("windows-1250") { out ->
tempFinalList.each() { line ->
out.println line
}
}
=> bad output
enrichedFile.withWriter { out ->
tempFinalList.each() { line ->
out.println line
}
}
=> OK.
FYI: I use it for czech language with letters as: ěščřžýáíé.

I do not see there is any problem.
def myFile = new File('./Archive/file.txt')
def tempFinalList = []
//Reading from the file with windows charset
myFile.withReader('windows-1250') { out ->
out.eachLine{
tempFinalList << it
}
}
//Appending stuff
tempFinalList << 'a' << 'b'
//Creating a new file
def newFile = new File('./Archive/NewFile.txt')
//Writing to the new file with windows charset
newFile.withWriter('windows-1250'){out ->
tempFinalList.each{out.writeLine it}
}
newFile.eachLine{println it}
Where content of file.txt contains czech characters you have mentioned.
Output from the last line:
ešcržýáíé
ešcržýáíé
ešcržýáíé
ešcržýáíé
ešcržýáíé
a
b

Related

Read part of binary file in Kotlin

I want to read the first 100 bytes from a binary file. I will then print them as hex. What is the most concise way to read them?
My current implementation is:
FileInputStream(File("/path/to/file")).use { stream ->
val buffer = ByteArray(100)
stream.read(buffer)
Log.d("EXAMPLE", buffer.joinToString(separator = " ") { "%02x".format(it) })
}

Test file structure in groovy(Spock)

How to test created and expected file tree in groovy(Spock)?
Right now I'm using Set where I specify paths which I expect to get and collecting actual paths in this way:
Set<String> getCreatedFilePaths(String root) {
Set<String> createFilePaths = new HashSet<>()
new File(root).eachFileRecurse {
createFilePaths << it.absolutePath
}
return createFilePaths
}
But the readability of the test isn't so good.
Is it possible in groovy to write expected paths as a tree, and after that compare with actual
For example, expected:
region:
usa:
new_york.json
california.json
europe:
spain.json
italy.json
And actual will be converted to this kind of tree.
Not sure if you can do it with the built-in recursive methods. There certainly are powerful ones, but this is standard recursion code you can use:
def path = new File("/Users/me/Downloads")
def printTree(File file, Integer level) {
println " " * level + "${file.name}:"
file.eachFile {
println " " * (level + 1) + it.name
}
file.eachDir {
printTree(it, level + 1)
}
}
printTree(path, 1)
That prints the format you describe
You can either build your own parser or use Groovy's built-in JSON parser:
package de.scrum_master.stackoverflow
import groovy.json.JsonParserType
import groovy.json.JsonSlurper
import spock.lang.Specification
class FileRecursionTest extends Specification {
def jsonDirectoryTree = """{
com : {
na : {
tests : [
MyBaseIT.groovy
]
},
twg : {
sample : {
model : [
PrimeNumberCalculatorSpec.groovy
]
}
}
},
de : {
scrum_master : {
stackoverflow : [
AllowedPasswordsTest.groovy,
CarTest.groovy,
FileRecursionTest.groovy,
{
foo : [
LoginIT.groovy,
LoginModule.groovy,
LoginPage.groovy,
LoginValidationPage.groovy,
User.groovy
]
},
LuceneTest.groovy
],
testing : [
GebTestHelper.groovy,
RestartBrowserIT.groovy,
SampleGebIT.groovy
]
}
}
}"""
def "Parse directory tree JSON representation"() {
given:
def jsonSlurper = new JsonSlurper(type: JsonParserType.LAX)
def rootDirectory = jsonSlurper.parseText(jsonDirectoryTree)
expect:
rootDirectory.de.scrum_master.stackoverflow.contains("CarTest.groovy")
rootDirectory.com.twg.sample.model.contains("PrimeNumberCalculatorSpec.groovy")
when:
def fileList = objectGraphToFileList("src/test/groovy", rootDirectory)
fileList.each { println it }
then:
fileList.size() == 14
fileList.contains("src/test/groovy/de/scrum_master/stackoverflow/CarTest.groovy")
fileList.contains("src/test/groovy/com/twg/sample/model/PrimeNumberCalculatorSpec.groovy")
}
List<File> objectGraphToFileList(String directoryPath, Object directoryContent) {
List<File> files = []
directoryContent.each {
switch (it) {
case String:
files << directoryPath + "/" + it
break
case Map:
files += objectGraphToFileList(directoryPath, it)
break
case Map.Entry:
files += objectGraphToFileList(directoryPath + "/" + (it as Map.Entry).key, (it as Map.Entry).value)
break
default:
throw new IllegalArgumentException("unexpected directory content value $it")
}
}
files
}
}
Please note:
I used new JsonSlurper(type: JsonParserType.LAX) in order to avoid having to quote each single String in the JSON structure. If your file names contain spaces or other special characters, you will have to use something like "my file name", though.
In rootDirectory.de.scrum_master.stackoverflow.contains("CarTest.groovy") you can see how you can nicely interact with the parsed JSON object graph in .property syntax. You might like it or not, need it or not.
Recursive method objectGraphToFileList converts the parsed object graph to a list of files (if you prefer a set, change it, but File.eachFileRecurse(..) should not yield any duplicates, so the set is not needed.
If you do not like the parentheses etc. in the JSON, you can still build your own parser.
You might want to add another utility method to create a JSON string like the given one from a validated directory structure, so you have less work when writing similar tests.
Modified Bavo Bruylandt answer to collect file tree paths, and sort it to not care about the order of files.
def "check directory structure"() {
expect:
String created = getCreatedFilePaths(new File("/tmp/region"))
String expected = new File("expected.txt").text
created == expected
}
private String getCreatedFilePaths(File root) {
List paths = new ArrayList()
printTree(root, 0, paths)
return paths.join("\n")
}
private void printTree(File file, Integer level, List paths) {
paths << ("\t" * level + "${file.name}:")
file.listFiles().sort{it.name}.each {
if (it.isFile()) {
paths << ("\t" * (level + 1) + it.name)
}
if (it.isDirectory()) {
collectFileTree(it, level + 1, paths)
}
}
}
And expected files put in the expected.txt file with indent(\t) in this way:
region:
usa:
new_york.json
california.json
europe:
spain.json
italy.json

vb.net stream reader reads from a .accdb and .xml file without an error [duplicate]

How can I test whether a file that I'm opening in C# using FileStream is a "text type" file? I would like my program to open any file that is text based, for example, .txt, .html, etc.
But not open such things as .doc or .pdf or .exe, etc.
In general: there is no way to tell.
A text file stored in UTF-16 will likely look like binary if you open it with an 8-bit encoding. Equally someone could save a text file as a .doc (it is a document).
While you could open the file and look at some of the content all such heuristics will sometimes fail (eg. notepad tries to do this, by careful selection of a few characters notepad will guess wrong and display completely different content).
If you have a specific scenario, rather than being able to open and process anything, you should be able to do much better.
I guess you could just check through the first 1000 (arbitrary number) characters and see if there are unprintable characters, or if they are all ascii in a certain range. If the latter, assume that it is text?
Whatever you do is going to be a guess.
As others have pointed out there is no absolute way to be sure. However, to determine if a file is binary (which can be said to be easier than determining if it is text) some implementations check for consecutive NUL characters. Git apparently just checks the first 8000 chars for a NUL and if it finds one treats the file as binary. See here for more details.
Here is a similar C# solution I wrote that looks for a given number of required consecutive NUL. If IsBinary returns false then it is very likely your file is text based.
public bool IsBinary(string filePath, int requiredConsecutiveNul = 1)
{
const int charsToCheck = 8000;
const char nulChar = '\0';
int nulCount = 0;
using (var streamReader = new StreamReader(filePath))
{
for (var i = 0; i < charsToCheck; i++)
{
if (streamReader.EndOfStream)
return false;
if ((char) streamReader.Read() == nulChar)
{
nulCount++;
if (nulCount >= requiredConsecutiveNul)
return true;
}
else
{
nulCount = 0;
}
}
}
return false;
}
To get the real type of a file, you must check its header, which won't be changed even the extension is modified. You can get the header list here, and use something like this in your code:
using(var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
using(var reader = new BinaryReader(stream))
{
// read the first X bytes of the file
// In this example I want to check if the file is a BMP
// whose header is 424D in hex(2 bytes 6677)
string code = reader.ReadByte().ToString() + reader.ReadByte().ToString();
if (code.Equals("6677"))
{
//it's a BMP file
}
}
}
I have a below solution which works for me.This is general solution which check all types of Binary file.
/// <summary>
/// This method checks whether selected file is Binary file or not.
/// </summary>
public bool CheckForBinary()
{
Stream objStream = new FileStream("your file path", FileMode.Open, FileAccess.Read);
bool bFlag = true;
// Iterate through stream & check ASCII value of each byte.
for (int nPosition = 0; nPosition < objStream.Length; nPosition++)
{
int a = objStream.ReadByte();
if (!(a >= 0 && a <= 127))
{
break; // Binary File
}
else if (objStream.Position == (objStream.Length))
{
bFlag = false; // Text File
}
}
objStream.Dispose();
return bFlag;
}
public bool IsTextFile(string FilePath)
using (StreamReader reader = new StreamReader(FilePath))
{
int Character;
while ((Character = reader.Read()) != -1)
{
if ((Character > 0 && Character < 8) || (Character > 13 && Character < 26))
{
return false;
}
}
}
return true;
}

Read line and append something at the end

I'm new in kotlin. I'm trying to read file line by line and add something at the end to each of them.
My file before reading:
abcd;abcd;abcd;
bcda;bcda;bcda;
dacb;dacb;dacb;
My file after reading and appending:
abcd;abcd;abcd;smth1
bcda;bcda;bcda;smth2
dacb;dacb;dacb;smth3
I have code for reading file line by line but could you tell me how to add string to each of them?
val pathToFile = "abc.txt"
val scan = Scanner(File(pathToFile))
while (scan.hasNextLine()) {
val line = scan.nextLine()
var lista = ArrayList<String>()
lista = line.split(";") as ArrayList<String>
println(lista.get(0) + " and " + lista.get(1) + " and " + lista.get(2))
}
Januson has the right idea. Here is some Kotlin code to do the job:
inline fun File.mapLines(crossinline transform: (line: String) -> String) {
val tempFile = createTempFile(prefix = "transform", suffix = ".txt")
tempFile.printWriter().use { writer ->
this.forEachLine { line -> writer.println(transform(line)) }
}
check(this.delete() && tempFile.renameTo(this)) { "failed to replace file" }
}
Example usage:
val pathToFile = "abc.txt"
var index = 0
File(pathToFile).mapLines { line -> "${line}smth${++index}" }
If you are using Java 1.7+ then you can use Files.move instead of delete/renameTo:
Files.move(tempFile.toPath(), this.toPath(), StandardCopyOption.REPLACE_EXISTING)
See also Write to file after match in Kotlin.
You can't read and write to the same file unless you are using RandomAccessFile. Instead you should do the following:
Read line from your input file.
Do the modification you want (append to the end of the line, print line).
Write modified line to the output file.
After reading/writing all the data close both files.
Delete input file. Rename output file to the input file name.

Groovy write to file (newline)

I created a small function that simply writes text to a file, but I am having issues making it write each piece of information to a new line. Can someone explain why it puts everything on the same line?
Here is my function:
public void writeToFile(def directory, def fileName, def extension, def infoList) {
File file = new File("$directory/$fileName$extension")
infoList.each {
file << ("${it}\n")
}
}
The simple code I'm testing it with is something like this:
def directory = 'C:/'
def folderName = 'testFolder'
def c
def txtFileInfo = []
String a = "Today is a new day"
String b = "Tomorrow is the future"
String d = "Yesterday is the past"
txtFileInfo << a
txtFileInfo << b
txtFileInfo << d
c = createFolder(directory, folderName) //this simply creates a folder to drop the txt file in
writeToFile(c, "garbage", ".txt", txtFileInfo)
The above creates a text file in that folder and the contents of the text file look like this:
Today is a new dayTomorrow is the futureYesterday is the past
As you can see, the text is all bunched together instead of separated on a new line per text. I assume it has something to do with how I am adding it into my list?
As #Steven points out, a better way would be:
public void writeToFile(def directory, def fileName, def extension, def infoList) {
new File("$directory/$fileName$extension").withWriter { out ->
infoList.each {
out.println it
}
}
}
As this handles the line separator for you, and handles closing the writer as well
(and doesn't open and close the file each time you write a line, which could be slow in your original version)
It looks to me, like you're working in windows in which case a new line character in not simply \n but rather \r\n
You can always get the correct new line character through System.getProperty("line.separator") for example.
I came across this question and inspired by other contributors. I need to append some content to a file once per line. Here is what I did.
class Doh {
def ln = System.getProperty('line.separator')
File file //assume it's initialized
void append(String content) {
file << "$content$ln"
}
}
Pretty neat I think :)
Might be cleaner to use PrintWriter and its method println.
Just make sure you close the writer when you're done
#Comment for ID:14.
It's for me rather easier to write:
out.append it
instead of
out.println it
println did on my machine only write the first file of the ArrayList, with append I get the whole List written into the file.
Kindly anyway for the quick-and-dirty-solution.