Left padding a string in pig - apache-pig

I would like to left pad a string data type field with 0-s. Is there any way to do that? I need to have fixed length (40) values.
thanks in advance,
Clairvoyant

The number of zeros needs to be generate dynamically based on the length of the remaining string, so i don't think its possible in native pig.
This is very much possible in UDF.
input.txt
11111
222222222
33
org.apache.hadoop.util.NativeCodeLoader
apachepig
PigScript:
REGISTER leftformat.jar;
A = LOAD 'input.txt' USING PigStorage() AS(f1:chararray);
B = FOREACH A GENERATE format.LEFTPAD(f1);
DUMP B;
Output:
(0000000000000000000000000000000000011111)
(0000000000000000000000000000000222222222)
(0000000000000000000000000000000000000033)
(0org.apache.hadoop.util.NativeCodeLoader)
(0000000000000000000000000000000apachepig)
UDF code: The below java class file is compiled and generated as leftformat.jar
LEFTPAD.java
package format;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class LEFTPAD extends EvalFunc<String> {
#Override
public String exec(Tuple arg) throws IOException {
try
{
String input = (String)arg.get(0);
return StringUtils.leftPad(input, 40, "0");
}
catch(Exception e)
{
throw new IOException("Caught exception while processing the input row ", e);
}
}
}
UPDATE:
1.Download 4 jar files from the below link(apache-commons-lang.jar,piggybank.jar, pig-0.11.0.jar and hadoop-common-2.6.0-cdh5.4.5)
http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm
http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm
http://www.java2s.com/Code/Jar/p/Downloadpig0110jar.htm
2. Set all the 3 jar files to your class path
>> export CLASSPATH=/tmp/pig-0.11.1.jar:/tmp/piggybank.jar:/tmp/apache-commons-lang.jar
3. Create directory name format
>>mkdir format
4. Compile your LEFTPAD.java and make sure all the three jars are included in the class path otherwise compilation issue will come
>>javac LEFTPAD.java
5. Move the class file to format folder
>>mv LEFTPAD.class format
6. Create jar file name leftformat.jar
>>jar -cf leftformat.jar format/
7. jar file will be created, include into your pig script
Example from command line:
$ mkdir format
$ javac LEFTPAD.java
$ mv LEFTPAD.class format/
$ jar -cf leftformat.jar format/
$ ls
LEFTPAD.java format input.txt leftformat.jar script.pig

Related

How can I recursively read a directory in Deno?

I'm trying to recursively read a file in Deno using Deno.readDir, but the example they provide only does the folder given:
for await (const entry of Deno.readDir(Deno.cwd())) {
console.log(entry.name);
}
How can I make this recursive?
Deno's standard library includes a function called walk for this purpose. It's available in std/fs/walk.ts. Here's an example:
/Users/deno/so-74953935/main.ts:
import { walk } from "https://deno.land/std#0.170.0/fs/walk.ts";
for await (const walkEntry of walk(Deno.cwd())) {
const type = walkEntry.isSymlink
? "symlink"
: walkEntry.isFile
? "file"
: "directory";
console.log(type, walkEntry.path);
}
Running in the terminal:
% pwd
/Users/deno/so-74953935
% ls -AF
.vscode/ deno.jsonc deno.lock main.ts
% ls -AF .vscode
settings.json
% deno --version
deno 1.29.1 (release, x86_64-apple-darwin)
v8 10.9.194.5
typescript 4.9.4
% deno run --allow-read main.ts
directory /Users/deno/so-74953935
file /Users/deno/so-74953935/main.ts
file /Users/deno/so-74953935/deno.jsonc
file /Users/deno/so-74953935/deno.lock
directory /Users/deno/so-74953935/.vscode
file /Users/deno/so-74953935/.vscode/settings.json
Since that function returns an async generator, you can make your own generator function that wraps around Deno.readDir:
(Do note that the example provided will join the path and name, giving you strings such as /directory/name.txt)
import { join } from "https://deno.land/std/path/mod.ts";
export async function* recursiveReaddir(
path: string
): AsyncGenerator<string, void> {
for await (const dirEntry of Deno.readDir(path)) {
if (dirEntry.isDirectory) {
yield* recursiveReaddir(join(path, dirEntry.name));
} else if (dirEntry.isFile) {
yield join(path, dirEntry.name);
}
}
}
for await (const entry of recursiveReaddir(Deno.cwd())) {
console.log(entry)
}
OR, you can use recursive_readdir, which is a 3rd party library in Deno made for this purpose.

Javacpp: problem with linked library - symbol not found in flat namespace

I'm trying to use javacpp library. I've prepared c-lib with one function
char * hello(const char * name);
and build it with cmake (on mac, with clang++)
Also I've prepared config file for javacpp
package arrival;
import org.bytedeco.javacpp.annotation.*;
import org.bytedeco.javacpp.tools.*;
#Properties(
value = #Platform(
includepath = {"/Users/valentina.baranova/external/kotlin-cloud/greeter/include/"},
include = {"greeter.h"},
library = "greeter-jni",
link = {"greeter"},
linkpath = {"/Users/valentina.baranova/external/kotlin-cloud/greeter/build/"}
),
target = "arrival.greeter"
)
public class GreeterConfig implements InfoMapper {
public void map(InfoMap infoMap) {
}
}
javacpp prepared library libgreeter-jni.dylib, but when I try to call hello function I got an error
dlopen(/Users/valentina.baranova/external/kotlin-cloud/greeter-javacpp/target/classes/arrival/macosx-x86_64/libgreeter-jni.dylib, 0x0001): symbol not found in flat namespace '__Z5helloPKc'
What I'm doing wrong? In debug I see that Loader.load() loads both libraries.
UPD:
Loader.load() is loaded in autogenerated greeter class.
Function hello there is in both libraries but it has different name
nm libgreeter-jni.dylib | grep hello
0000000000001f70 T _Java_arrival_greeter_hello__Ljava_lang_String_2
0000000000001ce0 T _Java_arrival_greeter_hello__Lorg_bytedeco_javacpp_BytePointer_2
U __Z5helloPKc
nm libgreeter.dylib | grep hello
0000000000003f50 T _hello

Data not correctly read from hadoop using Filesystem API

I am trying to read a file from hadoop using filesystem API, I am able to connect hadoop and read the file , however file read contains garbled characters.
Below is the code:
public class HdfsToInfaWriter{
public static void main(String[] args)
{
//FileUtil futil;
String hdfsuri=args[0];
//String src=args[1];
String localuri=args[1];
String hdusername=args[2];
byte[] buffer=new byte[30];
char c;
Configuration conf=new Configuration();
conf.addResource(new Path("file:///etc/hadoop/conf/core-site.xml"));
conf.addResource(new Path("file:///etc/hadoop/conf/hdfs-site.xml"));
conf.set("hadoop.security.authentication", "kerberos");
conf.set("fs.defaultFS",hdfsuri);
conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
//futil.copy(srcFS, src, dst, deleteSource, conf)
try {
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("**************",
"********************");
}catch(IOException e){
e.printStackTrace();
}
System.setProperty("HADOOP_USER_NAME",hdusername);
System.setProperty("hadoop.home.dir","/");
FSDataInputStream in1 = null;
try{
FileSystem fs = FileSystem.get(URI.create(hdfsuri),conf);
Path hdfsreadpath=new Path(hdfsuri);
CompressionCodecFactory factory = new CompressionCodecFactory(conf);
System.out.println("the class for codec is " +factory.getCodec(hdfsreadpath));
File src1=new File(localuri);
System.out.println("before copy");
FileUtil.copy(fs, hdfsreadpath, src1, false, conf);
}}}
When i use hdfs command hdfs dfs -cat /bigdatahdfs/datamart/trial.txt, the data in file is a simple text file.
But when I use the command cat /home/trial1.txt and copy file to local system, the output is as below:
▒▒▒1K▒;▒▒
=▒<▒▒▒&▒▒▒
NOTE:- i have tried using IOUtils API also, output is the same.

transfer command line arguments from java to jython / optparse

I want to make a jar file from a python package. I am using the jython-compile-maven-plugin with maven. The tricky part seems to be the handling of arguments. The receiving python package uses optparse which works fine on the python side but I have difficulties to provide the parameters via java / jython.
I got an error about missing arguments. Now I tried to provide the arguments to main() but it doesn't expect any.
This is how I call into the jar:
java -jar target/metrics-0.2.0-jar-with-dependencies.jar -f sample.txt --format csv -q
Java started
5 Arguments: -f, sample.txt, --format, csv, -q,
Exception in thread "main" javax.script.ScriptException: TypeError: main() takes no arguments (1 given) in <script> at line number 1
Any ideas on how to provide the args properly?
here is my InitJython.java:
package org.testingsoftware.metrics;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import org.apache.commons.lang.StringUtils;
public class InitJython {
public static void main(String[] args) throws ScriptException {
System.out.println("Java started");
System.out.print(args.length + " Arguments: ");
for (String s : args) {
System.out.print(s);
System.out.print(", ");
}
System.out.println();
ScriptEngine engine = new ScriptEngineManager().getEngineByName("python");
// c.exec("try:\n from metrics.metrics import main\n main()\nexcept SystemExit:\n pass");
engine.eval("from metrics.metrics import main");
engine.eval("main('" + StringUtils.join(args, " ") + "')");
System.out.println("Java exiting");
}
public void run() throws ScriptException {
}
}
This line,
engine.eval("main('" + StringUtils.join(args, " ") + "')");
evaluates to
engine.eval("main('-f sample.txt --format csv -q')");
This means that the main() Python function will receive one argument ("1 given", as it says in the error message).
In order to make it work, you could have a main() that looks something like this:
def main(arg):
# Split 'arg' to get the arguments that were given to InitJython
# on the command line as a list of strings
args = arg.split()
parser = OptionParser()
parser.add_option("-f", dest="filename",
help="read from FILENAME")
...
...
(opts, args) = parser.parse_args(args)

add timestamp in the log file on daily basis

I want to add the timestamp at the end of my log file on the daily basis.
Means for every day the log file should look like
test.2013-01-10.log
My new log.properties file is below.Please help
log4j.rootLogger=info,myapp
log4j.appender.myapp=org.apache.log4j.DailyRollingFileAppender
log4j.appender.myapp.ImmediateFlush=true
log4j.appender.myapp.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.myapp.layout.ConversionPattern=%d{yyyy-MM-dd}%m%n
log4j.appender.myapp.file=${catalina.base}/logs/myapplog/test
log4j.appender.myapp.DatePattern='_'yyyy-MM-dd
#log4j.appender.myapp.MaxFileSize=999MB
#log4j.appender.myapp.MaxBackupIndex=20
log4j.appender.myapp.layout=org.apache.log4j.PatternLayout
log4j.appender.consoleAppender = org.apache.log4j.ConsoleAppender
log4j.appender.consoleAppender.layout = org.apache.log4j.PatternLayout
log4j.appender.consoleAppender.layout.ConversionPattern=%m%n
log4j.logger=info,stdout,myapp
log4j.logger.org.hibernate=warn
Try use DailyRollingFileAppender instead of RollingFileAppender in your configuration
EDIT :
Try this. It will generate test.log and when your computer date changed, it will generate new test.log and test.logyyyy-mm-dd
log4j.properties :
log4j.rootLogger=info,A1
log4j.appender.A1=org.apache.log4j.DailyRollingFileAppender
log4j.appender.A1.ImmediateFlush=true
log4j.appender.A1.DatePattern='.'%d{yyyy-mm-dd}
log4j.appender.A1.layout.ConversionPattern=%d{yyyy-MM-dd}%m%n
log4j.appender.A1.File=c:/test.log
log4j.appender.A1.DatePattern='.'yyyy-MM-dd
#log4j.appender.A1.MaxFileSize=999MB
#log4j.appender.A1.MaxBackupIndex=20
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.consoleAppender = org.apache.log4j.ConsoleAppender
log4j.appender.consoleAppender.layout = org.apache.log4j.PatternLayout
log4j.appender.consoleAppender.layout.ConversionPattern=%m%n
log4j.logger=info,stdout,A1
log4j.logger.org.hibernate=warn
Code :
import org.apache.log4j.Logger;
public class test {
public static void main(String[] args) {
Logger barlogger = Logger.getLogger(test.class);
barlogger.info("test");
}
}
UPDATE
If you want to change your log to .txt, change this:
log4j.appender.A1.File=${catalina.base}/logs/test.log
to:
log4j.appender.A1.File=${catalina.base}/logs/test.txt