I use the command 'add file ' and the file will be load by a UDF subsequently.
But I cannot find the file added by hive at HDFS(hdfs://namenode:8026/user/hdfs) and I need the path in my udf method.
What is the path of the file and how to use it via udf?
dfs path is not accessible from UDF/UDTF, you need to provide local path in UDF/UDTF.
My approach:
Check if the file is present in locally at '/tmp' or not.
If yes and has non-zero file length then use it, else Pull the file from DFS:/shared to '/tmp and proceed.
public static boolean readFile(){
BufferedReader br=null;
try {
File f = new File("/tmp/" + fileName);
if (! f.exists() || f.length() == 0){
// Pull fresh file from dfs:/xyz.
String cmd = "hadoop fs -get /xyz/" + fileName + " /tmp/";
Runtime run = Runtime.getRuntime();
System.err.println("Pulling Mapping file from HDFS. Running: " + cmd);
Process pr = run.exec(cmd);
try {
// Waiting for the job to complete.
pr.waitFor();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
//open file for reading
br = new BufferedReader(new FileReader("/tmp/" + fileName));
String line= br.readLine();
while(line != null){
System.out.println(line);
//read next line
line = br.readLine();
}
br.close();
}catch (FileNotFoundException e){
System.err.println("File not found - "+fileName);
return false;
}catch(IOException e){
System.err.println("Error while reading from the preset file");
return false;
}
return true;
}
add file add files in distributed cache in Hive.
Related
I am developing a plugin using org.sonarsource.sonarqube:sonar-plugin-api:6.3. I am trying to access a file in my resource folder. The reading works fine in unit testing, but when it is deployed as a jar into sonarqube, it couldn't locate the file.
For example, I have the file Something.txt in src/main/resources. Then, I have the following code
private static final String FILENAME = "Something.txt";
String template = FileUtils.readFile(FILENAME);
where FileUtils.readFile would look like
public String readFile(String filePath) {
try {
return readAsStream(filePath);
} catch (IOException ioException) {
LOGGER.error("Error reading file {}, {}", filePath, ioException.getMessage());
return null;
}
}
private String readAsStream(String filePath) throws IOException {
try (InputStream inputStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(filePath)) {
if (inputStream == null) {
throw new IOException(filePath + " is not found");
} else {
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}
}
This question is similar with reading a resource file from within a jar. I also have tried with /Something.txt and Something.txt, both does not work.If I put the file Something.txt in the classes folder in sonarqube installation folder, the code will work.
Try this:
File file = new File(getClass().getResource("/Something.txt").toURI());
BufferredReader reader = new BufferedReader(new FileReader(file));
String something = IOUtils.toString(reader);
Your should not use getContextClassLoader(). see Short answer: never use the context class loader!
How can I read file contents from a virtual file. I am currently using this way
BufferedReader br = new BufferedReader(new InputStreamReader(virtualFile.getInputStream()));
String currentLine;
StringBuilder stringBuilder = new StringBuilder();
while ((currentLine = br.readLine()) != null) {
stringBuilder.append(currentLine);
stringBuilder.append("\n");
}
} catch (IOException e1) {
e1.printStackTrace();
return 0;
}
However am getting some garbled string appended when I print the stringbuilder.
Some common ways of reading VirtualFile contents are:
file.contentsToByteArray()
LoadTextUtil.loadText(file)
FileDocumentManager.getInstance().getDocument(file).get*CharSequence()
You can use VfsUtil.loadText(virtualFile);
Also, to make sure that the file is updated you can use virtualFile.refresh(false, false);
here you can find some more useful information.
Regarding file upload in play framework 2.2.3.
According to update on this question,
adding the following to application.conf should enable upload of files upto 10MB.
parsers.MultipartFormData.maxLength = 10240K
this does not work for me. I get "413, Request entity too large" error code for any file greater than 1MB.
I tried setting another field
parsers.text.maxLength=10M
Still upload fails with 413.
I upload files using XHR with a FormData that a can contain multiple files.
Upload Controller Code :
public Result uploadAttendeeFiles(){
try{
MultipartFormData body = request().body().asMultipartFormData();
List<FilePart> uploadedFiles = body.getFiles();
Map<String, String> returnMessages = new HashMap<String, String>();
String fileName ="", fileExtension="", fieldId = "";
int fileCounter = 0;
if (!CommonUtils.isEmpty(uploadedFiles)) {
for (FilePart filePar : uploadedFiles) {
try {
fileExtension = Files.getFileExtension(filePar.getFilename());
fieldId = body.asFormUrlEncoded().get("fieldId")[fileCounter];
fileName = body.asFormUrlEncoded().get("fileName")[fileCounter++];
InputStream in = new FileInputStream(filePar.getFile());
Object objectId = uploadService.loadFile(in, fileName, fileExtension);
if(objectId != null)
returnMessages.put(fieldId, objectId.toString());
else
returnMessages.put(fieldId, "failed-Failed to save file!");
} catch (IOException e) {
returnMessages.put(fileName+"."+fileExtension, "failed-Error while uploading file!");
e.printStackTrace();
}
}
}
else{
return ok("{\"errormessage\":\"No files selected!\"}");
}
return ok(Json.toJson(returnMessages));
} catch(Exception e) {
e.printStackTrace();
return ok("{\"errormessage\":\"Error while uploading files!\"}");
}
}
I am working a project in which i need to extract xml(sitemap)data from gz file using apache tika[AM NEW TO TIKA].
the fie name is something like sitemap01.xml.gz
I could extract data from normal text file or html,but i don't know how to extract xml from gz and extract the meta and data from xml...
I searched Google for past two days.
Do i need to use delegateParser in tika to extract data from xml?
Please guide me to some sample or articles....
Here is my try
public void parseXml() throws IOException{
Metadata metadata = new Metadata();
ContentHandler handler = new BodyContentHandler();
Parser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
InputStream stream =this.getClass().getResourceAsStream("sitemap.xml.gz");
try {
parser.parse(stream,handler,metadata,context);
for(int i = 0; i <metadata.names().length; i++) {
String name = metadata.names()[i];
System.out.println(name + " : " + metadata.get(name));
}
System.out.println(handler.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
if(stream!=null) {
stream.close();
}
}
}
The thing you're missing is setting a recursing parser on your ParseContext. You probably want something like:
Parser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
context.set(Parser.class, parser);
parser.parse(....)
By setting a Parser on the ParseContext, you tell Tika to call that when it encounters embedded documents (such as the XML inside your GZip)
Here is how you can use XML parser from Apache Tika for your case:
//detecting the file type
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
File inFile = new File("sitemap.xml.gz");
System.out.println(inFile.isFile());
FileInputStream inputstream = new FileInputStream(inFile);
ParseContext pcontext = new ParseContext();
//Xml parser
XMLParser xmlparser = new XMLParser();
xmlparser.parse(inputstream, handler, metadata, pcontext);
System.out.println(pcontext.toString());
System.out.println("Contents of the document:" + handler.toString());//this one contains all contents from xml files and tags are also removed
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
This question already has answers here:
How to list the files inside a JAR file?
(17 answers)
Closed 8 years ago.
I have read the posts:
Viewing contents of a .jar file
and
How do I list the files inside a JAR file?
But I, sadly, couldn't find a good solution to actually read a JAR's content (file by file).
Furthermore, could someone give me a hint, or point to a resource, where my problem is discussed?
I just could think of a not-so-straight-forward-way to do this:
I could somehow convert the list of a JAR's resources to a list of
inner-JAR URLs, which I then could open using openConnection().
You use JarFile to open a Jar file. With it you can get ZipEntry or JarEntry (they can be seen as the same thing) by using 'getEntry(String name)' or 'entires'. Once you get an Entry, you can use it to get InputStream by calling 'JarFile.getInputStream(ZipEntry ze)'. Well you can read data from the stream.
Here is the complete code which reads all the file contents inside the jar file.
public class ListJar {
private static void process(InputStream input) throws IOException {
InputStreamReader isr = new InputStreamReader(input);
BufferedReader reader = new BufferedReader(isr);
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close();
}
public static void main(String arg[]) throws IOException {
JarFile jarFile = new JarFile("/home/bathakarai/gold/click-0.15.jar");
final Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
final JarEntry entry = entries.nextElement();
if (entry.getName().contains(".")) {
System.out.println("File : " + entry.getName());
JarEntry fileEntry = jarFile.getJarEntry(entry.getName());
InputStream input = jarFile.getInputStream(fileEntry);
process(input);
}
}
}
}
Here is how I read it as a ZIP file,
try {
ZipInputStream is = new ZipInputStream(new FileInputStream("file.jar"));
ZipEntry ze;
byte[] buf = new byte[4096];
int len;
while ((ze = is.getNextEntry()) != null) {
System.out.println("----------- " + ze);
len = ze.getSize();
// Dump len bytes to the file
...
}
is.close();
} catch (Exception e) {
e.printStackTrace();
}
This is more efficient than JarFile approach if you want decompress the whole file.