How to get the avro schema from StructType

How to get the avro schema from StructType - spark-avro

I have a dataFrame
Dataset<Row> dataset = getSparkInstance().createDataFrame(newRDD, struct);
dataset.schema() is returning me a StructType.
But I want the actual schema to store in sample.avsc file
Basically I want to convert StructType to Avro Schema file (.avsc).
any Idea?

Below code is the work around that will solve my problem.
Here I am saving the .avro file and getting the schema back from it.
df.write().mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("outputPath");
File files = new File("outputPath");
String[] children = files.list();
String filename="";
for(String file : children) {
if (file.contains("SUCCESS")) {
}else {
filename=file;
if(file.contains(".crc")) {
filename= file.replaceAll(".crc", "");
if(filename.startsWith(".")) {
filename=filename.substring(1);
}
while(!new File("outputPath/"+filename).exists()) {
System.out.println("outputPath/"+filename);
Thread.sleep(100);
}
}
}
}
System.out.println(files.getAbsolutePath()+"/"+filename);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("outputPath/"+filename), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema.toString());

Related

Document permissions Content Engine API

I'm trying to remove/add the groups from security of a document in FileNet using CPE API. I am able to remove wihtout any issues. However, when I try to add the groups that are missing, by inheriting from document class, groups get added without full permissions. For example, I remove "author" group and when I try to add the same group back, it does not have all the permissions.
Remove groups:
AccessPermissionList apl = doc.get_Permissions();
Iterator iter = apl.iterator();
while (iter.hasNext())
{
AccessPermission ap = (AccessPermission)iter.next();
if(ap.get_GranteeName().contains("group name")){
iter.remove();
}
}
doc.set_Permissions(apl);
doc.save(RefreshMode.NO_REFRESH);
Add groups:
DocumentClassDefinition docClassDef = Factory.DocumentClassDefinition.fetchInstance(os, classID, null);
AccessPermissionList docClassApl = docClassDef.get_Permissions();
Iterator docClassApliter = docClassApl.iterator();
for(Object obj : docClassApl)
{
AccessPermission ap = (AccessPermission)obj;
if(!apl.contains(ap)){
apl.add(ap);
}
}
doc.set_Permissions(apl);
doc.save(RefreshMode.NO_REFRESH);
RESOLVED:
Had to use DefaultInstanceSecurity rather than regular security as the permissions in both instances were different. So just updated the following line of code:
AccessPermissionList docClassApl = docClassDef.get_DefaultInstancePermissions();

You need to set AccessMask too. Like below:
AccessPermission ap;
ap.set_AccessMark ( new Integer (AccessLevel.FULL_CONTROL_DOCUMENT_AS_INT));
//AccessLevel.WRITE_DOCUMENT_AS_INT
//AccessLevel.MAJOR_VERSION_DOCUMENT_AS_INT
Version 5.2.0 onwards, AccessLevel is deprecated but you can give it a try. AccessRight is the replacement now. Refer this.
Update
public static void setPermissions(Document doc) throws IOException {
//In cpetarget.properties file
//cpetarget.security=Administrator:FULL_CONTROL,p8admin:MODIFY_PROPERTIES
InputStream input = new FileInputStream("cpetarget.properties");
java.util.Properties prop = new java.util.Properties();
prop.load(input);
List<String> strList = new ArrayList<String>(Arrays.asList(prop.getProperty("cpetarget.security").split(",")));
AccessPermissionList apl = doc.get_Permissions();
Iterator<AccessPermission> itr = apl.iterator();
List<AccessPermissionList> oldPermissionsList = new ArrayList<AccessPermissionList>();
oldPermissionsList.addAll(apl);
// Remove all your old permissions here
apl.removeAll(oldPermissionsList);
// Add all your new permissions here
try {
for (String str : strList) {
String[] strArray = str.split(":");
AccessPermission permission = Factory.AccessPermission.createInstance();
permission.set_GranteeName(strArray[0]);
permission.set_AccessType(AccessType.ALLOW);
permission.set_InheritableDepth(new Integer(0));
//permission.set_InheritableDepth(new Integer(0)); // this object only
//permission.set_InheritableDepth(new Integer(-1));this object and all children
//permission.set_InheritableDepth(new Integer(1)); this object and immediate children
if (strArray[1].equalsIgnoreCase("FULL_CONTROL")) {
permission.set_AccessMask(new Integer(AccessLevel.FULL_CONTROL_DOCUMENT_AS_INT));
//permission.set_AccessMask(AccessRight.MAJOR_VERSION_AS_INT);
}
if (strArray[1].equalsIgnoreCase("READ_ONLY")) {
permission.set_AccessMask(new Integer(AccessLevel.VIEW_AS_INT));
}
if (strArray[1].equalsIgnoreCase("MODIFY_PROPERTIES")) {
permission.set_AccessMask(new Integer(AccessLevel.WRITE_DOCUMENT_AS_INT));
}
if (strArray[1].equalsIgnoreCase("MAJOR_VERSIONING")) {
permission.set_AccessMask(new Integer(AccessLevel.MAJOR_VERSION_DOCUMENT_AS_INT));
}
AccessPermissionList permissions = doc.get_Permissions();
permissions.add(permission);
doc.set_Permissions(permissions);
doc.save(RefreshMode.REFRESH);
System.out.println("Done");
}
} catch (Exception e) {
e.printStackTrace();
}
}

Convert Json to Newline Delimit json

I need to convert my json to Newline delimiter to insert data in BigQuery from C#(.NET Application).
Please suggest the workaround
Input
[
{
"DashboardCategoryId":1,
"BookingWindows":[
{
"DaysRange":"31-60 Days",
"BookingNumber":2
},
{
"DaysRange":"Greater Than 1 year",
"BookingNumber":1
}
]
},
{
"DashboardCategoryId":1,
"BookingWindows":[
{
"DaysRange":"61-120 Days",
"BookingNumber":1
},
{
"DaysRange":"8-14",
"BookingNumber":1
}
]
}
]
Required Output
{"DashboardCategoryId": 1,"BookingWindows": [{"DaysRange": "31-60 Days","BookingNumber":2},{"DaysRange": "Greater Than 1 year","BookingNumber": 1}]}
{"DashboardCategoryId": 1,"BookingWindows": [{"DaysRange": "61-120 Days","BookingNumber":1},{"DaysRange": "8-14","BookingNumber": 1}]}

If you have already loaded your JSON array into memory as, say, a List<JToken>, you can write it to newline delimited JSON by using the answer from Serialize as NDJSON using Json.NET.
However, since BigQuery newline delimited JSON files do tend to be... big, I'm going to suggest instead an entirely streaming solution:
public static class JsonExtensions
{
public static void ToNewlineDelimitedJson(Stream readStream, Stream writeStream)
{
var encoding = new UTF8Encoding(false, true);
// Let caller dispose the underlying streams.
using (var textReader = new StreamReader(readStream, encoding, true, 1024, true))
using (var textWriter = new StreamWriter(writeStream, encoding, 1024, true))
{
ToNewlineDelimitedJson(textReader, textWriter);
}
}
public static void ToNewlineDelimitedJson(TextReader textReader, TextWriter textWriter)
{
using (var jsonReader = new JsonTextReader(textReader) { CloseInput = false, DateParseHandling = DateParseHandling.None })
{
ToNewlineDelimitedJson(jsonReader, textWriter);
}
}
enum State { BeforeArray, InArray, AfterArray };
public static void ToNewlineDelimitedJson(JsonReader jsonReader, TextWriter textWriter)
{
var state = State.BeforeArray;
do
{
if (jsonReader.TokenType == JsonToken.Comment || jsonReader.TokenType == JsonToken.None || jsonReader.TokenType == JsonToken.Undefined || jsonReader.TokenType == JsonToken.PropertyName)
{
// Do nothing
}
else if (state == State.BeforeArray && jsonReader.TokenType == JsonToken.StartArray)
{
state = State.InArray;
}
else if (state == State.InArray && jsonReader.TokenType == JsonToken.EndArray)
{
state = State.AfterArray;
}
else
{
// Formatting.None is the default; I set it here for clarity.
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
jsonWriter.WriteToken(jsonReader);
}
// http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
// Root value wasn't an array after all, so end writing with one item.
if (state == State.BeforeArray)
state = State.AfterArray;
}
}
while (jsonReader.Read() && state != State.AfterArray);
}
}
Then use it as follows:
using (var readStream = File.OpenRead(fromFileName))
using (var writeStream = File.Open(toFileName, FileMode.Create))
{
JsonExtensions.ToNewlineDelimitedJson(readStream, writeStream);
}
This takes advantage of the method JsonWriter.WriteToken(JsonReader) to write and format directly from a JsonReader to a JsonWriter without ever loading the entire JSON token hierarchy into memory.
Working sample .Net fiddle.

Newtonsoft Json.NET can be used to format JSON.
I've found example here:
private static string FormatJson(string json)
{
dynamic parsedJson = JsonConvert.DeserializeObject(json);
return JsonConvert.SerializeObject(parsedJson, Formatting.Indented);
}

How to get data from database and make download that content as pdf format in Asp.net MVC

I have data in varbinary type in database from "Artifact" table like
ID CONTENT
1 Some data in varbinary type
Now i want to get the "Content" column data and should download as PDF format in Users Download folder with name "Report.PDF" file.
How should i this? I have tried like
public ActionResult DownloadFile(int fileId)
{
ResLandEntities reslandEntities = new ResLandEntities();
var content = reslandEntities.ARTIFACT.Where(m => m.ID== fileId).FirstOrDefault();
byte[] contents = content.CONTENT;
string text = Encoding.UTF8.GetString(contents);
return File(Server.MapPath("~/download/Report.PDF"), "application/pdf", text );
}
But, not getting, can anybody help me

Try This :
public ActionResult DownloadFile(int fileId)
{
ResLandEntities reslandEntities = new ResLandEntities();
var content = reslandEntities.ARTIFACT.Where(m => m.ID== fileId).FirstOrDefault();
byte[] contents = (byte[])content.CONTENT;
return File(contents, "application/pdf");
}

trouble creating a pig udf schema

Trying to parse xml and I'm having trouble with my UDF returning a tuple. Following the example from http://verboselogging.com/2010/03/31/writing-user-defined-functions-for-pig
pig script
titles = FOREACH programs GENERATE (px.pig.udf.PARSE_KEYWORDS(program))
AS (root_id:chararray, keyword:chararray);
here is the output schema code:
override def outputSchema(input: Schema): Schema = {
try {
val s: Schema = new Schema
s.add(new Schema.FieldSchema("root_id", DataType.CHARARRAY))
s.add(new Schema.FieldSchema("keyword", DataType.CHARARRAY))
return s
}
catch {
case e: Exception => {
return null
}
}
}
I'm getting this error
pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException:
ERROR 0: Given UDF returns an improper Schema.
Schema should only contain one field of a Tuple, Bag, or a single type.
Returns: {root_id: chararray,keyword: chararray}
Update Final Solution:
In java
public Schema outputSchema(Schema input) {
try {
Schema tupleSchema = new Schema();
tupleSchema.add(input.getField(1));
tupleSchema.add(input.getField(0));
return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),tupleSchema, DataType.TUPLE));
} catch (Exception e) {
return null;
}
}

You will need to add your s schema instance variable to another Schema object.
Try returning a new Schema(new FieldSchema(..., input), s, DataType.TUPLE)); like in the template below:
Here is my answer in Java (fill out your variable names):
#Override
public Schema outputSchema(Schema input) {
Schema tupleSchema = new Schema();
try {
tupleSchema.add(new FieldSchema("root_id", DataType.CHARARRAY));
tupleSchema.add(new FieldSchema("keyword", DataType.CHARARRAY));
return new Schema(new FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), tupleSchema, DataType.TUPLE));
} catch (FrontendException e) {
e.printStackTrace();
return null;
}
}
Would you try:
titles = FOREACH programs GENERATE (px.pig.udf.PARSE_KEYWORDS(program));
If that doesn't error, then try:
titles = FOREACH TITLES GENERATE
$0 AS root_id
,$1 AS keyword
;
And tell me the error?

Kendo Upload- Change File Name and get it back

I use kendo upload on my mvc project. My user have to upload files and I have to change file name with the unique file name.
I change the file name in the controller :
public ActionResult Dosyayukle(IEnumerable<HttpPostedFileBase> files)
{
if (files != null)
{
foreach (var file in files)
{
var fileName = Path.GetFileName(file.FileName);
var ext = Path.GetExtension(fileName);
var physicalPath = Path.Combine(Server.MapPath("~/UploadedFiles"), fileName.Replace(ext, "") + Guid.NewGuid() + ext);
file.SaveAs(physicalPath);
}
}
// Return an empty string to signify success
return Content("");
}
I need to get file name and save this file name to db with unique name.
In JS onComplate event I can't found the file new file name.
this.onComplate = function (e) {
var dosyaAdi = self.dosyaAdi();
if (dosyaAdi.match(/rar$/)) {
alert('rar');
} else if (dosyaAdi.match(/zip$/)) {
alert('zip');
} else {
alert(dosyaAdi);
}
};
How can I pass the new file name to onComplate event handler?
Or how can I do this another way ?

You can use TempData to store key value object in which you can save file name and unique key. The same temp data you can get it in the save call.
Hope this solves your problem.

I've found a solution for your question.
http://www.telerik.com/support/kb/aspnet-ajax/editor/giving-the-uploaded-files-unique-names.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get the avro schema from StructType - spark-avro

I have a dataFrame Dataset<Row> dataset = getSparkInstance().createDataFrame(newRDD, struct); dataset.schema() is returning me a StructType. But I want the actual schema to store in sample.avsc file Basically I want to convert StructType to Avro Schema file (.avsc). any Idea?

Related

Document permissions Content Engine API

Convert Json to Newline Delimit json

How to get data from database and make download that content as pdf format in Asp.net MVC

trouble creating a pig udf schema

Kendo Upload- Change File Name and get it back

Categories

Resources