In VTD-XML how to add new attribute into tag with existing attributes? - vtd-xml

I'm using VTD-XML to update XML files. In this I am trying to get a flexible way of maintaining attributes on an element. So if my original element is:
<MyElement name="myName" existattr="orig" />
I'd like to be able to update it to this:
<MyElement name="myName" existattr="new" newattr="newValue" />
I'm using a Map to manage the attribute/value pairs in my code and when I update the XML I'm doing something like the following:
private XMLModifier xm = new XMLModifier();
xm.bind(vn);
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
xm.insertAttribute(key+"='"+attr.get(key)+"'");
}
}
vn = xm.outputAndReparse();
This works for updating existing attributes, however when the attribute doesn't already exist, it hits the insert (insertAttribute) and I get "ModifyException"
com.ximpleware.ModifyException: There can be only one insert per offset
at com.ximpleware.XMLModifier.insertBytesAt(XMLModifier.java:341)
at com.ximpleware.XMLModifier.insertAttribute(XMLModifier.java:1833)
My guess is that as I'm not manipulating the offset directly this might be expected. However I can see no function to insert an an attribute at a position in the element (at end).
My suspicion is that I will need to do it at the "offset" level using something like xm.insertBytesAt(int offset, byte[] content) - as this is an area I have needed to get into yet is there a way to calculate the offset at which I can insert (just before the end of the tag)?
Of course I may be mis-using VTD in some way here - if there is a better way of achieving this then happy to be directed.
Thanks

That's an interesting limitation of the API I hadn't encountered yet. It would be great if vtd-xml-author could elaborate on technical details and why this limitation exists.
As a solution to your problem, a simple approach would be to accumulate your key-value pairs to be inserted as a String, and then to insert them in a single call after your for loop has terminated.
I've tested that this works as per your code:
private XMLModifier xm_ = new XMLModifier();
xm.bind(vn);
String insertedAttributes = "";
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
// Store the key-values to be inserted as attributes
insertedAttributes += " " + key + "='" + attr.get(key) + "'";
}
}
if (!insertedAttributes.equals("")) {
// Insert attributes only once
xm.insertAttribute(insertedAttributes);
}
This will also work if you need to update the attributes of multiple elements, simply nest the above code in while(autoPilot.evalXPath() != -1) and be sure to set insertedAttributes = ""; at the end of each while loop.
Hope this helps.

Related

DBArrayList to List<Map> Conversion after Query

Currently, I have a SQL query that returns information to me in a DBArrayList.
It returns data in this format : [{id=2kjhjlkerjlkdsf324523}]
For the next step, I need it to be in a List<Map> format without the id: [2kjhjlkerjlkdsf324523]
The Datatypes being used are DBArrayList, and List.
If it helps any, the next step is a function to collect the list and then to replace all single quotes if any [SQL-Injection prevention]. Using:
listMap = listMap.collect() { "'" + Util.removeSingleQuotes(it) + "'" }
public static String removeSingleQuotes(s) {
return s ? s.replaceAll(/'"/, '') : s
}
I spent this morning working on it, and I found out that I needed to actually collect the DBArrayList like this:
listMap = dbArrayList.collect { it.getAt('id')}
If you're in a bind like I was and restrained to a specific schema this might help, but #ou_ryperd has the correct answer!
While using a DBArrayList is not wrong, Groovy's idiom is to use the db result as a collection. I would suggest you use it that way directly from the db:
Map myMap = [:]
dbhandle.eachRow("select fieldSomeID, fieldSomeVal from yourTable;") { row ->
map[row.fieldSomeID] = row.fieldSomeVal.replaceAll(/'"/, '')
}

Sales Order Confirmation Report - SalesConfirmDP

I am modifying the SalesConfirmDP class and trying to add the CustVendExternalItem.ExternalItemTxt field into a new field I have created.
I have tried a couple of things but I do not think my syntax was correct i.e I declare the CustVendExternalItem table in the class declaration. But then when I try to insert CustVendExternalItem.ExternalItemTxt into my new field, it does not populate, I guess there must be a method which I need to include?
If anyone has any suggestion it would be highly appreciated.
Thank you in advance.
private void setSalesConfirmDetailsTmp(NoYes _confirmTransOrTaxTrans)
{
DocuRefSearch docuRefSearch;
// Body
salesConfirmTmp.JournalRecId = custConfirmJour.RecId;
if(_confirmTransOrTaxTrans == NoYes::Yes)
{
if (printLineHeader)
{
salesConfirmTmp.LineHeader = custConfirmTrans.LineHeader;
}
else
{
salesConfirmTmp.LineHeader = '';
}
salesConfirmTmp.ItemId = this.itemId();
salesConfirmTmp.Name = custConfirmTrans.Name;
salesConfirmTmp.Qty = custConfirmTrans.Qty;
salesConfirmTmp.SalesUnitTxt = custConfirmTrans.salesUnitTxt();
salesConfirmTmp.SalesPrice = custConfirmTrans.SalesPrice;
salesConfirmTmp.DlvDate = custConfirmTrans.DlvDate;
salesConfirmTmp.DiscPercent = custConfirmTrans.DiscPercent;
salesConfirmTmp.DiscAmount = custConfirmTrans.DiscAmount;
salesConfirmTmp.LineAmount = custConfirmTrans.LineAmount;
salesConfirmTmp.CurrencyCode = custConfirmJour.CurrencyCode;
salesConfirmTmp.PrintCode = custConfirmTrans.TaxWriteCode;
if (pdsCWEnabled)
{
salesConfirmTmp.PdsCWUnitId = custConfirmTrans.pdsCWUnitId();
salesConfirmTmp.PdsCWQty = custConfirmTrans.PdsCWQty;
}
**salesConfirmTmp.ExternalItemText = CustVendExternalItem.ExternalItemTxt;**
if ((custFormletterDocument.DocuOnConfirm == DocuOnFormular::Line)
|| (custFormletterDocument.DocuOnConfirm == DocuOnFormular::All))
{
docuRefSearch = DocuRefSearch::newTypeIdAndRestriction(custConfirmTrans,
custFormletterDocument.DocuTypeConfirm,
DocuRestriction::External);
salesConfirmTmp.Notes = Docu::concatDocuRefNotes(docuRefSearch);
}
salesConfirmTmp.InventDimPrint = this.printDimHistory();
Well, AX cannot guess which record you need, there is a helper class CustVendExternalItemDescription to deal with it:
boolean found;
str externalItemId;
...
[found, externalItemId, salesConfirmTmp.ExternalItemText] = CustVendExternalItemDescription::findExternalItemDescription(
ModuleCustVend::Cust,
custConfirmTrans.ItemId,
custConfirmTrans.inventDim(),
custConfirmJour.OrderAccount,
CustTable::find(custConfirmJour.OrderAccount).CustItemGroupId);
The findExternalItemDescription method returns more information than you need here, but you have to define variables to store it anyway.
Well, the steps to solve this problem are fairly easy and i will try to give you a step by step approach how to solve this problem.
1) Are you initialising CustVendExternalItem properly? Make a record of the same and initialise it as Jan has shown above, then debug your code and see if the value is being initialised in your DP class.
2)If your value is being initialised correctly, but it is not showing up in the report design there can be multiple issues such as:
Overlapping of text boxes.
Insufficient space for the given field
Some report parameter/property not being set correctly which causes
your value not to show up on the report.
Check these one by one and you should end up arriving towards a solution

Reading Hadoop SequenceFiles with Hive

I have some mapred data from the Common Crawl that I have stored in a SequenceFile format. I have tried repeatedly to use this data "as is" with Hive so I can query and sample it at various stages. But I always get the following error in my job output:
LazySimpleSerDe: expects either BytesWritable or Text object!
I have even constructed a simpler (and smaller) dataset of [Text, LongWritable] records, but that fails as well. If I output the data to text format and then create a table on that, it works fine:
hive> create external table page_urls_1346823845675
> (pageurl string, xcount bigint)
> location 's3://mybucket/text-parse/1346823845675/';
OK
Time taken: 0.434 seconds
hive> select * from page_urls_1346823845675 limit 10;
OK
http://0-italy.com/tag/package-deals 643 NULL
http://011.hebiichigo.com/d63e83abff92df5f5913827798251276/d1ca3aaf52b41acd68ebb3bf69079bd1.html 9 NULL
http://01fishing.com/fly-fishing-knots/ 3437 NULL
http://01fishing.com/flyin-slab-creek/ 1005 NULL
...
I tried using a custom inputformat:
// My custom input class--very simple
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.SequenceFileInputFormat;
public class UrlXCountDataInputFormat extends
SequenceFileInputFormat<Text, LongWritable> { }
I create the table then with:
create external table page_urls_1346823845675_seq
(pageurl string, xcount bigint)
stored as inputformat 'my.package.io.UrlXCountDataInputFormat'
outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
location 's3://mybucket/seq-parse/1346823845675/';
But I still get the same SerDer error.
I'm sure there's something really basic I'm missing here, but I can't seem to get it right. Additionally, I have to be able to parse the SequenceFiles in place (i.e. I can't convert my data to text). So I need to figure out the SequenceFile approach for future portions of my project.
Solution:
As #mark-grover pointed out below, the issue is that Hive ignores the key by default. With only one column (i.e. just the value), the serder was unable to map my second column.
The solution was to use a custom InputFormat that was great deal more complex than what I had used originally. I tracked down one answer at link to a Git about using the keys instead of the values, and then I modified it to suit my needs: take the key and value from an internal SequenceFile.Reader and then combining them into the final BytesWritable. I.e. something like this (from the custom Reader, as that's where all the hard work happens):
// I used generics so I can use this all with
// other output files with just a small amount
// of additional code ...
public abstract class HiveKeyValueSequenceFileReader<K,V> implements RecordReader<K, BytesWritable> {
public synchronized boolean next(K key, BytesWritable value) throws IOException {
if (!more) return false;
long pos = in.getPosition();
V trueValue = (V) ReflectionUtils.newInstance(in.getValueClass(), conf);
boolean remaining = in.next((Writable)key, (Writable)trueValue);
if (remaining) combineKeyValue(key, trueValue, value);
if (pos >= end && in.syncSeen()) {
more = false;
} else {
more = remaining;
}
return more;
}
protected abstract void combineKeyValue(K key, V trueValue, BytesWritable newValue);
}
// from my final implementation
public class UrlXCountDataReader extends HiveKeyValueSequenceFileReader<Text,LongWritable>
#Override
protected void combineKeyValue(Text key, LongWritable trueValue, BytesWritable newValue) {
// TODO I think we need to use straight bytes--I'm not sure this works?
StringBuilder builder = new StringBuilder();
builder.append(key);
builder.append('\001');
builder.append(trueValue.get());
newValue.set(new BytesWritable(builder.toString().getBytes()) );
}
}
With that, I get all my columns!
http://0-italy.com/tag/package-deals 643
http://011.hebiichigo.com/d63e83abff92df5f5913827798251276/d1ca3aaf52b41acd68ebb3bf69079bd1.html 9
http://01fishing.com/fly-fishing-knots/ 3437
http://01fishing.com/flyin-slab-creek/ 1005
http://01fishing.com/pflueger-1195x-automatic-fly-reels/ 1999
Not sure if this is impacting you but Hive ignores keys when reading SequenceFiles. You may need to create a custom InputFormat (unless you can find one online:-))
Reference: http://mail-archives.apache.org/mod_mbox/hive-user/200910.mbox/%3C5573211B-634D-4BB0-9123-E389D90A786C#metaweb.com%3E

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

How do I get a list of fields in a generic sObject?

I'm trying to build a query builder, where the sObject result can contain an indeterminate number of fields. I'm using the result to build a dynamic table, but I can't figure out a way to read the sObject for a list of fields that were in the query.
I know how to get a list of ALL fields using the getDescribe information, but the query might not contain all of those fields.
Is there a way to do this?
Presumably you're building the query up as a string, since it's dynamic, so couldn't you just loop through the fields in the describe information, and then use .contains() on the query string to see if it was requested? Not crazy elegant, but seems like the simplest solution here.
Taking this further, maybe you have the list of fields selected in a list of strings or similar, and you could just use that list?
Not sure if this is exactly what you were after but something like this?
public list<sObject> Querylist {get; set;}
Define Search String
string QueryString = 'select field1__c, field2__c from Object where';
Add as many of these as you need to build the search if the user searches on these fields
if(searchParameter.field1__c != null && searchParameter.field1__c != '')
{
QueryString += ' field1__c like \'' + searchParameter.field1__c + '%\' and ';
}
if(searchParameter.field2__c != null && searchParameter.field2__c != '')
{
QueryString += ' field2__c like \'' + searchParameter.field2__c + '%\' and ';
}
Remove the last and
QueryString = QueryString.substring(0, (QueryString.length()-4));
QueryString += ' limit 200';
add query to the list
for(Object sObject : database.query(QueryString))
{
Querylist.add(sObject);
}
To get the list of fields in an sObject, you could use a method such as:
public Set<String> getFields(sObject sobj) {
Set<String> fieldSet = new Set<String>();
for (String field : sobj.getSobjectType().getDescribe().fields.getMap().keySet()) {
try {
a.get(field);
fieldSet.add(field);
} catch (Exception e) {
}
}
return fieldSet;
}
You should refactor to bulkily this approach for your context, but it works. Just pass in an sObject and it'll give you back a set of the field names.
I suggest using a list of fields for creating both the query and the table. You can put the list of fields in the result so that it's accesible for anyone using it. Then you can construct the table by using result.getFields() and retrieve the data by using result.getRows().
for (sObject obj : result.getRows()) {
for (String fieldName : result.getFields()) {
table.addCell(obj.get(fieldName));
}
}
If your trying to work with a query that's out of your control, you would have to parse the query to get the list of fields. But I wouldn't suggest trying that. It complicates code in ways that are hard to follow.