Distributed Cache in Pig UDF - apache-pig

Here is my code to Implement a UDF using Distributed Cache Using Pig.
public class Regex extends EvalFunc<Integer> {
static HashMap<String, String> map = new HashMap<String, String>();
public List<String> getCacheFiles() {
Path lookup_file = new Path(
"hdfs://localhost.localdomain:8020/user/cloudera/top");
List<String> list = new ArrayList<String>(1);
list.add(lookup_file + "#id_lookup");
return list;
}
public void VectorizeData() throws IOException {
FileReader fr = new FileReader("./id_lookup");
BufferedReader brd = new BufferedReader(fr);
String line;
while ((line = brd.readLine()) != null) {
String str[] = line.split("#");
map.put(str[0], str[1]);
}
fr.close();
}
#Override
public Integer exec(Tuple input) throws IOException {
// TODO Auto-generated method stub
return map.size();
}
}
Given Below is my Distributed Cache Input File (hdfs://localhost.localdomain:8020/user/cloudera/top)
Impetigo|Streptococcus pyogenes#Impetigo
indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosy
Output I get is
(0)
(0)
(0)
(0)
(0)
This means that my hashmap is empty.
How do i fill my hashmap using Distributed Cache?.

This was because VectorizeData() was not called in the executable.

Related

Trouble in testing with invalid field

public with sharing class SobjectByParams {
public SObject createSObject(String sObjectName, Map<String, String> fields) {
String invalidSObjectError = System.Label.invalid_Sobject_Name;
String invalidFieldError = System.Label.Invalid_Sobject_Field;
SObject newObject;
try {
newObject = (SObject) Type.forName(sObjectName).newInstance();
} catch (NullPointerException ex) {
throw new InvalidTypeNameException(invalidSObjectError);
}
for (String field : fields.keySet()) {
try {
newObject.put(field, fields.get(field));
} catch (SObjectException ex) {
throw new InvalidTypeNameException(invalidFieldError);
}
}
insert newObject;
return newObject;
}
public class InvalidTypeNameException extends Exception {
}
}
#IsTest
public with sharing class SobjectByParamsTest {
SobjectByParams sobjectByParams;
private static final String TestName = 'TestName';
private static final String BCity = 'Lviv';
private static final String LastName = 'Kapo';
private static final String Email = 'email';
#IsTest
static void createSObject() {
SobjectByParams sobjectByParams = new SobjectByParams();
Map<String, String> fields = new Map<String, String>();
fields.put('BillingCity', BCity);
Test.startTest();
SObject result = sobjectByParams.createSObject(TestName, fields);
Test.stopTest();
System.assertEquals(BCity, result.get(BCity));
}
}
SobjectByParams.InvalidTypeNameException: invalidSobjectNameError - PROBLEM
TEST WORKING on 53.33% but I need min 80%
I don`t know how to fix my problem.
Shouldn't TestName be a sObject name like Contact? Does this code work when you run it normally, does it insert anything or throw? Check/fix that and then you probably need a negative test too.
Make second test method
#isTest
static void testPassingBadData() {
SobjectByParams sobjectByParams = new SobjectByParams();
Map<String, String> fields = new Map<String, String>();
fields.put('BillingCity', BCity);
try{
sobjectByParams.createSObject('NoSuchObjectInTheSystem', fields);
System.assert(false, 'This should have failed and thrown exception');
} catch(Exception e){
System.assert(e.getMessage().contains(Label.Invalid_Sobject_Field));
}
}

Write Status of test case with Hash Map and Selenium

I am using Hash Map to read the get excel data and use them in methods to perform If...else validations.
I am using class file for initializing the Hash Map for reading the data. it goes as shown below
public class SampleDataset {
public static HashMap<String, ArrayList<String>> main() throws IOException {
final String DatasetSheet = "src/test/resources/SampleDataSet.xlsx";
final String DatasetTab = "TestCase";
Object[][] ab = DataLoader.ReadMyExcelData(DatasetSheet, DatasetTab);
int rowcount = DataLoader.myrowCount(DatasetSheet, DatasetTab);
int colcount = DataLoader.mycolCount(DatasetSheet, DatasetTab);
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
// i = 2 to avoid column names
for (int i = 2; i < rowcount;) {
ArrayList<String> mycolvalueslist = new ArrayList<String>();
for (int j = 0; j < colcount;) {
mycolvalueslist.add(ab[i][j].toString());
j++;
}
map.put(ab[i][0].toString(), mycolvalueslist);
i++;
}
return map;
}
I am using this map in my testcase file which is as shown below
#Test //Testcase
public void testThis() throws Exception {
try {
launchMainApplication();
TestMain MainPage = new TestMain(tool, test, user, application);
HashMap<String, ArrayList<String>> win = SampleDataset.main();
SortedSet<String> keys = new TreeSet<>(win.keySet());
for (String i : keys) {
System.out.println("########### Test = " + win.get(i).get(0) + " ###########");
MainPage.step01(win.get(i).get(1));
MainPage.step02(win.get(i).get(2));
}
test.setResult("pass");
} catch (AlreadyRunException e) {
} catch (Exception e) {
verificationErrors.append(e.getMessage());
throw e;
}
}
#Override
#After
public void tearDown() throws Exception {
super.tearDown();
}
I want to write the status as PASS or FAIL for all the testcase initiated through above FOR-LOOP on to same excel by creating new column as Status for each row of test case
my excel sheet is as shown below
Cereate global List. After every test case add status result in the list.
After all test cases are finished, iterate trough te list and updete your excell file. In this way.
public static void main(String[] args) throws EncryptedDocumentException, IOException {
// Step 1: load your excel file as a Workbook
String excelFilePath = "D:\\Desktop\\testExcel.xlsx";
Workbook workbook = WorkbookFactory.create(new FileInputStream(excelFilePath));
// Step 2: modify your Workbook as you prefer
Iterator<Sheet> sheetIterator = workbook.sheetIterator(); // Getting an iterator for all the sheets
while (sheetIterator.hasNext()) {
Iterator<Row> rowIterator = sheetIterator.next().rowIterator(); // Getting an iterator for all the rows (of current sheet)
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// Put here your internal logic to understand if the row needs some changes!
int cellsn = row.getLastCellNum();
row.getCell(cellsn).setCellValue("String that you gat from List = list.get(rownumber)")
}
}
You may need Apache POI.
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.00</version>
</dependency>

Create and query a binary cache in ignite

I am trying to use BinaryObjects to create the cache at runtime. For example, instead of writing a pojo class such as Employee and configuring it as a cache value type, I need to be able to dynamically configure the cache with the field names and field types for the particular cache.
Here is some sample code:
public class EmployeeQuery {
public static void main(String[] args) throws Exception {
Ignition.setClientMode(true);
try (Ignite ignite = Ignition.start("examples/config/example-ignite.xml")) {
if (!ExamplesUtils.hasServerNodes(ignite))
return;
CacheConfiguration<Integer, BinaryObject> cfg = getbinaryCache("emplCache", 1);
ignite.destroyCache(cfg.getName());
try (IgniteCache<Integer, BinaryObject> emplCache = ignite.getOrCreateCache(cfg)) {
SqlFieldsQuery top5Qry = new SqlFieldsQuery("select * from Employee where salary > 500 limit 5", true);
while (true) {
QueryCursor<List<?>> top5qryResult = emplCache.query(top5Qry);
System.out.println(">>> Employees ");
List<List<?>> all = top5qryResult.getAll();
for (List<?> list : all) {
System.out.println("Top 5 query result : "+list.get(0) + " , "+ list.get(1) + " , " + list.get(2));
}
System.out.println("..... ");
Thread.sleep(5000);
}
}
finally {
ignite.destroyCache(cfg.getName());
}
}
}
private static QueryEntity createEmployeeQueryEntity() {
QueryEntity employeeEntity = new QueryEntity();
employeeEntity.setTableName("Employee");
employeeEntity.setValueType(BinaryObject.class.getName());
employeeEntity.setKeyType(Integer.class.getName());
LinkedHashMap<String, String> fields = new LinkedHashMap<>();
fields.put("id", Integer.class.getName());
fields.put("firstName", String.class.getName());
fields.put("lastName", String.class.getName());
fields.put("salary", Float.class.getName());
fields.put("gender", String.class.getName());
employeeEntity.setFields(fields);
employeeEntity.setIndexes(Arrays.asList(
new QueryIndex("id"),
new QueryIndex("firstName"),
new QueryIndex("lastName"),
new QueryIndex("salary"),
new QueryIndex("gender")
));
return employeeEntity;
}
public static CacheConfiguration<Integer, BinaryObject> getbinaryCache(String cacheName, int duration) {
CacheConfiguration<Integer, BinaryObject> cfg = new CacheConfiguration<>(cacheName);
cfg.setCacheMode(CacheMode.PARTITIONED);
cfg.setName(cacheName);
cfg.setStoreKeepBinary(true);
cfg.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cfg.setIndexedTypes(Integer.class, BinaryObject.class);
cfg.setExpiryPolicyFactory(FactoryBuilder.factoryOf(new CreatedExpiryPolicy(new Duration(SECONDS, duration))));
cfg.setQueryEntities(Arrays.asList(createEmployeeQueryEntity()));
return cfg;
}
}
I am trying to configure the cache with the employeeId (Integer) as key and the whole employee record (BinaryObject) as value. When I run the above class, I get the following exception :
Caused by: org.h2.jdbc.JdbcSQLException: Table "EMPLOYEE" not found; SQL statement:
select * from "emplCache".Employee where salary > 500 limit 5
What am I doing wrong here? Is there anything more other than this line:
employeeEntity.setTableName("Employee");
Next, I am trying to stream data into the cache. Is this the right way to do it?
public class CsvStreamer {
public static void main(String[] args) throws IOException {
Ignition.setClientMode(true);
try (Ignite ignite = Ignition.start("examples/config/example-ignite.xml")) {
if (!ExamplesUtils.hasServerNodes(ignite))
return;
CacheConfiguration<Integer, BinaryObject> cfg = EmployeeQuery.getbinaryCache("emplCache", 1);
try (IgniteDataStreamer<Integer, BinaryObject> stmr = ignite.dataStreamer(cfg.getName())) {
while (true) {
InputStream in = new FileInputStream(new File(args[0]));
try (LineNumberReader rdr = new LineNumberReader(new InputStreamReader(in))) {
int count =0;
for (String line = rdr.readLine(); line != null; line = rdr.readLine()) {
String[] words = line.split(",");
BinaryObject emp = getBinaryObject(words);
stmr.addData(new Integer(words[0]), emp);
System.out.println("Sent data "+count++ +" , sal : "+words[6]);
}
}
}
}
}
}
private static BinaryObject getBinaryObject(String[] rawData) {
BinaryObjectBuilder builder = Ignition.ignite().binary().builder("Employee");
builder.setField("id", new Integer(rawData[0]));
builder.setField("firstName", rawData[1]);
builder.setField("lastName", rawData[2]);
builder.setField("salary", new Float(rawData[6]));
builder.setField("gender", rawData[4]);
BinaryObject binaryObj = builder.build();
return binaryObj;
}
}
Note: I am running this in cluster mode. Both EmployeeQuery and CsvStreamer I run from one machine, and I have ignite running in server mode in two other machines. Ideally I want to avoid the use of a pojo class in my application and make things as dynamic and generic as possible.
You are getting this exception because you didn't configure SQL scheme. In your case (you don't want to create pojo object and etc) I recommend to use SQL like syntacsis which was added to Apache Ignite since 2.0 version. I sure that the following example helps you with configuration: https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/datagrid/CacheQueryDdlExample.java

How to use IBM MobileFirst java adapter to update existing entity?

The jax-rs resource method can get JSON which is part of document.
My issue is that I have to update existing object (entity). So I decided on creating jax-rs ContainerRequestFilter. This filter has to get existing object, replace its properties with new one and put it back to stream. Therefore I hope that I get entity completely in my resource method.
At first I has to get data for authenticated user. But 'securityContext.getAuthenticatedUser()' returns partially provided JSON data?
Is there any possibility to get authenticated user data in jax-rs filter (on ibm MobileFirst platform)?
There is the code of my filter:
#Provider
//#ManagedBean
public class UpdateFilter implements ContainerRequestFilter {
//ReaderInterceptor {
//#Inject
//ExistingObjectDao existingObjectDao;
#Context
AdapterSecurityContext securityContext;
#Override
#OAuthSecurity(scope = "protected") //doesn't work
public void filter(ContainerRequestContext context) throws IOException {
//context.getSecurityContext().getUserPrincipal() // is null
AuthenticatedUser user = securityContext.getAuthenticatedUser(); //is null
Map<String, String> authParams = (Map<String, String>) user.getAttributes().get("lotusCredentials");
InputStream inputStream = context.getEntityStream();
byte[] bytes = new byte[inputStream.available()];
inputStream.read(bytes);
String responseContent = new String(bytes);
String id = context.getUriInfo().getPathParameters().getFirst("id");
Object existingObject = null;
try {
existingObject = existingObjectDao.get(id, authParams);
} catch (Exception e) {
e.printStackTrace();
}
if (existingObject != null) {
ObjectMapper objectMapper = new ObjectMapper();
ObjectReader reader = objectMapper.readerForUpdating(existingObject );
JsonNode r = reader.readTree(responseContent);
responseContent = objectMapper.writer().writeValueAsString(r);
}
context.setEntityStream(new ByteArrayInputStream(responseContent.getBytes()));
}
}

How To update google-cloud-dataflow running in app engine without clearing bigquery tables

I have a google-cloud-dataflow process running on the App-engine.
It listens to messages sent via pubsub and streams to big-query.
I updated my code and I am trying to rerun the app.
But I receive this error:
Exception in thread "main" java.lang.IllegalArgumentException: BigQuery table is not empty
Is there anyway to update data flow without deleting the table?
Since my code might change quite often, and I do not want to delete data in the table.
Here is my code:
public class MyPipline {
private static final Logger LOG = LoggerFactory.getLogger(BotPipline.class);
private static String name;
public static void main(String[] args) {
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("a").setType("string"));
fields.add(new TableFieldSchema().setName("b").setType("string"));
fields.add(new TableFieldSchema().setName("c").setType("string"));
TableSchema tableSchema = new TableSchema().setFields(fields);
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("my-data-analysis");
options.setStagingLocation("gs://my-bucket/dataflow-jars");
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
PCollection<String> input = pipeline
.apply(PubsubIO.Read.subscription(
"projects/my-data-analysis/subscriptions/myDataflowSub"));
input.apply(ParDo.of(new DoFn<String, Void>() {
#Override
public void processElement(DoFn<String, Void>.ProcessContext c) throws Exception {
LOG.info("json" + c.element());
}
}));
String fileName = UUID.randomUUID().toString().replaceAll("-", "");
input.apply(ParDo.of(new DoFn<String, String>() {
#Override
public void processElement(DoFn<String, String>.ProcessContext c) throws Exception {
JSONObject firstJSONObject = new JSONObject(c.element());
firstJSONObject.put("a", firstJSONObject.get("a").toString()+ "1000");
c.output(firstJSONObject.toString());
}
}).named("update json")).apply(ParDo.of(new DoFn<String, TableRow>() {
#Override
public void processElement(DoFn<String, TableRow>.ProcessContext c) throws Exception {
JSONObject json = new JSONObject(c.element());
TableRow row = new TableRow().set("a", json.get("a")).set("b", json.get("b")).set("c", json.get("c"));
c.output(row);
}
}).named("convert json to table row"))
.apply(BigQueryIO.Write.to("my-data-analysis:mydataset.mytable").withSchema(tableSchema)
);
pipeline.run();
}
}
You need to specify withWriteDisposition on your BigQueryIO.Write - see documentation of the method and of its argument. Depending on your requirements, you need either WRITE_TRUNCATE or WRITE_APPEND.