How To update google-cloud-dataflow running in app engine without clearing bigquery tables - google-bigquery

I have a google-cloud-dataflow process running on the App-engine.
It listens to messages sent via pubsub and streams to big-query.
I updated my code and I am trying to rerun the app.
But I receive this error:
Exception in thread "main" java.lang.IllegalArgumentException: BigQuery table is not empty
Is there anyway to update data flow without deleting the table?
Since my code might change quite often, and I do not want to delete data in the table.
Here is my code:
public class MyPipline {
private static final Logger LOG = LoggerFactory.getLogger(BotPipline.class);
private static String name;
public static void main(String[] args) {
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("a").setType("string"));
fields.add(new TableFieldSchema().setName("b").setType("string"));
fields.add(new TableFieldSchema().setName("c").setType("string"));
TableSchema tableSchema = new TableSchema().setFields(fields);
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("my-data-analysis");
options.setStagingLocation("gs://my-bucket/dataflow-jars");
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
PCollection<String> input = pipeline
.apply(PubsubIO.Read.subscription(
"projects/my-data-analysis/subscriptions/myDataflowSub"));
input.apply(ParDo.of(new DoFn<String, Void>() {
#Override
public void processElement(DoFn<String, Void>.ProcessContext c) throws Exception {
LOG.info("json" + c.element());
}
}));
String fileName = UUID.randomUUID().toString().replaceAll("-", "");
input.apply(ParDo.of(new DoFn<String, String>() {
#Override
public void processElement(DoFn<String, String>.ProcessContext c) throws Exception {
JSONObject firstJSONObject = new JSONObject(c.element());
firstJSONObject.put("a", firstJSONObject.get("a").toString()+ "1000");
c.output(firstJSONObject.toString());
}
}).named("update json")).apply(ParDo.of(new DoFn<String, TableRow>() {
#Override
public void processElement(DoFn<String, TableRow>.ProcessContext c) throws Exception {
JSONObject json = new JSONObject(c.element());
TableRow row = new TableRow().set("a", json.get("a")).set("b", json.get("b")).set("c", json.get("c"));
c.output(row);
}
}).named("convert json to table row"))
.apply(BigQueryIO.Write.to("my-data-analysis:mydataset.mytable").withSchema(tableSchema)
);
pipeline.run();
}
}

You need to specify withWriteDisposition on your BigQueryIO.Write - see documentation of the method and of its argument. Depending on your requirements, you need either WRITE_TRUNCATE or WRITE_APPEND.

Related

How to use IBM MobileFirst java adapter to update existing entity?

The jax-rs resource method can get JSON which is part of document.
My issue is that I have to update existing object (entity). So I decided on creating jax-rs ContainerRequestFilter. This filter has to get existing object, replace its properties with new one and put it back to stream. Therefore I hope that I get entity completely in my resource method.
At first I has to get data for authenticated user. But 'securityContext.getAuthenticatedUser()' returns partially provided JSON data?
Is there any possibility to get authenticated user data in jax-rs filter (on ibm MobileFirst platform)?
There is the code of my filter:
#Provider
//#ManagedBean
public class UpdateFilter implements ContainerRequestFilter {
//ReaderInterceptor {
//#Inject
//ExistingObjectDao existingObjectDao;
#Context
AdapterSecurityContext securityContext;
#Override
#OAuthSecurity(scope = "protected") //doesn't work
public void filter(ContainerRequestContext context) throws IOException {
//context.getSecurityContext().getUserPrincipal() // is null
AuthenticatedUser user = securityContext.getAuthenticatedUser(); //is null
Map<String, String> authParams = (Map<String, String>) user.getAttributes().get("lotusCredentials");
InputStream inputStream = context.getEntityStream();
byte[] bytes = new byte[inputStream.available()];
inputStream.read(bytes);
String responseContent = new String(bytes);
String id = context.getUriInfo().getPathParameters().getFirst("id");
Object existingObject = null;
try {
existingObject = existingObjectDao.get(id, authParams);
} catch (Exception e) {
e.printStackTrace();
}
if (existingObject != null) {
ObjectMapper objectMapper = new ObjectMapper();
ObjectReader reader = objectMapper.readerForUpdating(existingObject );
JsonNode r = reader.readTree(responseContent);
responseContent = objectMapper.writer().writeValueAsString(r);
}
context.setEntityStream(new ByteArrayInputStream(responseContent.getBytes()));
}
}

Writing to BigQuery from Cloud Dataflow: Unable to create a side-input view from input

I'm trying to write a datastore flow that reads in a stream for pub sub and writes in into big query.
When trying to run the tool I get the error " Unable to create a side-input view from input" with the stack trace:
Exception in thread "main" java.lang.IllegalStateException: Unable to create a side-input view from input
at com.google.cloud.dataflow.sdk.transforms.View$AsIterable.validate(View.java:277)
at com.google.cloud.dataflow.sdk.transforms.View$AsIterable.validate(View.java:268)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:366)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
at com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
at com.google.cloud.dataflow.sdk.io.Write$Bound.createWrite(Write.java:214)
at com.google.cloud.dataflow.sdk.io.Write$Bound.apply(Write.java:79)
at com.google.cloud.dataflow.sdk.io.Write$Bound.apply(Write.java:68)
at com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:74)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.apply(DirectPipelineRunner.java:247)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:367)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:290)
at com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:174)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:1738)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:1440)
at com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:74)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.apply(DirectPipelineRunner.java:247)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:367)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
at com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
at co.uk.bubblestudent.dataflow.StarterPipeline.main(StarterPipeline.java:116)
Caused by: java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey.
at com.google.cloud.dataflow.sdk.transforms.GroupByKey.applicableTo(GroupByKey.java:192)
at com.google.cloud.dataflow.sdk.transforms.View$AsIterable.validate(View.java:275)
... 20 more
My code is:
public class StarterPipeline {
public static final Duration ONE_DAY = Duration.standardDays(1);
public static final Duration ONE_HOUR = Duration.standardHours(1);
public static final Duration TEN_SECONDS = Duration.standardSeconds(10);
private static final Logger LOG = LoggerFactory.getLogger(StarterPipeline.class);
private static TableSchema schemaGen() {
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("facebookID").setType("STRING"));
fields.add(new TableFieldSchema().setName("propertyID").setType("STRING"));
fields.add(new TableFieldSchema().setName("time").setType("TIMESTAMP"));
TableSchema schema = new TableSchema().setFields(fields);
return schema;
}
public static void main(String[] args) {
LOG.info("Starting");
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
LOG.info("Pipeline made");
// For Cloud execution, set the Cloud Platform project, staging location,
// and specify DataflowPipelineRunner or BlockingDataflowPipelineRunner.
options.setProject(<project>);
options.setStagingLocation(<bucket>);
options.setTempLocation(<bucket>);
Pipeline p = Pipeline.create(options);
TableSchema schema = schemaGen();
LOG.info("Schema made");
try {
LOG.info(schema.toPrettyString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
PCollection<String> input = p.apply(PubsubIO.Read.named("ReadFromPubsub").subscription(<subscription>));
PCollection<TableRow> pardo = input.apply(ParDo.of(new FormatAsTableRowFn()));
LOG.info("Formatted Row");
pardo.apply(BigQueryIO.Write.named("Write into BigQuery").to(<table>)
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
LOG.info("about to run");
p.run();
}
static class FormatAsTableRowFn extends DoFn<String, TableRow> {
#Override
public void processElement(ProcessContext c) {
LOG.info("Formatting");
String json = c.element();
//HashMap<String,String> items = new Gson().fromJson(json, new TypeToken<HashMap<String, String>>(){}.getType());
// Make a BigQuery row from the JSON object:
TableRow row = new TableRow()
.set("facebookID","324234")
.set("properttyID", "23423")
.set("time", "12312313123");
/*
* TableRow row = new TableRow()
.set("facebookID", items.get("facbookID"))
.set("properttyID", items.get("propertyID"))
.set("time", items.get("time"));
*/
c.output(row);
}
}
}
Any suggestions on what this might be?
The default implementation of BigQueryIO only works over bounded PCollections, and PubsubIO.Read produces an unbounded PCollection.
There are two ways to fix this: you can bound the input by calling maxReadTime or maxNumElements on your PubsubIO transform, or you can use the streaming insert type of BigQueryIO by calling setStreaming(true) on your options.

Spring jmsTemplate send Unit testing doen't work

My service method looks like below, I am trying to mock JmsTemplate so that it can send message during unit testing, but it doesn't execute jmsTemplate.send(...), it directly goes to next line, How can i execute jmsTemplate.send(..) part of code of my service class using unit testing?
public int invokeCallbackListener(final MyObject payload, final MyTask task) throws Exception{
//create map of payload and taskId
int taskStatusCd = task.getTaskSatus().getCode();
final Map<String, Object> map = new HashMap<String, Object>();
map.put(PAYLOAD_KEY, payload);
map.put(TASK_ID_KEY, task.getTaskId());
//generate JMSCorrelationID
final String correlationId = UUID.randomUUID().toString();
String requestQueue = System.getProperty("REQUEST_QUEUE");
requestQueue = requestQueue!=null?requestQueue:ExportConstants.DEFAULT_REQUEST_QUEUE;
jmsTemplate.send(requestQueue, new MessageCreator() {
#Override
public Message createMessage(Session session) throws JMSException {
***ObjectMessage message = session.createObjectMessage((Serializable)map)***; //fail here. Message returns null
message.setJMSCorrelationID(correlationId);
message.setStringProperty(MESSAGE_TYPE_PROPERTY,payload.getMessageType().getMessageType());
return message;
}
});
l.info("Request Message sent with correlationID: " + correlationId);
taskStatusCd = waitForResponseStatus(task.TaskId(), taskStatusCd, correlationId);
return taskStatusCd;
}
This is my test class code.
RemoteInvocationService remoteInvocationService;
JmsTemplate mockTemplate;
Session mockSession;
Queue mockQueue;
ObjectMessage mockMessage;
MessageCreator mockmessageCreator;
#Before
public void setUp() throws Exception {
remoteInvocationService = new RemoteInvocationService();
mockTemplate = mock(JmsTemplate.class);
mockSession = mock(Session.class);
mockQueue = mock(Queue.class);
mockMessage = mock(ObjectMessage.class);
mockmessageCreator = mock(MessageCreator.class);
when(mockSession.createObjectMessage()).thenReturn(mockMessage);
when(mockQueue.toString()).thenReturn("testQueue");
Mockito.doAnswer(new Answer<Message>() {
#Override
public Message answer(final InvocationOnMock invocation) throws JMSException {
final Object[] args = invocation.getArguments();
final String arg2 = (String)args[0];
final MessageCreator arg = (MessageCreator)args[1];
return arg.createMessage(mockSession);
}
}).when(mockTemplate).send(Mockito.any(MessageCreator.class));
mockTemplate.setDefaultDestination(mockQueue);
remoteInvocationService.setJmsTemplate(mockTemplate);
}
#Test
public void testMessage() throws Exception{
MyTask task = new MyTask();
task.setTaskSatus(Status.Pending);
remoteInvocationService.invokeCallbackListener(new MyObject(), task);
}
I have below code which receives message but, I am getting status object null.
Message receivedMsg = jmsTemplate.receiveSelected(responseQueue, messageSelector);if(receivedMsg instanceof TextMessage){
TextMessage status = (TextMessage) receivedMsg;
l.info(status.getText());}
below test code:
TextMessage mockTextMessage;
when(mockSession.createTextMessage()).thenReturn(mockTextMessage);
mockTextMessage.setText("5");
when(mockTemplate.receiveSelected(Mockito.any(String.class), Mockito.any(String.class))).thenReturn(mockTextMessage)
You are mocking the send method that accepts only one parameter (MessageCreator), but you are actually calling the one that accepts two (String, MessageCreator).
Add the String to your mock:
Mockito.doAnswer(new Answer<Message>() {
#Override
public Message answer(final InvocationOnMock invocation) throws JMSException {
final Object[] args = invocation.getArguments();
final MessageCreator arg = (MessageCreator)args[0];
return arg.createMessage(mockSession);
}
}).when(mockTemplate).send(Mockito.any(String.class), Mockito.any(MessageCreator.class));
There is another mistake when mocking the sesssion. You are mocking the method without parameterers:
when(mockSession.createObjectMessage()).thenReturn(mockMessage);
but you actually need to mock the one with the Serializable param:
when(mockSession.createObjectMessage(Mockito.any(Serializable.class)).thenReturn(mockMessage);

Distributed Cache in Pig UDF

Here is my code to Implement a UDF using Distributed Cache Using Pig.
public class Regex extends EvalFunc<Integer> {
static HashMap<String, String> map = new HashMap<String, String>();
public List<String> getCacheFiles() {
Path lookup_file = new Path(
"hdfs://localhost.localdomain:8020/user/cloudera/top");
List<String> list = new ArrayList<String>(1);
list.add(lookup_file + "#id_lookup");
return list;
}
public void VectorizeData() throws IOException {
FileReader fr = new FileReader("./id_lookup");
BufferedReader brd = new BufferedReader(fr);
String line;
while ((line = brd.readLine()) != null) {
String str[] = line.split("#");
map.put(str[0], str[1]);
}
fr.close();
}
#Override
public Integer exec(Tuple input) throws IOException {
// TODO Auto-generated method stub
return map.size();
}
}
Given Below is my Distributed Cache Input File (hdfs://localhost.localdomain:8020/user/cloudera/top)
Impetigo|Streptococcus pyogenes#Impetigo
indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosy
Output I get is
(0)
(0)
(0)
(0)
(0)
This means that my hashmap is empty.
How do i fill my hashmap using Distributed Cache?.
This was because VectorizeData() was not called in the executable.

Implementation of simple Java IDE using Runtime Process and JTextArea

I am developing a simple Java IDE like Netbeans/Eclipse. My GUI includes two JTextArea component, one used as a TextEditor where the end user can type in his programs and the other used as an output window.
I am running the users programs by invoking the windows command prompt through Java Runtime and Process classes. I am also catching the IO streams of the process using the methods getInputStream(), getErrorStream(), getOutputStream().
If the program contains only the statements to print something onto the screen, I am able to display the output on the output window(JTextArea). But if it includes statements to read input from the user, then it must be possible for the user to type the expected input value via the output window and it must be sent to the process just as in Netbeans/Eclipse.
I also checked the following link
java: work with stdin/stdout of process in same time
Using this code, I am able to display only the statements waiting for input and not simple output statements. Also, only a single line is displayed on the output window at a time.
It would be great if anybody can help me to resolve this issue.
Thanks
Haleema
I've found the solution with little modification to the earlier post java: work with stdin/stdout of process in same time
class RunFile implements Runnable{
public Thread program = null;
public Process process = null;
private JTextArea console;
private String fn;
public RunFile(JTextArea cons,String filename){
console = cons;
fn=filename;
program = new Thread(this);
program.start();
}
#Override
public void run() {
try {
String commandj[] = new String[4];
commandj[0] = "cmd";
commandj[1]="/C";
commandj[2]="java";
commandj[3] = fn;
String envp[] = new String[1];
envp[0]="path=C:/Program Files (x86)/Java/jdk1.6.0/bin";
File dir = new File("Path to File");
Runtime rt = Runtime.getRuntime();
process = rt.exec(commandj,envp,dir);
ReadStdout read = new ReadStdout(process,console);
WriteStdin write = new WriteStdin(process, console);
int x=process.waitFor();
console.append("\nExit value: " + process.exitValue() + "\n");
}
catch (InterruptedException e) {}
catch (IOException e1) {}
}
}
class WriteStdin implements Runnable{
private Process process = null;
private JTextArea console = null;
public Thread write = null;
private String input = null;
private BufferedWriter writer = null;
public WriteStdin(Process p, JTextArea t){
process = p;
console = t;
writer = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
write = new Thread(this);
write.start();
console.addKeyListener(new java.awt.event.KeyAdapter() {
#Override
public void keyTyped(java.awt.event.KeyEvent e){
//save the last lines for console to variable input
if(e.getKeyChar() == '\n'){
try {
int line = console.getLineCount() -2;
int start = console.getLineStartOffset(line);
int end = console.getLineEndOffset(line);
input = console.getText(start, end - start);
write.resume();
} catch (BadLocationException e1) {}
}
}
});
console.addCaretListener(new javax.swing.event.CaretListener() {
#Override
public void caretUpdate(CaretEvent e) {
console.setCaretPosition(console.getDocument().getLength());
throw new UnsupportedOperationException("Not supported yet.");
}
});
console.addFocusListener(new java.awt.event.FocusAdapter() {
#Override
public void focusGained(java.awt.event.FocusEvent e)
{
console.setCaretPosition(console.getDocument().getLength());
}
});
}
#Override
public void run(){
write.suspend();
while(true){
try {
//send variable input in stdin of process
writer.write(input);
writer.flush();
} catch (IOException e) {}
write.suspend();
}
}
}
class ReadStdout implements Runnable{
public Thread read = null;
private BufferedReader reader = null;
private Process process = null;
private JTextArea console = null;
public ReadStdout(Process p,JTextArea t){
process = p;
reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
console = t;
read = new Thread(this);
read.start();
}
public void run() {
String line;
try {
while((line = reader.readLine())!=null)
console.append(line+"\n");
}catch (IOException e) {}
}
}