BigTable Scan withStopRow inclusive issue - bigtable

Maybe I am mistaken, but it seems that the "inclusive" boolean on the scan method is not working. Below, I would expect the scan to include "row3" because I have withStopRow("row3".getBytes(), true)), however, I only scan into row2.
Output Received:
Row: row1
Row: row2
pom below...
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase-1.x-hadoop</artifactId>
<version>1.4.0</version>
</dependency>
Java Code:
Table table = conn.getTable(TABLE_NAME);
table.put(new Put("row1".getBytes()).addColumn("fam".getBytes(), "qual".getBytes(), "val".getBytes()));
table.put(new Put("row2".getBytes()).addColumn("fam".getBytes(), "qual".getBytes(), "val".getBytes()));
table.put(new Put("row3".getBytes()).addColumn("fam".getBytes(), "qual".getBytes(), "val".getBytes()));
for (Result r : table.getScanner(new Scan().withStartRow("row1".getBytes()).withStopRow("row3".getBytes(), true))) {
System.out.println(("Row: " + new String(r.getRow())));
}

Bug acknowledged, ref comment above. Issue created in GitHub
https://github.com/GoogleCloudPlatform/cloud-bigtable-client/issues/1850

Related

Scope DataFrame transformations in Spark

I need to transform some DataFrame rows for which specific flag is set and leave all other rows untouched.
df.withColumn("a", when($"flag".isNotNull, lit(1)).otherwise($"a"))
.withColumn("b", when($"flag".isNotNull, "$b" + 1).otherwise($"b"))
.withColumn("c", when($"flag".isNotNull, concat($"c", "++")).otherwise($"c"))
There might be more columns like that and I am looking for a way to refactor this into something nicer.
I thought about:
df.filter($"flag".isNotNull)
.withColumn("a", lit(1))
.withColumn("b", $"b" + 1)
.withColumn("c", concat($"c", "++"))
.union(df.filter($"flag".isNull))
but it scans/recalculates df twice. Even if I cache it, the plan contains lineage of each branch separately - and I actually chain multiple similar transformations, so final plan explodes expotentially and crashes.
Would it be possible to implement something like:
df.withScope($"flag".isNotNull) { scoped =>
scoped.withColumn("a", lit(1))
.withColumn("b", $"b" + 1)
.withColumn("c", concat($"c", "++"))
}
Using when expressions is ok. You can write something like this:
val updates = Map(
"a" -> lit(1),
"b" -> $"b" + 1,
"c" -> concat($"c", "++")
)
val df2 = updates.foldLeft(df) { case (acc, (c, v)) =>
acc.withColumn(c, when($"flag".isNotNull, v).otherwise(col(c)))
}

How to efficiently paste many variables into a sql query (Rshiny)

I'm building a shiny app where the user could update a table in a database by editing a selected row in a DT:table.
The problem is that process can be time-consuming when the dt:table has many columns (let's say 25 for instance). So I was wondering if there was a nice and efficient way to link my "vals" variables in the query below with the dataframe columns ?
The code below is working but since my DT:table has more than 60 columns I really cannot stick to this solution... :(
selected_row <- donnees[input$dt_rows_selected,]
query <- glue_sql('UPDATE myschema.mytable SET field1= ({vals*}), field2= ({vals2*}), field3 = ({vals3*}), field4= ({vals4*}), field5= ({vals5*}) WHERE id IN ({ID_field*});',
vals = selected_row$column1, vals2 = selected_row$column2, vals3= selected_row$column3, vals4= selected_row$column4, vals5= selected_row$column5, ID_field= selected_row$ID, .con = pool)
DBI::dbExecute(pool2, query)
The purpose of this answer is two-fold:
Demonstrate the (a?) proper postgres-style upsert action. I present a pg_upsert function, and in that function I've included (prefixed with #'#) what the query looks like when finished. The query is formed dynamically, so does not need a priori knowledge of the fields other than the user-provided idfields= argument.
Demonstrate how to react to DT-edits using this function. This is one way and there are definitely other ways to formulate how to deal with the reactive DT. If you have a different style for keeping track of changes in the DT, then feel free to take pg_upsert and run with it!
Notes:
it does not update the database with each cell edit, the changes are "batched" until the user clicks the Upsert! button; it is feasible to change to "upsert on each cell", but that would be a relatively trivial query, no need for upserts
since you're using postgres, the target table must have one or more unique indices (see No unique or exclusion constraint matching the ON CONFLICT); I'll create the sample data and the index on said table; if you don't understand what this means and your data doesn't have a clear "id" field(s), then do what I did: add an id column (both locally and in the db) that sequences along your real rows (this won't work if your data is preexisting and has no id fields)
the id field(s) must not be editable, so the editable= part of DT disables changing that column; I included a query (found in https://stackoverflow.com/a/2213199/3358272) that will tell you these fields programmatically; if this returns nothing, then go back to the previous bullet and fix it
the pg_upsert function takes a few steps to ensure things are clean (i.e., checks for duplicate ids), but does not check for incorrect new-values (DT does some of this for you, by class I believe), I'll assume you are verifying what you need before sending for an upsert;
the return value from pg_upsert is logical, indicating that the upsert action updated as many rows as we expected; this might be overly aggressive, though I cannot think of an example when it would correctly return other than nrow(value); caveat emptor
I include an optional "dbout" table in the shiny layout solely to show the current state of the database data, updated every time pg_upsert is called (indirectly); if no changes have been made, it will still query to show the current state, and is therefore the best way to show the starting condition for your testing; again, it is optional. When you remove it (and you should) and nothing else uses the do_update() reactive, then change
do_update <- eventReactive(input$upbtn, ...)
output$dbout <- renderTable({ do_update(); ... })
to
observeEvent(input$upbtn, ...)
# output$dbout <- renderTable({ do_update(); ... })
(Otherwise, a reactive(.) block that is never used downstream will never fire, so your updates would not happen.)
This app queries the database for all values (into curdata), this is likely already being done in your case. This app also finds (programmatically) the required indices. If you know ahead of time what these are, feel free to drop the query that feeds idfields and just assign it directly (case-sensitive).
When the app exits, the user-edited data is not stored in the local R console/environment, all changes are stored in the database. It's my assumption that this will be formalized into a shiny-server, RStudio Connect, or similar production server, in which case "console" has little meaning. If you really need the user-changed data to be available on the local R console while you are developing your app, then in addition to using mydata reactive values, after mydata$data is reassigned you can overwrite curdata <<- mydata$data (note the double < in <<-). I discourage this practice in production but it might be useful while in development.
Here is a setup for sample data. It doesn't matter if you have 6 (as here) or 60 columns, the premise remains. (After this, origdata is not used, it was a throw-away to prep for this answer.)
# pgcon <- DBI::dbConnect(...)
set.seed(42)
origdata <- iris[sample(nrow(iris), 6),]
origdata$id <- seq_len(nrow(origdata))
# setup for this answer
DBI::dbExecute(pgcon, "drop table if exists mydata")
DBI::dbWriteTable(pgcon, "mydata", origdata)
# postgres upserts require 'unique' index on 'id'
DBI::dbExecute(pgcon, "create unique index mydata_id_idx on mydata (id)")
Here is the UPSERT function itself, broken out to facilitate testing, console evaluation, and similar.
#' #param value 'data.frame', values to be updated, does not need to
#' include all columns in the database
#' #param name 'character', the table name to receive the updated
#' values
#' #param idfields 'character', one or more id fields that are present
#' in both the 'value' and the database table, these cannot change
#' #param con database connection object, from [DBI::dbConnect()]
#' #param verbose 'logical', be verbose about operation, default true
#' #return logical, whether 'nrow(value)' rows were affected; if an
#' error occurred, it is messaged to the console and a `FALSE` is
#' returned
pg_upsert <- function(value, name, idfields, con = NULL, verbose = TRUE) {
if (verbose) message(Sys.time(), " upsert ", name, " with ", nrow(value), " rows")
if (any(duplicated(value[idfields]))) {
message("'value' contains duplicates in the idfields, upsert will not work")
return(FALSE)
}
tmptable <- paste(c("uptemp_", name, "_", sample(1e6, size = 1)), collapse = "")
on.exit({
DBI::dbExecute(con, paste("drop table if exists", tmptable))
}, add = TRUE)
DBI::dbWriteTable(con, tmptable, value)
cn <- colnames(value)
quotednms <- DBI::dbQuoteIdentifier(con, cn)
notid <- DBI::dbQuoteIdentifier(con, setdiff(cn, idfields))
qry <- sprintf(
"INSERT INTO %s ( %s )
SELECT %s FROM %s
ON CONFLICT ( %s ) DO
UPDATE SET %s",
name, paste(quotednms, collapse = " , "),
paste(quotednms, collapse = " , "), tmptable,
paste(DBI::dbQuoteIdentifier(con, idfields), collapse = " , "),
paste(paste(notid, paste0("EXCLUDED.", notid), sep = "="), collapse = " , "))
#'# INSERT INTO mydata ( "Sepal.Length" , "Petal.Length" )
#'# SELECT "Sepal.Length" , "Petal.Length" , "id" FROM mydata
#'# ON CONFLICT ( "id" ) DO
#'# UPDATE SET "Sepal.Length"=EXCLUDED."Sepal.Length" , "Petal.Length"=EXCLUDED."Petal.Length"
# dbExecute returns the number of rows affected, this ensures we
# return a logical "yes, all rows were updated" or "no, something
# went wrong"
res <- tryCatch(DBI::dbExecute(con, qry), error = function(e) e)
if (inherits(res, "error")) {
msg <- paste("error upserting data:", conditionMessage(res))
message(Sys.time(), " ", msg)
ret <- FALSE
attr(ret, "error") <- conditionMessage(res)
} else {
ret <- (res == nrow(value))
if (!ret) {
msg <- paste("expecting", nrow(value), "rows updated, returned", res, "rows updated")
message(Sys.time(), " ", msg)
attr(ret, "error") <- msg
}
}
ret
}
Here's the shiny app. When you source this, you can immediately press Upsert! to get the current state of the database table (again, only an option, not required for production), no updated values are needed to requery.
library(shiny)
library(DT)
pgcon <- DBI::dbConnect(...) # fix this incomplete expression
curdata <- DBI::dbGetQuery(pgcon, "select * from mydata order by id")
# if you don't know the idfield(s) offhand, then use this:
idfields <- DBI::dbGetQuery(pgcon, "
select
t.relname as table_name,
i.relname as index_name,
a.attname as column_name
from
pg_class t,
pg_class i,
pg_index ix,
pg_attribute a
where
t.oid = ix.indrelid
and i.oid = ix.indexrelid
and a.attrelid = t.oid
and a.attnum = ANY(ix.indkey)
and t.relkind = 'r'
and t.relname = 'mydata'
order by
t.relname,
i.relname;")
idfieldnums <- which(colnames(curdata) %in% idfields$column_name)
shinyApp(
ui = fluidPage(
DTOutput("tbl"),
actionButton("upbtn", "UPSERT!"),
tableOutput("dbout")
),
server = function(input, output) {
mydata <- reactiveValues(data = curdata, changes = NULL)
output$tbl = renderDT(
mydata$data, options = list(lengthChange = FALSE),
editable = list(target = "cell", disable = list(columns = idfields)))
observeEvent(input$tbl_cell_edit, {
mydata$data <- editData(mydata$data, input$tbl_cell_edit)
mydata$changes <- rbind(
if (!is.null(mydata$changes)) mydata$changes,
input$tbl_cell_edit
)
# keep the most recent change to the same cell
dupes <- rev(duplicated(mydata$changes[rev(seq(nrow(mydata$changes))),c("row","col")]))
mydata$changes <- mydata$changes[!dupes,]
message(Sys.time(), " pending changes: ", nrow(mydata$changes))
})
do_update <- eventReactive(input$upbtn, {
if (isTRUE(nrow(mydata$changes) > 0)) {
# always include the 'id' field(s)
# idcol <- which(colnames(mydata$data) == "id")
updateddata <- mydata$data[ mydata$changes$row, c(mydata$changes$col, idfieldnums) ]
res <- pg_upsert(updateddata, "mydata", idfields = "id", con = pgcon)
# clear the stored changes only if the upsert was successful
if (res) mydata$changes <- mydata$changes[0,]
}
input$upbtn
})
output$dbout <- renderTable({
do_update() # react when changes are attempted, the button is pressed
message(Sys.time(), " query 'mydata'")
DBI::dbGetQuery(pgcon, "select * from mydata order by id")
})
}
)
In action:
(Left) When we start, we see the original DT and no database output.
(Middle) Press the Upsert! button just to query the db and show the optional table.
(Right) Make updates, then press Upsert!, and the database is updated (and the lower table re-queried).

access the index of the item inside reduce function in dataweave 2.0

My DataWeave code looks like below:-
Result: Data reduce (item,ls={}) -> ls ++ From: {dev: item.warehouse}
Is there way to check the index of item object. I need to do conditional based on the index of the item object.
Example:
Item = Data[0] do this ;
Result: Data reduce (item,ls={}) -> ls ++ From: {dev: item.warehouse}
Item = Data[1] do this ;
Result: Data reduce (item,ls={}) -> ls ++ To: {dev: item.warehouse}
Original Code looks like below:
Result : ( Data reduce (
item,ls={}) -> ls ++
From:{id: "111",(if (item.sign == "333") {status: "OPEN"} else if (item.sign == "444") {status: "HOLD"} else {status: item.sign})}
I need to add "From" whenever the index of Item is odd number and add "To" whenever the index of item is even.
Since I don't have the conditional, I am always getting "From"
No you can't access any indexes, here's the documentation of reduce https://docs.mulesoft.com/mule-runtime/4.1/dw-core-functions-reduce
What you can do is count the items yourself by modifying the structure of your accummulator: ls={counter=0,data={}}
Now you can use the counter to add one per iteration and keep track of things: {counter: ls.counter + 1, data: ls.data ++ To: {dev: item.warehouse}}
As you can understand you would need to add a conditional to differentiate between the From and To.
If I have time later on I 'll do it for you, or somebody else can beat me to it.
EDIT: here's the best I can do based upon your question, but you should get the idea:
%dw 2.0
output application/dw
var inputdata = [{warehouse: 100},{warehouse: 56}, {warehouse:1000}]
---
inputdata reduce (
(e, acc={c: 0, data: {From: {}, To: {}}}) ->
{
c: acc.c+1,
data: {
From: if (isEven(acc.c)) (acc.data.From ++ {warehouse: e.warehouse}) else acc.data.From,
To: if (isEven(acc.c)) acc.data.To else (acc.data.To ++ {warehouse: e.warehouse})
}
}
)
Always provide appropriate sample inputs and outputs of your transformation if you want to get the most out of the DW SO community ;)

What is the most optimized way to get a set of rows which is present in the middle of the list in java 8?

I've a list of items. I want to process a set of items which are in the middle of the list.
Ex: Assume a list of employees who have id, first name, last name and middle name as attributes.
I want to consider all rows between lastName "xxx" and "yyy" and process them further.
How can this be optimized in Java8? Optimization is my first concern.
Tried using Java8 streams and parallel streams. But termination(break) is not allowed in foreach loop in Java8 streams. Also we cannot use the outside("start" variable below) variables inside foreach.
Below is the code which I need to optimize:
boolean start = false;
for(Employee employee: employees) {
if(employee.getLastname().equals("yyy")) {
break;
}
if(start) {
// My code to process
}
if(employee.getLastname().equals("xxx")) {
start = true;
}
}
What is the best way to handle the above problem in Java8?
That is possible in java-9 via (I've simplified your example):
Stream.of(1, 2, 3, 4, 5, 6)
.dropWhile(x -> x != 2)
.takeWhile(x -> x != 6)
.skip(1)
.forEach(System.out::println);
This will get the values in the range 2 - 6, that is it will print 3,4,5.
Or for your example:
employees.stream()
.dropWhile(e -> e.getLastname().equals("xxx"))
.takeWhile(e -> e.getLastname().equals("yyy"))
.skip(1)
.forEach(....)
There are back-ports for dropWhile and takeWhile, see here and here
EDIT
Or you can get the indexes of those delimiters first and than do a subList (but this assumes that xxx and yyy are unique in the list of employees):
int[] indexes = IntStream.range(0, employees.size())
.filter(x -> list.get(x).getLastname().equals("xxx") || list.get(x).getLastname().equals("yyy"))
.toArray();
employees.subList(indexes[0] + 1, indexes[1])
.forEach(System.out::println);

Specify local Dynamic in Grid

I would like to update specific parts of a Grid dynamically in different ways. Consider the following toy example: I have two rows: one must be updated one-by-one (a, b, c), as these symbols depend on different triggers; the second row depends on one single trigger (show) that allows displaying/hiding some data.
Now I know that I can wrap the whole Grid structure into Dynamic, and even specify which symbols to track, thus this example does what I want:
Checkbox[Dynamic[show]]
test = {0, 0};
Dynamic[Grid[{{Dynamic#a, Dynamic#b, Dynamic#c},
If[show, Prepend[test, "test:"], {}]}, Frame -> All],
TrackedSymbols :> {show}]
Though for certain reasons I would like to have a locally specified Dynamic, that is only applied to the second row of the Grid.
For those who are wondering what ungodly situation would it be, just imagine the followings: show is used in any of a, b or c, and these I do NOT want to update when show is changing, their changes depend on other triggers. Why not remove then show from the symbols of the first row? Imagine, I can't, as show is present in a function that is used in a, b or c, and this function I cannot access easily.
Of course wrapping the first argument of If into Dynamic won't help here, as the Grid itself or any of its cells won't become dynamic:
Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
If[Dynamic#show, Prepend[test, "test:"], {}]
}, Frame -> All]
Furthermore, wrapping a row into Dynamic makes the given row invalid, as it does not have head List anymore:
Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
Dynamic#If[show, Prepend[test, "test:"], {}]
}, Frame -> All]
Mapping Dynamic over the row does not work either because show is not updated dynamically:
Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
Dynamic /# If[show, Prepend[test, "test:"], {}]
}, Frame -> All]
Also, wrapping Dynamic[If[...]] around list members work, but now I have to evaluate If 3 times instead of just 1.
Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
Dynamic[If[show, #, ""]] & /# Prepend[test, "test:"]
}, Frame -> All]
Would like to know if there is any solution to overcome this particular problem by locally applying a Dynamic wrapper on a row.
Here is a solution using the Experimental ValueFunction
show = True;
test = {0, 0};
Checkbox[Dynamic[show]]
Now write your own little Dynamic update function on the side
Needs["Experimental`"];
row = {};
updateRow[x_, v_] := row = If[v, Prepend[test, "test:"], {}];
ValueFunction[show] = updateRow;
Now make the Grid, and now can use Dynamic on EACH row, not around the whole Grid, which is what you wanted:
Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
{Dynamic#row}
},
Frame -> All
]
ps. I just read a post here by telefunkenvf14 that mentions this package and this function, which I did not know about, and when I saw this function, I remembered this question, and I thought it should be possible to use that function to solve this problem.
ps. I need to work more on placing the grid row correctly....
update(1)
I can't figure how to splice the final row over the columns in the grid. Which is strange, as it has List head, yet it won't go across all the columns. It will only go in the first cell. Tried Sequence, SpanFromLeft, and such, but no luck. May be someone can figure this part out.
Here is my current trial:
Needs["Experimental`"];
row = {};
updateRow[x_, v_] := row = If[v, {"test:", 0, 0}, {}];
ValueFunction[show] = updateRow;
show = False;
Checkbox[Dynamic[show]]
f = Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
List#Dynamic[row]
},
Frame -> All
]
It seems it should be doable. I do not see what is the problem now...
update(2)
As a temporary solution, I split the second row by force before hand. This made it possible to do what I want. Not sure if this meets the OP specifications or not (my guess is that it does not), but here it is:
Needs["Experimental`"];
ra = 0;
rb = 0;
rc = 0;
updateRow[x_, v_] :=
row = If[v, ra = "test:"; rb = 0; rc = 0, ra = ""; rb = ""; rc = ""]
ValueFunction[show] = updateRow;
show = False;
Checkbox[Dynamic[show]]
f = Grid[{
{Dynamic#a, Dynamic#b, Dynamic#c},
{Dynamic#ra, Dynamic#rb, Dynamic#rc}
},
Frame -> All]
This is actually a comment on #Nasser's solution and suggested fix to avoid manual splitting of the second row, but because of space limitations in the comment area, I post it as answer. Will be happy to delete it as soon as Nasser confirms that it works and incorporates it into his answer.
The clue to a solution is found in the Possible Issues section of Item in the documentation:
If Item is not the top-most item in the child of a function that supports Item, it will not work.
I use this to modify #Nasser's solution in the following way. First, I need to change the definition of row so that for both values of show the length of row is the same.
Needs["Experimental`"];
row = {"", "", ""};
updateRow[x_, v_] := row = If[v, Prepend[test, "test:"], {"", "", ""}];
Experimental`ValueFunction[show] = updateRow;
The second change needed is to wrap each element of Dynamic#row with Item:
Grid[{{Dynamic#a, Dynamic#b, Dynamic#c},
{Item[Dynamic#row[[1]]], Item[Dynamic#row[[2]]],
Item[Dynamic#row[[3]]]}}, Frame -> All]
Edit: Item wrapper is not really needed; it works just as well without it:
Grid[{{Dynamic#a, Dynamic#b, Dynamic#c},
{Dynamic#row[[1]], Dynamic#row[[2]],
Dynamic#row[[3]]}}, Frame -> All]