Bulk Delete in Spring Data REST - spring-data-rest

In spring Data REST - is it possible to perform a bulk delete (i.e. by list of ids) without any additional code efforts?
The documentation states that it's possible:
"Methods used for invocation
The following methods are used if present (descending order):
delete(T)
delete(ID)
delete(Iterable)"
However, in RepositoryEntityController I don't see how this can be possible - only a standard delete by id endpoint is exposed.
Am I missing anything or documentation is wrong?
Thanks in advance.

Related

RESTful Endpoint that returns single entity by two different unique parameters but not both at the same time

I want to understand what would be the best way to represent this in a RESTful way, taking in consideration that the codebase it's a very large - inherited - legacy project and I have to add a lot of new functionality on top of it.
The API Definition is built with OpenaAPI3.
Let's take in consideration the following example:
/v1/{customer}/types/{id}
But the Types collection also has a database constraint of Unique(customer, code) - customer and code being columns from the Types table.
What I need to implement now is a new endpoint that will retrieve a single entity, based on the customer path param and code path param, without having to use the ID path param.
It's a matter of reducing the number of calls, that's why I don't want to make use of the ID path param also.
One solution would be to use query params:
/v1/{customer}/types?code=123
But this will basicaly return a Singleton List so it's not that trivial and definetley not a best practice.
What would be your take on this? I know I should have the ID in the place I want that entity to be returned, but this some case I want to get resovled without having to do another call to get the ID of the entity so I can call the initial endpoint.

Python Apache Beam: BigQuery streaming deduplication by row_id

According to BigQuery docs, you can ensure data consistency providing an insertId (https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency). If it's not provided, BQ will try to ensure consistency based on internals Ids and best-effort.
Using the BQ API you can do that with the row_ids param (https://google-cloud-python.readthedocs.io/en/latest/bigquery/generated/google.cloud.bigquery.client.Client.insert_rows_json.html#google.cloud.bigquery.client.Client.insert_rows_json) but I can't find the same for the Apache Beam Python SDK.
Looking into the SDK I have noticed that a 'unique_row_id' property exist, but I really don't know how to pass my param to WriteToBigQuery()
How can I write into BQ (streaming) providing a row Id for deduplication?
Update:
If you use WriteToBigQuery then it will automatically create and
insert a unique row id called insertId for you, which will be inserted to bigquery. It's handled for you, you don't need to worry about it. :)
WriteToBigQuery is a PTransform, and in it's expand method calls BigQueryWriteFn
BigQueryWriteFn is a DoFn, and in it's process method calls _flush_batch
_flush_batch is a method that then calls the BigQueryWrapper.insert_rows method
BigQueryWrspper.insert_rows creates a list of bigquery.TableDataInsertAllRequest.RowsValueListEntry objects which contain the insertId and the row data as a json object
The insertId is generated by calling the unique_row_id method which returns a value consisting of UUID4 concatenated with _ and with an auto-incremented number.
In the current 2.7.0 code, there is this happy comment; I've also verified it is true :)
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1182
# Prepare rows for insertion. Of special note is the row ID that we add to
# each row in order to help BigQuery avoid inserting a row multiple times.
# BigQuery will do a best-effort if unique IDs are provided. This situation
# can happen during retries on failures.
* Don't use BigQuerySink
At least, not in it's current form as it doesn't support streaming. I guess that might change.
Original (non)answer
Great question, I also looked and couldn't find a certain answer.
Apache Beam doesn't appear to use that google.cloud.bigquery client sdk you've linked to, it has some internal generated api client, but it appears to be up-to-date.
I looked at the source:
The insertall method is there https://github.com/apache/beam/blob/18d2168ee71a1b1b04976717f0f955199bb00961/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py#L476
I also found the insertid mentioned
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py#L1707
So if you can make an InsertAll call it will use a TableDataInsertAllRequest and pass a RowsValueListEntry
class TableDataInsertAllRequest(_messages.Message):
"""A TableDataInsertAllRequest object.
Messages:
RowsValueListEntry: A RowsValueListEntry object.
The RowsValueListEntry message is where the insertid is.
Here's the API docs for insert all
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
I will look some more at this because I don't see the WriteToBigQuery() exposing this.
I suspect that the 'bigquery will remember this for at least one minute` is a pretty loose guarantee for de-duping. The docs suggest using datastore if you need transactions. Otherwise you might need to run SQL with window functions to de-dupe at runtime, or run some other de-duping jobs on bigquery.
Perhaps using batch_size parameter of WriteToBigQuery(), and running a combine (or at worst a GroupByKey) step in dataflow is a more stable way to de-dupe prior to writing.

How to get multiple data from gemfire cacheloader?

We are going to implement gemfire for our project. We are currently syncing gemfire cache with our DB2 database. So, we are facing issue while putting DB data into cache.
To put DB data into region. I have implement com.gemstone.gemfire.cache.CacheLoader and override load method of it. As written in java doc load method will return only one Object. But for our requirement we will have to return multiple VO from load method
public List<CmDvceInvtrGemfireBean> load(LoaderHelper<CmDvceInvtrGemfireBean, CmDvceInvtrGemfireBean> helper)
throws CacheLoaderException
While returining multiple VO in form of List<CmDvceInvtrGemfireBean> gemfire region consider it's as single value.
So, when i invoke,
System.out.println("return COUNT" + cmDvceInvtrRecord.query("SELECT COUNT(*) FROM /cmDvceInvtrRecord"));
It return count of one. But i can see total 7 number of data into it.
So, I want to implement the kind of mechanism that will put all the 7 values as a separate VO in Region
Is there any way to do this using Gemfire CacheLoader?
A CacheLoader was meant to load a value only for a single entry in the GemFire Region on a cache miss. As the Javadoc states...
..creates the value for the desired key..
While a key can map to a multi-valued (e.g. an array/Collection) value, the CacheLoader can only populate a single entry.
You will have to resort to other means of populating the cache with multiple "entries" in a single operation.
Out of curiosity, why do you need (requirement?) to load multiple entries (from the DB) at once? Are you trying to minimize the number of round trips to the DB?
Also, what logic are you using to decide what VO from the DB will be loaded based on the information (i.e. key) provided in the CacheLoader?
For instance, are you somehow trying to predictably select values from the DB based on the CacheLoader key that would subsequently minimize cache misses on future Region.get(key) calls?
Sorry, I don't have a better answer for you right now, but answers to some of these questions may help me give you some ideas for alternatives.
Cheers,
John

Getting specific Backbone.js models from a collection without getting all models first

I'm new to Backbone.js. I'm intrigued by the idea that you can just supply a URL to a collection and then proceed to create, update, delete, and get models from that collection and it handle all the interaction with the API.
In the small task management sample applications and numerous demo's I've seen of this on the web, it seems that the collection.fetch() is used to pull down all models from the server then do something with them. However, more often than not, in a real application, you don't want to pull down hundreds of thousands or even millions of records by issuing a GET statement to the API.
Using the baked-in connection.sync method, how can I specify parameters to GET specific record sets? For example, I may want to GET records with a date of 2/1/2014 or GET records that owned by a specific user id.
In this question, collection.find is used to do this, but does this still pull down all records to the client first then "finds" them or does the collection.sync method know to specify arguments when doing a GET to the server?
You do use fetch, but you provide options as seen in collection.fetch([options]).
So for example to obtain the one model where id is myIDvar:
collection.fetch(
{
data: { id: myIDvar },
success: function (model, response, options) {
// do a little dance;
}
};
My offhand recollections is that find, findWhere and where would invoke all models being downloaded and then the filtering taking place on the client. I believe with fetch the filtering takes places on the server side.
You can implement some kind of pagination on server side and update your collection with limited number of records. In this case all your data will be up to date with backend.
You can do it by overriding fetch method with you own implementaion, or specify params
For example:
collection.fetch({data: {page: 3})
You can also use find where method here
collection.findWhere(attributes)

How to invalid cache when using ListGrid and Datasource

As per what I found reading other sites :
SmartGWT uses data caching to optimize client-server connections and reduce network traffic. In your example, let's say you have the following in your database:
one word
two words
one sentence
When you type word, the fetch returns:
one word
two words
These values are cached in your client.
When you add one to word, because this is a more restraining search criteria, no need to server fetch, only client filter and the result is:
one word
Is there a way of avoiding this and make the search always against the server?
You can user the following properties of DataSource to turn caching OFF.
dataSource.setCacheAllData(false);
dataSource.setAutoCacheAllData(false);
If you want to turn the caching On, pass "true" to both function calls.
manually calling invalidateCache() on listgrid component should run the fetch method with actual criteria