Firebase Multiple Collection Update - objective-c

Is there a way to update two documents, each in a different
collection, in one request?
I know you can batch write using FIRWriteBatch - this seems to be limited to the same collection for any document updates. When trying to attach updates for documents in two different collections:
// Just for example
FIRWriteBatch *batch = FIRWriteBatch.new;
[batch updateData:#{#"posts" : #1} forDocument:[self.firebase.usersCollection documentWithPath:#"some_user_id"]];
[batch setData:#{#"test" : #"cool"} forDocument:[self.firebase.postsCollection documentWithPath:#"some_post_id"]];
[batch commitWithCompletion:^(NSError * _Nullable error) {
NSLog(#"error: %#", error.localizedDescription);
}];
It never executes - the app crashes and I get this:
Terminating app due to uncaught exception 'FIRInvalidArgumentException', reason:
'Provided document reference is from a different Firestore instance.'
Apparently the batch does not like updates in more than one collection.
Does anyone know how you can update two documents, each in a different collection, without one failing and the other succeeding?
I want to avoid, for example, successfully setting posts = 1 for a document in usersCollection, while failing to write a new document in postsCollection.
I understand it's pretty unlikely that one will write while the other fails, but in the case that it does happen, I obviously don't want inconsistent data.
NOTE:
For anyone who cares - I don't know if it ever will fail, but as of now I am running the transaction without reading the document before updating data... 🤷‍♂️ Cheers to -1 API call!

You should use a transaction, which is documented adjacent to batch writes:
Using the Cloud Firestore client libraries, you can group multiple
operations into a single transaction. Transactions are useful when you
want to update a field's value based on its current value, or the
value of some other field. You could increment a counter by creating a
transaction that reads the current value of the counter, increments
it, and writes the new value to Cloud Firestore.
You are not limited to a single collection when performing a transaction. You are just obliged to read the document before you write it:
A transaction consists of any number of get() operations followed by
any number of write operations such as set(), update(), or delete().
In the case of a concurrent edit, Cloud Firestore runs the entire
transaction again. For example, if a transaction reads documents and
another client modifies any of those documents, Cloud Firestore
retries the transaction. This feature ensures that the transaction
runs on up-to-date and consistent data.

Related

Complete Cryptocurrencies Data

I am working on cryptocurrencies blockchain data and I want data from the beginning of the time of that particular cryptocurrency. Is there any way to download complete block data in the Postgresql file?
https://blockchair.com/dumps is although offering this but they limit the download speed and number of downloading files. Moreover, I am also waiting for their reply. Meanwhile, I am finding some other ways or websites to download complete data of multiple cryptocurrencies in SQL format. I cannot download the .csv or .tsv file because it takes a lot of space on my laptop. Therefore, I want to use any other format (preferably .sql format)
This is depends on cryptocurrency, you have. I can suggest you, how to fetch data from Bitcoin-compatible crypto, i.e. Bitcoin, Emercoin, etc.
The cryptocurrency node (wallet) has JSON RPC API interface.Usinf this API, you retrieve all your data with following commands from this command list:
Get total block counter with command getblockcount.
Iterate block number from 0 to result of getblockcount. For each number, call getblockhash.
For result of getblockhash call getblock. This function provide transactions list, enclosed in this block.
For ech transaction (nested loop), call getrawtransaction. Hint: if you call getrawtransaction with 3rd argument "1", node automatically decodes transaction, and return you decoded transaction in JSON.
You can extract from a transaction vectors vin and vout, and upload all data into your SQL database.

BQ Switching to TIMESTAMP Partitioned Table

I'm attempting to migrate IngestionTime (_PARTITIONTIME) to TIMESTAMP partitioned tables in BQ. In doing so, I also need to add several required columns. However, when I flip the switch and redirect my dataflow to the new TIMESTAMP partitioned table, it breaks. Things to note:
Approximately two million rows (likely one batch) is successfully inserted. The job continues to run but doesn't insert anything after that.
The job runs in batches.
My project is entirely in Java
When I run it as streaming, it appears to work as intended. Unfortunately, it's not practical for my use case and batch is required.
I've been investigating the issue for a couple of days and tried to break down the transition into the smallest steps possible. It appears that the step responsible for the error is introducing REQUIRED variables (it works fine when the same variables are NULLABLE). To avoid any possible parsing errors, I've set default values for all of the REQUIRED variables.
At the moment, I get the following combination of errors and I'm not sure how to address any of them:
The first error, repeats infrequently but usually in groups:
Profiling Agent not found. Profiles will not be available from this
worker
Occurs a lot and in large groups:
Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner
Appears to be one very large group of these:
Aborting Operations. java.lang.RuntimeException: Unable to read value from state
Towards the end, this error appears every 5 minutes only surrounded by mild parsing errors described below.
Processing stuck in step BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at least 20m00s without outputting or completing in state finish
Due to the sheer volume of data my project parses, there are several parsing errors such as Unexpected character. They're rare but shouldn't break data insertion. If they do, I have a bigger problem as the data I collect changes frequently and I can adjust the parser only after I see the error, and therefore, see the new data format. Additionally, this doesn't cause the ingestiontime table to break (or my other timestamp partition tables to break). That being said, here's an example of a parsing error:
Error: Unexpected character (',' (code 44)): was expecting double-quote to start field name
EDIT:
Some relevant sample code:
public PipelineResult streamData() {
try {
GenericSection generic = new GenericSection(options.getBQProject(), options.getBQDataset(), options.getBQTable());
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("Read PubSub Events", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription()))
.apply(options.getWindowDuration() + " Windowing", generic.getWindowDuration(options.getWindowDuration()))
.apply(generic.getPubsubToString())
.apply(ParDo.of(new CrowdStrikeFunctions.RowBuilder()))
.apply(new BigQueryBuilder().setBQDest(generic.getBQDest())
.setStreaming(options.getStreamingUpload())
.setTriggeringFrequency(options.getTriggeringFrequency())
.build());
return pipeline.run();
}
catch (Exception e) {
LOG.error(e.getMessage(), e);
return null;
}
Writing to BQ. I did try to set the partitoning field here directly, but it didn't seem to affect anything:
BigQueryIO.writeTableRows()
.to(BQDest)
.withMethod(Method.FILE_LOADS)
.withNumFileShards(1000)
.withTriggeringFrequency(this.triggeringFrequency)
.withTimePartitioning(new TimePartitioning().setType("DAY"))
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER);
}
After a lot of digging, I found the error. I had parsing logic (a try/catch) that returned nothing (essentially a null row) in the event there was a parsing error. This would break BigQuery as my schema had several REQUIRED rows.
Since my job ran in batches, even one null row would cause the entire batch job to fail and not insert anything. This also explains why streaming inserted just fine. I'm surprised that BigQuery didn't throw an error claiming that I was attempting to insert a null into a required field.
In reaching this conclusion, I also realized that setting the partition field in my code was also necessary as opposed to just in the schema. It could be done using
.setField(partitionField)

Ravendb memory leak on query

I'm having a hard problem solving an issue with RavenDB.
At my work we have a process to trying to identify potential duplicates in our database on a specified collection (let's call it users collection).
That means, I'm iterating through the collection and for each document there is a query that is trying to find similar entities. So just imagine, it's quite a long task to run.
My problem is, when the task starts running, the memory consumption for RavenDB is going higher and higher, it's literally just growing and growing, and it seems to continue until it reaches the maximum memory of the system.
But it doesn't really makes sense, since I'm only doing query, I'm using one single index and take a default page size when querying (128).
Anybody meet a similar problem like this? I really have no idea what is going on in ravendb. but it seems like a memory leak.
RavenDB version: 3.0.179
When i need to do massive operations on large collections i work following this steps to prevent problems on memory usage:
I use Query Streaming to extract all the ids of the documents that i want to process (with a dedicated session)
I open a new session for each id, i load the document and then i do what i need
First, a recommendation: if you don't want duplicates, store them with a well-known ID. For example, suppose you don't want duplicate User objects. You'd store them with an ID that makes them unique:
var user = new User() { Email = "foo#bar.com" };
var id = "Users/" + user.Email; // A well-known ID
dbSession.Store(user, id);
Then, when you want to check for duplicates, just check against the well known name:
public string RegisterNewUser(string email)
{
// Unlike .Query, the .Load call is ACID and never stale.
var existingUser = dbSession.Load<User>("Users/" + email);
if (existingUser != null)
{
return "Sorry, that email is already taken.";
}
}
If you follow this pattern, you won't have to worry about running complex queries nor worry about stale indexes.
If this scenario can't work for you for some reason, then we can help diagnose your memory issues. But to diagnose that, we'll need to see your code.

Validating rows before inserting into BigQuery from Dataflow

According to
How do we set maximum_bad_records when loading a Bigquery table from dataflow? there is currently no way to set the maxBadRecords configuration when loading data into BigQuery from Dataflow. The suggestion is to validate the rows in the Dataflow job before inserting them into BigQuery.
If I have the TableSchema and a TableRow, how do I go about making sure that the row can safely be inserted into the table?
There must be an easier way of doing this than iterating over the fields in the schema, looking at their type and looking at the class of the value in the row, right? That seems error-prone, and the method must be fool-proof since the whole pipeline fails if a single row cannot be loaded.
Update:
My use case is an ETL job that at first will run on JSON (one object per line) logs on Cloud Storage and write to BigQuery in batch, but later will read objects from PubSub and write to BigQuery continuously. The objects contain a lot of information that isn't necessary to have in BigQuery and also contains parts that aren't even possible to describe in a schema (basically free form JSON payloads). Things like timestamps also need to be formatted to work with BigQuery. There will be a few variants of this job running on different inputs and writing to different tables.
In theory it's not a very difficult process, it takes an object, extracts a few properties (50-100), formats some of them and outputs the object to BigQuery. I more or less just loop over a list of property names, extract the value from the source object, look at a config to see if the property should be formatted somehow, apply the formatting if necessary (this could be downcasing, dividing a millisecond timestamp by 1000, extracting the hostname from a URL, etc.), and write the value to a TableRow object.
My problem is that data is messy. With a couple of hundred million objects there are some that don't look as expected, it's rare, but with these volumes rare things still happen. Sometimes a property that should contain a string contains an integer, or vice-versa. Sometimes there's an array or an object where there should be a string.
Ideally I would like to take my TableRow and pass it by TableSchema and ask "does this work?".
Since this isn't possible what I do instead is I look at the TableSchema object and try to validate/cast the values myself. If the TableSchema says a property is of type STRING I run value.toString() before adding it to the TableRow. If it's an INTEGER I check that it's a Integer, Long or BigInteger, and so on. The problem with this method is that I'm just guessing what will work in BigQuery. What Java data types will it accept for FLOAT? For TIMESTAMP? I think my validations/casts catch most problems, but there are always exceptions and edge cases.
In my experience, which is very limited, the whole work pipeline (job? workflow? not sure about the correct term) fails if a single row fails BigQuery's validations (just like a regular load does unless maxBadRecords is set to a sufficiently large number). It also fails with superficially helpful messages like 'BigQuery import job "dataflow_job_xxx" failed. Causes: (5db0b2cdab1557e0): BigQuery job "dataflow_job_xxx" in project "xxx" finished with error(s): errorResult: JSON map specified for non-record field, error: JSON map specified for non-record field, error: JSON map specified for non-record field, error: JSON map specified for non-record field, error: JSON map specified for non-record field, error: JSON map specified for non-record field'. Perhaps there is somewhere that can see a more detailed error message that could tell me which property it was and what the value was? Without that information it could just as well have said "bad data".
From what I can tell, at least when running in batch mode Dataflow will write the TableRow objects to the staging area in Cloud Storage and then start a load once everything is there. This means that there is nowhere for me to catch any errors, my code is no longer running when BigQuery is loaded. I haven't run any job in streaming mode yet, but I'm not sure how it would be different there, from my (admittedly limited) understanding the basic principle is the same, it's just the batch size that's smaller.
People use Dataflow and BigQuery, so it can't be impossible to make this work without always having to worry about the whole pipeline stopping because of a single bad input. How do people do it?
I'm assuming you deserialize the JSON from the file as a Map<String, Object>. Then you should be able to recursively type-check it with a TableSchema.
I'd recommend an iterative approach to developing your schema validation, with the following two steps.
Write a PTransform<Map<String, Object>, TableRow> that converts your JSON rows to TableRow objects. The TableSchema should also be a constructor argument to the function. You can start off making this function really strict -- require that JSON parsed input as Integer directly, for instance, when a BigQuery INTEGER schema was found -- and aggressively declare records in error. Basically, ensure that no invalid records are output by being super-strict in your handling.
Our code here does something somewhat similar -- given a file produced by BigQuery and written as JSON to GCS, we recursively walk the schema and do some type conversions. However, we do not need to validate, because BigQuery itself wrote the data.
Note that the TableSchema object is not Serializable. We've worked around by converting the TableSchema in a DoFn or PTransform constructor to a JSON String and back. See the code in BigQueryIO.java that uses the jsonTableSchema variable.
Use the "dead-letter" strategy described in this blog post to handle bad records -- side output the offending Map<String, Object> rows from your PTransform and write them to a file. That way, you can inspect the rows that failed your validation later.
You might start with some small files and use the DirectPipelineRunner rather than the DataflowPipelineRunner. The direct runner runs the pipeline on your computer, rather than on Google Cloud Dataflow service, and it uses the BigQuery streaming writes. I believe when those writes fail you will get better error messages.
(We use the GCS->BigQuery Load Job pattern for Batch jobs because it's much more efficient and cost-effective, but BigQuery streaming writes in Streaming jobs because they are low-latency.)
Finally, in terms of logging information:
Definitely check Cloud Logging (by following the Worker Logs link on the logs panel.
You may get better information about why the load jobs triggered by your Batch Dataflows fail if you run the bq command-line utility: bq show -j PROJECT:dataflow_job_XXXXXXX.

what does this error mean in nhibernate

Out of the blue, i am getting this error when doing a number of updates using nhibernate.
Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [MyDomainObject]
there is no additional information in the error. Is there some recommended way to help identify the root issue or can someone give me a better explanation on what this error indicated or is a sympton around.
Some additional info
I looked at the object and all of the data looks fine, it has an ID, etc . .
Note this is running in a single call stack from an asp.net-mvc website so i wouldn't expect there to be any threading issues to worry about in terms of concurrency.
NHibernate has an object, let's call it theObject. theObject.Id has a value of 42. NHibernate notices that the object is dirty. The object's Id is different than the unsaved-value, which is zero (0) for integer primary keys. So NHibernate issues an update statement, but no rows are updated, which means that there is no row in the database for that type of object with an Id of 42. So the object has been deleted without NHibernate knowing about it. This could happen inside another transaction (e.g. you have threading issues) or if someone (or another application) deleted/altered the row using SQL directly against the database.
The other possibility is that your unsaved-value is wrong. e.g. You are using -1 to indicate an unsaved-entity, but your mapping has a unsaved-value of zero. This is unlikely as your application is generally working from the sounds of it. If the unsaved-value was wrong, you wouldn't have been able to save any entities to the database as NHibernate would have been issuing UPDATE statements when it should have been issuing INSERT.
It means that you have multiple transactions accessing the same data, thus producing concurrency issues. You should improve on your data access handling, you probably are updating data from multiple threads, syndicate the changed data into a queue first which handles all the access to the db.
An old post, but hopefully my info will help someone. I was getting a similar error but only when persisting associations, after I had added in a new object. The error was of the form:
NHibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) [My.Entity#0]
Note the zero on the end, which is my identifier property. It should not be trying to save with key zero as I was using identity specification in SQL Server (generator class=native). I had not changed my unsaved-value in my xml so I had no idea what the problem was; for some reason NHibernate was trying to do an update using key value as 0 instead of a save (and getting the next key identity) for my new object.
In the end I found the cause was that I was initialising Version number to 1 for the new object in my constructor! Even though my identifier property was zero, for some reason NHibernate was also looking for a version property of zero as well, to identify it as an unsaved transient instance. The book "NHibernate in Action" does actually mention this on page 120, but for some reason my objects were fine when persisting with version number of 1 normally, and only failing if saving a new object through an association.
So make sure you do not set your Version value (leave as zero or null).
You say that your data is ok, but check if for example you are mapping the ID as self generate. I had the exact same problem, but I was sending an object with an ID different from 0.
Hope it helps!
My problem was this:
[Bind(Include="Name")] EventType eventType
Should have been:
[Bind(Include="EventTypeId,Name")] EventType eventType
Just as other answers suggest nhibernate was using zero as the id for my entity.
If you have a trigger on the table, it can be the reason. In this case, add inside it
SET ROWCOUNT 0;
SET NOCOUNT ON;
This error happened to me in the following way:
List < Device > allDevices = new List < Device > ();
//Add Devices to the list
allDevices.Add(aDevice);
//Add allDevices to database //Will work fine
// allDevices.Clear(); //Should be used here
//Later we add more devices
allDevices.Add(anotherDevice);
//Add allDevices to Database -> We get the error
//Solution to this
allDevices.Clear(); //Before adding new transaction with the oldData,