I am trying to download sentinel 2 data from the Copernicus Open Access Hub, with a python script and using the sentinelsat library. I know that the rate at which you can download data is limited: max 2 concomitant downloads. However, I seem to have problems when I try to download a higher quantity of data (10-20 queries) one after the other in a for loop, therefore not in concomitance.
The command api.download(<product_id>) does not trigger the download, rather it just prints something like This is the output I get rather than the actual download happening
Related
Ok - total noob on the topic here so this might not even be a good idea.
I have a Google Colab notebook I use for generating images from a textual prompt, where every image takes about 30 secs.
Since I’d like to generate a several hundred I’d like to have an environment set up on GCloud where I run the same process but with a batch of - say - 800 instead of the usual 5-10, and ideally I start the process, close the connection with the machine and come back the next day to find the results.
This might well be a duplicate - because I have no idea of what to search for.
So: is it a good or bad idea and how do I do this?
I am trying to generate 10000 pdf reports into the windows file share location using SSRS data driven subscription methodology and I found that when I run for small bathces and it works and it surely fail when I give 10000 at a time. This behavior is unpredictable and not able to scale the solution. Ex:
When I put 10000 load it generates 2700 and fails rest but when I try to run failed records in another batch then it gets me the PDFs. It fails sometime with small batch sizes also. No proper reason logged.
Thanks
I am trying to visualize a training session I trained in a remote server. I used scp to copy the file in my local iMac. I tried to visualize the data by running tensorboard. It runs the tensorboard site but I can't get the visualization. Every chart has a single dot at zero. I get this warning on the terminal.
WARNING:tensorflow:Unable to get first event timestamp for run
470_313_0.0001_2500_200/train
WARNING:tensorflow:Unable to get first event timestamp for run
470_313_0.0001_2500_200/train
WARNING:tensorflow:Unable to get first event timestamp for run
470_313_0.0001_2500_200/val
WARNING:tensorflow:Unable to get first event timestamp for run
470_313_0.0001_2500_50/train
WARNING:tensorflow:Unable to get first event timestamp for run
470_313_0.0001_2500_50/val
Any idea what is going on?
I ran into the same problem (with a TensorFlow 1.4 trainer running in the cloud with Google Cloud ML Engine).
Explicitly closing tf.summary.FileWriters with close() solved it in my case.
I ran into a similar problem. There are 2 solutions to it -
Delete all past tfevents files from the directory and keep the latest one (temporary solution).
Create a new directory for building your logs (permanent solution).
In given below picture, first I build logs in directory 2 which resulted in same error/warnings. Later I changed the directory to 3 and build logs there and tensorboard ran successfully.
In my case, the problem was the directory names for the runs were too long. After I manually renamed them, the problem is solved.
I am a very new user to Flume, please treat me as an absolute noob. I am having a minor issue configuring Flume for a particular use case and was hoping you could assist. Note that I am not using HDFS, which is why this question is different from others you may have seen on forums.
I have two Virtual Machines (VMs) connected to each other through an internal network on Oracle Virtual Box. My goal is to have one VM watch a particular directory that will only ever have one file in it. When the file is changed, I wish for Flume to only send only the new lines/data. I want the other VM to receive this data and update/concatenate the data to a single file in a particular directory on it.
So far, I have this process very close to working. Whenever changes are made in VM1, they are updated on VM2. However, the entire file on VM1 is sent to VM2 every time, not the new lines. For example, if I wrote “Test1” and then a while later underneath wrote “Test2” to the file on VM1, on VM2 the output would be:
Test1
Test1
Test2
What I want to see is:
Test1
Test2
I am not sure how to implement this, and am sending this email after thoroughly examining the Flume user guide documentation and most relevant articles on stackoverflow/stackexchange. For your reference, below are the current configurations(they are working in the manner I mentioned above).
VM1 configuration
VM2 configuration
I realize another solution would be to keep the configuration on VM1 and overwrite the file on VM2 everytime new contents are detected. However, I am also unsure how to implement this.
Any assistance you could provide is greatly appreciated!
Use TailDir source provided in Flume.It periodically writes last position read in position file and its more reliable than exec source as even in case of agent crashes or stops for some reason it will start reading from last position saved in the position file.
agent1.sources.src1.type = TAILDIR
agent1.sources.src1.channels = ch1
agent1.sources.src1.filegroups =f1
agent1.sources.src1.filegroups.f1= //path to log file
agent1.sources.src1.maxBackoffSleep = 10000
Set maxBackoffSleep value as per your need it means how much max time agent should wait before polling for changes in log file , when it didnt find any changes in last attempt made.
Hi I have below queries with SFTP component if you guys can help me out that would be a great help:
1) Can we get the file size of the file picked up by SFTP component? I need to restrict the transfer based on the size of file.
2) Can I get the number of files and the file names picked up by the SFTP component?
3) Is the understanding correct: SFTP component picks up all the files from the server and keep in memory and do the processing 1 by 1 until it finishes all files?
4) If server has 5 files can SFTP component process all the 5 files in parallel rather than 1 by 1?
1-Mule does not populate the file-size field for SFTP as they do with FILE. There are Jira tickets open on this matter but MuleSoft has called it an enhancement and not given it a priority. I disagree. Perhaps ping MuleSoft, if enough users do maybe they will raise the priority and address it. They use the size internally, they simply do not expose it outside as is done with the FILE connector.
2-No, not really. It gives them back one at a time, not as a list.
3 & 4-It is only loading the entire file into memory if you tell it not to stream or do something else, like an onject-to-string transformer which forces a memory load. The files or files streams are passed back 1 by 1, but unless you restrict threading and make your flow synchronous, it will go to asych and multi-threaded and process multiple files in parallel. Flows default to asych, subflow are synchronous.
You can use the SFTP endpoint to retrieve files, and then use a Java or script call to get the file's attributes and filter to only process the files you are actually interested in, such as ones larger than your minimum size. This would seem more in line with what you are looking for in point 1. There are other options, but this would be more straight forward that others I can quickly think of.
I found 1 way to get the file size, if we provide transformer-refs="Object_to_Byte_Array" in and then do #[payload.size()] to get the size of file in Bytes? Will this cause any issue?