Scrapy dynamic file name based on timestamp for the logfile

Scrapy dynamic file name based on timestamp for the logfile - scrapy

I am using scrapy and logging the results to a log file using the LOG_FILE= in settings.py.
However what I want is that each time I run the spider it should log to a different filename based on the timestamp.
Is it possible to achieve this by doing some setting in settings.py?

Try:
import time
LOG_FILE = str(int(time.time()))
settings.py is a python script, you can write any python code in it.

Related

is there a way to start to read where we stop in CSV Data Set Config in JMeter?

I have created a test that reads the users from a CSV Data Set Config in JMeter.
For example when i run a test, JMeter reads first 20 users in the CSV file.
Then if i will run the same test again, JMeter again reads the first 20 users in the CSV file.
But i want JMeter to read 20 users but must start the reading from 21st user, and so on.
Is there a way to make this possible?

As per CSV Data Set Config documentation:
By default, the file is only opened once, and each thread will use a different line from the file. However the order in which lines are passed to threads depends on the order in which they execute, which may vary between iterations. Lines are read at the start of each test iteration. The file name and mode are resolved in the first iteration.
So there is no way to specify the "offset" for reading the file, the options are in:
Use __CSVRead() function where you can call ${__CSVRead(/path/to/your/file.csv,next)} as many times as needed to "skip" the lines which are already "used"
Use setUp Thread Group and JSR223 Sampler to remove first 20 lines from the CSV file programmatically
Go for Redis Data Set config instead where you have Recycle Data on Use option, if you set it to False the "used" data will be removed

Export panda dataframe to CSV

I have a code where at the end I export a dataframe in CSV format. However each time I run my code it replaces the previous CSV while I would like to accumulate my csv files
Do you now a method to do this ?
dfind.to_csv(r'C:\Users\StageProject\Indicateurs\indStat.csv', index = True, header=True)
Thanks !

The question is really about how you want to name your files. The easiest way is just to attach a timestamp to each one:
import time
unix_time = round(time.time())
This should be unique under most real-world conditions because time doesn't go backwards and Python will give time.time() only in UTC. Then just save to the path:
rf'C:\Users\StageProject\Indicateurs\indStat_{unix_time}.csv'
If you want to do a serial count, like what your browser does when you save multiple versions, you will need to iterate through the files in that folder and then keep adding one to your suffix until you get to a file path that does not conflict, then save thereto.

How to access cluster_config dict within rule?

I'm working on writing a benchmarking report as part of a workflow, and one of the things I'd like to include is information about the amount of resources requested for each job.
Right now, I can manually require the cluster config file ('cluster.json') as a hardcoded input. Ideally, though, I would like to be able to access the per-rule cluster config information that is passed through the --cluster-config arg. In init.py, this is accessed as a dict called cluster_config.
Is there any way of importing or copying this dict directly into the rule?

From the documentation, it looks like you can now use a custom wrapper script to access the job properties (including the cluster config data) when submitting the script to the cluster. Here is an example from the documentation:
#!python
#!/usr/bin/env python3
import os
import sys
from snakemake.utils import read_job_properties
jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)
# do something useful with the threads
threads = job_properties[threads]
# access property defined in the cluster configuration file (Snakemake >=3.6.0)
job_properties["cluster"]["time"]
os.system("qsub -t {threads} {script}".format(threads=threads, script=jobscript))
During submission (last line of the previous example) you could either pass the arguments you want from the cluster.json to the script or dump the dict into a JSON file, pass the location of that file to the script during submission, and parse the json file inside your script. Here is an example of how I would change the submission script to do the latter (untested code):
#!python
#!/usr/bin/env python3
import os
import sys
import tempfile
import json
from snakemake.utils import read_job_properties
jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)
job_json = tempfile.mkstemp(suffix='.json')
json.dump(job_properties, job_json)
os.system("qsub -t {threads} {script} -- {job_json}".format(threads=threads, script=jobscript, job_json=job_json))
job_json should now appear as the first argument to the job script. Make sure to delete the job_json at the end of the job.
From a comment on another answer, it appears that you are just looking to store the job_json somewhere along with the job's output. In that case, it might not be necessary to pass job_json to the job script at all. Just store it in a place of your choosing.

You can manage the resources for the cluster easily per Rules.
Indeed you have the keyword "resources:" to use like this :
rule one:
input: ...
output: ...
resources:
gpu=1,
time=HH:MM:SS
threads: 4
shell: "..."
You can specify the resources by the yaml configuration files for the cluster give with the parameter --cluster-config like this:
rule one:
input: ...
output: ...
resources:
time=cluster_config["one"]["time"]
threads: 4
shell: "..."
When you will call snakemake you will just have to access to the resources like this (example for slurm cluster):
snakemake --cluster "sbatch -c {threads} -t {resources.time} " --cluster-config cluster.yml
It will send each rule with its specific resources for the cluster.
For more informations, you can check the documentations with this link : http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
Best regards

In Lua, how to print the console output into a file (piping) instead of using the standard output?

I workin' with Torch7 and Lua programming languages. I need a command that redirects the output of my console to a file, instead of printing it into my shell.
For example, in Linux, when you type:
$ ls > dir.txt
The system will print the output of the command "ls" to the file dir.txt, instead of printing it to the default output console.
I need a similar command for Lua. Does anyone know it?
[EDIT] An user suggests to me that this operation is called piping. So, the question should be: "How to make piping in Lua?"
[EDIT2] I would use this # command to do:
$ torch 'my_program' # printed_output.txt

Have a look here -> http://www.lua.org/pil/21.1.html
io.write seems to be what you are looking for.

Lua has no default function to create a file from the console output.
If your applications logs its output -which you're probably trying to do-, it will only be possible to do this by modifying the Lua C++ source code.
If your internal system has access to the output of the console, you could do something similar to this (and set it on a timer, so it runs every 25ms or so):
dumpoutput = function()
local file = io.write([path to file dump here], "w+")
for i, line in ipairs ([console output function]) do
file:write("\n"..line);
end
end
Note that the console output function has to store the output of the console in a table.
To clear the console at the end, just do os.execute( "cls" ).

Exporting product csv file with a scheduler in OpenERP v6.1

Is there a way to export the products to a csv file using a scheduler?
I tried the csv python package discussed in this link: ( The best way to export openerp data to csv file using python ) but it seems I miss something.
I take advantage of the export_data as stated and I point my scheduler to that method but when I run it, nothing happen so I don't know if it runs or not but the scheduler keeps running.
Thanks for replies.

I'm not an OpenERP (or Odoo) expert. With my current knowledge of Odoo I would get this started through ERPpeek along the lines of
erppeek [YOUR OPENERP DETAILS]
>>> l = model('product.product').browse([])
>>> import csv
>>> with open('test.csv', 'w') as fp:
a = csv.writer(fp, delimiter=',')
for prod in l: a.writerow(prod.read().values())
I'd further build it out into a script and then put the script in cron.
There are probably better, more standard Odoo-ish ways though.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scrapy dynamic file name based on timestamp for the logfile - scrapy

I am using scrapy and logging the results to a log file using the LOG_FILE= in settings.py. However what I want is that each time I run the spider it should log to a different filename based on the timestamp. Is it possible to achieve this by doing some setting in settings.py?

Try: import time LOG_FILE = str(int(time.time())) settings.py is a python script, you can write any python code in it.

Related

is there a way to start to read where we stop in CSV Data Set Config in JMeter?

Export panda dataframe to CSV

How to access cluster_config dict within rule?

In Lua, how to print the console output into a file (piping) instead of using the standard output?

Exporting product csv file with a scheduler in OpenERP v6.1

Categories

Resources