I recollect getting log files that were nicely ordered, so that you could follow one request, then the next, and so on.
Now, the log files are, as my 4 year old says "all scroggled up", meaning that they are no longer separate, distinct chunks of text. Loggings from two requests get intertwined/mixed up.
For instance:
Started GET /foobar
...
Completed 200 OK in 2ms (Views: 0.4ms | ActiveRecord: 0.8ms)
Patient Load (wait, that's from another request that has nothing to do with foobar!)
[ blank space ]
Something else
This is maddening, because I can't tell what's happening within one single request.
This is running on Passenger.
I tried to search for the same answer but couldn't find any good info. I'm not sure if you should fix server or rails code.
If you want more info about the issue here is the commit that removed old way of logging https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
If you value production log readability over everything else you can use the
PassengerMaxInstancesPerApp 1
configuration. It might cause some scaling issues. Alternatively you could stuff something like this in application.rb:
process_log_filename = Rails.root + "log/#{Rails.env}-#{Process.pid}.log"
log_file = File.open(process_log_filename, 'a')
Rails.logger = ActiveSupport::BufferedLogger.new(log_file)
Yep!, they have made some changes in the ActiveSupport::BufferedLogger so it is not any more waiting until the request has ended to flush the logs:
http://news.ycombinator.com/item?id=4483390
https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
But they have added the ActiveSupport::TaggedLogging which is very funny and you can stamp every log with any kind of mark you want.
In your case could be good to stamp the logs with the request UUID like this:
# config/application.rb
config.log_tags = [:uuid]
Then even if the logs are messed up you still can follow which of them correspond to the request you are following up.
You can make more funny things with this feature to help you in your logs study:
How to log user_name in Rails?
http://zogovic.com/post/21138929607/running-time-in-rails-logs
Well, for me the TaggedLogging solution is a no go, I can live with some logs getting lost if the server crashes badly, but I want my logs to be perfectly ordered. So, following advice from the issue comments I'm applying this to my app:
# lib/sequential_logs.rb
module ActiveSupport
class BufferedLogger
def flush
#log_dest.flush
end
def respond_to?(method, include_private = false)
super
end
end
end
# config/initializers/sequential_logs.rb
require 'sequential_logs.rb'
Rails.logger.instance_variable_get(:#logger).instance_variable_get(:#log_dest).sync = false
As far as I can say this hasn't affected my app, it is still running and now my logs make sense again.
They should add some quasi-random reqid and write it in every line regarding one single request. This way you won't get confused.
I haven't used it, but I believe Lumberjack's unit_of_work method may be what you're looking for. You call:
Lumberjack.unit_of_work do
yield
end
And all logging done either in that block or in the yielded block are tagged with a unique ID.
Related
I have this piece of code that only executes the first yield's callback and not the next one. I have tried reordering them and it gives the same result:
Only the first yield callback gets executed.
for j in range(totalOrderPages): # the code gets in the loop
productURI = feedUrl % (productId, j + 1)
print "Got in the loop" # this gets printed
yield response.follow(productURI, self.parse_orders, meta={'pid': productId, 'categories': categories})
yield response.follow(first_page, self.parse_product, meta={'pid': productId, 'categories': categories})
Is there anything in Python or scrapy that prevents 2 consecutive yields?
Second question:
I'm trying to debug this using pdb.set_trace() but when I try to execute yield from the debugging console, it give the yield outside function error.
Does anyone know how can we debug yields?
Thank you.
Without knowing more details, like the redirection behaviour of the specific site or the contents of the variables (feedUrl, productURI, first_page, etc), I would say that some requests are being dropped by the Dupefilter (https://doc.scrapy.org/en/latest/topics/settings.html#dupefilter-class).
I'd recommend you to enable the DEBUG logging level and setting DUPEFILTER_DEBUG=True, and check the logs to see if that's the case.
You can force requests to bypass the Dupefilter by adding dont_filter=True when calling response.follow.
If this doesn't solve your issue, please share your crawl logs so we can have more information to debug the issue. Happy scraping!
Using the Java SDK I am creating a load job for just a single record with a fairly complicated schema. When monitoring the status of the load job, it takes a surprisingly long time (but perhaps this is due to working out the schema), but then says:
11:21:06.975 [main] INFO xxx.GoogleBigQuery - Job status (21694ms) create_scans_1384744805079_172221126: DONE
11:24:50.618 [main] ERROR xxx.GoogleBigQuery - Job create_scans_1384744805079_172221126 caused error (invalid) with message
Too many errors encountered. Limit is: 0.
11:24:50.810 [main] ERROR xxx.GoogleBigQuery - {
"message" : "Too many errors encountered. Limit is: 0.",
"reason" : "invalid"
?}
BTW - how do I tell the job that it can have more than zero errors using Java?
This load job does not appear in the list of recent jobs in the console, and as far as I can see, none of the Java objects contains any more details about the actual errors encountered. So how can I pro-grammatically find out what is going wrong? All I can find is:
if (err != null) {
log.error("Job {} caused error ({}) with message\n{}", jobID, err.getReason(), err.getMessage());
try {
log.error(err.toPrettyString());
}
...
In general I am having a difficult time finding good documentation for some of these things and am working it out by trial and error and short snippets of code found on here and older groups. If there is a better source of information than the getting started guides, then I would appreciate any pointers to that information. The Javadoc does not really help and I cannot find any complete examples of loading, querying, testing for errors, cataloging errors and so on.
This job is submitted via a NEWLINE_DELIMITIED_JSON record, supplied to the job via:
InputStream dummy = getClass().getResourceAsStream("/googlebigquery/xxx.record");
final InputStreamContent jsonIn = new InputStreamContent("application/octet-stream", dummy);
createTableJob = bigQuery.jobs().insert(projectId, loadJob, jsonIn).execute();
My authentication and so on seems to work correctly as separate Java code to list the projects, and the datasets in the project all works correctly. So I just need help in working what the actual error is - does it not like the schema (I have records nested within records for instance), or does it think that there is an error in the data I am submitting.
Thanks in advance for any help. The job number cited above is an actual failed load job if that helps any Google staffers who might read this.
It sounds like you have a couple of questions, so I'll try to address them all.
First, the way to get the status of the job that failed is to call jobs().get(jobId), which returns a job object that has an errorResult object that has the error that caused the job to fail (e.g. "too many errors"). The errorStream list is a lost of all of the errors on the job, which should tell you which lines hit errors.
Note if you have the job id, it may be easier to use bq to lookup the job -- you can run bq show <job_id> to get the job error information. If you add the --format=prettyjson it will print out all of the information in the job.
A hint you also might want to consider is to supply your own job id when you create the job -- then even if there is an error starting the job (i.e. the insert() call fails, perhaps due to a network error) you can look up the job to see what actually happened.
To tell BigQuery that some errors are allowed during import, you can use the maxBadResults setting in the load job. See https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#getMaxBadRecords().
I have two models: hotel and location. A location belongs to a hotel, a hotel has one location. I'm trying to create both in a single form, bear in mind that I can't use dm-nested for nested forms due to a dependency clash.
I have code that looks like:
if (#hotel.save && #location.save)
# process
else
# back to form with errors
end
Unfortunately, #hotel.save can fail and #location.save can complete (which confuses me because I didn't think the second condition would run in an AND block if the first one failed).
I'd like to wrap these in a transaction so I can rollback the Location save. I can't seem to find a way to do it online. I'm using dm-rails, rails 3 and a postgresql database. Thanks.
The usual way to wrap database operations in DataMapper is to do something like this:
#hotel.transaction do
#hotel.save
#location.save
end
Notice that #hotel is quite arbitrary there; it could as well be #location or even a model name like Hotel.
In my experience, this works best when you enable exceptions to be thrown. Then if #hotel.save fails, it will throw an exception, which will be caught by the transaction block, causing the transaction to be rolled back. The exception is, of course, reraised.
Often I need to throw a custom(ized) error. Like when a resource cannot be found due to a mismatch in parameters or so.
I prefer to throw existing errors or, throw an error that is inherited from an existing error. That way, I don't introduce Error classes that were already defined and could have been used perfectly (DRY). But it also allows to keep wording and style the same, by inheriting and simply changing a word or two to clarify the difference with the original Error.
For example:
Foo.new
Foo.some_external_id = nil
Foo.fetch_external_resource
# => InvalidOptions: Calling Foo#fetch_external_resource with nil is invalid
I am quite sure such errors are already defined. In fact, after reading trough many lines of code, I found my MongoID driver has Mongoid::Errors::InvalidOptions: Calling Document#find with nil is invalid.
Is there a list of available Error classes in Ruby Core and Ruby on Rails? Is there a way to get such a list for your current project?
Is it smart at all to re-use and/or inherit existing errors, or should I maintain my own, custom set instead?
While this list may change, and it's best to still use Ryan's solution, the list for Rails 4 (limited to Rails classes and not other gems):
AbstractController::ActionNotFound
AbstractController::DoubleRenderError
AbstractController::Error
AbstractController::Helpers::ClassMethods::MissingHelperError
ActionController::ActionControllerError
ActionController::BadRequest
ActionController::InvalidAuthenticityToken
ActionController::MethodNotAllowed
ActionController::MissingFile
ActionController::NotImplemented
ActionController::ParameterMissing
ActionController::RedirectBackError
ActionController::RenderError
ActionController::RoutingError
ActionController::SessionOverflowError
ActionController::UnknownController
ActionController::UnknownFormat
ActionController::UnknownHttpMethod
ActionController::UnpermittedParameters
ActionController::UrlGenerationError
ActionDispatch::Cookies::CookieOverflow
ActionDispatch::IllegalStateError
ActionDispatch::Journey::Router::RoutingError
ActionDispatch::ParamsParser::ParseError
ActionDispatch::RemoteIp::IpSpoofAttackError
ActionDispatch::Session::SessionRestoreError
ActionView::Helpers::NumberHelper::InvalidNumberError
ActiveModel::ForbiddenAttributesError
ActiveModel::MissingAttributeError
ActiveRecord::ActiveRecordError
ActiveRecord::AdapterNotFound
ActiveRecord::AdapterNotSpecified
ActiveRecord::AssociationTypeMismatch
ActiveRecord::AttributeAssignmentError
ActiveRecord::ConfigurationError
ActiveRecord::ConnectionNotEstablished
ActiveRecord::ConnectionTimeoutError
ActiveRecord::DangerousAttributeError
ActiveRecord::DeleteRestrictionError
ActiveRecord::DuplicateMigrationNameError
ActiveRecord::DuplicateMigrationVersionError
ActiveRecord::EagerLoadPolymorphicError
ActiveRecord::HasAndBelongsToManyAssociationForeignKeyNeeded
ActiveRecord::HasManyThroughAssociationNotFoundError
ActiveRecord::HasManyThroughAssociationPointlessSourceTypeError
ActiveRecord::HasManyThroughAssociationPolymorphicSourceError
ActiveRecord::HasManyThroughAssociationPolymorphicThroughError
ActiveRecord::HasManyThroughCantAssociateNewRecords
ActiveRecord::HasManyThroughCantAssociateThroughHasOneOrManyReflection
ActiveRecord::HasManyThroughCantDissociateNewRecords
ActiveRecord::HasManyThroughNestedAssociationsAreReadonly
ActiveRecord::HasManyThroughSourceAssociationNotFoundError
ActiveRecord::HasOneThroughCantAssociateThroughCollection
ActiveRecord::IllegalMigrationNameError
ActiveRecord::ImmutableRelation
ActiveRecord::InvalidForeignKey
ActiveRecord::InverseOfAssociationNotFoundError
ActiveRecord::IrreversibleMigration
ActiveRecord::MultiparameterAssignmentErrors
ActiveRecord::NestedAttributes::TooManyRecords
ActiveRecord::PendingMigrationError
ActiveRecord::PreparedStatementInvalid
ActiveRecord::ReadOnlyAssociation
ActiveRecord::ReadOnlyRecord
ActiveRecord::RecordInvalid
ActiveRecord::RecordNotDestroyed
ActiveRecord::RecordNotFound
ActiveRecord::RecordNotSaved
ActiveRecord::RecordNotUnique
ActiveRecord::Rollback
ActiveRecord::SerializationTypeMismatch
ActiveRecord::StaleObjectError
ActiveRecord::StatementInvalid
ActiveRecord::SubclassNotFound
ActiveRecord::Tasks::DatabaseAlreadyExists
ActiveRecord::Tasks::DatabaseNotSupported
ActiveRecord::ThrowResult
ActiveRecord::TransactionIsolationError
ActiveRecord::Transactions::TransactionError
ActiveRecord::UnknownAttributeError
ActiveRecord::UnknownMigrationVersionError
ActiveRecord::UnknownPrimaryKey
ActiveRecord::WrappedDatabaseException
ActiveSupport::JSON::Encoding::CircularReferenceError
ActiveSupport::MessageVerifier::InvalidSignature
ActiveSupport::SafeBuffer::SafeConcatError
ActiveSupport::XMLConverter::DisallowedType
There's a mostly adequate solution here: http://www.ruby-forum.com/topic/158088
Since this question is unanswered, yet coming up at the top of Google search results, I decided to wrap Frederick Cheung's solution up in a rake task and post it here.
Drop the following in lib/tasks/exceptions.rake
namespace :exceptions do
task :list => :environment do
exceptions = []
ObjectSpace.each_object(Class) do |k|
exceptions << k if k.ancestors.include?(Exception)
end
puts exceptions.sort { |a,b| a.to_s <=> b.to_s }.join("\n")
end
end
Run it with:
bundle exec rake exceptions:list
If you're still on Rails 2, or not using Bundler, leave off the bundle exec
This list is probably adequate, but not exhaustive. For example, ActiveResource defines several exceptions such as ActiveResource::ConnectionError and ActiveResource::TimeoutError that are not appearing when I run this task. Maybe someone else can enlighten me as to why.
A shorter option is available in Rails thanks to ActiveSupport:
puts Exception.descendants.sort_by(&:name)
You can also check out the source to see how they handle it.
I have a CDash configured to accept posts for automatic builds and tests. However, when any system attempts to post results to the CDash, the following error is produced. The result is that each result gets posted four times (presumably the original posting attempt plus the three retries).
Can anyone give me a hint as to what sets this mysterious build ID? I found some code that seems to produce a similar error, but still no lead on what might be happening.
Build::GetNumberOfErrors(): BuildId not set
Build::GetNumberOfWarnings(): BuildId not set
Submit failed, waiting 5 seconds...
Retry submission: Attempt 1 of 3
Server Response:
The buildid for CDash is computed based on the site name, the build name and the build stamp of the submission. You should have a Build.xml file in a Testing/20110311-* directory in your build tree. Open that up and see if any of those fields (near the top) is empty. If so, you need to set BUILDNAME and SITE with -D args when configuring with CMake. Or, set CTEST_BUILD_NAME and CTEST_SITE in your ctest -S script.
If that's not it, then this is a mystery. I've not seen this error occur before...
I'm having the same issue though Site and Buildname are available in test.xml and are visible on cdash (4 times). I can see the jobs increment by refreshing between retries so it seems that the submission succeeds and reports a timeout.
Update: This seems to have started when I added the -j(nprocs) switch to the ctest command. changing CtestSubmitRetryDelay: 20 (was 5) allowed a server response through that indicates the cdash version may not be able to handle the multi-proc option I'll have to look into that for my issue. Perhaps setting CtestSubmitRetryDelay to a larger number will get you back a server response as it did for me. g'luck!
Out of range value for column 'processorclockfrequency'