Rails, check database if uploading data in CSV already exists - sql

I'm new to Rails and I would like to know if I can check if data already exists in my database while uploading new data from CSV. So far I haven't found this on the net.
I use a Postgres database. I don't know if I have to check it in Rails or in Postgres. In my database there are some columns like id, personal_id, cost_center. So I will check if one (or more) of my new data has the same personal_id with the the same cost_center while uploading.
How can I do this?
UPDATE
I've tried the solution of #huan son, it works but not the way I need it. So I tried different things and I think a SQL query is in my case the best choice.
DELETE FROM bookings WHERE id NOT IN (SELECT MIN(id) FROM bookings GROUP BY personal_id, wbs, date, hours, cost_center)
Booking.delete_all.where('id NOT IN (?)', Booking.select('MIN(id)').group(:personal_id, :wbs, :date, :hours, :cost_center).map(&:id))
My SQL query works like the way I want it but I don't know the right "translation" into rails because with the second code above my whole bookings table gets deleted
Solution
The solution for my problem is:
Booking.delete_all(['id NOT IN (?)', Booking.group(:personal_id, :wbs, :date, :hours, :cost_center).pluck('MIN(id)')])

Those are unique scopes.
You need to define those first in your database migration so postgres is making sure there is no double-value wie [key, key1, key2...]
add_index :table, [:personal_id, :cost_center_id], :unique => true
then you need to go into your rails-model and catch that uniqueness at validations.
validates_uniqueness_of :personal_id, :scope => :cost_center_id
with that, rails is querying every time before creating a object, the database and check if something with the unique-pair-values already exists. if so, its adding it to the #errors of the model, so the model can't be saved

You can use uniqueness validation, but in my experience when importing data from CSV, the problem is that if the item already exists, all the validation does is stop the record being saved. In most examples, you usually want to do something with the matched record. Therefore, I'd suggest you also look at find_or_initialize_by. This allows you to also update existing records from imported data.
So if you have Thing model with a name and cost for example, you may want to identify existing things by name, and update their costs. And create new things where no matching name exists. The following code would do that:
name, price = some_method_that_gets_name_and_price_from_csv
thing = Thing.find_or_initialize_by name: name
thing.price = price
thing.save
Also have a look at find_or_create_by which can be more suitable in some situations. I'd also still keep the validation of uniqueness in the model. I just wouldn't use validation to handle how the data was imported.

Related

Mass/bulk update in rails without using update_all with a single query?

I want to update multiple rows using single query of active record. I don't have to use update_all because it skips validation. Is there any way to do this in rails active record.?
Mass update without using update_all can be achievable using activerecord-import gem.
Please refer to this gem for more information.
Methods with detail.
Example:
Lets say there is a table named "Services" having a "booked" column. We want to update its value using the gem outside the loop.
services.each do |service|
service.booked = false
service.updated_at = DateTime.current if service.changed?
end
ProvidedService.import services.to_ary, on_duplicate_key_update: { columns: %i[booked updated_at] }
active-record import by default does not update the "updated_at" column. So we've to explicitly update it.
If you want to update multiple records without instantiating the models, update_all method should do the trick:
Updates all records in the current relation with details given. This method constructs a single SQL UPDATE statement and sends it straight to the database. It does not instantiate the involved models and it does not trigger Active Record callbacks or validations. However, values passed to #update_all will still go through Active Record’s normal type casting and serialization.
E.g:
# Update all books with 'Rails' in their title
Book.where('title LIKE ?', '%Rails%').update_all(author: 'David')
As I understood, it even accepts an array as a parameter, allowing us to provide different hashes to update different records in their corresponding order, like in SQL UPDATE statement. Correct me, somebody, if I'm wrong.
It sounds like you are looking for Update records - from the documentation Updating multiple records; different col with different data
You can do like this Example:
people = { 1 => { "first_name" => "David" }, 2 => { "first_name" => "Jeremy" } }
Person.update(people.keys, people.values)

Need a strategy for an efficient autocomplete in Rails across multiple attributes

I have a form for submitting an order, and I need an autocomplete field that searches across three attributes in the associated customer model: first name, last name, and customer_number (as opposed to customer.id). I know about the rails3-jquery-autocomplete gem found here http://github.com/crowdint/rails3-jquery-autocomplete, and got it working well, but a question has occurred to me -- is there a more efficient way to make the autocomplete work without having to query the db every time?
The other solution that occurred to me is to create a new indexed attribute in the customer model -- call it autocomplete_data. Whenever a new customer is added via the usual new customer form, an :after_create callback could populate the field. Would this speed up the performance? Or am I overthinking it?
UPDATE
I'm embarassed to say that I just didn't search hard enough the first time around -- I think this actually answers my question:
Rails: Efficiently searching by both firstname and surname

Rails 3 Searching Multiple Models by created_at using sunspot

I'm trying to get a "What's new" section working in my Rails app that takes into account new records created for various tables that don't share any relationships. The one thing they do have in common is that they all have a created_at field, which I'm going to use to determine if they're indeed "new" and then I'm wanting to sort the results by that common field. I tried doing this with Sunspot, but I couldn't figure out how to make use of the the result set returned from the Sunspot search...
For instance in my Uploads and Article models I have:
searchable do
time :created_at
end
and in my search action I'll do this:
#updates = Sunspot.search(Upload,Article) do
with(:created_at).greater_than(1.hour.ago)
end
Which does seem to return something, if I do an #updates.total it returns the number of records I was expecting to find. Beyond this I'm not sure how to actually make use of the records. What I'd like to do is send #updates to a view and determine the model type of each record and then proceed to print out the relevant information, i.e names, descriptions, parent/child record information (for instance upload.user.username).
I might be going at this all wrong, perhaps there's a better option than sunspot for the simple search I'm attempting to perform?
Refer readme for details of how to use the search results. The method you are looking for is "results", which will give you first 30 results, by default:
#updates.results # array of first 30 results

Rails 3 - defining :order using a combination of attributes, while treating the *latest* record differently

My Application has the following :order requirements:
Order by the importance (measured by say number of likes) AND
Order by created_at DESC
So I am currently using the following to define the :order
:order => 'model.likes_count DESC, created_at DESC'
However, in my views, I create new model entries using AJAX and therefore, I reload the partials where I display the database entries for this particular model and would like to use jQuery effects to (say) highlight the record that was just created.
Now due to the above :order definition, the record just created would not show up as the first one if an older record has greater number of 'likes'.
Is there a way to define an order which takes into account the "latest" record differently and then order the rest of the records as per the defined order? If possible what would be the clean way to achieve this.
Thanks!
The Rails API does not have any functionality to treat a newly created record differently.
Your options are:
Do not select the new record in the query, eg. with .where("id <> ?",idofnewrecord)
You can select the new record in a separate query if required
Or, select the records as normal, then find the new record in ruby, eg:
#newrecord = #records.find{|r|r.id==idofnewrecord}
or to delete it from the array of records:
#newrecord = #records.delete_if{|r|r.id==idofnewrecord}
If you don't know the id, you may be able to filter it out with some other attribute as well, eg. created_at.

Rails3: left join aggregate count - how to calculate?

In my application Users register for Events, which belong to a Stream. The registrations are managed in the Registration model, which have a boolean field called 'attended'.
I'm trying to generate a leaderboard and need to know: the total number of registrations for each user, as well as a count for user registrations in each individual event stream.
I'm trying this (in User.rb):
# returns an array of users and their attendence count
def self.attendance_counts
User.all(
:select => "users.*, sum(attended) as attendance_count",
:joins => 'left join `registrations` ON registrations.user_id = users.id',
:group => 'registrations.user_id',
:order => 'attendance_count DESC'
)
end
The generated SQL works for just returning the total attended count for each user when I run it in the database, but all that gets returned is the User record in Rails.
I'm about to give up and hardcode a counter_cache for each stream (they are fairly fixed) into the User table, which gets manually updated whenever the attended attribute changes on a Registration model save.
Still, I'm really curious as to how to perform a query like this. It must come up all the time when calculating statistics and reports on records with relationships.
Your time and consideration is much appreciated. Thanks in advance.
Firstly as a couple of points on style and rails functions to help you with building DB queries.
1) You're better writing this as a scope rather than a method i.e.
scope attendance_counts, select("users.*, sum(attended) as attendance_count").joins(:registrations).group('registrations.user_id').order('attendance_count DESC')
2) It's better not to call all/find/first on the query you've built up until you actually need it (i.e. in the controller or view). That way if you decide to implement action / fragment caching later on the DB query won't get called if the cached action / fragment is served to the user.
3) Rails has a series of functions to help with aggregating db data. for example if you only wanted a user's id and the sum of attended you could use something like the following code:
Registrations.group(:user_id).sum(:attended)
Other functions include count, avg, minimum, maximum
Finally in answer to your question, rails will create an attribute for you to access the value of any custom fields you have in the select part of your query. e.g.
#users = User.attendance_counts
#users[0].attendance_count # The attendance count for the first user returned by the query