Encrypt/Decrypt using Carrierwave - ruby-on-rails-3

I want to make it so that my carrierwave uploader will encrypt files as it stores them and then decrypt them when they're being retrieved.
My first thought was to re-write the CarrierWave::Uploader::Store::store! and CarrierWave::Uploader::Store::retrieve_from_store! methods, to include my encryption and decryption code, but I'm not quite sure how to do this?
I'm planning on using Blowfish encryption.
Store
def store!(new_file=nil)
# seems like I should process new_file here
cache!(new_file) if new_file && ((#cache_id != parent_cache_id) || #cache_id.nil?)
if #file and #cache_id
with_callbacks(:store, new_file) do
new_file = storage.store!(#file)
#file.delete if (delete_tmp_file_after_storage && ! move_to_store)
delete_cache_id
#file = new_file
#cache_id = nil
end
end
end
Retrieve from store
def retrieve_from_store!(identifier)
with_callbacks(:retrieve_from_store, identifier) do
res = storage.retrieve!(identifier)
#file = res #process res before I store it to file?
end
end
Any tips would be greatly appreciated.

You could always do something like this where the file is not actually being stored in the public directory. This would prevent a web crawler or someone guessing the file name and from downloading your file
def store_dir
"/path_to_rails_app/uploads/#{model.user_id}/#{model.id}"
end
From here, you could use a download method and authorization tool to see if the user has access to download. Any ideas of the vunerabilities on this? Other than Mass Assignment or physical access to the machine, I don't think that it would be a big deal. You will also want to set the cache_dir to something outside of the public as well.
def cache_dir
"/path_to_rails_app/tmp/uploads/cache/#{model.user_id}/#{model.id}"
end

Related

ETL to csv files, split up and then pushed to s3 to be consume by redshift

Just getting started with Kiba, didn't find anything obvious, but I could be just channeling my inner child (who looks for their shoes by staring at the ceiling).
I want to dump a very large table to Amazon Redshift. It seems that the fastest way to do that is to write out a bunch of CSV files to an S3 bucket, then tell Redshift (via the COPY command) to pull them in. Magical scaling gremlins will do the rest.
So, I think that I want Kiba to write a CSV file for every 10k rows of data, then push it to s3, then start writing to a new file. At the end, make a post-processing call to COPY
So, can I "pipeline" the work or should this be a big, nested Destination class?
i.e.
source -> transform -> transform ... -> [ csv -> s3 ]{every 10000}; post-process
Kiba author here. Thanks for trying it out!
Currently, the best way to implement this is to create what I'd call a "buffering destination". (A version of that will likely end up in Kiba Common at some point).
(Please test thoroughly, I just authored that this morning for you, didn't run it at all, although I've used less generic versions in the past. Also keep in mind that this version uses an in-memory buffer for your 10k rows, so growing the number to something much larger will consume memory. A least memory consuming version could also be created though, which would write rows to file as you get them)
class BufferingDestination
def initialize(buffer_size:, on_flush:)
#buffer = []
#buffer_size
#on_flush = on_flush
#batch_index = 0
end
def write(row)
#buffer << row
flush if #buffer.size >= buffer_size
end
def flush
on_flush.call(batch_index: #batch_index, rows: #buffer)
#batch_index += 1
#buffer.clear
end
def close
flush
end
end
This is something you can then use like this, for instance here reusing the Kiba Common CSV destination (although you can write your own too):
require 'kiba-common/destinations/csv'
destination BufferingDestination,
buffer_size: 10_000,
on_flush: -> { |batch_index, rows|
filename = File.join("output-#{sprintf("%08d", batch_index)}")
csv = Kiba::Common::Destinations::CSV.new(
filename: filename,
csv_options: { ... },
headers: %w(my fields here)
)
rows.each { |r| csv.write(r) }
csv.close
}
You could then trigger your COPY right in the on_flush block after generating the file (if you want the upload to start right away), or in a post_process block (but this would only start after all the CSV are ready, which can be a feature to ensure some form of transactional global upload if you prefer).
You could go fancy and start a thread queue to actually handle the upload in parallel if you really need this (but then be careful with zombie threads etc).
Another way is to have "multiple steps" ETL processes, with one script to generate the CSV, and another one picking them for upload, running concurrently (this is something I've explained in my talk at RubyKaigi 2018 for instance).
Let me know how things work for you!
I'm not sure here exact question. But, I think your solution seems correct overall, but few suggestions though.
You may wnat too consider even having more then 10K records per CSV file and gzip them while sending to S3.
You want to see menifest creation containing list of multiple files and then run copy command supplying menifest file as input.
Thibaut, I did something similar, except that I streamed it out to a Tempfile, I think...
require 'csv'
# #param limit [Integer, 1_000] Number of rows per csv file
# #param callback [Proc] Proc taking one argument [CSV/io], that can be used after
# each csv file is finished
module PacerPro
class CSVDestination
def initialize(limit: 1_000, callback: ->(obj) { })
#limit = limit
#callback = callback
#csv = nil
#row_count = 0
end
# #param row [Hash] returned from transforms
def write(row)
csv << row.values
#row_count += 1
return if row_count < limit
self.close
end
# Called by Kiba when the transform pipeline is finished
def close
csv.close
callback.call(csv)
tempfile.unlink
#csv = nil
#row_count = 0
end
private
attr_reader :limit, :callback
attr_reader :row_count, :tempfile
def csv
#csv ||= begin
#tempfile = Tempfile.new('csv')
CSV.open(#tempfile, 'w')
end
end
end
end

Copy an image carrierwave

I have ArtworkUploader and i want to create a duplicate of the artwork image in same directory. Help me to solve this.
My Uploader:
class ArtworkUploader < CarrierWave::Uploader::Base
def store_dir
if model
"uploads/#{model.class.to_s.underscore}/#{model.id}/#{mounted_as}"
end
end
def filename
"artwork.png"
end
end
I tried with console but it doesn't work. What am i missing here?
Console:
> u = User.find(5)
> u.artwork.create(name: "testing.png", file: u.artwork.path)
> NoMethodError: undefined method `create!' for /uploads/5/artwork/Artwork:ArtworkUploader
there are 2 way I can think of you can do that
a) VIA Versioning :
create version of your file
version: copy_file do
process :create_a_copy
end
and now just define create_a_copy method inside your uploader which will just return the same file
This way you can have the copy of the file .I didn't understand your custom filename stuff but the way you have define it your for uploader filename method you can do that the same for version as well something like this
version: copy_file do
process :create_a_copy
def filename
"testing.png"
end
end
NOTE:
Not sure of the filename stuff for version file since I done it long back, but I believe the above setting of different filename method would work.
Adavantage :
All File and copy are associated in one uploader
No extra column is required on database (which is required in approach b)
Now the above approach too have some caveats in it
Slightly more complex
Single delete issue, delete to original uploader would delete its copy as well
b) VIA Separate Column :
They other way you can achieve that is defining a separate column called artwork_copy and mount the same uploader with just a slightly change in your uploader like this
def filename
if self.mounted_as == :artwork
"artwork.png"
else
"testing.png"
end
end
And the way you attach the file (give that your file is stored in locally)
u = User.find(5)
u.artwork_copy = File.open(u.artwork) ## Just cross check
u.save
There is an another way you do this via do the above same
u = User.find(5)
u.artwork_copy.store!(File.open(u.artwork))
Now you it pretty obvious what the advantage/disadvantage of the approach b mention above
Hope this make sense

Locally calculate dropbox hash of files

Dropbox rest api, in function metatada has a parameter named "hash" https://www.dropbox.com/developers/reference/api#metadata
Can I calculate this hash locally without call any remote api rest function?
I need know this value to reduce upload bandwidth.
https://www.dropbox.com/developers/reference/content-hash explains how Dropbox computes their file hashes. A Python implementation of this is below:
import hashlib
import math
import os
DROPBOX_HASH_CHUNK_SIZE = 4*1024*1024
def compute_dropbox_hash(filename):
file_size = os.stat(filename).st_size
with open(filename, 'rb') as f:
block_hashes = b''
while True:
chunk = f.read(DROPBOX_HASH_CHUNK_SIZE)
if not chunk:
break
block_hashes += hashlib.sha256(chunk).digest()
return hashlib.sha256(block_hashes).hexdigest()
The "hash" parameter on the metadata call isn't actually the hash of the file, but a hash of the metadata. It's purpose is to save you having to re-download the metadata in your request if it hasn't changed by supplying it during the metadata request. It is not intended to be used as a file hash.
Unfortunately I don't see any way via the Dropbox API to get a hash of the file itself. I think your best bet for reducing your upload bandwidth would be to keep track of the hash's of your files locally and detect if they have changed when determining whether to upload them. Depending on your system you also likely want to keep track of the "rev" (revision) value returned on the metadata request so you can tell whether the version on Dropbox itself has changed.
This won't directly answer your question, but is meant more as a workaround; The dropbox sdk gives a simple updown.py example that uses file size and modification time to check the currency of a file.
an abbreviated example taken from updown.py:
dbx = dropbox.Dropbox(api_token)
...
# returns a dictionary of name: FileMetaData
listing = list_folder(dbx, folder, subfolder)
# name is the name of the file
md = listing[name]
# fullname is the path of the local file
mtime = os.path.getmtime(fullname)
mtime_dt = datetime.datetime(*time.gmtime(mtime)[:6])
size = os.path.getsize(fullname)
if (isinstance(md, dropbox.files.FileMetadata) and mtime_dt == md.client_modified and size == md.size):
print(name, 'is already synced [stats match]')
As far as I am concerned, No you can't.
The only way is using Dropbox API which is explained here.
The rclone go program from https://rclone.org has exactly what you want:
rclone hashsum dropbox localfile
rclone hashsum dropbox localdir
It can't take more than one path argument but I suspect that's something you can work with...
t0|todd#tlaptop/p8 ~/tmp|295$ echo "Hello, World!" > dropbox-hash-demo/hello.txt
t0|todd#tlaptop/p8 ~/tmp|296$ rclone copy dropbox-hash-demo/hello.txt dropbox-ttf:demo
t0|todd#tlaptop/p8 ~/tmp|297$ rclone hashsum dropbox dropbox-hash-demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt
t0|todd#tlaptop/p8 ~/tmp|298$ rclone hashsum dropbox dropbox-ttf:demo
aa4aeabf82d0f32ed81807b2ddbb48e6d3bf58c7598a835651895e5ecb282e77 hello.txt

Rails 3 Carrierwave-Fog-S3 error: Expected(200) <=> Actual(404 Not Found)

I'm using Carrerwave 0.5.3 and getting a 404 error on my call to Picture.save in the Create method of my picture controller. Per the instructions in lib/carrierwave/storage/s3.rb I have the following in my initialization file (config/initializers/carrierwave_fog.rb):
CarrierWave.configure do |config|
config.s3_access_key_id = "xxxxx"
config.s3_secret_access_key = "xxxxx"
config.s3_bucket = "mybucket" #already created in my S3 account
end
In photo_uploader.rb I have:
class PhotoUploader < CarrierWave::Uploader::Base
include CarrierWave::RMagick
storage :s3
def store_dir
"uploads" # already created in my s3 account
end
def cache_dir
"uploads/cache" #already created in my s3 account
end
end
The exact error:
Excon::Errors::NotFound in PicturesController#create
Expected(200) <=> Actual(404 Not Found)
request => {:expects=>200}
response => #<Excon::Response:0x00000104a72448 #body="", #headers={}, #status=404>
I found a slightly similar question here Carrierwave and s3 with heroku error undefined method `fog_credentials=' . But setting things up the way I have it now apparently worked in that case. Unfortunately it didn't for me.
I've put a picture in my bucket and set the permissions to public and can access the picture via a browser. So things on the AWS S3 side seem to be working.
Not sure where to go next. Any ideas?
Well, I slept on this for a night came back the next day and all was good. Not sure why it suddenly started working.
Make sure your file names are sanitized and do not contain invalid characters like spaces or slashes.
To sanitize a string you can call the gsub method on it. The following method call will sanitize files for upload to S3, Google Cloud Storage etc.
"Invalid\ file *& | | name.png".gsub(/[^0-9A-z.\-]/, '_')

Paperclip processor can't find it's file

Formerly: Running a model method on a paperclip attachment after create or update (paperclip callbacks don't seem to work)
Edit (later that day)
I figured out my problem. The processor apparently works with the file that is updated, but doesn't save any files until after processing. I changed my Zip::ZipFile to open 'file' rather than 'attachment.path' since the attachment path doesn't actually hold anything yet. This fixed the first problem. Now I'm having other problems that I'll need to track down. But the answer below is mostly correct.
Edit (1/31/2011):
So I have taken the advice to create a processor for my attachment which will perform all the necessary actions. So far, it looks like it should work; the processor starts and does all the initialization stuff, apparently. However, when I get the point where I want to access the zip file that gets uploaded, I get an error saying that the file cannot be found. The code for my processor is below:
class Extractor < Processor
attr_accessor :resolution, :whiny
def initialize(file, options = {}, attachment = nil)
super
#file = file
#whiny = options[:whiny].nil? ? true : options[:whiny]
#basename = File.basename(#file.path, File.extname(#file.path))
#attachment = attachment
#instance = attachment.instance
end
def make
# do your conversions here, you've got #file, #attachment and #basename to work with
export_path = attachment.path.gsub('.zip', '_content')
Zip::ZipFile.open(attachment.path) { |zip_file|
zip_file.each { |image|
image_path = File.join(export_path, image.name)
FileUtils.mkdir_p(File.dirname(image_path))
unless File.exist?(image_path)
zip_file.extract(image, image_path)
# ..stuff that it does..
end
}
}
# clean up source files, but leave the zip
FileUtils.remove_dir(export_path)
# you return a file handle which is the processed result
dst = File.open result_file_path
end
end
And here is the contents of the error that I get:
Zip::ZipError in GalleriesController#create
File /home/joshua/railscamp/moments_on_three/public/assets/archives/delrosario.zip not found
Rails.root: /home/joshua/railscamp/moments_on_three
Application Trace | Framework Trace | Full Trace
config/initializers/extractor.rb:16:in `make'
app/controllers/galleries_controller.rb:32:in `new'
app/controllers/galleries_controller.rb:32:in `create'
Request
Parameters:
{"utf8"=>"✓",
"authenticity_token"=>"0s4L4MrlqjDTMjzjgkUdvUxeHklZNOIShDhT6fgOICY=",
"gallery"=>{"name"=>"DelRosario",
"public"=>"0",
"owner_id"=>"1",
"shoot_date(1i)"=>"2011",
"shoot_date(2i)"=>"1",
"shoot_date(3i)"=>"31",
"category_id"=>"1",
"archive"=>#<ActionDispatch::Http::UploadedFile:0x00000004148d78 #original_filename="delrosario.zip",
#content_type="application/zip",
#headers="Content-Disposition: form-data; name=\"gallery[archive]\"; filename=\"delrosario.zip\"\r\nContent-Type: application/zip\r\n",
#tempfile=#<File:/tmp/RackMultipart20110131-9745-14u347v>>},
"commit"=>"Create Gallery"}
From what I can tell it's looking for the file in the right place, but the file doesn't seem to be uploaded yet to access it. As far as I'm aware, Paperclip is smart enough to know and wait for the attachment to upload before it tries to process it. Can anyone spot what I'm doing wrong here?
Thanks a lot.
Old stuff:
I'm developing a photo gallery app using Rails 3 and Paperclip. The Admin is able to create a gallery and upload a zip file containing a bunch of images.
What I want to happen:
Enter gallery info and zip file to upload into the form.
Hit 'Create Gallery' button.
Form posts, gallery saves, and zip file gets uploaded.
After zip file is uploaded, run the method :extract_photos (btw, this code
works).
4.a. At the end of this method, zip file is destroyed.
Admin is redirected to gallery page with all the photos in it (where
gallery has_many photos).
I've tried to make this work several different ways.
Before, I created a controller method which would allow the Admin to click a link which ran the :extract_photos method. This worked on my computer, but for some reason the server had trouble routing this on the client's computer. So it's a no go. Plus I thought it was an ugly way of doing it.
Recently, I tried using callback methods. after_save didn't work because it apparently interrupts the form POST and the file doesn't get uploaded and the :extract_photos method can't find the file.
I checked out callback methods on the Paperclip github page, and it talks about the callbacks:
Before and after the Post Processing
step, Paperclip calls back to the
model with a few callbacks, allowing
the model to change or cancel the
processing step. The callbacks are
"before_post_process" and
"after_post_process" (which are called
before and after the processing of
each attachment), and the
attachment-specific
"beforepost_process" and
"afterpost_process". The callbacks are
intended to be as close to normal
ActiveRecord callbacks as possible, so
if you return false (specifically -
returning nil is not the same) in a
before filter, the post processing
step will halt. Returning false in an
after filter will not halt anything,
but you can access the model and the
attachment if necessary.
I've tried using before_post_process and after_post_process, but it can't find the file to run the process, so the file obviously isn't getting uploaded by the time those methods are getting called (which I think is strange). Additionally, when I try beforepost_process and afterpost_process, I get a NoMethodError.
So how do I call a method on an attachment when it is created or updated, but after the file is uploaded and in the right place?
UPDATE
I tried the code below, moving my extraction method code into the make method of the processor. I've gotten farther than I have did before with trying to write a processor, but it's still a no-go. The process throws an exception as soon as I try and open the uploaded file for processing, saying the file doesn't exist. The naming scheme is correct and everything, but still nothing is getting uploaded before the process is getting triggered. Does anyone have any idea why this is happening?
You can write your own processor to accomplish this.
in your model when declaring the paperclip stuff add a custom processor
has_attached_file :my_attachment, {
:styles => {:original => {:processors => [:my_processor]}}
}.merge(PAPERCLIP_SETTINGS)
then write your own processor and put it config/initializers:
module Paperclip
class MyProcessor < Processor
attr_accessor :resolution, :whiny
def initialize(file, options = {}, attachment = nil)
super
#file = file
#whiny = options[:whiny].nil? ? true : options[:whiny]
#basename = File.basename(#file.path, File.extname(#file.path))
#attachment = attachment
end
def make
# do your conversions here, you've got #file, #attachment and #basename to work with
# you return a file handle which is the processed result
dst = File.open result_file_path
end
end
end
I am using a custom processor to things similar to what you are doing with lots of processing and converting of the file in the middle and it seems to work well.