Character encoding issue exporting rails data to CSV

Character encoding issue exporting rails data to CSV - ruby-on-rails-3

I'm exporting data to a CSV file in rails and in some of my fields, I'm getting character encoding issues like this when I open in Excel:
didnâ€™t
I borrowed this code from an example and I'm assuming the encoding is off. Any idea what it should be?
send_data csv_data,
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{filename}.csv"

When Excel opens the CSV file it just assumes an "iso-8859-1" character encoding. I guess it doesn't even know about the encoding information you send along within your HTTP reply. That's why setting this to UTF-8 doesn't work.
So in order to export your CSV file for Excel in Rails you could do this:
send_data Iconv.conv('iso-8859-1//IGNORE', 'utf-8', csv_data),
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{filename}.csv"
This re-encodes your UTF-8 data string (that's the Rails default) to ISO-8859 and sends it. Along goes the information that this reply is actually ISO-8859-1 encoded (which won't make a difference for Excel but is technically correct if you should open it in a browser etc.).

This worked for me, with Chinese characters!excel csv fromat (BOM + UTF8)
def export_csv_excel
....
# Add BOM to make excel using utf8 to open csv file
head = 'EF BB BF'.split(' ').map{|a|a.hex.chr}.join()
csv_str = CSV.generate(csv = head) do |csv|
csv << [ , , , ...]
#invoices.each do |invoice|
csv << [ , , , ...]
end
end
send_data csv_str, filename: "Invoices-#{Time.now.strftime("%y%m%d%H%M%S")}.csv", type: "text/csv"
end
source(Chinese): http://blog.inheart.tw/2013/09/rubyraisl-csv-excel.html

The answers above did not work for me on Mac Excel:
Using iso-8859-1 would require I replace/remove weird characters, which is not a good enough solution for me, and using BOM with UTF8 worked under Windows but not under Mac Excel.
What worked for me is the WINDOWS-1252 encoding as suggested by https://stackoverflow.com/a/20194266/226255
def self.to_csv(options = {})
(CSV.generate(options) do |csv|
csv << self.headers
all.each do |e|
csv << e.values
end
end).encode('WINDOWS-1252', :undef => :replace, :replace => '')
end

module DownloadService
def student_list
File.open("#{file_name}", "w+:UTF-16LE:UTF-8") do |f|
file = CSV.generate({:col_sep => "\t"}) do |c|
c << ['Canción ', 'años', 'etc']
end
f.write "\xEF\xBB\xBF"
f.write(file)
end
end
end

Related

scrapy handle hebrew (non-english) language

I am using scrapy to scrap a hebrew website. However even after encoding scrapped data into UTF-8, I am not able to get the hewbrew character.
Getting weird string(× ×¨×¡×™ ×‘×¢×ž) in CSV. However If I check print same item, I am able to see the correct string on terminal.
Following is the website I am using.
http://www.moch.gov.il/rasham_hakablanim/Pages/pinkas_hakablanim.aspx
class Spider(BaseSpider):
name = "moch"
allowed_domains = ["www.moch.gov.il"]
start_urls = ["http://www.moch.gov.il/rasham_hakablanim/Pages/pinkas_hakablanim.aspx"]
def parse(self, response):
data = {'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$cboAnaf': unicode(140),
'SearchFreeText:': u'חפש',
'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$txtShemKablan': u'',
'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$txtMisparYeshut': u'',
'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$txtShemYeshuv': u'הקלד יישוב',
'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$txtMisparKablan': u'',
'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$btnSearch': u'חפש',
'ctl00$ScriptManager1': u'ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$UpdatePanel1|ctl00$ctl13$g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d$ctl00$btnSearch'}
yield FormRequest.from_response(response,
formdata=data,
callback = self.fetch_details,
dont_click = True)
def fetch_details(self, response):
# print response.body
hxs = HtmlXPathSelector(response)
item = MochItem()
names = hxs.select("//table[#id='ctl00_ctl13_g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d_ctl00_gridRashamDetails']//tr/td[2]/font/text()").extract()
phones = hxs.select("//table[#id='ctl00_ctl13_g_dbcc924d_5066_4fee_bc5c_6671d3e2c06d_ctl00_gridRashamDetails']//tr/td[6]/font/text()").extract()
index = 0
for name in names:
item['name'] = name.encode('utf-8')
item['phone'] = phones[index].encode('utf-8')
index += 1
print item # This is printed correctly on termial.
yield item # If I create a CSV output file. Then I am not able to see proper Hebrew String
The weird thing is, If i open the same csv in notepad++. I am able to see the correct output. So as a workaroud. What i did is, I opened the csv in notepad++ and change the encoding to UTF-8. And saved it. Now when i again open the csv in excel it shows me the correct hebrew string.
Is there anyway to specify the CSV encoding, from within scrapy ?

Adding currency unit in Prawn PDF? (syntax error)

I'm working on a PDF invoice using Prawn PDF. I am trying to use number_to_currency whilst passing the unit.
def line_item_rows
[["Description", "Qty", "Unit Price", "Price GBP"]] +
#invoice.line_items.map do |item|
[item.name, item.quantity, price(item.unit_price), price(item.full_price)]
end
end
#view.number_to_currency(num, :unit => "£")
The above results in an error:
syntax error, unexpected $end, expecting ')'
#view.number_to_currency(num, :unit => "£")
^):
If I use the HTML value instead it simply outputs the raw html:
#view.number_to_currency(num, :unit => "£")
Total £2,266.00
Is there a particular way of adding a £ when using Prawn PDF? The above attempts work fine when using html/erb but not when using Prawn PDF.

Ruby probably isn't treating your source file as utf-8:
# encoding: US-ASCII <-- it's defaulting to this
puts "£"
So when it compiles:
$ ruby foo.rb
foo.rb:2: invalid multibyte char (US-ASCII)
foo.rb:2: invalid multibyte char (US-ASCII)
foo.rb:2: syntax error, unexpected $end, expecting ')'
puts("£")
^
Add an encoding hint at the top of your file:
# encoding: utf-8
puts("£")
And it should run:
$ ruby foo.rb
£

Rails - How to test that ActionMailer sent a specific attachment?

In my ActionMailer::TestCase test, I'm expecting:
#expected.to = BuyadsproMailer.group_to(campaign.agency.users)
#expected.subject = "You submitted #{offer_log.total} worth of offers for #{offer_log.campaign.name} "
#expected.from = "BuyAds Pro <feedback#buyads.com>"
#expected.body = read_fixture('deliver_to_agency')
#expected.content_type = "multipart/mixed;\r\n boundary=\"something\""
#expected.attachments["#{offer_log.aws_key}.pdf"] = {
:mime_type => 'application/pdf',
:content => fake_pdf.body
}
and stub my mailer to get fake_pdf instead of a real PDF normally fetched from S3 so that I'm sure the bodies of the PDFs match.
However, I get this long error telling me that one email was expected but got a slightly different email:
<...Mime-Version: 1.0\r\nContent-Type: multipart/mixed\r\nContent-Transfer-Encoding: 7bit...> expected but was
<...Mime-Version: 1.0\r\nContent-Type: multipart/mixed;\r\n boundary=\"--==_mimepart_50f06fa9c06e1_118dd3fd552035ae03352b\";\r\n charset=UTF-8\r\nContent-Transfer-Encoding: 7bit...>
I'm not matching the charset or part-boundary of the generated email.
How do I define or stub this aspect of my expected emails?

Here's an example that I copied from my rspec test of a specific attachment, hope that it helps (mail can be creating by calling your mailer method or peeking at the deliveries array after calling .deliver):
mail.attachments.should have(1).attachment
attachment = mail.attachments[0]
attachment.should be_a_kind_of(Mail::Part)
attachment.content_type.should be_start_with('application/ics;')
attachment.filename.should == 'event.ics'

I had something similar where I wanted to check an attached csv's content. I needed something like this because it looks like \r got inserted for newlines:
expect(mail.attachments.first.body.encoded.gsub(/\r/, '')).to(
eq(
<<~CSV
"Foo","Bar"
"1","2"
CSV
)
)

Getting Origami-pdf to work with Amazon S3 files

I've implemented a local script to insert digital signatures into local pdf files recurring to Origami, but don't quite know what would be the best approach to do this within a rails server, and with amazon s3 stored files.
I am guessing i would need to download the file from s3 to my server (or capture it before uploading to amazon, which is what i am doing with paperclip) insert the signature, and sent it back to s3 again.
Here is the PDF.read method in pdf.rb file of origami solution:
class << self
#
# Reads and parses a PDF file from disk.
#
def read(filename, options = {})
filename = File.expand_path(filename) if filename.is_a?(::String)
PDF::LinearParser.new(options).parse(filename)
end
How could i adapt this so that i treat an in-memory binary file?
Do you have any suggestions?
You can find more about origami here
And my code below
require 'openssl'
begin
require 'origami'
rescue LoadError
ORIGAMIDIR = "C:\RailsInstaller\Ruby1.9.3\lib\ruby\gems\1.9.1\gems\origami-1.2.4\lib"
$: << ORIGAMIDIR
require 'origami'
end
include Origami
INPUTFILE = "Sample.pdf"
#inputfile = String.new(INPUTFILE)
OUTPUTFILE = #inputfile.insert(INPUTFILE.rindex("."),"_signed")
CERTFILE = "certificate.pem"
RSAKEYFILE = "private_key.pem"
passphrase = "your passphrase"
key4pem=File.read RSAKEYFILE
key = OpenSSL::PKey::RSA.new key4pem, passphrase
cert = OpenSSL::X509::Certificate.new(File.read CERTFILE)
pdf = PDF.read(INPUTFILE)
page = pdf.get_page(1)
# Add signature annotation (so it becomes visibles in pdf document)
sigannot = Annotation::Widget::Signature.new
sigannot.Rect = Rectangle[:llx => 89.0, :lly => 386.0, :urx => 190.0, :ury => 353.0]
page.add_annot(sigannot)
# Sign the PDF with the specified keys
pdf.sign(cert, key,
:method => 'adbe.pkcs7.sha1',
:annotation => sigannot,
:location => "Portugal",
:contact => "myemail#email.tt",
:reason => "Proof of Concept"
)
# Save the resulting file
pdf.save(OUTPUTFILE)

PDF.read and PDF.save methods both accept either a file path or a Ruby IO object.
One method to create a PDF instance from a string (which, I suppose, is what you mean when you say "in-memory") is to use a StringIO object.
For example, the following session in the Origami shell will create a PDF instance, save it to a StringIO object and reload it using its own output string.
>>> PDF.new.save(strio = StringIO.new)
...
>>> strio.string
"%PDF-1.0\r\n1 0 obj\r\n<<\r\n\t/Pages 2 0 R ..."
>>> strio.reopen(strio.string, 'r')
#<StringIO:0xffbea6cc>
>>> pdf = PDF.read(strio)
...
>>> pdf.class
Origami::PDF

After a deep further analysis into the origami code, i noticed that PDF.Read accepts a binary file, and so instead of sending the local file path, we can send the file instance as a whole.

As #MrWater wrote, Origami::PDF.read accepts a stream (more precisely, the Origami::PDF::LinearParser does, look at the source here).
Here's my simple solution:
require 'open-uri'
# pdf_url = 'http://someurl.com/somepdf.pdf'
pdf = Origami::PDF.read(URI.parse(pdf_url))
References
open-uri URI
Origami::PDF::LinearParser

Ignore header line when parsing CSV file

How can the header line of the CSV file be ignored in ruby on rails while doing the CSV parsing!! Any ideas

If you're using ruby 1.8.X and FasterCSV, it has a 'headers' option:
csv = FasterCSV.parse(your_csv_file, {:headers => true}) #or false if you do want to read them
If you're using ruby 1.9.X, the default library is basically FasterCSV, so you can just do the following:
csv = CSV.parse(your_csv_file, {headers: true})

csv = CSV.read("file")
csv.shift # <-- kick out the first line
csv # <-- the results that you want

I have found the solution to above question. Here is the way i have done it in ruby 1.9.X.
csv_contents = CSV.parse(File.read(file))
csv_contents.slice!(0)
csv=""
csv_contents.each do |content|
csv<<CSV.generate_line(content)
end

Easier way I have found is by doing this:
file = CSV.open('./tmp/sample_file.csv', { :headers => true })
# <#CSV io_type:File io_path:"./tmp/sample_file.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"" headers:true>
file.each do |row|
puts row
end

Here is the simplest one worked for me. You can read a CSV file and ignore its first line which is the header or field names using headers: true:
CSV.foreach(File.join(File.dirname(__FILE__), filepath), headers: true) do |row|
puts row.inspect
end
You can do what ever you want with row. Don't forget headers: true

To skip the header without the headers option (since that has the side-effect of returning CSV::Row rather than Array) while still processing a line at a time:
File.open(path, 'r') do |io|
io.readline
csv = CSV.new(io, headers: false)
while row = csv.shift do
# process row
end
end

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Character encoding issue exporting rails data to CSV - ruby-on-rails-3

module DownloadService def student_list File.open("#{file_name}", "w+:UTF-16LE:UTF-8") do |f| file = CSV.generate({:col_sep => "\t"}) do |c| c << ['Canción ', 'años', 'etc'] end f.write "\xEF\xBB\xBF" f.write(file) end end end

Related

scrapy handle hebrew (non-english) language

Adding currency unit in Prawn PDF? (syntax error)

Rails - How to test that ActionMailer sent a specific attachment?

Getting Origami-pdf to work with Amazon S3 files

Ignore header line when parsing CSV file

Categories

Resources