I am making an application in which it import a csv file names user.csv.But the problem i am facing is that it gives an error
ArgumentError in CsvimportController#import
wrong number of arguments (1 for 0)
And the code of the CsvimportController is
require 'csv'
class CsvimportController < ApplicationController
def import
results = import('anas.csv') do
read_attributes_from_file
end
end
end
And i have also give the specification of csv-mapper and fastercsv in gem file.
Can anyone help me???
Any help would be appreciated..
Thanks
Take a look at Railscast 396 on how to import data from CSV and Excel files.
The smarter_csv project aims to provide a better dealing with CSV files so it would worth to take a look.
It's easy if you use the Gem smarter_csv.
All you need to do is this:
require 'smarter_csv'
def import(filename)
results = SmarterCSV.process( filename, options_hash )
end
and you need to specify the options in the options_hash according to the documentation of smarter_csv
There are tons of useful options, including manipulation of the headers, custom-headers, ignoring columns, and type-conversion of values.
If your CSV file is large, you can also chunk the incoming data for parallel processing.
Related
UPDATE BELOW.....
Have automated csv data dumping into our backend and it looks like there are some malformed items buried in the data. There is a job family title that errantly has a \n in between two words. Which is wrecking our data, so that's the problem.
I want to read in the csv as wholetext, regexp_replace the title with the correction, then load this fixed wholetext into a new dataframe as if I loaded up a correct csv to start with.. Here's the madness of where I'm at right now: Lol.
# Import in the functions I need
# from pyspark.sql.functions import col
# Looks like there is a job family title with an issue. There's a carriage return / line feed between two words messing up the csv
# This needs to be patched before we actually pull the data into the dataframes to begin work
data_requisitions_patch0 = spark.read.text('abfss://container#somethingcool.dfs.core.windows.net/Data/brokencsv.csv', wholetext=True)
data_requisitions_patch0.collect()
data_requisitions_new = data_requisitions_patch0
# print(data_requisitions_patch0)
# data_requisitions_patch0.printSchema()
# data_requisitions_patch0.show()
data_requisitions_patch1 = data_requisitions_patch0 \
.withColumn("value", regexp_replace(col('value'), 'Job - Starting\n', 'Job - Starting'))
data_requisitions_patch1.collect()
print('patch0')
data_requisitions_new.count()
print('patch1')
data_requisitions_patch1.count()
# print('Patch0 dataframe: ' + data_requisitions_patch0.count())
# print('Patch0 dataframe: ' + data_requisitions_patch1.count())
# data_requisitions_test0 = spark(data_requisitions_patch1, header=True)
# data_requisitions_test1 = spark.read.csv('abfss://container#somethingcool.dfs.core.windows.net/Data/brokencsv.csv', header=True)
# data_requisitions_test0.count()
# data_requisitions_test0.printSchema()
# data_requisitions_test1.count()
# data_requisitions_test1.printSchema()
It's obviously a mess right now, I'm trying to troubleshoot is the regexp_replace is working, but not having much luck. Then it occurred to me that I have a single row single column dataframe. Now I'm attempting to try to figure how how to take the dataframe post the 'patch' and turn that back into a normal csv'ed dataframe like everything was ok to begin with.
I left in all my testing nonsense, thought was that you might see where my head is... Unsure if that was helpful or not. Links have been faked, obviously.
First off: Am I going in the right direction? No part of this is really working.. I can't even get the counts to work. test1.count() does return... but test0.count() doesn't? I don't even really care about the counts, that's me just trying to figure out why it's not working.
Secondly: Malformed csv -> wholetext dataframe -> regexp fix the problem -> fixed dataframe with correct headers, rows, like normal.
How off am I?
=======
UPDATE
Made some great progress, I ended up splitting the wholetext dataframe on \n line feeds and exploded that into rows. That works great. Now the dataframe has exactly how many rows it's supposed to have. Now working on trying to figure out how to re-map the columns to get those created correctly.
Thoughts are to take in the header row and try to use that as a map? I don't know, still researching.
I wasn't approaching this right... Was handling this like a typical C# project, pull data from the db and process. But this doesn't really deal well with that. Ended up putting the processed data into the dataframe itself and ran my if checks from contained columns. Works fantastic, and it's a lot faster than trying to extract the data to do the checks.
For example, if my text data come from a database, how can I get one line/doc(as a database record) using the same mechanism (subclassing Dataset such that the pipeline described here still works) as TextLineDataset ?
By looking at the source code of TextLineDataset, I find that make_dataset_resource() seems an import method to be implemented. But I can't find where the actual code of yielding a line from a file as the docstring of TextLineDataset says: A Dataset comprising lines from one or more text files.
I need to store the data presented in the graphs on the Google Ngram website. For example, I want to store the occurences of "it's" as a percentage from 1800-2008, as presented in the following link: https://books.google.com/ngrams/graph?content=it%27s&year_start=1800&year_end=2008&corpus=0&smoothing=3&share=&direct_url=t1%3B%2Cit%27s%3B%2Cc0.
The data I want is the data you're able to scroll over on the graph. How can I extract this for about 140 different terms (e.g. "it's", "they're", "she's", etc.)?
econpy wrote a nice little module in Python that you can use through a command-line interface.
For your "it's" example, you would need to type this command in a terminal / windows console:
python getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
This will automatically save the query result in a CSV file named after your query parameters.
econpy's package, in #HugoMailhot's answer, no longer works (2021) and seems not maintained.
Here's a updated version, with some improvements for easier integration into Python code:
https://gitlab.com/cpbl/google-ngrams
You can call this from the command line (as in econpy's) to create a CSV file, e.g.
getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
or call it from python to get (and plot) data directly in python, e.g.:
from getngrams import ngrams
df = ngrams('bells and whistles -startYear=1900 -endYear=2018 -smoothing=2')
df.plot()
The xkcd functionality is still there too.
(Issues / bug fix pull requests /etc welcome there)
I have a CSV file which contains these columns - Timestamp, Author, Title and Content.
Now I would like to import this CSV into TYPO3, so that I can display a list of posts containing these attributes.
If the above is not possible, is there a way to write in manual SQL queries, so that I can manually insert content into TYPO3 ?
I have tried many extensions for importing CSV- wil_import, rs_impory, external import .. but none of them work !!
In the following image, I have installed wil_import, but It does not show up anything.
Do I need to make any changes anywhere else, like configuration or something?
You could use phpmyadmin's CSV import functionality. It works reliably.
I've had same problem once and my day was saved thanks to Francois Suter's (Core team member) extensions: svconnector and svconnector_csv. So, I can really recommend them
I would like for ActiveAdmin gem only allow to download collection as csv file . Default it is set to download as xml, csv and json. I would like only allow to download as csv.
You can pass an array with the formats that you want to allow downloading. For instance, for only enabling csv you can do the following:
index download_links: [:csv] do
column :author
column :title
end
I found the link on the bottom of that post. I don't see a date on it, so it might not work anymore, or there might be a better way than monkey patching ActiveAdmin...
https://coderwall.com/p/qzlssg?i=1&p=1&q=author%3Adaviec85&t%5B%5D=daviec85