Prestashop - import csv files of products in different languages : feature value not translated - prestashop

I want to import csv files of products in 2 different languages in Prestashop 1.6.
I have 2 csv files, one for each languages.
Everything is fine when I import the csv file of the first language.
When I import the csv file of the second language, the features values are not understand by Prestashop as the translation of the features values of the first language, but added as new features values.
It s added as a new feature value because I use Multiple Feature module (http://addons.prestashop.com/en/search-filters-prestashop-modules/6356-multiple-features-assign-your-features-as-you-want.html) .
Without this module, the second csv import updates the feature value of both languages.
How can I make Prestashop understand that it s a translation, not a new feature value of a feature?
Thanks!

I found a solution by updating the database directly.
- I imported all my products using csv import in prestashop for the main language.
- feature values are stored in ps_feature_value_lang table. 3 columns : id_feature_value | id_lang | value
- In my case, french is ps_feature_value_lang.id_lang = 1 and english ps_feature_value_lang.id_lang = 2
- Before I do any change, data of ps_feature_value_lang looks like that:
id_feature_value | id_lang | value
1 | 1 | my value in french
1 | 2 | my value in english
- I created a table (myTableOfFeatureValueIWantToImport) with 2 columns : feature_value_FR / feature_value_EN. I filled this table with data.
- because I don't know the ID (id_feature_value) of my feature values (prestashop has created these ID during the import of the csv file of my first language), I m gonna loop on the data of myTableOfFeatureValueIWantToImport and each time ps_feature_value_lang.id_lang == 2 and ps_feature_value_lang.value == "value I want to translate" I m gonna update ps_feature_value_lang.value with my feature values translated.
$select = $connection>query("SELECT * FROM myTableOfFeatureValueIWantToImport GROUP BY feature_value_FR");
$select->setFetchMode(PDO::FETCH_OBJ);
while( $data = $select->fetch() )
{
$valFR = $data->feature_value_FR;
$valEN = $data->feature_value_EN;
$req = $connection->prepare('UPDATE ps_feature_value_lang
SET ps_feature_value_lang.value = :valEN
WHERE ps_feature_value_lang.id_lang = 2
AND ps_feature_value_lang.value = :valFR
');
$req->execute(array(
'valEN' => $valEN,
'valFR' => $valFR
));
}
done :D

Related

How can I import data from excel to postgres- many to many relationship

I'm creating a web application and I encountered a problem with importing data to a table in a postgress database.
I have excel with id_b and id_cat(books id and categories id) books have several categories and categories can be assigned to many books, excel looks like this:
excel data
It has 30 000 records.
I have a problem how to import it into the database(Postgres). The table for this data has two columns:
id_b and id_cat. I wanted to export this data to csv in this way, each book has to be assigned a category identifier (e.g., book with identifier 1 should appear 9 times because it has 9 categories assigned to it and so on)- but I can't do it easily. It should looks like this:
correct data
Does anyone know any way to get data in this form?
Your excel sheet format has a large number of columns, which also depends on the number of categories, and SQL isn't well adapted to that.
The simplest option would be to:
Export your excel data as CSV.
Use a python script to read it using the csv module and output COPY-friendly tab-delimited format.
Load this into the database (or INSERT directly from python script).
Something like that...
import csv
with open('bookcat.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if row:
id = row[0].strip()
categories = row[1:]
for cat in categories:
cat = cat.strip()
if cat:
print("%s\t%s" % (id, cat))
csv output version:
import csv
with open('bookcat.csv') as csvfile, open("out.csv","w") as outfile:
reader = csv.reader(csvfile)
writer = csv.writer(outfile)
for row in reader:
if row:
id = row[0].strip()
categories = row[1:]
for cat in categories:
cat = cat.strip()
if cat:
writer.writerow((id, cat))
If you need a specific csv format, check the docs of csv module.

How to enrich places with geonames ID

I have a list of places which I would enrich with the IDs from geonames.
Since geonames by default it's embedded into WikiData I chose to go directly via SPARQL using WikiData endpoint.
My workflow:
I have imported the excel file into OpenRefine and created a new project
In OpenRefine I have created my graph, then I have downloaded it as RDF/XML. Here a snapshot:
<rdf:Description rdf:about="http://localhost:3333/0">
<rdfs:label>Aïre</rdfs:label>
<crm:P1_is_identified_by>5A1CE163-105F-4BAF 8BF9</crm:P1_is_identified_by>
</rdf:Description>
I have imported then the RDF file into my local graphDB and I runned the federated query:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {?place <http://purl.org/NET/cidoc-crm/core#P1_is_identified_by> ?value;
rdfs:label ?label_geo.
SERVICE <https://query.wikidata.org/sparql> {
?value wdt:P31/wdt:P279* wd:Q515;
rdfs:label ?label;
wdt:P1566 ?id_value.
}
}
limit 10
No results.
The output should be something like this:
|-----------------------|------------------|---------------|
| Oggetto | Place | GeonamesID |
|-----------------------|------------------|---------------|
|5A1CE163-105F-4BAF 8BF9| Aïre |11048419 |
|-----------------------|------------------|---------------|
Suggestions?
Thanks a lot.
I solved the problem directly via client
Here my pipeline:
I have created an Excel sheet with a list of place name
I built a Python script that uses as query parameters the values from the excel sheet and save the output in a .txt file. E.g. Aïre,https://www.geonames.org/11048419
import pandas as pd
import requests
import json
import csv
url = 'http://api.geonames.org/searchJSON?'
#Change df parameters according to excel sheet specification.
df = pd.read_excel('grp.xlsx', sheet_name='Foglio14', usecols="A")
for item in df.place_name:
df.place_name.head()
#Change username params with geonames API username
params ={ 'username': "XXXXXXXX",
'name_equals': item,
'maxRows': "1"}
e = requests.get(url, params=params)
pretty_json = json.loads(e.text)
with open("data14.txt", "a") as myfile:
writer = csv.writer(myfile)
for item in pretty_json["geonames"]:
#print("{}, https://www.geonames.org/{}".format(item["name"], item["geonameId"]))
writer.writerow([item["name"], "https://www.geonames.org/{}".format(item["geonameId"])]) #Write row.
myfile.close()
I have copied the output from the .txt file in the column B of the excel sheet.
I split then the output values into two columns. E.g.
|---------------------|-----------------------------------|
| ColA | ColB |
|---------------------|-----------------------------------|
| Aïre | https://www.geonames.org/11048419 |
|---------------------|-----------------------------------|
Since there is no a 1:1 correspondence between place name and the obtained results I have aligned the values.
In the excel sheet I have created a new empty column B
In the column B I wrote the formula: =IF(ISNA(MATCH(A1;C:C;0));"";INDEX(C:C;MATCH(A1;C:C;0))) and I have iterated the formula till the end of the list
Then I have created a new empty column C
In the column C I wrote the formula: =IFERROR(INDEX($E:$E;MATCH($B1;$D:$D;0));"") and I have iterated the formula till the end of the list
Here the final result:

Reformat wide Excel table to more SQL-friendly structure

I have a very wide Excel sheet, from Column A - DIE (about 2500 columns wide), of survey data. Each column is a question, and each row is a response. I'm trying to upload the data to SQL and convert it to a more SQL-friendly format using the UNPIVOT function, but I can't even get it loaded into SQL because it exceeds the 1024-column limit.
Basically, I have an Excel sheet that looks like this:
But I want to convert it to look like this:
What options do I have to make this change, either in Excel (prior to upload) or SQL (while circumventing the 1024 column limit)?
I have had to do this quite a bit. My solution was to write a Python script that would un-crosstab a CSV file (typically exported from Excel), creating another CSV file. The Python code is here: https://pypi.python.org/pypi/un-xtab/ and the documentation is here: http://pythonhosted.org/un-xtab/. I've never run it on a file with 2500 columns, but don't know why it wouldn't work.
R has a very specific function call in one of it's libraries. You can also connect, read, and write data with R into a database. Would suggest downloading R and Rstudio.
Here is a working script to get you started that does what you need:
Sample data:
df <- data.frame(id = c(1,2,3), question_1 = c(1,0,1), question_2 = c(2,0,2))
df
Input table:
id question_1 question_2
1 1 1 2
2 2 0 0
3 3 1 2
Code to transpose the data:
df2 <- gather(df, key = id, value = values)
df2
Output:
id id values
1 1 question_1 1
2 2 question_1 0
3 3 question_1 1
4 1 question_2 2
5 2 question_2 0
6 3 question_2 2
Some helper functions for you to import and export the csv data:
# Install and load the necessary libraries
install.packages(c('tidyr','readr'))
library(tidyr)
library(readr)
# to read a csv file
df <- read_csv('[some directory][some filename].csv')
# To output the csv file
write.csv(df2, '[some directory]data.csv', row.names = FALSE)
Thanks for all the help. I ended up using Python due to limitations in both SQL (over 1024 columns wide) and Excel (well over 1 million rows in the output). I borrowed the concepts from rd_nielson's code, but that was a bit more complicated than I needed. In case it's helpful to anyone else, this is the code I used. It outputs a csv file with 3 columns and 14 million rows that I can upload to SQL.
import csv
with open('Responses.csv') as f:
reader = csv.reader(f)
headers = next(reader) # capture current field headers
newHeaders = ['ResponseID','Question','Response'] # establish new header names
with open('PythonOut.csv','w') as outputfile:
writer=csv.writer(outputfile, dialect='excel', lineterminator='\n')
writer.writerow(newHeaders) # write new headers to output
QuestionHeaders = headers[1:len(headers)] # Slice the question headers from original header list
for row in reader:
questionCount = 0 # start counter to loop through each question (column) for every response (row)
while questionCount <= len(QuestionHeaders) - 1:
newRow = [row[0], QuestionHeaders[questionCount], row[questionCount + 1]]
writer.writerow(newRow)
questionCount += 1

Reading sparse columns from a CSV

I get a CSV that I need to read into a SQL table. Right now it's manually uploaded with a web application, but I want to move this into SQL server. Rather than port my import script straight across into a script in SSIS, I wanted to check and see if there was a better way to do it.
The issue with this particular CSV is that the first few columns are known, and have appropriate headers. However, after that group, the rest of the columns are sparsely populated and might not even have headers.
Example:
Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4
What makes this tolerable is that everything after Col3 can get concatenated together. The script checks each row for these trailing columns and puts them together into a "misc" column. It has to do this in a bit of a blind method because there is no way of knowing ahead of time how many of these columns will be out there.
Is there a way to do this with SSIS tools, or should I just port my existing import script to an SSIS script task?
Another option outside of SSIS is using BulkInsert with format files.
Format files allow you to describe the format of the incoming data.
For example..
9.0
4
1 SQLCHAR 0 100 "," 1 Header1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 Header2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3 Header3 SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "\r\n" 4 Misc SQL_Latin1_General_CP1_CI_AS
Bulk Insert>> http://msdn.microsoft.com/en-us/library/ms188365.aspx
Format Files >> http://msdn.microsoft.com/en-us/library/ms178129.aspx
Step 0. My test file with an additional line
Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4
ends,with,comma,,,value4,
Drag a DFT on the Control flow surface
Inside the DFT, on the data flow surface, drag a Flat file source
Let is map by itself to start with. Check Column names in the first data row.
You will see Col1, Col2, Col3 which are your known fields.
You will also see Column 3 through Column 8. These are the columns
that need to be lumped into one Misc column.
Go to the Advanced section of the Flat File Manager Editor.
Rename Column 3 to Misc. Set field size to 4000.
Note: For longer than that, you would need to use Text data type.
That will pose some challenge, so be ready for fun ;-)
Delete Columns 4 through 8.
Now add a script component.
Input Columns - select only Misc field. Usage Type: ReadWrite
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sMisc = Row.Misc;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] values = sMisc.Split(',');
foreach (string value in values)
{
temp = value;
if (temp.Trim().Equals(string.Empty))
{
temp = "NA";
}
sManipulated = string.Format("{0},{1}", sManipulated, temp);
}
Row.Misc = sManipulated.Substring(1);
}
-- Destination.
Nothing different from usual.
Hope I have understood your problem and the solution works for you.

SQL query engine for text files on Linux?

We use grep, cut, sort, uniq, and join at the command line all the time to do data analysis. They work great, although there are shortcomings. For example, you have to give column numbers to each tool. We often have wide files (many columns) and a column header that gives column names. In fact, our files look a lot like SQL tables. I'm sure there is a driver (ODBC?) that will operate on delimited text files, and some query engine that will use that driver, so we could just use SQL queries on our text files. Since doing analysis is usually ad hoc, it would have to be minimal setup to query new files (just use the files I specify in this directory) rather than declaring particular tables in some config.
Practically speaking, what's the easiest? That is, the SQL engine and driver that is easiest to set up and use to apply against text files?
David Malcolm wrote a little tool named "squeal" (formerly "show"), which allows you to use SQL-like command-line syntax to parse text files of various formats, including CSV.
An example on squeal's home page:
$ squeal "count(*)", source from /var/log/messages* group by source order by "count(*)" desc
count(*)|source |
--------+--------------------+
1633 |kernel |
1324 |NetworkManager |
98 |ntpd |
70 |avahi-daemon |
63 |dhclient |
48 |setroubleshoot |
39 |dnsmasq |
29 |nm-system-settings |
27 |bluetoothd |
14 |/usr/sbin/gpm |
13 |acpid |
10 |init |
9 |pcscd |
9 |pulseaudio |
6 |gnome-keyring-ask |
6 |gnome-keyring-daemon|
6 |gnome-session |
6 |rsyslogd |
5 |rpc.statd |
4 |vpnc |
3 |gdm-session-worker |
2 |auditd |
2 |console-kit-daemon |
2 |libvirtd |
2 |rpcbind |
1 |nm-dispatcher.action|
1 |restorecond |
q - Run SQL directly on CSV or TSV files:
https://github.com/harelba/q
Riffing off someone else's suggestion, here is a Python script for sqlite3. A little verbose, but it works.
I don't like having to completely copy the file to drop the header line, but I don't know how else to convince sqlite3's .import to skip it. I could create INSERT statements, but that seems just as bad if not worse.
Sample invocation:
$ sql.py --file foo --sql "select count(*) from data"
The code:
#!/usr/bin/env python
"""Run a SQL statement on a text file"""
import os
import sys
import getopt
import tempfile
import re
class Usage(Exception):
def __init__(self, msg):
self.msg = msg
def runCmd(cmd):
if os.system(cmd):
print "Error running " + cmd
sys.exit(1)
# TODO(dan): Return actual exit code
def usage():
print >>sys.stderr, "Usage: sql.py --file file --sql sql"
def main(argv=None):
if argv is None:
argv = sys.argv
try:
try:
opts, args = getopt.getopt(argv[1:], "h",
["help", "file=", "sql="])
except getopt.error, msg:
raise Usage(msg)
except Usage, err:
print >>sys.stderr, err.msg
print >>sys.stderr, "for help use --help"
return 2
filename = None
sql = None
for o, a in opts:
if o in ("-h", "--help"):
usage()
return 0
elif o in ("--file"):
filename = a
elif o in ("--sql"):
sql = a
else:
print "Found unexpected option " + o
if not filename:
print >>sys.stderr, "Must give --file"
sys.exit(1)
if not sql:
print >>sys.stderr, "Must give --sql"
sys.exit(1)
# Get the first line of the file to make a CREATE statement
#
# Copy the rest of the lines into a new file (datafile) so that
# sqlite3 can import data without header. If sqlite3 could skip
# the first line with .import, this copy would be unnecessary.
foo = open(filename)
datafile = tempfile.NamedTemporaryFile()
first = True
for line in foo.readlines():
if first:
headers = line.rstrip().split()
first = False
else:
print >>datafile, line,
datafile.flush()
#print datafile.name
#runCmd("cat %s" % datafile.name)
# Create columns with NUMERIC affinity so that if they are numbers,
# SQL queries will treat them as such.
create_statement = "CREATE TABLE data (" + ",".join(
map(lambda x: "`%s` NUMERIC" % x, headers)) + ");"
cmdfile = tempfile.NamedTemporaryFile()
#print cmdfile.name
print >>cmdfile,create_statement
print >>cmdfile,".separator ' '"
print >>cmdfile,".import '" + datafile.name + "' data"
print >>cmdfile, sql + ";"
cmdfile.flush()
#runCmd("cat %s" % cmdfile.name)
runCmd("cat %s | sqlite3" % cmdfile.name)
if __name__ == "__main__":
sys.exit(main())
Maybe write a script that creates an SQLite instance (possibly in memory), imports your data from a file/stdin (accepting your data's format), runs a query, then exits?
Depending on the amount of data, performance could be acceptable.
MySQL has a CVS storage engine, that might do what you need, if your files are CSV files.
Otherwise, you can use mysqlimport to import text files into MySQL. You could create a wrapper around mysqlimport, which figures out columns etc. and creates the necessary table.
You might also be able to use DBD::AnyData, a Perl module which lets you access text files like a database.
That said, it sounds a lot like you should really look at using a database. Is it really easier keeping table-oriented data in text files?
I have used Microsoft LogParser to query csv files several times... and it serves the purpose. It was surprising to see such a useful tool from M$ that too Free!