I have been tasked with uploading some data into Solr, whereupon it will be used for analysis.
I understand that Solr can index data in xlsx file formats.
In Exercise 2 for Solr, the following files were indexed in the order of json, xml and csv:
bin/post -c films example/films/films.json
bin/post -c films example/films/films.xml
bin/post -c films example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
The issue I have is that the though I indexed my xlsx file, it only shows one record in the query, which means that the file may have been indexed wrongly, ie it may require parameters such as that needed by a csv file. Can anyone tell me how this indexing can be done without having to convert the xlsx file into a csv file?
You can use Apacha Tika to index these formats in SOLR. It will parse the data and do the index.
Reference Link :
https://lucidworks.com/2009/09/02/content-extraction-with-tika/
Related
I need to query the database, and drop the results in a csv file.
I remember I did this with sql server, is it possible to do this CrateDB?
CrateDB's crash shell supports various different output formats, including csv.
Example:
crash --format csv -c 'select * from sys.cluster' > sys_cluster.csv
See https://crate.io/docs/reference/crash for details.
Of course yes, where are u getting stuck?
You can find the official github repository of examples: crate/crate-sample-apps, where cratedb is used. You can use this as a baseline to understand crate.
And, you can easily convert the values from crate to a CSV format using various language specific libraries. For example, in python you can use csv.
select * from gpqueries:contracts.raw where fiscal_year =2015
ignore case
Whe generating data as JSON or CSV from google big query, and getting data as below when downloaded it from storage bucket.
Please guide me why this happens. Also how can combine multiple files, if generated files are multiple.
Update :
From the screenshot I can see that the first 3 characters are 1F8B08 - that's the signature of a gzip compressed file. Just uncompress it with gunzip.
http://www.filesignatures.net/index.php?page=search&search=1F8B08&mode=SIG
My guess: Did you pick "compression: gzip" when exporting?
I am new to using SQL, so please bear with me.
I need to import several hundred csv files into PostgreSQL. My web search has only indicated how to import many csv files into one table. However, most csv files have different column types (all have one line headers). Is it possible to somehow run a loop, and have each csv imported to a table with the same name as the csv? Creating each table manually and specifying columns is not an option. I know that COPY will not work as the table needs to already by specified.
Perhaps this is not feasible in PostgreSQL? I would like to accomplish this in pgAdmin III or the PSQL console, but I am open to other ideas (using something like R to change the csv to a format more easily entered into PostgreSQL?).
I am using PostgreSQL on a Windows 7 computer. It was requested that I use PostgreSQL, thus the focus of the question.
The desired result is a database full of tables, that I will then join with a spreadsheet that includes specific site data. Thanks!
Use pgfutter.
The general syntax looks like this:
pgfutter csv
In order to run this on all csv files in a directory from Windows Command Prompt, navigate to the desired directory and enter:
for %f in (*.csv) do pgfutter csv %f
Note that the path for the downloaded program must be added to the list of accepted paths for Environmental Variables.
EDIT:
Here is the command line code for Linux users
Run it as
pgfutter *.csv
Or if that won't do
find -iname '*.csv' -exec pgfutter csv {} \;
In the terminal use nano to make a file to loop through moving csv files under my directory to postgres DB
>nano run_pgfutter.sh
The content of run_pgfutter.sh:
#! /bin/bash
for i in /mypath/*.csv
do
./pgfutter csv ${i}
done
Then make the file executable:
chmod u+x run_pgfutter.sh
Any ideas how I can extract data from a SQL database, put it into a CSV file in a specific format and push it to an external url?
Most SQL databases have some sort of Export utility that can produce a CSV file. Google "Export " and you should find it. It is not a part of SQL standard, so every product does it differently.
I need to copy my Solr index into PosgreSQL for another project. Is there an easy way to dump the index into a plain text file (something like pg_dump) or to iterate through each primary key so I can download them one by one?
Solr supports a CSV output format. But note that only stored fields can be returned.