How should data in xlsx format be indexed in Solr?

How should data in xlsx format be indexed in Solr? - indexing

I have been tasked with uploading some data into Solr, whereupon it will be used for analysis.
I understand that Solr can index data in xlsx file formats.
In Exercise 2 for Solr, the following files were indexed in the order of json, xml and csv:
bin/post -c films example/films/films.json
bin/post -c films example/films/films.xml
bin/post -c films example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
The issue I have is that the though I indexed my xlsx file, it only shows one record in the query, which means that the file may have been indexed wrongly, ie it may require parameters such as that needed by a csv file. Can anyone tell me how this indexing can be done without having to convert the xlsx file into a csv file?

You can use Apacha Tika to index these formats in SOLR. It will parse the data and do the index.
Reference Link :
https://lucidworks.com/2009/09/02/content-extraction-with-tika/

Related

Is it possible to download CrateDB results in CSV file?

I need to query the database, and drop the results in a csv file.
I remember I did this with sql server, is it possible to do this CrateDB?

CrateDB's crash shell supports various different output formats, including csv.
Example:
crash --format csv -c 'select * from sys.cluster' > sys_cluster.csv
See https://crate.io/docs/reference/crash for details.

Of course yes, where are u getting stuck?
You can find the official github repository of examples: crate/crate-sample-apps, where cratedb is used. You can use this as a baseline to understand crate.
And, you can easily convert the values from crate to a CSV format using various language specific libraries. For example, in python you can use csv.

Google generated BigQuery file is corrupted all the time

select * from gpqueries:contracts.raw where fiscal_year =2015
ignore case
Whe generating data as JSON or CSV from google big query, and getting data as below when downloaded it from storage bucket.
Please guide me why this happens. Also how can combine multiple files, if generated files are multiple.
Update :

From the screenshot I can see that the first 3 characters are 1F8B08 - that's the signature of a gzip compressed file. Just uncompress it with gunzip.
http://www.filesignatures.net/index.php?page=search&search=1F8B08&mode=SIG
My guess: Did you pick "compression: gzip" when exporting?

Importing and maintaining multiple csv files into PostgreSQL

I am new to using SQL, so please bear with me.
I need to import several hundred csv files into PostgreSQL. My web search has only indicated how to import many csv files into one table. However, most csv files have different column types (all have one line headers). Is it possible to somehow run a loop, and have each csv imported to a table with the same name as the csv? Creating each table manually and specifying columns is not an option. I know that COPY will not work as the table needs to already by specified.
Perhaps this is not feasible in PostgreSQL? I would like to accomplish this in pgAdmin III or the PSQL console, but I am open to other ideas (using something like R to change the csv to a format more easily entered into PostgreSQL?).
I am using PostgreSQL on a Windows 7 computer. It was requested that I use PostgreSQL, thus the focus of the question.
The desired result is a database full of tables, that I will then join with a spreadsheet that includes specific site data. Thanks!

Use pgfutter.
The general syntax looks like this:
pgfutter csv
In order to run this on all csv files in a directory from Windows Command Prompt, navigate to the desired directory and enter:
for %f in (*.csv) do pgfutter csv %f
Note that the path for the downloaded program must be added to the list of accepted paths for Environmental Variables.
EDIT:
Here is the command line code for Linux users
Run it as
pgfutter *.csv
Or if that won't do
find -iname '*.csv' -exec pgfutter csv {} \;

In the terminal use nano to make a file to loop through moving csv files under my directory to postgres DB
>nano run_pgfutter.sh
The content of run_pgfutter.sh:
#! /bin/bash
for i in /mypath/*.csv
do
./pgfutter csv ${i}
done
Then make the file executable:
chmod u+x run_pgfutter.sh

How to convert sql file to a specific formatted csv file?

Any ideas how I can extract data from a SQL database, put it into a CSV file in a specific format and push it to an external url?

Most SQL databases have some sort of Export utility that can produce a CSV file. Google "Export " and you should find it. It is not a part of SQL standard, so every product does it differently.

Is there a data dump command for Solr or a way to iterate through each document in the index?

I need to copy my Solr index into PosgreSQL for another project. Is there an easy way to dump the index into a plain text file (something like pg_dump) or to iterate through each primary key so I can download them one by one?

Solr supports a CSV output format. But note that only stored fields can be returned.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How should data in xlsx format be indexed in Solr? - indexing

You can use Apacha Tika to index these formats in SOLR. It will parse the data and do the index. Reference Link : https://lucidworks.com/2009/09/02/content-extraction-with-tika/

Related

Is it possible to download CrateDB results in CSV file?

Google generated BigQuery file is corrupted all the time

Importing and maintaining multiple csv files into PostgreSQL

How to convert sql file to a specific formatted csv file?

Is there a data dump command for Solr or a way to iterate through each document in the index?

Categories

Resources