ogr2ogr shp import to postgis and add column - gdal

I'm trying to import a shapefile into a non existing Postgis table.
I'm using ogr2ogr with -sql option to select the fields from the shp that I would like to get imported however I would need a date field (that does not exists in my input shp) to be created in my output table.
Is there any option within ogr2ogr for that ?
I've tried :
ogr2ogr -f "PostgreSQL" -a_srs "EPSG:4326" PG:"host=<db_host> user=<db_user> password=<db_password> dbname=<db_name>" -nlt MULTIPOLYGON -nln <table_name> -sql "SELECT <shp_field1> AS <pg_field1>, <shp_field2> AS <pg_field2>, current_date AS date FROM <shapeFile>" -overwrite <shapeFile>.shp
However I get an error 'Unrecognized field name current_date' which obviously does not exists in my input shapefile.
Thanks for your help.

ok I get it to work by adding -dialect SQLite to by command line however it does not import the geometry anymore ...

Related

Ignore Last row in CSV file as part of BigQuery External table command

I have about 40 odd csv files, comma delimited in GCS however the last line of all the files has quotes and dot
”.
So these are not exactly conformed csv schema and has data quality issue which i have to get around
My aim is to create an external table referencing to the gcs files and then be able to select the data.
example:
create or replace dataset.tableName
options (
uris = ['gs://bucket_path/allCSVFILES_*.csv'],
format = 'CSV',
skip_leading_rows = 1,
ignore_unknown_values = true
)
the external table gets created without any error. however, when I select the data, I ran to error
"error message: CSV table references column position 16, but line starting at position:18628631 contains only 1 columns"
This is due to quotes and dot ”. at the end of file.
My question is: is there any way in BigQuery to consume to data without the LAST LINE. as part of options we have skip_leading_rows to skip header but any way to skip to last row?
Currently my best placed option is to clean the files, using sed/tail command.
I have checked the create or replace external table options list below and have tried using ignore_unknown_values but other than this option i don't see any other option which will work.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_external_table_statement
You can try below work around:
I tried with pandas and removed the last record from the csv file.
from google.cloud import bigquery
import pandas as pd
from google.cloud import storage
df=pd.read_csv('gs://samplecsv.csv')
client = bigquery.Client()
dataset_ref = client.dataset('dataset')
table_ref = dataset_ref.table('new_table')
df.drop(df.tail(1).index,inplace=True)
client.load_table_from_dataframe(df, table_ref).result()
For more information you can refer to this link which mentions the limitation for loading csv files to Bigquery.

Extract an attribute in GPKG

I am trying to extract rivers from OSM. I downloaded the waterway GPKG where I believe there are over 21 million entries (see link) with a file size of 19.9 GB.
I have tried using the split vector layer in QGIS, but it would crash.
Was thinking of using GDAL ogr2ogr, but having trouble generating the command line.
I first isolated the multiline string with the following command.
ogr2ogr -f gpkg water.gpkg waterway_EPSG4326.gpkg waterway_EPSG4326_line -nlt linestring
ogrinfo water.gpkg INFO: Open of water.gpkg' using driver GPKG' successful. 1: waterway_EPSG4326_line (Line String)
Tried the following command below, but it is not working.
ogr2ogr -f GPKG SELECT * FROM waterway_EPSG4326_line - where waterway="river" river.gpkg water.gpkg
Please let me know what is missing or if there is any easy way to perform the task. I tried opening the file in R sf package, but it would not load after a long time.
Thanks

Hive coalesce parse exception

I want to create a hive script that uses as database one of two given parameters, whichever is not null.
My hive-test.sql is this:
set db_name = coalesce(${hiveconf:dbOne}, ${hiveconf:dbTwo});
use ${hiveconf:db_name};
show tables;
and I run it with:
hive -hiveconf dbOne=my_database -f hive-test.sql
and I am getting:
FAILED: ParseException line 2:12 missing EOF at '(' near 'coalesce'
I should note that if I change the first line in script to:
set db_name = my_database;
it works.
I can't figure out what I did wrong. Your assistance is appreciated.
This feature is not available in Hive.
Do variable assignment in the shell, for example like here: setting-a-shell-variable-in-a-null-coalescing-fashion and pass it to the Hive.

Sqoop Export with Missing Data

I am trying to use Sqoop to export data from HDFS into Postgresql. However, I receive an error partially through the export that it can't parse the input. I manually went into the file I was exporting and saw that this row had two columns missing. I have tried a bunch of different arguments with the Sqoop command, but cannot get it to work. Here is what I was running thus far:
sqoop export --connect jdbc:postgresql://localhost:5432/XX -username
XX -password XX --table XX --input-fields-terminated-by
"\t" --input-lines-terminated-by "\n" --input-null-string '\n' --input-null
non-string '\n' -m 1 --export-dir /user/dan/output
I have also tried it without the "--input-null-string" and "--input-null-non-string" args and got the same result. My table has 6 columns and the file I am reading has tab separated values that are inserted into the table if all 6 are there. Any help would be appreciated.
I solved the problem by changing my reduce function so that if there were not the correct amount of fields to output a certain value and then I was able to use the --input-null-non-string with that value and it worked.

concatenate text files and import them into a SQLite DB

Let us say I have thousands of comma separated text files with 1050 columns each (no header). Is there a way to concatenate and import all the text files into one table, one database in SQLite (Ideally I'd use R and sqldf to communicate with SQlite).
I.e.,
Each file is called, table1.txt, table2.txt, table3.txt; all of different number of rows, but same column types, and different unique IDs in the IDs column (first column of each file).
table1.txt
id1,20.3,1.2,3.4
id10,2.1,5.2,9.3
id21,20.5,1.2,8.4
table2.txt
id2,20.3,1.2,3.4
id92,2.1,5.2,9.3
table3.txt
id3,1.3,2.2,5.4
id30,9.1,4.4,9.3
The real example is pretty much the same but with more columns and more rows. AS you can note the first column in each file corresponds to a unique ID.
Now I'd like my new table in supertable, in the DB, super.db to be (also uniquely indexed):
super.db - name of the DB
mysupertable - name of the table in the DB
myids,v1,v2,v3
id1,20.3,1.2,3.4
id10,2.1,5.2,9.3
id21,20.5,1.2,8.4
id2,20.3,1.2,3.4
id92,2.1,5.2,9.3
id3,1.3,2.2,5.4
id30,9.1,4.4,9.3
For reference, I am using SQLite3; and I am looking for a SQL command that I can run on the background without logging interactively into the sqlite3 interpreter, i.e., IMPORT bla INTO,...
I could try in unix:
cat *.txt > allmyfiles.txt
and then a .sql file,
CREATE TABLE test (myids varchar(255), v1 float, v2 float, v3 float);
.separator ,
.import output.csv test
But this command does not work since I am using R sqldf library, and dbGetQuery(db, sql) and I have no idea how to create such string in R without getting an error.
p.s. I asked a similar Q for appending tables from a DB but this time I need to append/import text files not tables from a DB.
If you are using sqlite database files anyway, you might want to consider working with RSQLite.
install.packages( "RSQLite" ) # will install package "DBI"
library( RSQLite )
db <- dbConnect( dbDriver("SQLite"), dbname = "super.db" )
You still can use the unix command within R which should be faster than any loop in R, using the system() command:
system( "cat *.txt > allmyfiles.txt" )
Provided that your allmyfiles.txt has a consistent format, you can import it as a data.frame into R
allMyFiles <- read.table( "allmyfiles.txt", header = FALSE, sep = "," )
and write it to your database, following #Martín Bel's advice, with something like
dbWriteTable( db, "mysupertable", allMyFiles, overwrite = TRUE, append = FALSE )
EDIT:
Or, if you don't want to route your data through R,you can again resort to using the system() command. This may get you started:
You have a file with the data you want to get into SQLite called allmyfiles.txt. Create a file called table.sql with this content (obviously the structure must match):
CREATE TABLE mysupertable (myids varchar(255), v1 float, v2 float, v3 float);
.separator ,
.import allmyfiles.txt mysupertable
and call it from R with
system( "sqlite3 super.db < table.sql" )
That should avoid routing the data through R but still do all the work from within R.
Take a look at termsql:
https://gitorious.org/termsql/pages/Home
cat *.txt | termsql -d ',' -t mysupertable -c 'myids,v1,v2,v3' -o mynew.db
This should do the job.