Reading a CSV file with SQL querys from linux shell - sql

I would like to read a CSV file from the shell as if it was an SQL Database table.
Is this possible without having to import the CSV file content to a SQL enviroment?
Maybe there is some kind of linux based tool that can work it out...
I know it sounds like a tricky question, but I'm trying to avoid installing a SQL server and stuff. I have some limitations.
Any clue?

There is also csvsql (part of csvkit)!
It can not only run sql on given csv (converting it into sqlite behind scenes), but also convert and insert into one of many supported sql databases!
Here you have example command (also in csvsql_CDs_join.sh):
csvsql --query 'SELECT CDTitle,Location,Artist FROM CDs JOIN Artists ON CDs.ArtistID=Artists.ArtistID JOIN Locations ON CDs.LocID = Locations.LocID' "$#"
showing how to join three tables (available in csv_inputs in csv_dbs_examples).
(formatting with csvlook also part of csvkit)
Inputs
$ csvlook csv_inputs/CDs.csv
| CDTitle | ArtistID | LocID |
| -------- | -------- | ----- |
| CDTitle1 | A1 | L1 |
| CDTitle2 | A1 | L2 |
| CDTitle3 | A2 | L1 |
| CDTitle4 | A2 | L2 |
$ csvlook csv_inputs/Artists.csv
| ArtistID | Artist |
| -------- | ------- |
| A1 | Artist1 |
| A2 | Artist2 |
$ csvlook csv_inputs/Locations.csv
| LocID | Location |
| ----- | --------- |
| L1 | Location1 |
| L2 | Location2 |
csvsql
$ csvsql --query 'SELECT CDTitle,Location,Artist FROM CDs JOIN Artists ON CDs.ArtistID=Artists.ArtistID JOIN Locations ON CDs.LocID = Locations.LocID' "$#" | csvlook
Produces:
| CDTitle | Location | Artist |
| -------- | --------- | ------- |
| CDTitle1 | Location1 | Artist1 |
| CDTitle2 | Location2 | Artist1 |
| CDTitle3 | Location1 | Artist2 |
| CDTitle4 | Location2 | Artist2 |

Take a look at https://github.com/harelba/q, a Python tool for treating text as a database. By default it uses spaces to delimit fields, but the -d , parameter will allow it to process CSV files.
Alternatively you can import the CSV file into SQLite and then run SQL commands against it. This is scriptable, with a bit of effort.

Related

Retrieve closest road when given (lat, long) using OSM in Postgres with Postgis using SQL query

Given a set (lat, long) I am trying to find the maximum speed using "max_speed" and street type using "highway".
I have loaded my database (Postgres and Postgis) as follows:
$ osm2pgsql -c -d gis --slim -C 50000 /var/lib/postgresql/data/germany-latest.osm.pbf
The closest related question I could find was How to query all shops around a certain longitude/latitude using osm-postgis?. I have taken the query, and plugged in a (lat, long) that I found in google maps for the city center of Munich (as the post was also related to city center Munich and I have the map for Germany). The result turns up empty.
gis=# SELECT name, shop FROM planet_osm_point WHERE ST_DWithin(way ,ST_SetSrid(ST_Point(48.137969, 11.573829), 900913), 100);
name | shop
------+------
(0 rows)
Also when looking into the planet_osm_nodes, which contains (lat, long) pairs directly, I end up with no results:
gis=# SELECT * FROM planet_osm_nodes WHERE ((lat BETWEEN 470000000 AND 490000000) AND (lon BETWEEN 100000000 AND 120000000)) LIMIT 10;
id | lat | lon | tags
----+-----+-----+------
(0 rows)
I verified the data is in my database:
gis=# SELECT COUNT(*) FROM planet_osm_point;
count
---------
9924531
(1 row)
and
gis=# SELECT COUNT(*) FROM planet_osm_nodes;
count
-----------
288597897
(1 row)
So ideally, my question would be
Q: How can I find the "max speed" and "highway" given a set (lat, lon)
alternatively, my questions is:
Q: How do I get the query from the other stack overflow post to work?
My best guess is that I need to transform my (lat, lon) in some way, or that I simply have the wrong data for whatever reason.
Edit: added sample data as requested:
gis=# SELECT * FROM planet_osm_point LIMIT 1;
osm_id | access | addr:housename | addr:housenumber | addr:interpolation | admin_level | aerialway | aeroway | amenity | area | barrier | bicycle | brand | bridge | boundary | building | capital | construction | covered | culvert |
cutting | denomination | disused | ele | embankment | foot | generator:source | harbour | highway | historic | horse | intermittent | junction | landuse | layer | leisure | lock | man_made | military | motorcar | name | natural | off
ice | oneway | operator | place | poi | population | power | power_source | public_transport | railway | ref | religion | route | service | shop | sport | surface | toll | tourism | tower:type | tunnel | water | waterway | wetland | wi
dth | wood | z_order | way
-----------+--------+----------------+------------------+--------------------+-------------+-----------+---------+---------+------+---------+---------+-------+--------+----------+----------+---------+--------------+---------+---------+
---------+--------------+---------+-----+------------+------+------------------+---------+----------+----------+-------+--------------+----------+---------+-------+---------+------+----------+----------+----------+------+---------+----
----+--------+----------+-------+-----+------------+-------+--------------+------------------+---------+-----+----------+-------+---------+------+-------+---------+------+---------+------------+--------+-------+----------+---------+---
----+------+---------+----------------------------------------------------
304070863 | | | | | | | | | | | | | | | | | | | |
| | | | | | | | crossing | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | |
| | | 010100002031BF0D0048E17A94F19F2941CDCCCCDCC60D5741
(1 row)
and
gis=# SELECT * FROM planet_osm_nodes LIMIT 1;
id | lat | lon | tags
--------+-----------+----------+------
234100 | 666501948 | 80442755 |
(1 row)
Edit 2: There was a mention regarding "SRID", so I added example data from another table:
gis=# SELECT * FROM spatial_ref_sys LIMIT 1;
srid | auth_name | auth_srid | srtext
| proj4text
------+-----------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------
3819 | EPSG | 3819 | GEOGCS["HD1909",DATUM["Hungarian_Datum_1909",SPHEROID["Bessel 1841",6377397.155,299.1528128,AUTHORITY["EPSG","7004"]],TOWGS84[595.48,121.69,515.35,4.115,-2.9383,0.853,-3.408],AUTHORITY["EPSG","1024"]],PR
IMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","3819"]] | +proj=longlat +ellps=bessel +towgs84=595.48,121.69,515.35,4.115,-2.9383,0.853,-3.408 +no_defs
(1 row)
Geometry in PostGIS has a different ordering of (lat long) first is going longitude then latitude.
Also if you want to transform a point from one SRID to another use st_transfrom(), not ST_SetSrid.
ST_Transform relly transform your data from one coordinates system to another.
select st_astext(st_transform(ST_SetSrid(ST_Point(11.573829,48.137969), 4326),900913))
ST_SetSrid - just change SRID for the object.
select st_astext((ST_SetSrid(ST_Point(11.573829,48.137969),900913)
So, you have to change your SQL that way
SELECT name, shop
FROM planet_osm_point
WHERE ST_DWithin(way,st_transform(ST_SetSrid(ST_Point(11.573829,48.137969), 4326),900913), 100);

how do you convert a fasta file into an sql table?

I have a fasta file temp_mart.txt as such:
ENSG00000100219|ENST00000005082
MTLLTFRDVAIEFSLEEWKCLDLAQQNLYRDVMLENYRNLFSVGLTVCKPGL
And I tried to load it into a table in sql using
load data local infile '~/Desktop/temp_mart.txt' into table mart
But instead of getting an output like this:
+-----------------+------+------+
| ENSG | ENST | 3UTR |
+-----------------+------+------+
| >ENSG0000010021 | ENST00000005082 | MTLLTFRDVAIEFSL |
I get this:
+-----------------+------+------+
| ENSG | ENST | 3UTR |
+-----------------+------+------+
| >ENSG0000010021 | NULL | NULL |
| MTLLTFRDVAIEFSL | NULL | NULL |
| EPWNVKRQEAADGHP | NULL | NULL |
| DKFTAMSSHFTQDLL | NULL | NULL |
Everything seems to go into the first column. What is the best way to load it into the table so it presents as intended? Do I need to convert it into a csv file first?

Query from secondary index on aerospike

I'm considering aerospike for one of our projects. So I currently created a 3 node cluster and loaded some data on it.
Sample data
ns: imei
set: imei_data
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| imsi | fcheck | lcheck | msc | fcheck_epoch | lcheck_epoch |
+-------------------+-----------------------+-----------------------+----------------------------+--------------+--------------+
| "413010324064956" | "2017-03-01 14:30:26" | "2017-03-01 14:35:30" | "13d20b080011044917004100" | 1488358826 | 1488359130 |
| "413012628090023" | "2016-09-21 10:06:49" | "2017-09-16 13:54:40" | "13dc0b080011044917006100" | 1474432609 | 1505550280 |
| "413010130130320" | "2016-12-29 22:05:07" | "2017-10-09 16:17:10" | "13d20b080011044917003100" | 1483029307 | 1507546030 |
| "413011330114274" | "2016-09-06 01:48:06" | "2017-10-09 11:53:41" | "13d20b080011044917003100" | 1473106686 | 1507530221 |
| "413012629781993" | "2017-08-16 16:03:01" | "2017-09-13 18:10:48" | "13dc0b080011044917004100" | 1502879581 | 1505306448 |
Then I created a secondary index on lcheck_epoch using AQL since I want to query based on date.
create index idx_lcheck on imei.imei_data (lcheck_epoch) NUMERIC
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| ns | bin | indextype | set | state | indexname | path | type |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
| "imei" | "lcheck_epoch" | "NONE" | "imei_data" | "RW" | "idx_lcheck" | "lcheck_epoch" | "NUMERIC" |
+--------+----------------+-----------+-------------+-------+--------------+----------------+-----------+
When I execute
select imsi from imei.imei_data where idx_lcheck=1476165806
I'm getting
Error: (204) AEROSPIKE_ERR_INDEX
Please explain.
You're using the index name, not the bin name, in your query. Try this:
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch=1476165806
Or
SELECT imsi FROM imei.imei_data WHERE lcheck_epoch BETWEEN 1490000000 AND 1510000000
Just a note, you can do much more complex queries using predicate filtering through several of the language clients (Java, C, C#, Go). For example the PredExp class of the Java client (see examples.)

How to crossreference and combine values from many tables

I have three tables, tblTemplates, tblBLNALM and tblPrefs. They follow this structure:
tblPrefs:
--------------------------------------------
| Pref | Derived-Template | Template |
--------------------------------------------
|GA |BLNALM_F03 |AIN_F03 |
--------------------------------------------
|HSSD |BLNALM_F01 |AIN_F01 |
-------------------------------------------- etc...
tblBLNALM:
------------------------------------------------------------------
| Controller | Compound | Tagname | BaseTemplate | Name |
------------------------------------------------------------------
|15CP42 |15F00 |HSSD30001C |BLNALM |IN_7 |
------------------------------------------------------------------
|15CP12 |15F06 |GA123456 |BLNALM |IN_3 |
------------------------------------------------------------------ etc...
tblTemplates:
---------------------------------------
| Template | Maintenance Override |
---------------------------------------
|AIN_F01 |IN_7 |
---------------------------------------
|AIN_F02 |IN_5 |
---------------------------------------
|AIN_F03 |IN_7 |
---------------------------------------etc...
What I need to do is to look if the characters before the numbers start in tblBLNALM.Tagname exist in tblPrefs, if they do, use this to determine what template it is. Then using this template and tblTemplates work out what Maintenance override it is.
The end result should look kind of like this:
-----------------------------------------------------------------------------
| Controller | Compound | Tagname | Template | Maintenance Override |
-----------------------------------------------------------------------------
|15CP12 |15F06 |GA123456 |AIN_F03 |IN_7 |
----------------------------------------------------------------------------- etc...
My gut instinct was to use a few EXISTS statements and maybe nest them, but this hasn't helped, so where do I go from here?
I'm using msaccess 2010.
You can use string operations within SQL joins.
how about comparing if the tagname begins with your pref?
in SQL that would be:
SELECT tblBLNALM.Controller,
tblBLNALM.Compound,
tblBLNALM.Tagname,
tblTemplates.Template,
tblTemplates.[Maintenance Override]
FROM (tblTemplates
INNER JOIN tblPrefs ON tblTemplates.Template = tblPrefs.Template)
INNER JOIN tblBLNALM ON (tblPrefs.Pref = left(tblBLNALM.Tagname, len(tblPrefs.Pref)));
output will be as you described:
+------------+----------+------------+----------+----------------------+
| Controller | Compound | Tagname | Template | Maintenance Override |
+------------+----------+------------+----------+----------------------+
| 15CP12 | 15F06 | GA123456 | AIN_F03 | IN_7 |
| 15CP42 | 15F00 | HSSD30001C | AIN_F01 | IN_7 |
+------------+----------+------------+----------+----------------------+
Join 3 tables: join Template fields in tblPrefs and tblTemplates, then you should join Tagname of tblBLNALM and Pref, but here you cannot join fields directly, so create a query, where select all columns from tblBLNALM and add a calculated column, which returns starting letters from Tagname field and use it in join with tblPrefs instead of table.

Cast and transform generic attribute values to a flat database view

For reasons out of scope of this question I have a relation that persists generic values of an entity and looks something like this:
| group_id::int | id:int |attr_id::text | data_type::text | value::text |
-------------------------------------------------------------------------
| G1 | 1 | A | varchar | lorem |
| G1 | 1 | B | integer | 1001 |
| G2 | 2 | B | integer | 1002 |
data_type is guaranteed to be one of PostgreSQLs supported data types, so the table (or tables that is, example is simplified) somewhat represents a table definition.
I would like to present this transformed in a different view per group, so the values are cast to their actual data types. Is this even possible?
Database View for G1 Database View for G2
| id:int | A::varchar | B::int | | id:int | B::int |
-------------------------------- -------------------
| 1 | lorem | 1001 | | 2 | 1002 |
I was thinking tablefunc.crosstab would be the way to go but I didn't come very far. I'm just out of ideas. Any help or directions are very welcome.