isql "select ... where 'a part of a column value' = 'a part of a column value' - where-clause

I have 2 csv files. In one file I have a phone number with prices and in the second file I have a phone number with the name of its owner.
First file: file1.csv
491732234332;30,99
491723427343;12,59
491732097232;33,31
Second file: file2.csv
01732/234332;Ben Jefferson
01723/427343;Jon Doe
01732/097232;Benjamin Franklin
My problem is, that the phone number columns are formatted differently and I can not find a way to compare them.
Desired output is:
01732/234332;30,99;Ben Jefferson
01723/427343;12,59;Jon Doe
01732/097232;33,31;Benjamin Franklin
My sql statement is
create temp table FILETB1
(phonenr char(30),
price char(30)
);
create temp table FILETB2
(phonenr char(40),
owner char(60)
);
load from "file1.csv"
insert into FILETB1;
load from "file2.csv"
insert into FILETB2;
unload to "output.csv"
select FILETB1.phonenr, FILETB1.price, FILETB2.owner
from FILETB1, FILETB2
where FILETB1.phonenr = FILETB2.phonenr
How do I have to modify my where clause to be able to compare both columns?
We are working on linux with IBM INFORMIX-SQL Version 7.50.UC5 which makes finding a working solution not easier since many functions are not supported...
Any help is highly appreciated!

Using just the facilities of ISQL, you can use:
CREATE TEMP TABLE FILETB1
(
phonenr CHAR(30),
price CHAR(30)
);
CREATE TEMP TABLE FILETB2
(
phonenr CHAR(40),
owner CHAR(60)
);
LOAD FROM "file1.csv" DELIMITER ';' INSERT INTO FILETB1;
LOAD FROM "file2.csv" DELIMITER ';' INSERT INTO FILETB2;
UNLOAD TO "output.csv" DELIMITER ';'
SELECT FILETB2.phonenr, FILETB1.price, FILETB2.owner
FROM FILETB1, FILETB2
WHERE FILETB1.phonenr[3,6] = FILETB2.phonenr[2,5]
AND FILETB1.phonenr[7,12] = FILETB2.phonenr[7,12];
Testing with DB-Access, I got:
$ dbaccess stores so-35360310.sql
Database selected.
Temporary table created.
Temporary table created.
3 row(s) loaded.
3 row(s) loaded.
3 row(s) unloaded.
Database closed.
$ cat output.csv
01732/234332;30,99;Ben Jefferson;
01723/427343;12,59;Jon Doe;
01732/097232;33,31;Benjamin Franklin;
$
The key is using the built-in substring [start,end] operator. You compare the two parts of the phone numbers that are comparable. And you select the number from file2.csv (table FILETB2) because that's the format you wanted.
For the sample data, of course, you could simply use Unix command line tools to do the job, but knowing how to do it inside the DBMS is helpful too.
You could also use the SUBSTR(col, start, len) function:
UNLOAD TO "output2.csv" DELIMITER ';'
SELECT FILETB2.phonenr, FILETB1.price, FILETB2.owner
FROM FILETB1, FILETB2
WHERE SUBSTR(FILETB1.phonenr, 3, 3) = SUBSTR(FILETB2.phonenr, 2, 3)
AND SUBSTR(FILETB1.phonenr, 7, 6) = SUBSTR(FILETB2.phonenr, 7, 6);
This produces the same output from the sample data.
If ISQL does not recognize the DELIMITER ';' clause to the UNLOAD (or LOAD) pseudo-SQL statements, then you can set the environment variable DBDELIMITER=';' before running the script and remove those clauses from the SQL.

The sugestion is, for the file2.csv if you use tr you get:
[infx1210#tardis ~]$ cat file2.csv | tr '/' ';' > file.2
[infx1210#tardis ~]$ cat file.2
01732;234332;Ben Jefferson
01723;427343;Jon Doe
01732;097232;Benjamin Franklin
[infx1210#tardis ~]$
For the file1.csv if you know that the prefix is always 6 digits long you can use:
[infx1210#tardis ~]$ cut -c7- file1.csv > file.1
[infx1210#tardis ~]$ cat file.1
234332;30,99
427343;12,59
097232;33,31
[infx1210#tardis ~]$
As you can see you can use the 1st field of the file.1 to cross directly with the 2nd one on the file.2.
Then you can execute:
CREATE TEMP TABLE filetb1(
phonenr CHAR(30),
price CHAR(30)
);
CREATE TEMP TABLE filetb2(
prefix CHAR(30),
phonenr CHAR(30),
owner CHAR(60)
);
LOAD FROM 'file.1' DELIMITER ';' INSERT INTO filetb1;
LOAD FROM 'file.2' DELIMITER ';' INSERT INTO filetb2;
UNLOAD TO 'output.csv' DELIMITER ';'
SELECT
TRIM(f2.prefix )|| '/' || TRIM(f2.phonenr),
f1.price,
f2.owner
FROM
filetb1 f1, filetb2 f2
WHERE
f1.phonenr = f2.phonenr;
And you'll get the disered ouput:
[infx1210#tardis ~]$ cat output.csv
01732/234332;30,99;Ben Jefferson;
01723/427343;12,59;Jon Doe;
01732/097232;33,31;Benjamin Franklin;
[infx1210#tardis ~]$
If you're not sure that the prefix on the file1.csv is a 6 digit length leave it and use LIKE:
CREATE TEMP TABLE filetb1(
phonenr CHAR(30),
price CHAR(30)
);
CREATE TEMP TABLE filetb2(
prefix CHAR(30),
phonenr CHAR(30),
owner CHAR(60)
);
LOAD FROM 'file.1' DELIMITER ';' INSERT INTO filetb1;
LOAD FROM 'file.2' DELIMITER ';' INSERT INTO filetb2;
UNLOAD TO 'output.csv' DELIMITER ';'
SELECT
TRIM(f2.prefix )|| '/' || TRIM(f2.phonenr),
f1.price,
f2.owner
FROM
filetb1 f1, filetb2 f2
WHERE
f1.phonenr LIKE TRIM(f2.phonenr)||'%';

Related

How can I create an external table using textfile with presto?

I've a csv file in hdfs directory /user/bzhang/filefortable:
123,1
And I use the following to create an external table with presto in hive:
create table hive.testschema.au1 (count bigint, matched bigint) with (format='TEXTFILE', external_location='hdfs://192.168.0.115:9000/user/bzhang/filefortable');
But when I run select * from au1, I got
presto:testschema> select * from au1;
count | matched
-------+---------
NULL | NULL
I changed the comma to the TAB as the delimeter but it still returns NULL. But If I modify the csv as
123
with only 1 column, the select * from au1 gives me:
presto:testschema> select * from au1;
count | matched
-------+---------
123 | NULL
So maybe I'm wrong with the file format or anything else?
I suppose the field delimiter of the table is '\u0001'.
You can change the ',' to '\u0001' or change the field delimiter to ',' , and check your problem was solved

Create table with a variable name

I need to create tables on daily basis with name as date in form at (yyMMdd), I tried this :
dbadmin=> \set table_name 'select to_char(current_date, \'yyMMdd \')'
dbadmin=> :table_name;
to_char
---------
150515
(1 row)
and then tried to create table with table name from the set parameter :table_name, but got this
dbadmin=> create table :table_name(col1 varchar(1));
ERROR 4856: Syntax error at or near "select" at character 14
LINE 1: create table select to_char(current_date, 'yyMMdd ')(col1 va...
Is there a way where i could store a value in a variable and then use that variable as table name or to assign priority that the inner select statement has execute first to give me the name i require.
Please suggest!!!
Try this
for what ever reason the variable stored comes with some space and i had to remove it and also cannot start naming table starting with numbers so i had to add something in form like tbl_
in short you just need to store the value of the exit so you need to do some extra work and execute the query.
\set table_name `vsql -U dbadmin -w d -t -c "select concat('tbl_',replace(to_char(current_date, 'yyMMdd'),' ',''))"`
Create table:
create table :table_name(col1 varchar(1));
(dbadmin#:5433) [dbadmin] *> \d tbl_150515
Schema | public
Table | tbl_150515
Column | col1
Type | varchar(1)
Size | 1
Default |
Not Null | f
Primary Key | f
Foreign Key |

How do I upload a key=value format file into a Hive table?

I am new to data engineering, so this might be a basic question, appreciate your help here.
I have a file which is in the following format -
first_name=A1 last_name=B1 city=Austin state=TX Zip=78703
first_name=A2 last_name=B2 city=Seattle state=WA
Note: No zip code available for the second row.
I need to upload this into Hive, in the following format:
First_name Last_name City State Zip
A1 B1 Austin TX 78703
A2 B2 Seattle WA NULL
Thanks for your help!!
I figured a way to do this in Hive. The idea is to first upload the entire data into a n*1 table (n is the number of rows), and then parsing the key names in the second step using the str_to_map function.
Step 1: Upload all data into 1 column table. Input a delimiter which you are sure will not parse your data, and doesn't exist (\002 in this case)
DROP TABLE IF EXISTS kv_001;
CREATE EXTERNAL TABLE kv_001 (
col_import string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002'
LOCATION 's3://location/directory/';
Step 2: Using the str_to_map function, extract the keys that are needed
DROP TABLE IF EXISTS required_table;
CREATE TABLE required_table
(first_name STRING
, last_name STRING
, city STRING
, state STRING
, zip INT);
INSERT OVERWRITE TABLE required_table
SELECT
params["first_name"] AS first_name
, params["last_name"] AS last_name
, params["city"] AS city
, params["state"] AS state
, params["zip"] AS zip
FROM
(SELECT str_to_map(col_import, '\001', '=') params FROM kv_001) A;
You can transform your file using python3 script and then upload it to hive table
Try this steps:
Script for example:
import sys
for line in sys.stdin:
line = line.split()
res = []
for item in line:
res.append(item.split("=")[1])
if len(line) == 4:
res.append("NULL")
print(",".join(res))
If only zip field can be empty, it works.
To apply it, use something like
cat file | python3 script.py > output.csv
Then upload this file to hdfs using
hadoop fs -copyFromLocal ./output.csv hdfs:///tmp/
And create the table in hive using
CREATE TABLE my_table
(first_name STRING, last_name STRING, city STRING, state STRING, zip STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
LOAD DATA INPATH '/tmp/output.csv'
OVERWRITE INTO TABLE my_table;

load multiple csv into one table by SQLLDR

I am using SQL LOADER to load multiple csv file in one table.
The process I found is very easy like
LOAD
DATA
INFILE '/path/file1.csv'
INFILE '/path/file2.csv'
INFILE '/path/file3.csv'
INFILE '/path/file4.csv'
APPEND INTO TABLE TBL_DATA_FILE
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
COL0,
COL1,
COL2,
COL3,
COL4
)
But I don't want to use INFILE multiple time cause if I have more than 1000 files then I have to mention 1000 times INFILE in control file script.
So my question is: is there any other way (like any loop / any *.csv) to load multiple files without using multiple infile?
Thanks,
Bithun
Solution 1: Can you concatenate the 1000 files into on big file, which is then loaded by SQL*Loader. On unix, I'd use something like
cd path
cat file*.csv > all_files.csv
Solution 2: Use external tables and load the data using a PL/SQL procedure:
CREATE PROCEDURE myload AS
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE IMMEDIATE 'ALTER TABLE xtable LOCATION ('''||to_char(i,'FM9999')||'.csv'')';
INSERT INTO mytable SELECT * FROM xtable;
END LOOP;
END;
You can use a wildcards (? for a single character, * for any number) like this:
infile 'file?.csv'
;)
Loop over the files from the shell:
#!/bin/bash
for csvFile in `ls file*.csv`
do
ln -s $csvFile tmpFile.csv
sqlldr control=file_pointing_at_tmpFile.ctl
rm tmpFile.csv
done
OPTIONS (skip=1)
LOAD DATA
INFILE /export/home/applmgr1/chalam/Upload/*.csv
REPLACE INTO TABLE XX_TEST_FTP_UP
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(FULL_NAME,EMPLOYEE_NUMBER)
whether it will check all the CSV and load the data or not

Removing columns in SQL file

I have a big SQL file (~ 200MB) with lots of INSERT instructions:
insert into `films_genres`
(`id`,`film_id`,`genre_id`,`num`)
values
(1,1,1,1),
(2,1,17,2),
(3,2,1,1),
...
How could I remove or ignore columns id, num in the script?
Easiest way might be to do the full insert into a temporary holding table and then insert the desired columns into the real table from the holding table.
insert into `films_genres_temp`
(`id`,`film_id`,`genre_id`,`num`)
values
(1,1,1,1),
(2,1,17,2),
(3,2,1,1),
...
insert into `films_genres`
(`film_id`,`genre_id`)
select `film_id`,`genre_id`
from `films_genres_temp`
CREATE TABLE #MyTempTable (id int,film_id smallint, genre_id int, num int)
INSERT INTO #MyTempTable (id,film_id,genre_id,num)
[Data goes here]
insert into films_genres (film_id,genre_id) select film_id,genre_id from #MyTempTable
drop table #MyTempTable
This Perl one-liner should do it:
perl -p -i.bak -e 's/\([^,]+,/\(/g; s/,[^,]+\)/\)/g' sqlfile
It edits the file in place, but creates a backup copy with the extension .bak.
Or if you prefer Ruby:
ruby -p -i.bak -e 'gsub(/\([^,]+,/, "("); gsub/, ")");' sqlfile