Complex sub-string replace query SQL - sql

I have the following table containing path info:
I need to replace the DIRECTORY_NAME value in the PATH field with the NEW_DIR_NAME value recursively.
sample table:
PATH |DIRECTORY_NAME |NEW_DIR_NAME
...............................................................................................................
\folder1\folder2\2a | folder2\2a | folder2/2a
\folder1\folder2\2a\folder3 | folder3 | folder3
\folder1\folder2\2a\folder4 | folder4 | folder4
\folder1\folder2\2a\folder4\2a\2b | 2a\2b | 2a/2b
...............................................................................................................
The result would look like this:
* changes are in bold
NEW_PATH
...............................................................................................................
\folder1\ folder2/2a
\folder1\ folder2/2a\folder3
\folder1\ folder2/2a\folder4
\folder1\ folder2/2a\folder4\ 2a/2b
...............................................................................................................
database is Oracle.
using the select replace(PATH, DIRECTORY_NAME, NEW_DIR_NAME) function will yield the folowing (not the solution):
\folder1\ folder2/2a
\folder1\folder2\2a\ folder3
\folder1\folder2\2a\ folder4
\folder1\folder2\2a\folder4\ 2a/2b

Please tell me your field name isn't really STRING. Anyways, here's the code you need, based on the supplied field names.
SELECT REPLACE(STRING,REFERENCE,REPLACE_WITH)

Your problem is your data. Your table posits a one-to-one relationship between PATH and DIRECTORY_NAME, and hence with NEW_DIR_NAME. But according to your required output this is clearly not so. The DIRECTORY_NAME appears in multiple values of PATH.
So what you need to do is run the replace() statement for every combination where DIRECTORY_NAME != NEW_DIR_NAME
for lrec in ( select DIRECTORY_NAME, NEW_DIR_NAME
from your_table
where DIRECTORY_NAME != NEW_DIR_NAME )
loop
update your_table
set path = replace(PATH, lrec.DIRECTORY_NAME, lrec.NEW_DIR_NAME)
;
end loop;
This is not a particularly efficient approach but presumably this is a one-off exercise.

Related

Create table name in Hive using variable subsitution

I'd like to create a table name in Hive using variable substitution.
E.g.
SET market = "AUS";
create table ${hiveconf:market_cd}_active as ... ;
But it fails. Any idea how it can be achieved?
You should use backtrics (``) for name for that, like:
SET market=AUS;
CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
DESCRIBE `${hiveconf:market}_active`;
Example run script.sql from beeline:
$ beeline -u jdbc:hive2://localhost:10000/ -n hadoop -f script.sql
Connecting to jdbc:hive2://localhost:10000/
...
0: jdbc:hive2://localhost:10000/> SET market=AUS;
No rows affected (0.057 seconds)
0: jdbc:hive2://localhost:10000/> CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
...
INFO : Dag name: CREATE TABLE `AUS_active` AS SELECT 1(Stage-1)
...
INFO : OK
No rows affected (12.402 seconds)
0: jdbc:hive2://localhost:10000/> DESCRIBE `${hiveconf:market}_active`;
...
INFO : Executing command(queryId=hive_20190801194250_1a57e6ec-25e7-474d-b31d-24026f171089): DESCRIBE `AUS_active`
...
INFO : OK
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| _c0 | int | |
+-----------+------------+----------+
1 row selected (0.132 seconds)
0: jdbc:hive2://localhost:10000/> Closing: 0: jdbc:hive2://localhost:10000/
Markovitz's criticisms are correct, but do not produce a correct solution. In summary, you can use variable substitution for things like string comparisons, but NOT for things like naming variables and tables. If you know much about language compilers and parsers, you get a sense of why this would be true. You could construct such behavior in a language like Java, but SQL is just too crude.
Running that code produces an error, "cannot recognize input near '$' '{' 'hiveconf' in table name".(I am running Hortonworks, Hive 1.2.1000.2.5.3.0-37).
I spent a couple hours Googling and experimenting with different combinations of punctuation, different tools ranging from command line, Ambari, and DB Visualizer, etc., and I never found any way to construct a table name or a field name with a variable value. I think you're stuck with using variables in places where you need a string literal, like comparisons, but you cannot use them in place of reserved words or existing data structures, if that makes sense. By example:
--works
drop table if exists user_rgksp0.foo;
-- Does NOT work:
set MY_FILE_NAME=user_rgksp0.foo;
--drop table if exists ${hiveconf:MY_FILE_NAME};
-- Works
set REPORT_YEAR=2018;
select count(1) as stationary_event_count, day, zip_code, route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = '${hiveconf:REPORT_YEAR}'
-- Does NOT Work:
set MY_VAR_NAME='zip_code'
select count(1) as stationary_event_count, day, '${hiveconf:MY_VAR_NAME}', route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = 2018
The qualifies should be removed
You're using the wrong variable name
SET market=AUS; create table ${hiveconf:market}_active as select 1;

creating external table from compressed (gz format) files without selecting all fields

I have gz files in a folder. I need only 3 columns from these files, but each line has over 100 of them. At the moment I create a view this way.
drop table MAK_CHARGE_RCR;
create external table MAK_CHARGE_RCR
(LINE string)
STORED as SEQUENCEFILE
LOCATION '/apps/hive/warehouse/mydb.db/file_rcr';
drop view VW_MAK_CHARGE_RCR;
create view VW_MAK_CHARGE_RCR as
Select LINE[57] as CREATE_DATE, LINE[64] as SUBS_KEY, LINE[63] as RC_TERM_NAME
from
(Select split(LINE, '\\|') as LINE
from MAK_CHARGE_RCR) a;
The view has the fields I need. Now I have to do the same, but without CTAS and I am not sure how to go about it. What can I do?
I was told the table must look like this
create external table MAK_CHARGE_RCR
(CREATE_DATE string, SUBS_KEY string, RC_TERM_NAME etc)
I could split the line like this
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\\|'
but I'll need to list every column. I have another group of files with over 1000 columns. All of them I'll need to list. This just seems a bit excessive, so I wondered if it is possible to do
create external table arstel.MAK_CHARGE_RCR
(split(LINE, '\\|')[57] string,
split(LINE, '\\|')[64] string
etc)
This doesn't work obviously, but maybe there are work arounds?
RegexSerDe
For educational purposes
P.s.
I intend to create an enhanced version of the CSV SerDe that excepts an additional parameter with the positions of the requested columns.
Demo
bash
echo {a..c}{1..100} | xargs -n 100 | tr ' ' '|' | \
hdfs dfs -put - /user/hive/warehouse/mytable/data.txt
hive
create external table mytable
(
col58 string
,col64 string
,col65 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ("input.regex" = "^(?:([^|]*)\\|){58}(?:([^|]*)\\|){6}([^|]*)\\|.*$")
stored as textfile
location '/user/hive/warehouse/mytable'
;
select * from mytable
;
+---------------+---------------+---------------+
| mytable.col58 | mytable.col64 | mytable.col65 |
+---------------+---------------+---------------+
| a58 | a64 | a65 |
| b58 | b64 | b65 |
| c58 | c64 | c65 |
+---------------+---------------+---------------+

SQL query to change file extension in a record containing a file-path?

Given an SQL table Table with a column path, how can I modify values like /dir/subdir/file.aaa => /dir/subdir/file.bbb e.g. modify just the file-extension without having to hard-code the specific file/path into my query?
Seems a perfect fit for regexp_replace :
with t as (select '/dir/subdir/file.aaa' as path from dual
union all select '/dir/subdir.aaa/file.aaa' from dual)
select regexp_replace(path, '[.][^.]*$', '.bbb') path
-- ^^^^ ^^^^^^^^^ ^^^^
-- replace last dot-whatever by the "right" extension
from t
where path like '%.aaa'
-- ^^^^^
-- only for path ending with the "wrong" extension
See http://sqlfiddle.com/#!4/d41d8/37017 for some tests
If the column only contains values that are structured like a file, path the following will work:
update the_table
set path = replace(path, '.aaa', '.bbb')
where path like '%.aaa';
Note that this will also update a value like /dir/subdir.aaa/file.aaa to /dir/subdir.bbb/file.bbb.
Another option is to use a regular expression:
update foo
set file_path = regexp_replace(file_path, '\.aaa$', '.bbb', 1, 0, 'i')
where lower(file_path) like '%.aaa';

Find exact file name from the file name with complete path stored in database

I need to find exact file name by executing SQL query on the table containing the file_name column .In file_name column the complete path of files are stored like D:/Workspace/app.js
I can find app.js with query Query
SELECT *
FROM details
WHERE file_name LIKE '%app.js'
but the problem is if I write the query like
SELECT *
FROM details
WHERE file_name LIKE '%p.js'
it lists app.js file also . So anyone could guide me how to get an exact match for file name from the database if file names are stored with the comple path?
Thanks in advance.
How about this?
SELECT * FROM details WHERE file_name LIKE '%/app.js' OR file_name LIKE '%\app.js'
The "%" sign is used to define wildcards (missing letters) both before and after the pattern. So you'll never find %app.js because there are no xxxxapps.js.
Thanks to all of you ,
I got the result I wanted
sql = "SELECT * FROM details WHERE file_name RLIKE ?";
ps = conn.prepareStatement(sql);
ps.setString(1, "[[:<:]]"+fname+"[[:>:]]");
This gives the exact string that fname varible contains.

How to format Oracle SQL text-only select output

I am using Oracle SQL (in SQLDeveloper, so I don't have access to SQLPLUS commands such as COLUMN) to execute a query that looks something like this:
select assigner_staff_id as staff_id, active_flag, assign_date,
complete_date, mod_date
from work where assigner_staff_id = '2096';
The results it give me look something like this:
STAFF_ID ACTIVE_FLAG ASSIGN_DATE COMPLETE_DATE MOD_DATE
---------------------- ----------- ------------------------- ------------------------- -------------------------
2096 F 25-SEP-08 27-SEP-08 27-SEP-08 02.27.30.642959000 PM
2096 F 25-SEP-08 25-SEP-08 25-SEP-08 01.41.02.517321000 AM
2 rows selected
This can very easily produce a very wide and unwieldy textual report when I'm trying to paste the results as a nicely formatted quick-n-dirty text block into an e-mail or problem report, etc. What's the best way to get rid of all tha extra white space in the output columns when I'm using just plain-vanilla Oracle SQL? So far all my web searches haven't turned up much, as all the web search results are showing me how to do it using formatting commands like COLUMN in SQLPLUS (which I don't have).
In your statement, you can specify the type of output you're looking for:
select /*csv*/ col1, col2 from table;
select /*Delimited*/ col1, col2 from table;
there are other formats available such as xml, html, text, loader, etc.
You can change the formatting of these particular options under tools > preferences > Database > Utilities > Export
Be sure to choose Run Script rather than Run Statement.
* this is for Oracle SQL Developer v3.2
What are you using to get the results? The output you pasted looks like it's coming from SQL*PLUS. It may be that whatever tool you are using to generate the results has some method of modifying the output.
By default Oracle outputs columns based upon the width of the title or the width of the column data which ever is wider.
If you want make columns smaller you will need to either rename them or convert them to text and use substr() to make the defaults smaller.
select substr(assigner_staff_id, 8) as staff_id,
active_flag as Flag,
to_char(assign_date, 'DD/MM/YY'),
to_char(complete_date, 'DD/MM/YY'),
mod_date
from work where assigner_staff_id = '2096';
What you can do with sql is limited by your tool. SQL Plus has commands to format the columns but they are not real easy to use.
One quick approach is to paste the output into excel and format it there or just attach the spreadsheet. Some tools will save the output directly as a spreadsheet.
Nice question. I really had to think about it.
One thing you could do is change your SQL so that it only returns the narrowest usable columns.
e.g. (I'm not very hot on oracle syntax, but something similar should work):
select substring( convert(varchar(4), assigner_staff_id), 1, 4 ) as id,
active_flag as act, -- use shorter column name
-- etc.
from work where assigner_staff_id = '2096';
Does that make sense?
If you were doing this on unix/linux, I would suggest running it from the command line and piping it through an awk script.
If I've miss-understood, then please update your question and I'll have another go :)
If you don't have alot of rows returned I'll often use Tom Kytes print_table function.
SQL> set serveroutput on
SQL> execute print_table('select * from all_objects where rownum < 3');
OWNER : SYS
OBJECT_NAME : /1005bd30_LnkdConstant
SUBOBJECT_NAME :
OBJECT_ID : 27574
DATA_OBJECT_ID :
OBJECT_TYPE : JAVA CLASS
CREATED : 22-may-2008 11:41:13
LAST_DDL_TIME : 22-may-2008 11:41:13
TIMESTAMP : 2008-05-22:11:41:13
STATUS : VALID
TEMPORARY : N
GENERATED : N
SECONDARY : N
-----------------
OWNER : SYS
OBJECT_NAME : /10076b23_OraCustomDatumClosur
SUBOBJECT_NAME :
OBJECT_ID : 22390
DATA_OBJECT_ID :
OBJECT_TYPE : JAVA CLASS
CREATED : 22-may-2008 11:38:34
LAST_DDL_TIME : 22-may-2008 11:38:34
TIMESTAMP : 2008-05-22:11:38:34
STATUS : VALID
TEMPORARY : N
GENERATED : N
SECONDARY : N
-----------------
PL/SQL procedure successfully completed.
SQL>
If its lots of rows, i'll just do the query in SQL Developer and save as xls, businessy types love excel for some reason.
Why not just use the "cast" function?
select
(cast(assigner_staff_id as VARCHAR2(4)) AS STAFF_ID,
(cast(active_flag as VARCHAR2(1))) AS A,
(cast(assign_date as VARCHAR2(10))) AS ASSIGN_DATE,
(cast(COMPLETE_date as VARCHAR2(10))) AS COMPLETE_DATE,
(cast(mod_date as VARCHAR2(10))) AS MOD_DATE
from work where assigner_staff_id = '2096';