Use different line separator in awk - awk

I have a file as follows:
cat file
00:29:01|10.3.57.60|dbname1| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
^
00:01:00|10.3.56.101|dbname2|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"^
These are the output of sql results delimited by pipe (|). In order to identify new line I used ^ at the end of each row.
I want to get the output as:
1/00:29:01|10.3.57.60|parasol_ams| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
2/00:01:00|10.3.56.101|parasol_sprint_tep|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"
But when I am using:
cat file | awk -F '|' -v RS="^" '{ print FNR "/" $0 }'
I get:
1/00:29:01|10.3.57.60|parasol_ams| SELECT
re.id,
re.event_type_cd,
re.event_ts,
re.source_type,
re.source_id,
re.properties
FROM
table1 re
WHERE
re.id > 621982999
AND re.id <= 884892348
ORDER BY
re.id
2/
00:01:00|10.3.56.101|parasol_sprint_tep|BEGIN;declare "SQL_CUR00000000009CE140" cursor for SELECT id, cast(event_type_cd as character(4)) event_type_cd, CAST(event_ts AS DATE) event_ts, CAST(source_id AS character varying(100)) source_id, CAST(tx_id AS character varying(100)) tx_id, CAST(properties AS character varying(4000)) properties, CAST(source_type AS character(1)) source_type FROM table1 WHERE ID > 514725989 ORDER BY ID limit 500000;fetch 500000 in "SQL_CUR00000000009CE140"
3/

awk '/^\^/{next}/\|/{sub("^",++c"/")}1' file
awk -vRS='^' -F '|' '{sub("^\n","")}{printf "%s/TIME:%s HOST:%s DB:%s SQL:%s",FNR,$1,$2,$3,$4}' file

Related

Replace string with random text - Oracle SQL

I have a table table1 with 1 column - edi_value which is of type CLOB.
These are the entries:
seq edi_message
1 ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
2 ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
Please note - there can be varying number of lines, from 3 to 500.
What I'm looking for is the following conditions:
Ignore text before first * in each line, for every line, before the first *, it should not change. For ex. GS, ST should not change. ONLY after the first * should randomize
Replace numbers [0-9] with random numbers, for ex. if 0 is replaced with 1, then it should be 1 througout.
Replace text [A-Za-z] with random text, for ex. if A is replaced with W, then it should be replaced with W throughout
Leave special characters as is
One character/number should ONLY map to one random character/number
Output can be:
seq edi_message
1 ISA*11* *11* *13*4030111101 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130A1~
2 ISA*11* *11* *13*30234320023 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130W1~
How can this be achieved in Oracle SQL?
You can use translate with a helper function for generating random strings (though #LukStorms has a much neater SQL solution for that using LISTAGG), along with a method to tokenise and then re-concatenate the values into lines (I use a pure SQL method here for demonstration):
create or replace function f(p_low integer, p_high integer)
return varchar as
r varchar(2000) := '';
x integer;
begin
for i in p_low..p_high loop
x := dbms_random.value(0,length(r)+1);
r := substr(r,1,x)||chr(i)||substr(r,x+1);
end loop;
return r;
end;
/
select * from table1;
| EDI_VALUE |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A1~ |
| ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A |
with t as (select f(48,57)||f(65,90) translate_chars from dual)
select (select new_value
from (select substr(sys_connect_by_path(r_line,'
'),2) new_value, connect_by_isleaf isleaf
from (select lvl
, substr(line,1,instr(line,'*')-1)||
translate(substr(line,instr(line,'*'))
,'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
,(select translate_chars from t)) r_line
from (select level lvl
, regexp_substr(edi_value,'^.*$',1,level,'m') line
from (select table1.edi_value from dual)
connect by level <= regexp_count(edi_value,'^.*$',1,'m')))
start with lvl=1 connect by lvl=(prior lvl)+1)
where isleaf=1)
from table1;
| (SELECTNEW_VALUEFROM(SELECTSUBSTR(SYS_CONNECT_BY_PATH(R_LINE,''),2)NEW_VALUE,CONNECT_BY_ISLEAFISLEAFFROM(SELECTLVL,SUBSTR(LINE,1,INSTR(LINE,'*')-1)||TRANSLATE(SUBSTR(LINE,INSTR(LINE,'*')),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',(SELECTTRANSLATE_CHARSFR |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*66* *66* *67*1935006626 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G0~ |
| ISA*66* *66* *67*32471742247 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G |
db<>fiddle here
You can use CTE's with a CONNECT to generate the strings for the letters and numbers.
Then use the ordered and scrambled strings in the translate.
A CROSS APPLY can be used to REGEX split the message into parts.
Then only translate those that start with a *.
And use LISTAGG to glue the parts back together.
WITH
NUMS as
(
select
LISTAGG(n, '') WITHIN GROUP (ORDER BY n) as n_from,
LISTAGG(n, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as n_to
from (select level-1 n from dual connect by level <= 10)
),
LETTERS as
(
select
LISTAGG(c, '') WITHIN GROUP (ORDER BY c) as c_from,
LISTAGG(c, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as c_to
from (select chr(ascii('A')+level-1 ) c from dual connect by level <= 26)
)
SELECT ca.scrambled as scrambled_message
FROM table1 t
CROSS JOIN NUMS
CROSS JOIN LETTERS
CROSS APPLY
(
SELECT LISTAGG(CASE WHEN part like '*%' then translate(part, n_from||c_from, n_to||c_to) else part end, '') WITHIN GROUP (ORDER BY lvl) as scrambled
FROM
(
SELECT
level AS lvl,
REGEXP_SUBSTR(t.edi_message,'[*]\S+|[^*]+',1,level,'m') AS part
FROM dual
CONNECT BY level <= regexp_count(t.edi_message, '[*]\S+|[^*]+')+1
) parts
) ca;
A test on db<>fiddle here
Example output:
SCRAMBLED_MESSAGE
-----------------------------------------------------------------------------------------------------------
ISA*99* *99* *92*3525999959 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925V9~
ISA*99* *99* *92*25023205502 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925W9~

Find way for gathering data and replace with values from another table

I am looking for an Oracle SQL query to find a specific pattern and replace them with values from another table.
Scenario:
Table 1:
No column1
-----------------------------------------
12345 user:12345;group:56789;group:6785;...
Note: field 1 may be has one or more pattern
Table2 :
Id name type
----------------------
12345 admin user
56789 testgroup group
Result must be the same
No column1
-----------------------------------
12345 user: admin;group:testgroup
Logic:
First split the concatenated string to individual rows using connect
by clause and regex.
Join the newly created table(split_tab) with Table2(tab2).
Use listagg function to concatenate data in the columns.
Query:
WITH tab1 AS
( SELECT '12345' NO
,'user:12345;group:56789;group:6785;' column1
FROM DUAL )
,tab2 AS
( SELECT 12345 id
,'admin' name
,'user' TYPE
FROM DUAL
UNION
SELECT 56789 id
,'testgroup' name
,'group' TYPE
FROM DUAL )
SELECT no
,listagg(category||':'||name,';') WITHIN GROUP (ORDER BY tab2.id) column1
FROM ( SELECT NO
,REGEXP_SUBSTR( column1, '(\d+)', 1, LEVEL ) id
,REGEXP_SUBSTR( column1, '([a-z]+)', 1, LEVEL ) CATEGORY
FROM tab1
CONNECT BY LEVEL <= regexp_count( column1, '\d+' ) ) split_tab
,tab2
WHERE split_tab.id = tab2.id
GROUP BY no
Output:
No Column1
12345 user:admin;group:testgroup
with t1 (no, col) as
(
-- start of test data
select 1, 'user:12345;group:56789;group:6785;' from dual union all
select 2, 'user:12345;group:56789;group:6785;' from dual
-- end of test data
)
-- the lookup table which has the substitute strings
-- nid : concatenation of name and id as in table t1 which requires the lookup
-- tname : required substitute for each nid
, t2 (id, name, type, nid, tname) as
(
select t.*, type || ':' || id, type || ':' || name from
(
select 12345 id, 'admin' name, 'user' type from dual union all
select 56789, 'testgroup', 'group' from dual
) t
)
--select * from t2;
-- cte table calculates the indexes for the substrings (eg, user:12345)
-- no : sequence no in t1
-- col : the input string in t1
-- si : starting index of each substring in the 'col' input string that needs attention later
-- ei : ending index of each substring in the 'col' input string
-- idx : the order of substring to put them together later
,cte (no, col, si, ei, idx) as
(
select no, col, 1, case when instr(col,';') = 0 then length(col)+1 else instr(col,';') end, 1 from t1 union all
select no, col, ei+1, case when instr(col,';', ei+1) = 0 then length(col)+1 else instr(col,';', ei+1) end, idx+1 from cte where ei + 1 <= length(col)
)
,coll(no, col, sstr, idx, newstr) as
(
select
a.no, a.col, a.sstr, a.idx,
-- when a substitute is not found in t2, use the same input substring (eg. group:6785)
case when t2.tname is null then a.sstr else t2.tname end
from
(select cte.*, substr(col, si, ei-si) as sstr from cte) a
-- we don't want to miss if there is no substitute available in t2 for a substring
left outer join
t2
on (a.sstr = t2.nid)
)
select no, col, listagg(newstr, ';') within group (order by no, col, idx) from coll
group by no, col;

hive error, getting an EOF error in subtract query

I am getting an, missing EOF at '-' near HAB, the query for the most part looks correct. just not sure how to implement minus in HIVE.
SELECT
a.playerID AS ID,
a.yearID AS yearID,
(b.HAB - a.EG) AS HAB-EG
FROM
(SELECT
playerID,
yearID,
(E/G) AS EG
FROM fielding
WHERE (
yearID > 2005
AND yearID < 2009
AND G > 20
)
) AS a
JOIN
(SELECT
id,
year,
(hits/ab) AS HAB
FROM batting
WHERE(
year > 2005
AND year < 2009
AND ab > 40
)
) AS b ON a.playerID = b.id AND a.yearID = b.year;
Alias names should be quoted with backtick character (``) characters, when you include any additional character such as space or dash.
So use following:
SELECT
a.playerID AS ID,
a.yearID AS yearID,
(b.HAB - a.EG) AS `HAB-EG`

find a line based on string pattern and move it X places

I want to move the 1st instance of AND F.col_x IS NOT NULL pattern ( for each of the blocks 23 24 25 etc ) that follows the WHERE F.col_x = D.col_x pattern either of these ways
--Look for brackets Just before group by and add it in there
--alternately move that line 1 line below from where it was taken.
Either way the results would be the same
INPUT
*22
Select ((MYLILFUNC(F.col_x,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.col_x D
WHERE F.col_x = D.col_x
AND F.col_x IS NOT NULL
AND D.col_x IS NOT NULL )
GROUP BY F.col_x;
*23
Select ((MYLILFUNC(F.COL_y,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.DIM_DRG_CODE D
WHERE F.COL_y = D.COL_y
AND F.COL_y IS NOT NULL
AND D.COL_y IS NOT NULL )
GROUP BY F.COL_y;
*24
Select ((MYLILFUNC(F.COL_Z,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.COL_Z D
WHERE F.COL_Z = D.COL_Z
AND F.COL_Z IS NOT NULL
AND D.COL_Z IS NOT NULL )
GROUP BY F.COL_Z;
*25
Select ((MYLILFUNC(F.COL_XXX,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.COL_XX D
WHERE F.COL_XXX = D.COL_XXX
AND F.COL_XXX IS NOT NULL
AND D.COL_XXX IS NOT NULL )
GROUP BY F.COL_XXX;
OUTPUT
*22
Select ((MYLILFUNC(F.col_x,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.col_x D
WHERE F.col_x = D.col_x
AND D.col_x IS NOT NULL ) AND F.col_x IS NOT NULL
GROUP BY F.col_x;
*23
Select ((MYLILFUNC(F.COL_y,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.DIM_DRG_CODE D
WHERE F.COL_y = D.COL_y
AND D.COL_y IS NOT NULL ) AND F.COL_y IS NOT NULL
GROUP BY F.COL_y;
*24
Select ((MYLILFUNC(F.COL_Z,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.COL_Z D
WHERE F.COL_Z = D.COL_Z
AND D.COL_Z IS NOT NULL ) AND F.COL_Z IS NOT NULL
GROUP BY F.COL_Z;
*25
Select ((MYLILFUNC(F.COL_XXX,-99999))) AS WIDTH,
COUNT(*) AS SIZE
FROM MYDB.BGSQLTB F where NOT EXISTS ( sel '1' from MYDB.COL_XX D
WHERE F.COL_XXX = D.COL_XXX
AND D.COL_XXX IS NOT NULL ) AND F.COL_XXX IS NOT NULL
GROUP BY F.COL_XXX;
My search pattern using Ed is a bit too wide and takes more lines and I am not sure how I can get the moving logic done because it is relative to each selected line.
You can do this in a couple ways. With sed you can do
sed -e '/AND F\.[a-zA-Z_]* *IS *NOT *NULL/ { h; d }; /GROUP BY/ { H; x }'
What happens is anytime the first regular expression matches, the { h; d }; commands store the line in the hold buffer and move to the next line without outputting anything. Whenever the second regexp matches, the { H; x } append the current line to the hold buffer with a newline in between and then swap the hold buffer and the current line buffer. Then sed will automatically print out the pattern line. It's easy for this not to work correctly depending on your input, but it works fine for the sample you provided.
In awk it would be
awk 'tolower($0) ~ /and f.col_[a-z]* is not null/ {save = $0; next} /GROUP BY/ { print save } {print}'

Postgres: Could not identify an ordering operator for type unknown

I have this query with prepared statement:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY $1
LIMIT $2
I send the value of name ASC to $1 and 10 to $2
For some reason I am getting this error:
could not identify an ordering operator for type unknown
If I hard code the name ASC instead of $1, like this:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY name ASC
LIMIT $1
It is working fine.
What am I doing wrong?
For one column you can use CASE WHEN to parametrize it:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY
CASE WHEN $1 = 'name' THEN name
WHEN $1 = 'col_name' THEN col_name
ELSE ...
END
LIMIT $2;
or:
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
ORDER BY
CASE $1
WHEN 'name' THEN name
WHEN 'col_name' THEN col_name
ELSE column_name -- default sorting
END
LIMIT $2;
Using CASE you nay need to cast column to the same datatype to avoid implicit conversion errors.
EDIT:
SELECT sub.*
FROM (
SELECT * FROM ONLY service_services
UNION ALL
SELECT * FROM fleet.service_services
WHERE deleted=false
) As sub
ORDER BY
CASE $1
WHEN 'name' THEN name
WHEN 'col_name' THEN col_name
ELSE column_name -- default sorting
END
LIMIT $2;
you can't pass column names as a variable (unless you're using dynamic query building)