SQL insert with regexp - sql

I have a table with data in the following format
col1 col2
a1 a2;a3;a4
b1 b2
c1 c2;c3
d1 null
...
I'm trying to split the strings, get unique combinations of col1/col2 and insert them into tableB. So the expected outcome should look like this:
a1 a2
a1 a3
a1 a4
b1 b2
c1 c2
c1 c3
...
I tried the following query:
INSERT INTO tableB (col1, col2)
SELECT col1, (regexp_substr(col2,'[^;]+', 1, LEVEL)) FROM tableA
CONNECT BY regexp_substr(col2, '[^;]+', 1, LEVEL) IS NOT NULL;
Not sure what's going wrong here but it keeps executing (it actually went on for more than an hour) and when I finally cancel the task, nothing's been inserted. The table is quite large (around 25000 rows) but I've done similar inserts with larger tables and they worked fine.
I also tried adding a where clause (although it seems redundant) with
WHERE col2 LIKE'%;%'
That didn't help either.
Any suggestions would be great.
Edit: I tried counting the max number of substrings in col2, to ballpark the number of rows to be inserted, and found the max to be 42 substrings. The whole table has 25814 rows, so worst case scenario, it's inserting 1084104 rows. If that has anything to do with it.

Don't use connect by to split string into rows.
Use a PL/SQL procedure that does varchar2 -> collection split.
For a ad-hoc kind of query, stick with xmltable as a simple way to split string into rows (it is a bit slower than PL/SQL).
The following kind of query is expected to take 3-4 seconds for each input 1000 rows.
select t.col1, c2.val
from (
select 'a1' col1, 'a2;a3;a4' col2 from dual union all
select 'b1', 'b2' from dual union all
select 'c1', 'c2;c3' from dual union all
select 'd1', null from dual
) t
, xmltable('$WTF' passing
xmlquery(('"'||replace(replace(t.col2,'"','""'),';','","')||'"')
returning sequence
) as wtf
columns val varchar2(4000) path '.'
)(+) c2
Fiddle: http://sqlfiddle.com/#!4/9eecb7d/5059

One thing you can try is to select all the distinct values.
INSERT INTO tableB (col1, col2)
SELECT distinct col1, (regexp_substr(col2,'[^;]+', 1, LEVEL))
FROM tableA
CONNECT BY regexp_substr(col2, '[^;]+', 1, LEVEL) IS NOT NULL;
commit;
You should also commit the transaction if you need the changes to be permanent.

While I still don't really understand what was wrong with the initial query, I found a work around. The column with the strings I'm trying to split is participantcountries and the p_id contains unique identifiers.
CREATE OR REPLACE PROCEDURE split_countries IS
CURSOR get_rows IS
SELECT p_id, participantcountries
FROM staging_projects;
row_rec get_rows%ROWTYPE;
nb_countries number:=0;
country VARCHAR2(4);
BEGIN
OPEN get_rows;
LOOP
FETCH get_rows INTO row_rec;
EXIT WHEN get_rows%NOTFOUND;
SELECT regexp_count(row_rec.participantcountries, ';') INTO nb_countries
FROM staging_projects
WHERE p_id=row_rec.p_id;
IF (row_rec.participantcountries IS NULL) THEN
nb_countries:=0; -- if the field is null set counter to zero
ELSE
nb_countries:=nb_countries+1; -- nb of delimiters so +1 --> number of items
END IF;
WHILE (nb_countries >0) LOOP -- loop and get single country at current counter position
SELECT (regexp_substr(PARTICIPANTCOUNTRIES,'[^;]+', 1, nb_countries)) INTO country
FROM (SELECT * FROM staging_projects WHERE p_id=row_rec.p_id)
WHERE participantcountries IS NOT NULL;
nb_countries:=nb_countries-1;
INSERT INTO project_countries(proj_id, country_code) VALUES(row_rec.p_id, country);
END LOOP;
END LOOP;
CLOSE get_rows;
END split_countries;
This works fine but it seems like an overly complicated solution for the problem at hand.

Related

Update numbers with a counter in SQL

Suppose I have a table containing increasing and irregularly incremented numbers (3, 4, 7, 11, 16,...) in the first column col1.
I want to update col1 such that the numbers will be 1000 plus the row number. So something like:
UPDATE tab1 SET col1 = 1000 + row_number()
I am using Oracle SQL and would appreciate any help! Many thanks in advance.
In Oracle, the simplest method might be to create a sequence and use that:
create sequence temp_seq_rownum;
update tab1
set col1 = 1000 + temp_seq_rownum.nextval;
drop sequence temp_seq_rownum;
At the time I am posting this, a different answer is already marked as "correct". That answer assigns values 1001, 1002 etc. with no regard to the pre-existing values in col1.
To me that makes no sense. The problem is more interesting if the OP actually meant what he wrote - the new values should follow the pre-existing values in col1, so that while the numbers 1001, 1002, 1003 etc. are not the same as the pre-existing values, they still preserve the pre-existing order.
In Oracle, row_number() is not allowed in a straight update statement. Perhaps the simplest way to achieve this task is with a merge statement, as demonstrated below.
For testing I created a table with two columns, with equal values before the update, simply so that we can verify that the new values in col1 preserve the pre-existing order after the update. I start with the test table, then the merge statement and a select statement to see what merge did.
create table t (col1, col2) as
select 3, 3 from dual union all
select 4, 4 from dual union all
select 16, 16 from dual union all
select 7, 7 from dual union all
select 1, 1 from dual
;
Table T created.
merge into t
using ( select rowid as rid, 1000 + row_number() over (order by col1) as val
from t) s
on (t.rowid = s.rid)
when matched then update set col1 = val;
5 rows merged.
select *
from t
order by col1
;
COL1 COL2
------- -------
1001 1
1002 3
1003 4
1004 7
1005 16

Save value in local variable HANA SQL Script

I'm trying to take value from a non-empty row and overwrite it in the subsequent rows until another non-empty row appears and then write that in the subsequent rows. Coming from ABAP Background, I'm not sure how to accomplish this in HANA SQL Script. Here's a picture to show what the data looks like.
Basically 'Doe, John' should be overwritten into all the empty rows until 'Doe, Jane' appears and then 'Doe, Jane' should be overwritten into empty rows until another name appears.
My idea is to store the non-empty row in a local variable, but I haven't had much success so far. Here's my code:
tempTab1 = SELECT
CASE WHEN EMPLOYEE <> ''
THEN lv_emp = EMPLOYEE
ELSE EMPLOYEE
END AS EMPLOYEE,
FROM :tempTab;
In general, rows in dataset are unordered until you explicitly specify ORDER BY part of SQL. If you observe some order it may be a side-effect and can vary. So first of all you have to explicitly create a row number column (assume it's name is RECORD).
Then you should go this way:
Select only rows with non-empty data in column.
Use LEAD(RECORD) over(order by RECORD) to identify the next non-empty record number.
Join your source dataset to dataset defined on step 3 on between condition for RECORD field.
with a as (
select 1 as record, 'Val1' as field1 from dummy union
select 2 as record, '' as field1 from dummy union
select 3 as record, '' as field1 from dummy union
select 4 as record, 'Val2' as field1 from dummy union
select 5 as record, '' as field1 from dummy union
select 6 as record, '' from dummy union
select 7 as record, '' from dummy union
select 8 as record, 'Val3' as field1 from dummy
)
, fill_base as (
select field1, record, lead(record, 1, record) over(order by record asc) as next_record
from a
where field1 <> '' and field1 is not null
)
select
a.record
, case
when a.field1 = '' or a.field1 is null
then f.field1
else a.field1
end as field1
, a.field1 as field1_original
from a
left join fill_base as f
on a.record > f.record
and a.record < f.next_record
The performance in HANA may be bad in some cases since it process window functions very bad.
Here is another more elegant solution with two nested window functions than does not force you to write multiple selects for each column: How to make LAG() ignore NULLS in SQL Server?
You can use window aggregate function LAST_VALUE to achieve the imputation of missing values.
Sample Data
CREATE TABLE sample (id integer, sort integer, value varchar(10));
INSERT INTO sample VALUES (4711, 1, 'Hello');
INSERT INTO sample VALUES (4712, 2, null);
INSERT INTO sample VALUES (4713, 3, null);
INSERT INTO sample VALUES (4714, 4, 'World');
INSERT INTO sample VALUES (4715, 5, null);
INSERT INTO sample VALUES (4716, 6, '!');
Generate a new column with imputed values
SELECT base.*, LAST_VALUE(fill.value ORDER BY fill.sort) AS value_imputed
FROM sample base
LEFT JOIN sample fill ON fill.sort <= base.sort AND fill.value IS NOT NULL
GROUP BY base.id, base.sort, base.value
ORDER BY base.id, base.sort
Result
Note that sort could be anything determining the order (e.g. a timestamp).

Make in (SQL) dynamic for incoming values

Is it possible for in statement to be dynamic? like dynamic comma separation
for example:
DATA=1
select * from dual
where
account_id in (*DATA);
DATA=2
select * from dual
where
account_id in (*DATA1,*DATA2);
FOR DATA=n
how will i make the in statement dynamic/flexible (comma) for unknown quantity.
select * from dual
where
account_id in (*DATAn,*DATAn+1,etc);
A hierarchical query might help.
acc CTE represents sample data
lines #9 - 11 are what you might be looking for; data is concatenated with level pseudocolumn's value returned by a hierarchical query
Here you go:
SQL> with acc (account_id) as
2 (select 'data1' from dual union all
3 select 'data2' from dual union all
4 select 'data3' from dual union all
5 select 'data4' from dual
6 )
7 select *
8 from acc
9 where account_id in (select 'data' || level
10 from dual
11 connect by level <= &n
12 );
Enter value for n: 1
ACCOU
-----
data1
SQL> /
Enter value for n: 3
ACCOU
-----
data1
data2
data3
SQL>
As I can see, You are using the number in Where clause, Substitution will be enough to solve your problem.
See the example below:
CREATE table t(col1 number);
insert into t values(1);
insert into t values(2);
insert into t values(3);
-- Substitution variable initialization
define data1=1;
define data2='1,2';
--
-- With data1
select * from t where col1 in (&data1);
Output:
-- With data2
select * from t where col1 in (&data2);
Output:
Hope, This will be helpful to you.
Cheers!!
The basic problem is not the listagg function but a major misconception that just because elements in a string list are comma separated that a string with commas in it is a list. Not So. Consider a table with the following rows.
Key
- Data1
- Data2
- Data1,Data2
And the query: Select * from table_name where key = 'wanted_key'; Now if all commas separate independent elements then what value in for "wanted_Key" is needed to return only the 3rd row above? Even with the IN predicate 'Data1,Data2' is still just 1 value, not 2. For 2 values it would have to be ('Data1','Data2').
The problem you're having with Listagg is not because of the comma but because it's not the appropriate function. Listagg takes values from multiple rows and combines then into a single comma separated string but not comma separated list. Example:
with elements as
( select 'A' code, 'Data1' item from dual union all
select 'A', 'Data2' from dual union all
select 'A', 'Data3' from dual
)
select listagg( item, ',') within group (order by item)
from elements group by code;
(You might also want to try 'Data1,Data2' as a single element. Watch out.
What you require is a query that breaks out each element separately. This can be done with
with element_list as
(select 'Data1,Data2,Data3' items from dual) -- get paraemter string
, parsed as
(select regexp_substr(items,'[^,]+',1,level) item
from element_list connect by regexp_substr(items,'[^,]+',1,level) is not null -- parse string to elements
)
The CTE "parsed" can now be used as table/view in your query.
This will not perform as well as querying directly with a parameter, but performance degradation is the cost of dynamic/flexible queries.
Also as set up this will NOT handle parameters which contain commas within an individual element. That would require much more code as you would have to determine/design how to keep the comma in those elements.

Insert Multiple Rows SQL Teradata

I am creating a volatile table and trying to insert rows to the table. I can upload one row like below...
create volatile table Example
(
ProductID VARCHAR(15),
Price DECIMAL (15,2)
)
on commit preserve rows;
et;
INSERT INTO Example
Values
('Steve',4);
However, when I try to upload multiple I get the error:
"Syntax error: expected something between ')' and ','."
INSERT INTO Example
Values
('Steve',4),
('James',8);
As Gordon said, Teradata doesn't support VALUES with multiple rows (and the UNION ALL will fail because of the missing FROM.
You can utilize a Multi Statement Request (MSR) instead:
INSERT INTO Example Values('Steve',4)
;INSERT INTO Example Values('James',8)
;
If it's a BTEQ job the Inserts are submitted as one block after the final semicolon (when there's a new command starting on the same line it's part of the MSR). In SQL Assistant or Studio you must submit it using F9 instead of F5.
I don't think Teradata supports the multiple row values syntax. Just use select:
INSERT INTO Example(ProductId, Price)
WITH dual as (SELECT 1 as x)
SELECT 'Steve' as ProductId, 4 as Price FROM dual UNION ALL
SELECT 'James' as ProductId, 8 as Price FROM dual;
CTE syntax (working):
insert into target_table1 (col1, col2)
with cte as (select 1 col1)
select 'value1', 'value2' from cte
union all
select 'value1a', 'value2a' from cte
;
CTE Syntax not working in Teradata
(error: expected something between ")" and the "insert" keyword)
with cte as (select 1 col1)
insert into target_table1 (col1, col2)
select 'value1', 'value2' from cte
union all
select 'value1a', 'value2a' from cte
;
I found a solution for this via RECURSIVE. It goes like this:-
INSERT INTO table (col1, col2)
with recursive table (col1, col2) as
(select 'val1','val2' from table) -- 1
select 'val1','val2' from table -- 2
union all select 'val3','val4' from table
union all select 'val5','val6' from table;
Data of line 1 does not get inserted (but you need this line). Starting from line 2, the data you enter for val1, val2 etc. gets inserted into the respective columns. Use as many UNION ALLs' as many rows you want to insert. Hope this helps :)
At least in our version of Teradata, we are not able to use an insert statement with a CTE. Instead, find a real table (preferably small in size) and do a top 1.
Insert Into OtherRealTable(x, y)
Select top 1
'x' as x,
'y' as y
FROM RealTable
create table dummy as (select '1' col1) with data;
INSERT INTO Student
(Name, Maths, Science, English)
SELECT 'Tilak', 90, 40, 60 from dummy union
SELECT 'Raj', 30, 20, 10 from dummy
;
yes you can try this.
INSERT INTO Student
SELECT (Name, Maths, Science, English) FROM JSON_Table
(ON (SELECT 1 id,cast('{"DataSet" : [
{"s":"m", "Name":"Tilak", "Maths":"90","Science":"40", "English":"60" },
{"s":"m", "Name":"Raj", "Maths":"30","Science":"20", "English":"10" }
]
}' AS json ) jsonCol)
USING rowexpr('$.DataSet[*]')
colexpr('[{"jsonpath":"$.s","type":"CHAR(1)"},{"jsonpath":"$.Name","type":"VARCHAR(30)"}, {"jsonpath":"$.Maths","type":"INTEGER"}, {"jsonpath":"$.Science","type":"INTEGER"}, {"jsonpath":"$.English","type":"INTEGER"}]')
) AS JT(id,State,Name, Maths, Science, English)

Returning rows that had no matches

I've read and read and read but I haven't found a solution to my problem.
I'm doing something like:
SELECT a
FROM t1
WHERE t1.b IN (<external list of values>)
There are other conditions of course but this is the jist of it.
My question is: is there a way to show which in the manually entered list of values didn't find a match? I've looked but I can't find and I'm going in circles.
Create a temp table with the external list of values, then you can do:
select item
from tmptable t
where t.item not in ( select b from t1 )
If the list is short enough, you can do something like:
with t as (
select case when t.b1='FIRSTITEM' then 1 else 0 end firstfound
case when t.b1='2NDITEM' then 1 else 0 end secondfound
case when t.b1='3RDITEM' then 1 else 0 end thirdfound
...
from t1 wher t1.b in 'LIST...'
)
select sum(firstfound), sum(secondfound), sum(thirdfound), ...
from t
But with proper rights, I would use Nicholas' answer.
To display which values in the list of values haven't found a match, as one of the approaches, you could create a nested table SQL(schema object) data type:
-- assuming that the values in the list
-- are of number datatype
create type T_NumList as table of number;
and use it as follows:
-- sample of data. generates numbers from 1 to 11
SQL> with t1(col) as(
2 select level
3 from dual
4 connect by level <= 11
5 )
6 select s.column_value as without_match
7 from table(t_NumList(1, 2, 15, 50, 23)) s -- here goes your list of values
8 left join t1 t
9 on (s.column_value = t.col)
10 where t.col is null
11 ;
Result:
WITHOUT_MATCH
-------------
15
50
23
SQLFiddle Demo
There is no easy way to convert "a externally provided" list into a table that can be used to do the comparison. One way is to use one of the (undocumented) system types to generate a table on the fly based on the values supplied:
with value_list (id) as (
select column_value
from table(sys.odcinumberlist (1, 2, 3)) -- this is the list of values
)
select l.id as missing_id
from value_list l
left join t1 on t1.id = l.id
where t1.id is null;
There are ways to get what you have described, but they have requirements which exceed the statement of the problem. From the minimal description provided, there's no way to have the SQL return the list of the manually-entered values that did not match.
For example, if it's possible to insert the manually-entered values into a separate table - let's call it matchtbl, with the column named b - then the following should do the job:
SELECT matchtbl.b
FROM matchtbl
WHERE matchtbl.b NOT IN (SELECT distinct b
FROM t1)
Of course, if the data is being processed by a programming language, it should be relatively easy to keep track of the set of values returned by the original query, by adding the b column to the output, and then perform the set difference.
Putting the list in an in clause makes this hard. If you can put the list in a table, then the following works:
with list as (
select val1 as value from dual union all
select val2 from dual union all
. . .
select valn
)
select list.value, count(t1.b)
from list left outer join
t1
on t1.b = list.value
group by list.value;