I would like to find the distinct CLOB values that can assume the column called CLOB_COLUMN (of type CLOB) contained in the table called COPIA.
I have selected a PROCEDURAL WAY to solve this problem, but I would prefer to give a simple SELECT as the following: SELECT DISTINCT CLOB_COLUMN FROM TABLE avoiding the error "ORA-00932: inconsistent datatypes: expected - got CLOB"
How can I achieve this?
Thank you in advance for your kind cooperation. This is the procedural way I've thought:
-- Find the distinct CLOB values that can assume the column called CLOB_COLUMN (of type CLOB)
-- contained in the table called COPIA
-- Before the execution of the following PL/SQL script, the CLOB values (including duplicates)
-- are contained in the source table, called S1
-- At the end of the excecution of the PL/SQL script, the distinct values of the column called CLOB_COLUMN
-- can be find in the target table called S2
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE S1 DROP STORAGE';
EXECUTE IMMEDIATE 'DROP TABLE S1 CASCADE CONSTRAINTS PURGE';
EXCEPTION
WHEN OTHERS
THEN
BEGIN
NULL;
END;
END;
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE S2 DROP STORAGE';
EXECUTE IMMEDIATE 'DROP TABLE S2 CASCADE CONSTRAINTS PURGE';
EXCEPTION
WHEN OTHERS
THEN
BEGIN
NULL;
END;
END;
CREATE GLOBAL TEMPORARY TABLE S1
ON COMMIT PRESERVE ROWS
AS
SELECT CLOB_COLUMN FROM COPIA;
CREATE GLOBAL TEMPORARY TABLE S2
ON COMMIT PRESERVE ROWS
AS
SELECT *
FROM S1
WHERE 3 = 9;
BEGIN
DECLARE
CONTEGGIO NUMBER;
CURSOR C1
IS
SELECT CLOB_COLUMN FROM S1;
C1_REC C1%ROWTYPE;
BEGIN
FOR C1_REC IN C1
LOOP
-- How many records, in S2 table, are equal to c1_rec.clob_column?
SELECT COUNT (*)
INTO CONTEGGIO
FROM S2 BETA
WHERE DBMS_LOB.
COMPARE (BETA.CLOB_COLUMN,
C1_REC.CLOB_COLUMN) = 0;
-- If it does not exist, in S2, a record equal to c1_rec.clob_column,
-- insert c1_rec.clob_column in the table called S2
IF CONTEGGIO = 0
THEN
BEGIN
INSERT INTO S2
VALUES (C1_REC.CLOB_COLUMN);
COMMIT;
END;
END IF;
END LOOP;
END;
END;
If it is acceptable to truncate your field to 32767 characters this works:
select distinct dbms_lob.substr(FIELD_CLOB,32767) from Table1
You could compare the hashes of the CLOB to determine if they are different:
SELECT your_clob
FROM your_table
WHERE ROWID IN (SELECT MIN(ROWID)
FROM your_table
GROUP BY dbms_crypto.HASH(your_clob, dbms_crypto.HASH_SH1))
Edit:
The HASH function doesn't guarantee that there will be no collision. By design however, it is really unlikely that you will get any collision. Still, if the collision risk (<2^80?) is not acceptable, you could improve the query by comparing (with dbms_lob.compare) the subset of rows that have the same hashes.
add TO_CHAR after distinct keyword to convert CLOB to CHAR
SELECT DISTINCT TO_CHAR(CLOB_FIELD) from table1; //This will return distinct values in CLOB_FIELD
Use this approach. In table profile column content is NCLOB. I added the where clause to reduce the time it takes to run which is high,
with
r as (select rownum i, content from profile where package = 'intl'),
s as (select distinct (select min(i) from r where dbms_lob.compare(r.content, t.content) = 0) min_i from profile t where t.package = 'intl')
select (select content from r where r.i = s.min_i) content from s
;
It is not about to win any prizes for efficiency but should work.
select distinct DBMS_LOB.substr(column_name, 3000) from table_name;
If truncating the clob to the size of a varchar2 won't work, and you're worried about hash collisions, you can:
Add a row number to every row;
Use DBMS_lob.compare in a not exists subquery. Exclude duplicates (this means: compare = 0) with a higher rownum.
For example:
create table t (
c1 clob
);
insert into t values ( 'xxx' );
insert into t values ( 'xxx' );
insert into t values ( 'yyy' );
commit;
with rws as (
select row_number () over ( order by rowid ) rn,
t.*
from t
)
select c1 from rws r1
where not exists (
select * from rws r2
where dbms_lob.compare ( r1.c1, r2.c1 ) = 0
and r1.rn > r2.rn
);
C1
xxx
yyy
To bypass the oracle error, you have to do something like this :
SELECT CLOB_COLUMN FROM TABLE COPIA C1
WHERE C1.ID IN (SELECT DISTINCT C2.ID FROM COPIA C2 WHERE ....)
I know this is an old question but I believe I've figure out a better way to do what you are asking.
It is kind of like a cheat really...The idea behind it is that You can't do a DISTINCT of a Clob column but you can do a DISTINCT on a Listagg function of a Clob_Column...you just need to play with the partition clause of the Listagg function to make sure it will only return one value.
With that in mind...here is my solution.
SELECT DISTINCT listagg(clob_column,'| ') within GROUP (ORDER BY unique_id) over (PARTITION BY unique_id) clob_column
FROM copia;
I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:
I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell...
I would like to see
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | stop
1 | Shone | cancell
....
A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.
Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.
Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:
select
(row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;
If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:
select
n::int
into numbers
from
(select
row_number() over (order by true) as n
from cmd_logs)
cross join
(select
max(regexp_count(user_action, '[,]')) as max_num
from cmd_logs)
where
n <= max_num + 1;
Once there is a numbers table, we can do:
select
user_id,
user_name,
split_part(user_action,',',n) as parsed_action
from
cmd_logs
cross join
numbers
where
split_part(user_action,',',n) is not null
and split_part(user_action,',',n) != '';
Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:
... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced
... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action
Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.
If you know that there are not many actions in your user_action column, you use recursive sub-querying with union all and therefore avoiding the aux numbers table.
But it requires you to know the number of actions for each user, either adjust initial table or make a view or a temporary table for it.
Data preparation
Assuming you have something like this as a table:
create temporary table actions
(
user_id varchar,
user_name varchar,
user_action varchar
);
I'll insert some values in it:
insert into actions
values (1, 'Shone', 'start,stop,cancel'),
(2, 'Gregory', 'find,diagnose,taunt'),
(3, 'Robot', 'kill,destroy');
Here's an additional table with temporary count
create temporary table actions_with_counts
(
id varchar,
name varchar,
num_actions integer,
actions varchar
);
insert into actions_with_counts (
select user_id,
user_name,
regexp_count(user_action, ',') + 1 as num_actions,
user_action
from actions
);
This would be our "input table" and it looks just as you expected
select * from actions_with_counts;
id
name
num_actions
actions
2
Gregory
3
find,diagnose,taunt
3
Robot
2
kill,destroy
1
Shone
3
start,stop,cancel
Again, you can adjust initial table and therefore skipping adding counts as a separate table.
Sub-query to flatten the actions
Here's the unnesting query:
with recursive tmp (user_id, user_name, idx, user_action) as
(
select id,
name,
1 as idx,
split_part(actions, ',', 1) as user_action
from actions_with_counts
union all
select user_id,
user_name,
idx + 1 as idx,
split_part(actions, ',', idx + 1)
from actions_with_counts
join tmp on actions_with_counts.id = tmp.user_id
where idx < num_actions
)
select user_id, user_name, user_action as parsed_action
from tmp
order by user_id;
This will create a new row for each action, and the output would look like this:
user_id
user_name
parsed_action
1
Shone
start
1
Shone
stop
1
Shone
cancel
2
Gregory
find
2
Gregory
diagnose
2
Gregory
taunt
3
Robot
kill
3
Robot
destroy
Here are two ways to achieve this.
In my example, I'm assuming that I am accepting a comma separated list of values. My values look like schema.table.column.
The first involves using a recursive CTE.
drop table if exists #dep_tbl;
create table #dep_tbl as
select 'schema.foobar.insert_ts,schema.baz.load_ts' as dep
;
with recursive tmp (level, dep_split, to_split) as
(
select 1 as level
, split_part(dep, ',', 1) as dep_split
, regexp_count(dep, ',') as to_split
from #dep_tbl
union all
select tmp.level + 1 as level
, split_part(a.dep, ',', tmp.level + 1) as dep_split_u
, tmp.to_split
from #dep_tbl a
inner join tmp on tmp.dep_split is not null
and tmp.level <= tmp.to_split
)
select dep_split from tmp;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
The second involves a stored procedure.
CREATE OR REPLACE PROCEDURE so_test(dependencies_csv varchar(max))
LANGUAGE plpgsql
AS $$
DECLARE
dependencies_csv_vals varchar(max);
BEGIN
drop table if exists #dep_holder;
create table #dep_holder
(
avoid varchar(60000)
);
IF dependencies_csv is not null THEN
dependencies_csv_vals:='('||replace(quote_literal(regexp_replace(dependencies_csv,'\\s','')),',', '\'),(\'') ||')';
execute 'insert into #dep_holder values '||dependencies_csv_vals||';';
END IF;
END;
$$
;
call so_test('schema.foobar.insert_ts,schema.baz.load_ts')
select
*
from
#dep_holder;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
in conclusion
If you only care about one single column in your input (the X delimited values), then I think the stored procedure is easier/faster.
However, if you have other columns you care about and want to keep those columns along with your comma separated value column now transformed to rows, OR, if you want to know the argument (original list of delimited values), I think the stored procedure is the way to go. In that case, you can just add those other columns to your columns selected in the recursive query.
You can get the expected result with the following query. I'm using "UNION ALL" to convert a column to row.
select user_id, user_name, split_part(user_action,',',1) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',2) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',3) as parsed_action from cmd_logs
Here's my equally-terrible answer.
I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg
event_id | user_ids
1 | 5,18,25,99,105
In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.
SELECT e.event_id, u.id as user_id
FROM events e
LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.
Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.
Like I said: a terrible, no-good solution.
Late to the party but I got something working (albeit very slow though)
with nums as (select n::int n
from
(select
row_number() over (order by true) as n
from table_with_enough_rows_to_cover_range)
cross join
(select
max(json_array_length(json_column)) as max_num
from table_with_json_column )
where
n <= max_num + 1)
select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
from nums, table_with_json_column
where json_extract_array_element_text(json_column,nums.n-1) != ''
and nums.n <= json_array_length(json_column)
Thanks to answer by Bob Baxley for inspiration
Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306
Is generating numbers table using the following SQL
https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482
SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
INTO numbers
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
ORDER BY 1
LIMIT 100
"ORDER BY" is there only in case you want paste it without the INTO clause and see the results
create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.
here is the magic code:-
CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
AS $$
DECLARE
cnt INTEGER := 1;
no_of_parts INTEGER := (select REGEXP_COUNT ( string , ',' ));
sql VARCHAR(MAX) := '';
item character varying := '';
BEGIN
-- Create table
sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
RAISE NOTICE 'executing sql %', sql ;
EXECUTE sql;
<<simple_loop_exit_continue>>
LOOP
item = (select split_part("string",',',cnt));
RAISE NOTICE 'item %', item ;
sql := 'INSERT INTO split_table SELECT '''||item||''' ';
EXECUTE sql;
cnt = cnt + 1;
EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
END LOOP;
END ;
$$ LANGUAGE plpgsql;
Usage example:-
call public.sp_string_split('john,smith,jones');
select *
from split_table
You can try copy command to copy your file into redshift tables
copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
You can use delimiter ',' option.
For more details of copy command options you can visit this page
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
below is my code that will trigger when a new row in Trip is inserted. Then it will update on the column totalTripMade in the Driver table.
CREATE OR REPLACE TRIGGER UPDATE_TOTAL_TRIPS_MADE
AFTER INSERT
ON TRIP
FOR EACH ROW
DECLARE
tripsDone NUMBER(6);
driverL# NUMBER(12);
BEGIN
--Find the L# of the Driver performing the INSERT into the Trip table
SELECT D.L# INTO driverL#
FROM DRIVER D
WHERE D.L# =: NEW.L#;
--Find the number of trips done by the driver (Error occured here)
SELECT COUNT(T#) INTO tripsDone
FROM TRIP T
WHERE NEW.L# =: driverL#;
--Then update the totaltripmade by the driver
UPDATE
DRIVER
SET
totalTripMade = tripsDone
WHERE
L# = driverL#;
END UPDATE_TOTAL_TRIPS_MADE;
/
However, there is compilation error due to not able to query TRIP until the trigger is completed.
So, I tried changing the select count statement to like that:
SELECT COUNT(*) INTO tripsDone
FROM TRIP T
WHERE driverL# =: NEW.L#;
But it did not work too. I am not sure how can I work around to get the total number of rows in the table Trip with the driverL# that triggered the trigger.
For solving this problem, you should probably NOT use a trigger. Just write a query for counting the "trips". Example:
Tables
create table driver (
id number generated always as identity start with 1 primary key
, surname varchar2( 50 )
-- , ttm number -- total trips made: redundant!
) ;
create table trip (
id number generated always as identity start with 1000 primary key
, from_ varchar2( 50 )
, to_ varchar2( 50 )
, driver number references driver( id )
) ;
Test data: DRIVER
insert into driver( surname )
select
'driver_' || to_char( level )
from dual
connect by level <= 10 ;
select * from driver ;
ID SURNAME
1 driver_1
2 driver_2
3 driver_3
4 driver_4
5 driver_5
6 driver_6
7 driver_7
8 driver_8
9 driver_9
10 driver_10
Test data: TRIP
begin
for i in 1 .. 10
loop
insert into trip ( from_, to_, driver ) values (
'departure_' || to_char( i )
, 'arrival_' || to_char( i )
, mod( i, 3 ) + 1
) ;
end loop ;
commit ;
end ;
/
SQL> select * from trip ;
ID FROM_ TO_ DRIVER
1000 departure_1 arrival_1 2
1001 departure_2 arrival_2 3
1002 departure_3 arrival_3 1
1003 departure_4 arrival_4 2
1004 departure_5 arrival_5 3
1005 departure_6 arrival_6 1
1006 departure_7 arrival_7 2
1007 departure_8 arrival_8 3
1008 departure_9 arrival_9 1
1009 departure_10 arrival_10 2
Query: find the "trip count"
select D.id, D.surname, count(*) as trips_made
from driver D
join trip T on D.id = T.driver
group by D.id, D.surname ;
-- result
ID SURNAME TRIPS_MADE
1 driver_1 3
2 driver_2 4
3 driver_3 3
However, if you want to learn about triggers, you will find that a "row level trigger" will give you "mutating table" errors. When using a "table level" trigger, you cannot reference :NEW and :OLD. What you need is a COMPOUND TRIGGER. You can find numerous discussions about this problem. Apart from consulting the Oracle PL/SQL documentation, it may be worth your while looking that the examples written by T Hall eg here, and S Feuerstein eg here.
The dbfiddle here contains the code used for this answer, and a third example table called DRIVERV2 plus examples of the above mentioned trigger types (and typical error messages).
I want to write a PL/SQL function which can be used in a variety of queries, particular the subqueries of a WITH clause. The tricky part is that I want the function to both receive and return a one-column TABLE (or CURSOR) of information.
Details: imagine that this function just sorts a list of employee IDs according to some very complicated criteria. We'd have this:
My function is called SORT_EMPLOYEES
My function takes a 1-column table of employee IDs (emp_id) as input.
This input type is probably a TABLE type EMP_T_TYPE.
This return type is probably also a TABLE type EMP_T_TYPE.
So far so good, I hope.
Now, I think that I can use the output of the function with the TABLE operator pretty much anywhere; e.g.:
WITH
wanted_emps AS (...), -- find employees we want, as table of emp_id
ranked_emps AS (
SELECT rownum() as rank, emp_id
FROM TABLE(SORT_EMPLOYEES(...???...))
),
...
The problem is: how can I get the list of employees from 'wanted_emps' and make it the input of SORT_EMPLOYEES? What goes in the "...???..." above? Is this even possible?
Please note that I want this to be used from plain SQL, especially from subqueries of WITH clauses as shown above -- not from PL/SQL. Thanks!
Here is an example of how to COLLECT values into a collection which can then be passed to a FUNCTION that returns another collection:
Oracle Setup:
CREATE TYPE numbers_table AS TABLE OF NUMBER;
CREATE TABLE test_data ( grp, value ) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 1, 2 FROM DUAL UNION ALL
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 4 FROM DUAL UNION ALL
SELECT 2, 5 FROM DUAL UNION ALL
SELECT 3, 6 FROM DUAL;
Query:
WITH FUNCTION square( i_numbers IN numbers_table ) RETURN numbers_table
IS
p_numbers numbers_table := numbers_table();
p_count PLS_INTEGER;
BEGIN
IF i_numbers IS NULL THEN
p_count := 0;
ELSE
p_count := i_numbers.COUNT;
END IF;
p_numbers.EXTEND( p_count );
FOR i IN 1 .. p_count LOOP
p_numbers(i) := i_numbers(i) * i_numbers(i);
END LOOP;
RETURN p_numbers;
END;
collected_rows ( grp, grouped_values ) AS (
SELECT grp,
CAST(
COLLECT( value ORDER BY value )
AS numbers_table
)
FROM test_data
GROUP BY grp
)
SELECT c.grp,
t.COLUMN_VALUE AS squared_value
FROM collected_rows c
CROSS JOIN
TABLE( square( c.grouped_values ) ) t;
Output:
GRP | SQUARED_VALUE
--: | ------------:
1 | 1
1 | 4
1 | 9
2 | 16
2 | 25
3 | 36
db<>fiddle here
I have two tables tab1 and tab2 each are having two columns acc_num and prod_code. I need to update prod_code in tab2 from tab1. Below is the sample data in both the tables:
TAB1
acnum Prod
-------------------
1 A
2 B
2 C
3 X
3 X
Tab2
acnum Prod
-------------------
1 null
2 null
2 null
3 null
3 null
And for the 2nd table after update all the distinct codes should be concatenated. Below is the sample output.
Tab2
acnum Prod
-------------------
1 A
2 B|C
2 B|C
3 X
3 X
I am able to achieve this through PL/SQL, but it's taking ages to complete. (Actual tables are having millions of records). Below is the code I am using.
DECLARE
l_acnum dbms_sql.varchar2a;
l_prod dbms_sql.varchar2a;
l_prod2 VARCHAR2(10):= NULL;
l_count NUMBER := 0;
CURSOR cr_acnum
IS
SELECT DISTINCT(acnum) FROM tab1;
CURSOR cr_prod(l_acnum_dum IN VARCHAR2)
IS
SELECT prod FROM tab1 WHERE acnum = l_acnum_dum;
BEGIN
OPEN cr_acnum;
FETCH cr_acnum bulk collect INTO l_acnum;
CLOSE cr_acnum;
FOR i IN l_acnum.first .. l_acnum.last
LOOP
OPEN cr_prod(l_acnum(i));
FETCH cr_prod bulk collect INTO l_prod;
CLOSE cr_prod;
FOR m IN l_prod.first .. l_prod.last
LOOP
IF m <> 1 THEN
IF l_prod(m) = l_prod(m-1) THEN
l_prod2 := l_prod(m);
ELSE
l_prod2 := l_prod2||'|'||l_prod(m);
END IF;
ELSE
l_prod2 := l_prod(m);
END IF;
END LOOP;
UPDATE tab2 SET prod = l_prod2 WHERE acnum = l_acnum(i);
END LOOP;
END;
This pl/sql block is taking ages to complete. Is there anyway I can achieve the same through query rather than PL/SQL or may be by efficient PL/SQL. I tried BULK COLLECT also but of no use. Data is in Oracle DB. Thanks a lot for your time.
This will concatenate those values as long as they don't exceed a certain total length. You may also want to do a subquery to dedupe them if there are any dupes.
Update: here is with LISTAGG
update table2 set Prod = (
SELECT LISTAGG(t1.Prod, ', ') WITHIN GROUP (ORDER BY t1.Prod) "Prod"
FROM Table1 t1
where t1.acnum = table2.acnum)
Thanks everyone for your input. I achived the same using your inputs. I used 2 step solution:
Step 1) Create a lookup table for product and account no.
create table lkup_tbl
as SELECT acnum, LISTAGG(Prod, '|') WITHIN GROUP (ORDER BY Prod) product
FROM (select distinct acnum, Prod from tab1) tab
GROUP BY acnum;
Step 2) Now update all the tables by joining this lookup table.
update tab2 t1
set (t1.Prod) = (select product from lkup_tbl t2
where t2.acnum = t1.acnum
);