concat two strings totalling >4000 characters - sql

I am trying to combine two columns from one table (A) into on column in a second table (B). The column in table B is set as CLOB to hold the large amount of data, however when I run the script I get a message that the result of my string concatenation is too long.
INSERT INTO berkshire.tim_rebuttal_prod(
,INVEST_OR_UWG
)
SELECT
,(INVEST_OR_UWG || INVEST_OR_UWG2)
FROM berkshire.stage_rebuttal

select rtrim(xmlagg(xmlelement(e,id,',').extract('//text()') order by id).GetClobVal(),',')
from (select level as id from dual connect by level < 1050)

Related

Redshift - Extract value matching a condition in Array

I have a Redshift table with the following column
How can I extract the value starting by cat_ from this column please (there is only one for each row and at different position in the array)?
I want to get those results:
cat_incident
cat_feature_missing
cat_duplicated_request
Thanks!
There is no easy way to extract multiple values from within one column in SQL (or at least not in the SQL used by Redshift).
You could write a User-Defined Function (UDF) that returns a string containing those values, separated by newlines. Whether this is acceptable depends on what you wish to do with the output (eg JOIN against it).
Another option is to pre-process the data before it is loaded into Redshift, to put this information in a separate one-to-many table, with each value in its own row. It would then be trivial to return this information.
You can do this using tally table (table with numbers). Check this link on information how to create this table: http://www.sqlservercentral.com/articles/T-SQL/62867/
Here is example how you would use it. In real life you should replace temporary #tally table with a permanent one.
--create sample table with data
create table #a (tags varchar(500));
insert into #a
select 'blah,cat_incident,mcr_close_ticket'
union
select 'blah-blah,cat_feature_missing,cat_duplicated_request';
--create tally table
create table #tally(n int);
insert into #tally
select 1
union select 2
union select 3
union select 4
union select 5
;
--get tags
select * from
(
select TRIM(SPLIT_PART(a.tags, ',', t.n)) AS single_tag
from #tally t
inner join #a a ON t.n <= REGEXP_COUNT(a.tags, ',') + 1 and n<1000
)
where single_tag like 'cat%'
;
Thanks!
In the end I managed to do it with the following query:
SELECT SUBSTRING(SUBSTRING(tags, charindex('cat_', tags), len(tags)), 0, charindex(',', SUBSTRING(tags, charindex('cat_', tags), len(tags)))) tags
FROM table

Hive - getting the column names count of a table

How can I get the hive column count names using HQL? I know we can use the describe.tablename to get the names of columns. How do we get the count?
create table mytable(i int,str string,dt date, ai array<int>,strct struct<k:int,j:int>);
select count(*)
from (select transform ('')
using 'hive -e "desc mytable"'
as col_name,data_type,comment
) t
;
5
Some additional playing around:
create table mytable (id int,first_name string,last_name string);
insert into mytable values (1,'Dudu',null);
select size(array(*)) from mytable limit 1;
This is not bulletproof since not all combinations of columns types can be combined into an array.
It also requires that the table will contain at least 1 row.
Here is a more complex but also stronger solution (types versa), but also requires that the table will contain at least 1 row
select size(str_to_map(val)) from (select transform (struct(*)) using 'sed -r "s/.(.*)./\1/' as val from mytable) t;

Average count of elements in a set in hive?

I have two columns id and segment. Segment is comma separated set of strings. I need to find average number of segments in all the table. One way to do it is by using two separate queries -
A - select count(*) from table_name;
B - select count(*) from table_name LATERAL VIEW explode(split(segment, ',') lTable AS singleSegment where segment != ""
avg = B/A
Answer would be 8/4 = 2 in the above case.
Is there a better way to achieve this ?
Try:
select sum(CASE segment
WHEN '' THEN 0
ELSE size(split(segment,','))
END
)*1.0/count(*) from table_name;
If your id field is unique, and you want to add a filter to the segment part, or protect against other malformed segment values like a,b, and a,,b, you could do:
SELECT SUM(seg_size)*1.0/count(*) FROM (
SELECT count(*) as seg_size from table_name
LATERAL VIEW explode(split(segment, ',')) lTable AS singleSegment
WHERE trim(singleSegment) != ""
GROUP BY id
) sizes
Then you can add other stuff into the where clause.
But this query takes two Hive jobs to run, compared to one for the simpler query, and requires the id field to be unique.

Oracle Compare data between two different table

I have two table one is having all field VARCHAR2 but other having different type for different data.
For Example :
Table One
==========================
Col 1 VARCHAR2 UNIQUE KEY
Col 2 VARCHAR2
Col 3 VARCHAR2
===========================
Table Two
==========================
Col One VARCHAR2 UNIQUE KEY
Col Two TIMESTAMP
Col Three NUMBER
==========================
we are having one mapping table. it denotes which column of Table One has to compare with which column of Table Two.
For Example
Mapping Table
==============================
Table One Table Two
==============================
Col 1 Col One
Col 2 Col Three
Col 3 Col Two
==============================
Now with the help of UNIQUE KEY of TABLE ONE we have to find same row in TABLE TWO and compare rows column by column and get changes in data.
Currently we are using java program for comparing data row by row and column by column and getting changes between data in rows with same UNIQUE KEY. it is working fine but taking too much time as we are having 100000 records in DB.
Now my question is : is there any way i can compare data at SQL level and get changes in data?
You can do it 'manually' with a query like this: It's a lot of work, but there are only three different types of checks you need to do, so it's not very complex:
select
*
from
Table1 t1
full outer join Table2 t2 on t2.ID = t1.ID
where
-- Check ID, either record does not exist in either table.
t1.ID is null or
t2.ID = null or
-- Not nullable field can be easily compared.
t1.NotNullableField1 <> t2.NotNUllableField1 or
-- Nullable field is slightly more work.
t1.NullableField1 <> t2.NullableField1 or
(t1.NullableField1 is null and t2.NullableField1 is not null) or
(t1.NullableField1 is not null and t2.NullableField1 is null)
Another solution is to use MINUS, which is a bit like UNION, only it returns a dataset minus the records in a second dataset:
select * from Table1 t1
MINUS
select * from Table2 t2
This works only one way (which might be fine for your purpose), but you can also combine it with UNION to make it bidirectional.
select
*
from
( select * from Table1
MINUS
select * from Table2)
UNION ALL
( select * from Table2
MINUS
select * from Table1)
The output of both solutions is a bit different.
In the FULL OUTER JOIN query, the IDs will be joined and the values of the matching rows will be displayed next to each other as a single row.
In the MINUS query, the result will be presented as a single dataset. If a record does not exist in either one table, it will be displayed. If a record (ID) exists in both tables, but other fields are different, you will get both rows. So it's a bit harder to compare them.
See: http://www.techonthenet.com/oracle/minus.php

Oracle Select numbers from an IN clause

I'm looking for the best way to select numbers directly from an in clause.
Basically like:
SELECT * FROM (2,6,1,8);
That doesn't work. I can do it this way:
SELECT Lv FROM ( SELECT Level LV
FROM DUAL
CONNECT BY Level < 20)
WHERE Lv IN (2,6,1,8);
But that seems to be a bit clunky. Is there a more elegant way?
You can do
select column_value from table(sys.dbms_debug_vc2coll(1,2,3,4,5));
but that actually returns a varchar2. You can create your own TYPE and use that
create type tab_num is table of number;
/
select column_value from table(tab_num(1,2,3,4,5));
It's also worth looking at the MODEL clause. It looks complicated, but it is very good at generating data
SELECT x from dual
MODEL DIMENSION BY (1 AS z) MEASURES (1 x)
RULES ITERATE (7) (x[ITERATION_NUMBER]=ITERATION_NUMBER+1)
It's more elegant if you materialize an auxiliary numbers table:
SELECT num FROM numbers WHERE num IN (2,6,1,8);
And this is also useful when combined with another table.
For instance, I've had a case where I needed to populate large configuration tables with changes from piecewise results:
Big SP or Excel sheet or report identifies missing cost centers in config gives a large set of results which need to be inserted with varying data in some groups.
Paste partial results into a individual comma separated lists:
INSERT INTO {stuff}
SELECT {stuff}, 130 as line_item
FROM numbers
WHERE numbers.num IN ({pasted a section of results})
INSERT INTO {stuff}
SELECT {stuff}, 135 as line_item
FROM numbers
WHERE numbers.num IN ({pasted another section of results})
If you don't explicitly need the IN clause, you could use UNION:
select 2 from dual
union
select 6 from dual
union
select 1 from dual
union
select 8 from dual
There is a more elegant variant to INSERT multiple rows into a table:
INSERT ALL
INTO table (col) VALUES ('a')
INTO table (col) VALUES ('b')
INTO table (col) VALUES ('c')
SELECT * FROM dual;
But I don't know a way to do that for a SELECT.