How to compare insert and select with STRING_SPLIT? - sql

I want to compare the insert part and select part orders of lots of procedures,
for example, the procedure name is 'update_example', insert part is 'stichtag, mandant, code, id'
and the select part is '#stichtag, s1.mandant as mandant, s2,id as id, s3.code as code', I want to write all the information in a table, if I can find stichtag in #stichtag, then Y, otherwise N.
update_example stichtag #stichtag Y
update_example mandant s1.mandant as mandant Y
update_example code s2,id as id N
update_example id s3.code as code N
I write a statement like
select 'update_example',
value from STRING_SPLIT ('stichtag, mandant, code, id
', ','),
value from STRING_SPLIT (' #stichtag, s1.mandant as mandant, s2,id as id, s3.code as code
', ',')
but I can not use from 2 times.

Related

Redshift comma separated string to column [duplicate]

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:
I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell...
I would like to see
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | stop
1 | Shone | cancell
....
A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.
Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.
Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:
select
(row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;
If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:
select
n::int
into numbers
from
(select
row_number() over (order by true) as n
from cmd_logs)
cross join
(select
max(regexp_count(user_action, '[,]')) as max_num
from cmd_logs)
where
n <= max_num + 1;
Once there is a numbers table, we can do:
select
user_id,
user_name,
split_part(user_action,',',n) as parsed_action
from
cmd_logs
cross join
numbers
where
split_part(user_action,',',n) is not null
and split_part(user_action,',',n) != '';
Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:
... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced
... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action
Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.
If you know that there are not many actions in your user_action column, you use recursive sub-querying with union all and therefore avoiding the aux numbers table.
But it requires you to know the number of actions for each user, either adjust initial table or make a view or a temporary table for it.
Data preparation
Assuming you have something like this as a table:
create temporary table actions
(
user_id varchar,
user_name varchar,
user_action varchar
);
I'll insert some values in it:
insert into actions
values (1, 'Shone', 'start,stop,cancel'),
(2, 'Gregory', 'find,diagnose,taunt'),
(3, 'Robot', 'kill,destroy');
Here's an additional table with temporary count
create temporary table actions_with_counts
(
id varchar,
name varchar,
num_actions integer,
actions varchar
);
insert into actions_with_counts (
select user_id,
user_name,
regexp_count(user_action, ',') + 1 as num_actions,
user_action
from actions
);
This would be our "input table" and it looks just as you expected
select * from actions_with_counts;
id
name
num_actions
actions
2
Gregory
3
find,diagnose,taunt
3
Robot
2
kill,destroy
1
Shone
3
start,stop,cancel
Again, you can adjust initial table and therefore skipping adding counts as a separate table.
Sub-query to flatten the actions
Here's the unnesting query:
with recursive tmp (user_id, user_name, idx, user_action) as
(
select id,
name,
1 as idx,
split_part(actions, ',', 1) as user_action
from actions_with_counts
union all
select user_id,
user_name,
idx + 1 as idx,
split_part(actions, ',', idx + 1)
from actions_with_counts
join tmp on actions_with_counts.id = tmp.user_id
where idx < num_actions
)
select user_id, user_name, user_action as parsed_action
from tmp
order by user_id;
This will create a new row for each action, and the output would look like this:
user_id
user_name
parsed_action
1
Shone
start
1
Shone
stop
1
Shone
cancel
2
Gregory
find
2
Gregory
diagnose
2
Gregory
taunt
3
Robot
kill
3
Robot
destroy
Here are two ways to achieve this.
In my example, I'm assuming that I am accepting a comma separated list of values. My values look like schema.table.column.
The first involves using a recursive CTE.
drop table if exists #dep_tbl;
create table #dep_tbl as
select 'schema.foobar.insert_ts,schema.baz.load_ts' as dep
;
with recursive tmp (level, dep_split, to_split) as
(
select 1 as level
, split_part(dep, ',', 1) as dep_split
, regexp_count(dep, ',') as to_split
from #dep_tbl
union all
select tmp.level + 1 as level
, split_part(a.dep, ',', tmp.level + 1) as dep_split_u
, tmp.to_split
from #dep_tbl a
inner join tmp on tmp.dep_split is not null
and tmp.level <= tmp.to_split
)
select dep_split from tmp;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
The second involves a stored procedure.
CREATE OR REPLACE PROCEDURE so_test(dependencies_csv varchar(max))
LANGUAGE plpgsql
AS $$
DECLARE
dependencies_csv_vals varchar(max);
BEGIN
drop table if exists #dep_holder;
create table #dep_holder
(
avoid varchar(60000)
);
IF dependencies_csv is not null THEN
dependencies_csv_vals:='('||replace(quote_literal(regexp_replace(dependencies_csv,'\\s','')),',', '\'),(\'') ||')';
execute 'insert into #dep_holder values '||dependencies_csv_vals||';';
END IF;
END;
$$
;
call so_test('schema.foobar.insert_ts,schema.baz.load_ts')
select
*
from
#dep_holder;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
in conclusion
If you only care about one single column in your input (the X delimited values), then I think the stored procedure is easier/faster.
However, if you have other columns you care about and want to keep those columns along with your comma separated value column now transformed to rows, OR, if you want to know the argument (original list of delimited values), I think the stored procedure is the way to go. In that case, you can just add those other columns to your columns selected in the recursive query.
You can get the expected result with the following query. I'm using "UNION ALL" to convert a column to row.
select user_id, user_name, split_part(user_action,',',1) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',2) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',3) as parsed_action from cmd_logs
Here's my equally-terrible answer.
I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg
event_id | user_ids
1 | 5,18,25,99,105
In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.
SELECT e.event_id, u.id as user_id
FROM events e
LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.
Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.
Like I said: a terrible, no-good solution.
Late to the party but I got something working (albeit very slow though)
with nums as (select n::int n
from
(select
row_number() over (order by true) as n
from table_with_enough_rows_to_cover_range)
cross join
(select
max(json_array_length(json_column)) as max_num
from table_with_json_column )
where
n <= max_num + 1)
select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
from nums, table_with_json_column
where json_extract_array_element_text(json_column,nums.n-1) != ''
and nums.n <= json_array_length(json_column)
Thanks to answer by Bob Baxley for inspiration
Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306
Is generating numbers table using the following SQL
https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482
SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
INTO numbers
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
ORDER BY 1
LIMIT 100
"ORDER BY" is there only in case you want paste it without the INTO clause and see the results
create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.
here is the magic code:-
CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
AS $$
DECLARE
cnt INTEGER := 1;
no_of_parts INTEGER := (select REGEXP_COUNT ( string , ',' ));
sql VARCHAR(MAX) := '';
item character varying := '';
BEGIN
-- Create table
sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
RAISE NOTICE 'executing sql %', sql ;
EXECUTE sql;
<<simple_loop_exit_continue>>
LOOP
item = (select split_part("string",',',cnt));
RAISE NOTICE 'item %', item ;
sql := 'INSERT INTO split_table SELECT '''||item||''' ';
EXECUTE sql;
cnt = cnt + 1;
EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
END LOOP;
END ;
$$ LANGUAGE plpgsql;
Usage example:-
call public.sp_string_split('john,smith,jones');
select *
from split_table
You can try copy command to copy your file into redshift tables
copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
You can use delimiter ',' option.
For more details of copy command options you can visit this page
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Alternate approach to WITH CTE and large UNION query

I'd like to rework a script I've been given.
The way it currently works is via a WITH CTE using a large number of UNIONs.
Current setup
We're taking one record from a source table, inserting it into a destination table once with [Name] A then inserting it again with [Name] B. Essentially creating multiple rows in the destination, albeit with different [Name].
An example of one transaction would be to take this row from [Source]:
ID [123] Name [Red and Green]
The results of my current set up in the [Destination] is:
ID [123] Name [Red]
ID [123] Name [Green]
Current logic
Here's a simplified version of the current logic:
WITH CTE
AS
(SELECT ID,
'Red' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green'
UNION ALL
SELECT ID,
'Green' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green')
INSERT INTO [Destination_Table]
(ID,
[Name])
SELECT ID,
[Name]
FROM CTE;
The reason I'd like to rework this is when we get a new [Name], we have to manually add another portion of code into our (ever increasing) UNION, to make sure it gets picked up.
What I've considered
What I was considering was setting up a WHILE LOOP (or CURSOR) running off a control table, where we could store all of the [Names]. However, I'm not sure if this would be the best approach and I'm not too familiar yet with LOOPS/CURSORS. Also, Wouldn't be too sure of how to stop the loop once all [Name]s had been completed.
Any help much appreciated.
You can use cross apply to duplicate the rows:
insert into [destination_table] (id, name)
select x.*
from source_table s
cross apply (values (id, 'Red'), (id, 'Green')) x(id, name)
where name = 'Red and Green'
Introduce a new table called Color_List which just contains one row for each possible color. Then do this:
with cte as
(
select
st.ID,
c.colorname
from
Source_Table s
inner join
Color_List c
on
CHARINDEX(c.colorname, s.[Name]) > 0
)
insert into Destination_Table
(
ID,
[Name]
)
select
ID,
colorname
from
cte
The benefit of this method is that you aren't hard-coding any color names in the query. All the color names (and presumably there can be many more than two) get maintained in the Color_List table.
You could use string_split to split the values apart. First replace the ' and ' with a pipe '|'. Then do a string split on the vertical pipe.
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '[123]', 'Name', '[Red and Green]')) V(ID, testCol, nameCol, stringCol);
select ID, testCol, nameCol,
case when left([value], 1)!='[' then concat('[',[value]) else
case when right([value], 1)!=']' then concat([value], ']') else [value] end end valCol
from #tTEST t
cross apply string_split(replace(t.stringCol, ' and ', '|'), '|');
Results
ID testCol nameCol valCol
1 [123] Name [Red]
1 [123] Name [Green]

Append values from 2 different columns in SQL

I have the following table
I need to get the following output as "SVGFRAMXPOSLSVG" from the 2 columns.
Is it possible to get this appended values from 2 columns
Please try this.
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tblName
FOR XML PATH('')
), 1, 0, '')
For Example:-
Declare #tbl Table(
id INT ,
DEPART_AIRPORT_CODE Varchar(50),
ARRIVE_AIRPORT_CODE Varchar(50),
value varchar(50)
)
INSERT INTO #tbl VALUES(1,'g1','g2',NULL)
INSERT INTO #tbl VALUES(2,'g2','g3',NULL)
INSERT INTO #tbl VALUES(3,'g3','g1',NULL)
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tbl
FOR XML PATH('')
), 1, 0, '')
Summary
Use Analytic functions and listagg to get the job done.
Detail
Create two lists of code_id and code values. Match the code_id values for the same airport codes (passengers depart from the same airport they just arrived at). Using lag and lead to grab values from other rows. NULLs will exist for code_id at the start and end of the itinerary. Default the first NULL to 0, and the last NULL to be the previous code_id plus 1. A list of codes will be produced, with a matching index. Merge the lists together and remove duplicates by using a union. Finally use listagg with no delimiter to aggregate the rows onto a string value.
with codes as
(
select
nvl(lag(t1.id) over (order by t1.id),0) as code_id,
t1.depart_airport_code as code
from table1 t1
union
select
nvl(lead(t1.id) over (order by t1.id)-1,lag(t1.id) over (order by t1.id)+1) as code_id,
t1.arrive_airport_code as code
from table1 t1
)
select
listagg(c.code,'') WITHIN GROUP (ORDER BY c.code_id) as result
from codes c;
Note: This solution does rely on an integer id field being available. Otherwise the analytic functions wouldn't have a column to sort by. If id doesn't exist, then you would need to manufacture one based on another column, such as a timestamp or another identifier that ensures the rows are in the correct order.
Use row_number() over (order by myorderedidentifier) as id in a subquery or view to achieve this. Don't use rownum. It could give you unpredictable results. Without an ORDER BY clause, there is no guarantee that the same query will return the same results each time.
Output
| RESULT |
|-----------------|
| SVGFRAMXPOSLSVG |

How to have select distinct of all but last column , and last column as comma separated values

I have a table with following structure. It has 8 columns where 1st 7 some time contain duplicate and last one can have different values (no unique though).
How do I select distinct for 1st 7 columns and then show the last column as comma separated values.
So the last column should look like following,
Select the distinct 7 values in a sub-query and then perform the XML STUFF to consolidate the values
Please note the "--Add your other fields here"
Example
Select A.*
,CollectionDate = Stuff((Select Distinct ',' +cast(CollectionDate as varchar(max))
From YourTable
Where Quantity=A.Quantity
and Protein =A.Protein
and Carb =A.Carb
-- Add your other fields here
For XML Path ('')),1,1,'')
From (Select Distinct
Quantity
,Protein
,Carb
-- Add your other fields here
From YourTable ) A

Concatenate comma delimited column values are remove duplicates

I need to concatenate comma delimited column values stored in an oracle table. I will also need to remove duplicated values when I concatenate the column values. I'm new to oracle and not sure where to start. Can someone help me achieve the following in oracle 11g?
Table:
rec_id affiliations
1 P,QE,D
2 EE,ED-D
1 QE,PO-D, D
2 A,EE
Desired output:
rec_id affiliations
1 P,QE,D,PO-D
2 EE,ED-D,A,EE
The first part of this query parses the input into a separate row for each affiliation; the final select concatenates them into a single list for each rec_id.
with parsed as (
select distinct
rec_id
,ltrim(regexp_substr(','||affiliations,',([^,])+',1,i), ',') k
from t, (select rownum i from dual connect by level <= 100)
where regexp_substr(','||affiliations,',([^,])+',1,i) is not null)
select distinct
rec_id
,listagg(k, ',') within group (order by k) over (partition by rec_id) affiliations
from parsed
order by rec_id;
Adjust the number (e.g. 100) to the maximum number of items you'd expect to see in the input.
http://sqlfiddle.com/#!4/37b44/4