I am having data in clob column as below:
:1A:CAD22021828,17
:1B:RECEIVE GENERAL IND
11 BEGUM ST 3-15A2
VILL AP IND 313 416
:1C:/000061071257 CC
RECEIVER GENERAL FOR IND
C/O PNBB MAIN BRANCH
11 BEGUM ST 3-15A2
AA HYD APIND
Now my requirement is to load this into 3 separate columns in target table as below:
1A - CAD22021828,17
1B - RECEIVE GENERAL IND
11 BEGUM ST 3-15A2
VILL AP IND 313 416
1C - /000061071257 CC
RECEIVER GENERAL FOR IND
C/O PNBB MAIN BRANCH
11 BEGUM ST 3-15A2
AA HYD APIND
can someone suggest how I can do this.
This is oracle 11.2
I have tried below code ;
SELECT
REGEXP_SUBSTR(mc_clob,':1A:([[:alnum:]]+\S+)') AS code1A,
REGEXP_SUBSTR(mc_clob,':1B:([[:alnum:]]+\s+)') AS code1B,
REGEXP_SUBSTR(mc_clob,':1C:([[:alnum:]]+\s+)') AS code1c
FROM tableA;
Here is one way to do this using REGEXP_SUBSTR with capture groups:
SELECT
REGEXP_SUBSTR(mc_clob, ':1A:(.*):1B:', 1, 1, 'n', 1) AS code1A,
REGEXP_SUBSTR(mc_clob,':1B:(.*):1C:', 1, 1, 'n', 1) AS code1B,
REGEXP_SUBSTR(mc_clob,':1C:(.*)', 1, 1, 'n', 1) AS code1c
FROM tableA;
Demo
To understand how this works, take the first call to REGEXP_SUBSTR:
REGEXP_SUBSTR(mc_clob, ':1A:(.*):1B:', 1, 1, 'n', 1)
This says to match :1A:(.*):1B:, capturing all content between the :1A: and :1B: markers. The fifth parameter is n, which tells Oracle to let dot match across newlines. That is, (.*) will capture all content between the two markers, including across lines. The sixth parameter is 1, which means that the return value will be the first (and only) capture group. Similar logic applies to the second and third call to REGEXP_SUBSTR.
Related
Please see the SQLFiddle example:
http://sqlfiddle.com/#!4/abd6d/1
here are a few example address:
MINNEAPOLIS MN 55450
MINNAPOLIS MN 55439-8136
BETHANY OK 73008
Hillsboro Oregon 97124
Not all of them are separated by spaces, but enough that I I think that is the method I want to approach.
running Oracle 11g
update:
this was how it was accomplished:
select bill_address4, Substr(bill_address4, 1, Instr(bill_address4,
',') - 1) "CITY EXMP ONE",
regexp_substr(bill_address4,'[^,]+', 1, 1) "CITY EXMP TWO",
Trim(regexp_substr(bill_address4,'[^,]+', 1, 2)) "STATE/ZIP",
TRIM(regexp_substr(Trim(regexp_substr(bill_address4,'[^,]+', 1,
2)),'[^ ]+', 1, 1)) "STATE",
TRIM(TRIM(regexp_substr(Trim(regexp_substr(bill_address4,'[^,]+',
1, 2)),'[^ ]+',1,2))||'
'||TRIM(regexp_substr(Trim(regexp_substr(bill_address4,'[^,]+', 1,
2)),'[^ ]+',1,3))||'
'||TRIM(regexp_substr(Trim(regexp_substr(bill_address4,'[^,]+', 1,
2)),'[^ ]+',1,4))) "ZIP" from so_header
I do not think this is easily feasible by sql. You will need to rearrange the raw data and the table schema by adding more columns.
The appraoch I recommand is string manipulation by using other programming language, for example, C#.
See if such an approach helps.
TEMP CTE finds position of the first digit in citystatezip column
zip: that's the substring that starts from the zip_position
state: nested functions
substr selects everything up to the zip_position (e.g. "SNOHOMISH WA ")
trim removes trailing spaces
regexp_substr extracts the last word from that substring (e.g. "WA")
city: substring from the 1st character, up to position of the second space character starting from the back of the string (see instr's parameters)
For sample data you posted (LA added), that would look as follows:
SQL> with temp as
2 (select p.*,
3 regexp_instr(citystatezip, '\d') zip_position
4 from po_header p
5 )
6 select t.po_number, t.customer, t.citystatezip,
7 substr(t.citystatezip, t.zip_position) zip,
8 regexp_substr(trim(substr(t.citystatezip, 1, t.zip_position - 1)), '\w+$') state,
9 trim(substr(t.citystatezip, 1, instr(t.citystatezip, ' ', -1, 2))) city
10 from temp t;
PO_NUMBER CUSTOME CITYSTATEZIP ZIP STATE CITY
---------- ------- ------------------------------ ---------- ---------- -----------
1 John SNOHOMISH WA 98290 98290 WA SNOHOMISH
2 Jen MINNAPOLIS MN 55439-8136 55439-8136 MN MINNAPOLIS
3 Jillian BETHANY OK 73008 73008 OK BETHANY
4 Jordan Hillsboro Oregon 97124 97124 Oregon Hillsboro
5 Scott Los Angeles CA 12345 12345 CA Los Angeles
SQL>
Is it perfect? Certainly not, but the final solution depends on much more sample data. Generally speaking, data model is just wrong - you should have split those information into separate columns in the first place.
My data will look like below
Journey Table
SERNR
TYPE
123
null
456
null
789
null
Segment Table
SERNR
Sgmnt
FROM-Station
TO-Station
123
01
A
B
123
02
B
C
123
03
C
B
123
04
B
A
456
01
A
B
456
02
B
C
456
03
C
D
456
04
D
A
789
04
A
B
I want to join these two data frames/tables and have check on the journey station FROM and TO to decide a journey type, i.e if its return journey some type A if its mirror return some type B, if its a one-way journey some type C
type calculation will be as follows
lets say for journey SERNR 123, the journey details are A->B , B->C, C->B,B->A, this is a mirror journey, because its A-B- C then C-B- A.
for 789 its A->B so its a normal journey .
for 456 its A-> B, B->C , C->D , D-A, in short A-B-C then C-D-A , this is a return but not a mirror
I really don't know how to do a comparison of rows in Dataframe based on SERNR to decide the type by checking FROM and To station of the same SERNR
Really appreciate if I can get a pointer to go ahead and implement the same.
You can collect the list of FROM TO journeys into an array column for each SERNR, then join the array elements to get a journey_path (A-B-C...).
When you get the journey path for each journey, you can use when expression to determine the TYPE:
If first FROM != last TO then it's normal
else : if the reverse of the journey_path == the journey_path the mirror otherwise it's a return
Note that you need to use a Window to keep the order of the segment when grouping and collecting the list of FROM - TOs.
import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("SERNR").orderBy("Sgmnt").rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
val result = segment_df.select(
col("SERNR"),
array_join(
collect_list(concat_ws("-", col("FROM"), col("TO"))).over(w),
"-"
).alias("journey_path")
).dropDuplicates(Seq("SERNR")).withColumn(
"TYPE",
when(
substring(col("journey_path"), 0, 1) =!= substring(col("journey_path"), -1, 1),
"normal"
).otherwise(
when(
reverse(col("journey_path")) === col("journey_path"),
"mirror"
).otherwise("return")
)
)
.drop("journey_path")
result.show
//+-----+------+
//|SERNR| TYPE|
//+-----+------+
//| 789|normal|
//| 456|return|
//| 123|mirror|
//+-----+------+
Use cllect_list of from_ station or to_station by grouping it with SERNR and order with segment
I have a table with DDD and Phone fields. Some were registered correctly, others the ddd is next to the phone and I need to separate.
my table:
Modified table:
I am starting my studies in HIVEQL, how can I make this change?
Use regexp_extract(str, regex, group_number) to extract ddd and telefone. Demo:
with mytable as (--test data
select stack(3,'5566997000000','5521997000001','24997000011') as str
)
select regexp_extract(str,'^(?:55)?(\\d{2})(\\d+)',1) as ddd,
regexp_extract(str,'^(?:55)?(\\d{2})(\\d+)',2) as telefone
from mytable
Result:
ddd telefone
66 997000000
21 997000001
24 997000011
Regexp '^(?:55)?(\\d{2})(\\d+)' meaning:
^ - beginning of the string anchor
(?:55)? - non-capturing group with 55 country code zero or one time (optional)
(\\d{2}) - capturing group with two digits - ddd
(\\d+) - capturing group with 1+ digits - telefone
I've imported data into SQL developer and I need to separate data from one column into a new column. For example I have:
Temp_Title
Congo (1995)
Nadja (1993)
I need to remove the year from the title into a new column named temp_year. I was told that I can use "Parse" but I'm not sure where to start. Any help is greatly appreciated.
You didn't specify database you use; also, you mentioned "SQL Developer" (designed by Oracle) but tagged the question with "plsqldeveloper" tag so - actual query might depend on certain info you didn't share with us.
Anyway, to get you started, here's an example (based on Oracle) which uses two options:
the first one uses traditional SUBSTR + INSTR combination
another one uses regular expressions
.
SQL> with test (temp_title) as
2 (select 'Congo (1995)' from dual union all
3 select 'Nadja (1993)' from dual union all
4 select 'Back to the future (1985)' from dual
5 )
6 select
7 substr(temp_title, 1, instr(temp_title, '(') - 1) title,
8 substr(temp_title, instr(temp_title, '(') + 1, 4) year,
9 --
10 regexp_substr(temp_title, '(\w| )+') title2,
11 rtrim(regexp_substr(temp_title, '\d+\)$'), ')') year2
12 from test;
TITLE YEAR TITLE2 YEAR2
-------------------- ---------------- -------------------- -----
Congo 1995 Congo 1995
Nadja 1993 Nadja 1993
Back to the future 1985 Back to the future 1985
SQL>
I am using SQL Server 2008 R2
I have a table ITEM:
NO_ITEM LABEL
121_54_7 aaaaaa
32_5 jjjjjj
6 88888
9987_54_4 oooooo
What I want:
NO_ITEM LABEL
121 aaaaaa
32 jjjjjj
6 88888
9987 oooooo
Just select the first data by omitting the rest after _.
Sure, you could do something like this:
SELECT SUBSTRING(NO_ITEM, 1, CHARINDEX('_', NO_ITEM + '_')) AS NO_ITEM,
LABEL,
FROM table