I need to split ADDRESS LINE into two columns - address number and street.
I have tried the following:
Select
REGEXP_EXTRACT_ALL(address_number, r"([0-9]+)"),
REGEXP_EXTRACT(address_street, r"([a-zA-Z]+)")
From table;
and
Select
substr(addressline1, 1, 4) as address_number,
substr(addressline1, 6, 30) as address_street,
From table;
However, none of them seem to be ideal because address line does not have strict structure.
It can be:
Adressline1
9666 Northridge Ct.
P.O. Box 8070
369 Peabody Road
83 Mountain View Blvd
3279 W 46th St
I would say to cut it into two parts - and split it after first space but did not find the right way.
You Can try following code
with data as
(select '9666 Northridge Ct.' as add1 Union all
select 'P.O. Box 8070' as add1 Union all
select '369 Peabody Road' as add1 Union all
select '83 Mountain View Blvd' as add1 Union all
select '3279 W 46th St' as add1 )
SELECT
add1,
REGEXP_EXTRACT_ALL(add1, r'\d+') AS numbers,
REGEXP_REPLACE(add1, r'\d+', '') AS non_numbers
FROM data
output looks like :-
Related
I Have a large Postgres dataset table,
the table ('tbl') has 4 columns,
and a data similar to this:
ID
address
x,y
1
22 E 4th Ave, Cordele, GA, 11015
x1,y1
2
22 E 4th Ave, Cordele, GA 11015
x2,y2
3
408 E 5th Ave, Cordele, CA 11215
x2,y2
4
408 E 5th Ave, Cordele, CA, 11215
x2,y2
5
408 E 5th Ave, vic, VA, 11215
x2,y2
6
408 E 5th Ave, vic, VA, 11215
x3,y3
My question is , how to find all the addresses that have similar address (similar address means ignoring the comma between the state and zip, that's the only part that should be ignored), But having different 'x,y' value
In the above example , id 1 and 2 should be returned because they have the same address ( with a diff in the comma) But different 'x,y' values.
Id 3 and 4 should not be returned because their 'x,y' values are identical.
Id 5 and 6 should not be returned because their address values are identical.
*I can count on the address format to always have a state and a zip
It might be overkill, but can you just remove all commas and compare?
select array_agg(distinct address)
from t
group by replace(address, ',', '')
having min(x_y) <> max(x_y);
To specifically remove that comma, you could instead use:
select array_agg(distinct address)
from t
group by (case when address like '%, _____'
then left(address, -7) || right(address, 6)
else address
end)
having min(x_y) <> max(x_y);
I am not sure how many variations you have in your data, but I was able to get what you want on the sample data provided. I inserted the data in a table named locdat, you could change columns and table as per your need.
SELECT
id, address, xy
FROM
(
SELECT
l.*,
COUNT(l.address)
OVER(PARTITION BY replace(l.address, ',', '')) AS addr_count,
COUNT(l.xy)
OVER(PARTITION BY replace(l.address, ',', ''), l.xy) AS xy_count
FROM
locdat l
)
WHERE
( addr_count >= 1
AND xy_count < 2 );
I would like to write an sql where in the table the first character of a row be grouped with different first letters of some rows and then that group is named.let's say that is a list of students. and I want students whose first name start with a particular letter to be put in a specific group. If there first name starts with A,B or C then they are put in a group and that group will be named 'Junior'; if their first name starts with D, E or F then they are put in a group that will be named 'Senior'. e.g.
KATE
JANE
MARY
NICOLE
ROBIN
A-C = Junior
D-F = Senior
G-I = Teacher
You don't need a group by clause, what you are asking is a simple DECODE or CASE expression.
Demo:
with data as
(
select 'ANGELINA' name from dual union all
select 'DAVID' from dual union all
select 'IAN' from dual union all
select 'NICOLE' from dual union all
select 'ROBIN' from dual
)
-- Your query starts here
select name,
case
when substr(name, 1, 1) in ('A','B','C')
then 'Junior'
when substr(name, 1, 1) in ('D','E','F')
then 'Senior'
when substr(name, 1, 1) in ('G','H','I')
then 'Teacher'
end as letter
from data;
NAME LETTER
-------- -------
ANGELINA Junior
DAVID Senior
IAN Teacher
NICOLE
ROBIN
The with data clause is only to build the sample data as you have not provided any. In your actual query, use your table name instead of data. Remove everything before the comment "-- Your query starts here".
I have column X with values like:
619 19th St S, Oslo, AL 3522310, Spain
4538 S Harvard Ave, Roma, OK 74135, Germany
Golaa, CA , USA
Piri, SO, Italy
And I would like to filter only those, where I see no number in column so the outcome of the query should be:
Golaa, CA , USA
Piri, SO, Italy
I would use a regular expression, but I think this is simpler:
SELECT *
FROM yourTable
WHERE NOT REGEXP_LIKE(x, '[0-9]');
You can also do this without regular expressions:
WHERE x = TRANSLATE(x, 'a0123456789', 'a')
You can use Regular Expressions for pattern matching in Oracle.
SELECT
*
FROM
yourTable
WHERE
NOT REGEXP_LIKE(x, '[0-9]+')
This will exclude any rows that have one or more numeric digits in column x.
with s as (
select '619 19th St S, Oslo, AL 3522310, Spain' str from dual union all
select '4538 S Harvard Ave, Roma, OK 74135, Germany' from dual union all
select 'Golaa, CA , USA' from dual union all
select 'Piri, SO, Italy' from dual)
select *
from s
where str = translate(str, 'z1234567890', 'z');
STR
------------------------------
Golaa, CA , USA
Piri, SO, Italy
I have some addresses not in proper format. I want to add a space if there is none. Example shown below
Input Expected output
----- ----------------
AVEX AVE X or AVENUE X
AVE X AVE X or AVENUE X
AVENUEX AVENUE X or AVE X
AVENUE X AVENUE X or AVE X
AVEOFCITY AVE OF CITY or AVENUE OF CITY
I created a below expression ,but it is not giving correct result for all cases, especially the AVENUE breaks into AVE NUE
SELECT REGEXP_REPLACE('AVENUEN','^(AVE(NUE)*?)(\w)','\1 \3') rep FROM dual;
This will get you a little closer. Just tweaked your regex a little to allow for an optional 'NUE' and handle 0 or more spaces after.
with tbl(id, str) as (
select 1, 'AVEX' from dual union all
select 2, 'AVE X' from dual union all
select 3, 'AVENUEX' from dual union all
select 4, 'AVENUE X' from dual union all
select 5, 'AVEOFCITY' from dual
)
SELECT
id,
REGEXP_REPLACE(str,'^(AVE(NUE)?) *?(\w)','\1 \3') rep
FROM tbl;
You may need another pass to handle the 'OFCITY' as who knows what could come after the AVENUE that you have to allow for.
I have a column which stores multiple comma separated values. I need to split it in a way so that it gets split into as many rows as values in that column along with remaining values in that row.
eg:
John 111 2Jan
Sam 222,333 3Jan
Jame 444,555,666 2Jan
Jen 777 4Jan
Output:
John 111 2Jan
Sam 222 3Jan
Sam 333 3Jan
Jame 444 2Jan
Jame 555 2Jan
Jame 666 2Jan
Jen 777 4Jan
P.S : I have seen multiple questions similar to this, but could not find a way to split in such a way.
This solution is built on Vertica, but it works for every database that offers a function corresponding to SPLIT_PART().
Part of it corresponds to the un-pivoting technique that works with every ANSI compliant database platform that I explain here (just the un-pivoting part of the script):
Pivot sql convert rows to columns
So I would do it like here below. I'm assuming that the minimalistic date representation is part of the second column of a two-column input table. So I'm first splitting that short date literal away, in a first Common Table Expression (and, in a comment, I list that CTE's output), before splitting the comma separated list into tokens.
Here goes:
WITH
-- input
input(name,the_string) AS (
SELECT 'John', '111 2Jan'
UNION ALL SELECT 'Sam' , '222,333 3Jan'
UNION ALL SELECT 'Jame', '444,555,666 2Jan'
UNION ALL SELECT 'Jen' , '777 4Jan'
)
,
-- put the strange date literal into a separate column
the_list_and_the_date(name,list,datestub) AS (
SELECT
name
, SPLIT_PART(the_string,' ',1)
, SPLIT_PART(the_string,' ',2)
FROM input
)
-- debug
-- SELECT * FROM the_list_and_the_date;
-- name|list |datestub
-- John|111 |2Jan
-- Sam |222,333 |3Jan
-- Jame|444,555,666|2Jan
-- Jen |777 |4Jan
,
-- ten integers (too many for this example) to use as pivoting value and as "index"
ten_ints(idx) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
)
-- the final query - pivoting prepared input using a CROSS JOIN with ten_ints
-- and filter out where the SPLIT_PART() expression evaluates to the empty string
SELECT
name
, SPLIT_PART(list,',',idx) AS token
, datestub
FROM the_list_and_the_date
CROSS JOIN ten_ints
WHERE SPLIT_PART(list,',',idx) <> ''
;
name|token|datestub
John|111 |2Jan
Jame|444 |2Jan
Jame|555 |2Jan
Jame|666 |2Jan
Sam |222 |3Jan
Sam |333 |3Jan
Jen |777 |4Jan
Happy playing ...
Marco the Sane