SQL, Trying to split Finnish addresses - sql

I have address column which hosts Streetname+housenumber(+possible divider)(+possible apartment no.) + postcode + City
5 different examples:
( Street ), (Postal) (City)
"Testalley 3, 00200 Helsinki"
"Testalley 3 A 21, 00200 Helsinki
"TestAlley 3 B, 00300 Helsinki
"TestAlley 3, 00500 Helsinki AS
"testAlley 3 F 22, 00500 Helsinki AS
So, the variation of addresses change quite a bit.
I'll hope to get this big junk of address into 3 separate columns.
SELECT
bigAddress,
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress)+2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode
FROM TABLEXX
^^This works, almost for the postcode.
Only problem is that, if the city is not one part like "HELSINKI" then the city comes along the postcode. Like 00300 Ylistaro (When city is Ylistaro AS)
with cte as (
SELECT
ID,
bigAddress,
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress)+2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode,
RIGHT(bigAddress,CHARINDEX(',', (REVERSE(bigAddress))) - 1) AS City
FROM TableXXX
select
bigAddress,
LEFT(Postcode,5) As PostcodeV2,
STUFF(City, 1, 7, '') AS CityV2
FROM cte
^^
Also this was quite great, it did failed when tried to put this into PowerBi DirectQuery. PowerBI wont support it at DQ mode, and import mode did have some other problems.

What you are trying to do is very risky since it's a well known problem that there is no really proper and safe way to separate street, postal and city from such an entire string. So please note that the following is just an idea to help you, but in future, you should directly save the information in different columns.
Anyway, the following solution will work only with some assumptions. As example, there always must be a comma between the street and the rest. The postal must not contain any not numeric characters and the city must not contain any numeric characters. The idea is to first add four columns to your table:
ALTER TABLE yourtable ADD street varchar(200);
ALTER TABLE yourtable ADD postal varchar(200);
ALTER TABLE yourtable ADD city varchar(200);
ALTER TABLE yourtable ADD prov varchar(600);
The first three columns should be the columns you will in future use to save the information. The prov column will just be used during the data "transformation" and then be removed again.
As first step, you will update the street column with everything before the comma and the prov column with the rest:
UPDATE yourtable SET street = SUBSTRING(bigAddress, 0, charindex(',', bigAddress, 0)),
prov = REPLACE(SUBSTRING(bigAddress,CHARINDEX(',',bigAddress) + 1, LEN(bigAddress)),' ','');
Then you will fill the city column with the entire string which is currently saved in the prov column beginning with the first non numeric character. In other words, you will remove the postal from the city:
UPDATE yourtable SET
city = RIGHT(prov,LEN(prov) - (PATINDEX('%[^0-9]%',prov) -1));
After this, you will remove the city from the prov column to get the postal and save it in the postal column:
UPDATE yourtable SET postal = REPLACE(prov, city,'');
The three columns are now filled correctly (as I said, as long as the required conditions are met), so you can remove the prov column again:
ALTER TABLE yourtable DROP COLUMN prov;
I created an example which shows this is working correctly: db<>fiddle
In future, please don't do such things, but use separate columns.

Considering the postal codes as fixed length of 5 digits, you can make use of CHARINDEX, SUBSTRING, LEFT and RIGHT with some constants to get the data:
CREATE TABLE addresses (
address VARCHAR(50) NOT NULL
);
INSERT INTO addresses (address)
VALUES
('Testalley 3, 00200 Helsinki'),
('Testalley 3 A 21, 00200 Helsinki'),
('TestAlley 3 B, 00300 Helsinki'),
('TestAlley 3, 00500 Helsinki AS'),
('testAlley 3 F 22, 00500 Helsinki AS');
SELECT
LEFT(address, CHARINDEX(',', address) - 1) AS street,
SUBSTRING(address, CHARINDEX(',', address) + 2, 5) AS postcode,
RIGHT(address, LEN(address) - CHARINDEX(',', address) - 7) AS city
FROM addresses;
Results in:
street
postcode
city
Testalley 3
00200
Helsinki
Testalley 3 A 21
00200
Helsinki
TestAlley 3 B
00300
Helsinki
TestAlley 3
00500
Helsinki AS
testAlley 3 F 22
00500
Helsinki AS
You can play with the running demo at https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a9e1469c753158d8e3cd0a4ab08f97ec

Related

Combining various rows in a table based on a condition

need some help to construct a PostgreSQL query. I am trying to combine various rows in a Postgres table based on a certain condition.
Here's what the table looks like
Roll_No
Role
Address Type
Address Value
0538
Home
Address Line 1
123 Main Street
0538
Home
Address Line 2
London
0538
Home
Address Line 3
Rogers Street
0538
Home
Address Line 4
United Kingdom
0538
Office
Address Line 1
Adam Land
0538
Office
Address Line 2
Valley Forge PA 19482
0538
Office
Address Line 3
U.S.A
0738
School
Address Line 1
Rogers Street
0738
School
Address Line 2
London
0738
School
Address Line 3
Holland Lane
0738
School
Address Line 4
United Kingdom
I want to concatenate all address values of a specific role (eg. home, school, office) into one column. Address type can contain values like Address Line 1 to 8. Here, Home has Address Line 1 to 4 whereas office has Address Line 1 to 3.
Roll_No
Role
Address Type
Address Value
0538
Home
Home Address
123 Main Street, London, Rogers Street, United Kingdom
0538
Office
Office Address
Adam Land, Valley Forge PA 19482, U.S.A
0738
School
School Address
Rogers Street, London, Holland Lane, United Kingdom
Use array_agg() function for combining column value with comma. Here ORDER BY clause isn't used because address line 4 wouldn't come before address line 1. Extra ORDER BY clause can degrade query performance for a large data sets
-- PostgreSQL
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(address_value), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check this url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e46cf351452a2715258b69afeea5c742
If ORDER BY must needed inside array_agg() function then use the below query
-- after applying order by inside array_agg()
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(address_value order by address_type), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check the url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=c196a9a2ca71bd886e4750b935d23040
If same address stored for multiple address line of a particular role for specific roll_no then DISTINCT keyword will use inside array_agg().
-- after applying distinct inside array_agg()
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(DISTINCT address_value), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check this url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e4f0de7c37e003133e2539149db245f8
You may use ARRAY_AGG() inside ARRAY_TO_STRING() but with caution. Because generally, SQL tables are unordered sets. Hence explicitly mentioning ORDER BY inside array_agg() is very very important.
Code:
SELECT
Roll_No,
role,
role || ' address' AS address_type,
array_to_string(array_agg(addressvalue ORDER BY roll_no, role, addresstype), ', ') as address_value
FROM t
GROUP BY Roll_No, role
ORDER BY roll_no, role, address_value
Look at the db<>fiddle. Take a look how results vary with and without ORDER BY inside array_agg().

How to get value from data structure in column

I have data in a table and I am using SQL Server as follow:
Number
Value
1
/F10749180509 1/TOYOTA TSUSHO ASIA PACIFIC PTE. L 1/TD. 2/600 NORTH BRIDGE ROAD HEX19 01, P 3/SG/ARKVIEWSQUARE SINGAPORE 188778
2
/0019695051 1/PT ASURANSI ALLIANZ LIFE 1/INDONESIA 2/ALLIANZ TWR 16FL JL.HRRASUNA SAID 3/ID/JAKARTA
As you can see on the table, I need to find Country code from field value. The country code can be found in string after "3/". The example from the first row, I need to get "SG" after 3/ and the second row I need to get "ID" after 3/ and so on. Actually If I copy the first data from value field to notepad, the data separated by new line. The data will be like:
/F10749180509
1/TOYOTA TSUSHO ASIA PACIFIC PTE. L
1/TD.
2/600 NORTH BRIDGE ROAD HEX19 01, P
3/SG/ARKVIEWSQUARE SINGAPORE 188778
Please help to find the query to get country code. Thank you
We might be able to use PATINDEX here along with SUBSTRING. Assuming that the country code would always be exactly two uppercase letters, we can try:
SELECT val, SUBSTRING(val, PATINDEX('% [0-9]/[A-Z][A-Z]/%', val) + 3, 2) AS country_code
FROM yourTable
Demo
You can use CHARINDEX to get the data.
declare #table table(number int, val varchar(8000))
insert into #table
values
(1, '/F10749180509 1/TOYOTA TSUSHO ASIA PACIFIC PTE. L 1/TD. 2/600 NORTH BRIDGE ROAD HEX19 01, P 3/SG/ARKVIEWSQUARE SINGAPORE 188778')
select substring(val,charindex('3/',val,1)+2,2) from #table
SG

How to sort SQL query alphabetically but ignoring leading numbers?

I am unable to find the right query for my problem. I have a table in the db and I need to sort it in a very specific manner - the column I am sorting is an address, and it starts with the number, but I need to sort it ignoring the number.
Here is my data set:
id | address
1 | 23 Bridge road
2 | 14 Kennington street
3 | 7 Bridge road
4 | 12 Oxford street
5 | 9 Bridge road
I need to sort this like:
id | address
1 | 7 Bridge road
2 | 9 Bridge road
3 | 23 Bridge road
4 | 14 Kennington street
5 | 12 Oxford street
So far I got only this:
SELECT id, address
FROM propertySearch
Order by address ASC.
Can anyone help me out on this?
If this will always be that format(leading number, a space and then the address) , then you can do this:
SQL-Server:
SELECT * FROM YourTable t
ORDER BY SUBSTRING(t.address,CHARINDEX(' ',t.address,1),99)
MySQL :
SELECT * FROM YourTable t
ORDER BY SUBSTRING_INDEX(t.address,' ',-1)
If the format is not constant , you can use SQL-Server patindex() :
SELECT * FROM YourTable t
ORDER BY SUBSTRING(t.address,PATINDEX('%[A-z]%',t.address),99)
NOTE: This is bad DB design!! Each value should be properly stored in its own column, E.G STREET , CITY , APARTMANT_NUMBER ETC, becuase if not, they are leading to exactly this.
If you use SQL Server, you can use a combination of PATINDEX and STUFF:
SELECT *, STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, '')
FROM #Table1 AS T
ORDER BY STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, '')
PATINDEX will find first letter index in your string and STUFF is used to trim everything from the beginning to that index.
That's output:
id address No column name)
---------------------------------------------
1 23 Bridge road Bridge road
3 7 Bridge road Bridge road
5 9 Bridge road Bridge road
2 14 Kennington street Kennington street
4 12 Oxford street Oxford street
I also noticed you have different order in your expected output. If that was intented. You need to use ROW_NUMBER:
SELECT ROW_NUMBER() OVER(ORDER BY STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, ''), T.id) AS ID, T.address
FROM #Table1 AS T;
This query will generate new ID for each row.
Result:
id address
------------------------
1 23 Bridge road
2 7 Bridge road
3 9 Bridge road
4 14 Kennington street
5 12 Oxford street
Anyway, this is rather hacky solution.
I'd suggest you to store your address in seperate columns, such as street name, postal code, house number, house letter (optional), town, etc. This will be a much better approach.
I think this kind of operations is more for business layer.
If you load all data to the .net code - sorting will be more easy, more readable and maintainable.
Public Class Address
Public Property Id As Integer
Public Property AddressData As String
'This property can be used for sorting
Public ReadOnly Property SortedKey As String
Get
Dim rawData As IEnumerable(Of String) = Me.AddressData.Split(" "c).Skip(1)
Return String.Join(" ", rawData)
End Get
End Property
End Class
Then use it with LINQ
Dim loaded As List(Of Address) = yourLoadFunction()
Dim sorted = loaded.OrderBy(Function(item) item.SortedKey).ToList()
As you've tagged vb.net, guess you use MS SQL. If you are always separating street number and street name with a blank space, try ordering like this:
ORDER BY RIGHT([address], LEN([address]) - CHARINDEX(' ', [address], 1))
Declare #Table table (id int,address varchar(100))
Insert into #Table values
(1,'23 Bridge road'),
(2,'14 Kennington street'),
(3,'7 Bridge road'),
(4,'12 Oxford street'),
(5,'9 Bridge road')
Select * From #Table
Order By substring(address,patindex('%[a-z]%',address),200)
,cast(substring(address,1,charindex(' ',address)) as int)
Returns
id address
3 7 Bridge road
5 9 Bridge road
1 23 Bridge road
2 14 Kennington street
4 12 Oxford street

Trying to find duplication in records where address is different only in the one field and only by a certain number

I have a table of listings that has NAP fields and I wanted to find duplication within it - specifically where everything is the same except the house number (within 2 or 3 digits).
My table looks something like this:
Name Housenumber Streetname Streettype City State Zip
1 36 Smith St Norwalk CT 6851
2 38 Smith St Norwalk CT 6851
3 1 Kennedy Ave Campbell CA 95008
4 4 Kennedy Ave Campbell CA 95008
I was wondering how to set up a qry to find records like these.
I've tried a few things but can't figure out how to do it - any help would be appreciated.
Thanks
Are you looking to find something that shows the amount of these rows you have like this?
SELECT
StreenName,
City,
State,
Zip,
COUNT(*)
FROM YourTable
group by StreenName, City, State, Zip
HAVING COUNT(*) >1
Or maybe trying to find all of the rows that have the same street, city, state, and zip?
SELECT
A.HouseNumber,
A.StreetName,
A.City,
A.State,
A.Zip
FROM YourTable as A
INNER JOIN YourTable as B
ON A.StreetName = B.StreetName
AND A.City = B.City
AND A.State = B.State
AND A.Zip = B.Zip
AND A.HouseNumber <> B.HouseNumber
Here is one way to do it. You'll need a unique ID for the table to run this, as you wouldn't want to select the exact same person if theyre the only one there. This'll just spit out all the results where there is at least one duplicate.
Edit: Woops, just realized in comments it says varchar for the street number...hmm. So you could just run a cast on it. The OP never said anything about house numbers in varchar or being letters and numbers in the original post. As for letters in the street number field, I've been a third party shipping provider for 2 yrs in the past and I have never seen one; with the exception of an apt., which would be a diff field. Its just as likely that someone put varchar there for some other reason(leading 0's), or for no reason. Of oourse there could be, but no way of knowing whats in the field without response from OP. To run cast to int its the same except this for each instance: Cast(mt.HouseNumber as int)
select *
from MyTable mt
where exists (select 1
from MyTable mt2
where mt.name = mt2.name
and mt.street = mt2.street
and mt.state = mt2.state
and mt.city = mt2.city
and mt2.HouseNumber between (mt.HouseNumber -3) and (mt.HouseNumber +3)
and mt.UID != mt2.UID
)
order by mt.state, mt.city, mt.street
;
Not sure how to run the -3 +3 if there are letters involed...unless you know excatly where they are and you can just simply cut them out then cast.

SQL Server 2008 - separating Address field

I have an address column that contains address, state and postcode. I would like to extract the address, suburb, state, and postcode into separate columns, how can a do this as the length of the address is variable, there is a ^ to separate the address and "other" details. The State can be 2 or 3 characters long and the postcode is always 4 characters long.
PostalAddress TO BE Address Suburb State Postcode
28 Smith Avenue^MOOROOLBARK VIC 3138^ 28 Smith Avenue MOOROOLBARK VIC 3138
16 Farr Street^HEYFIELD VIC 3858^ 16 Farr Street HEYFIELD VIC 3858
17 Terry Road^LOWER PLENTY VIC 3093^ 17 Terry Road LOWER PLENTY VIC 3093
String parsing in SQL is messy and tends to be brittle. I usually think it's best to do these sort of tasks outside of SQL altogether. That said, given the mini-spec above, it is possible to parse the data into the fields you want like so:
select
left(PostalAddress, charindex('^', PostalAddress) - 1) as street_address,
left(second_part, len(second_part) - charindex(' ', reverse(second_part))) as suburb,
right(second_part, charindex(' ', reverse(second_part))) as state,
reverse(substring(reverse(PostalAddress), 2, 4)) as postal_code
from (
select
PostalAddress,
rtrim(reverse(substring(reverse(PostalAddress), 6, len(PostalAddress) - charindex('^', PostalAddress) - 5))) as second_part
from Addresses
) as t1
Note that you'll need so substitute your table name for what I've called addresses in the subquery above.
You can see this in action against your sample data here.
In my case it's just to get a five-numeric from a string as a postcode:
Below is my code:
Select SUBSTRING([Column or string],patindex('%[0-9][0-9][0-9][0-9][0-9]%',[Column or string]),5) AS 'Postcode'