Combining various rows in a table based on a condition - sql

need some help to construct a PostgreSQL query. I am trying to combine various rows in a Postgres table based on a certain condition.
Here's what the table looks like
Roll_No
Role
Address Type
Address Value
0538
Home
Address Line 1
123 Main Street
0538
Home
Address Line 2
London
0538
Home
Address Line 3
Rogers Street
0538
Home
Address Line 4
United Kingdom
0538
Office
Address Line 1
Adam Land
0538
Office
Address Line 2
Valley Forge PA 19482
0538
Office
Address Line 3
U.S.A
0738
School
Address Line 1
Rogers Street
0738
School
Address Line 2
London
0738
School
Address Line 3
Holland Lane
0738
School
Address Line 4
United Kingdom
I want to concatenate all address values of a specific role (eg. home, school, office) into one column. Address type can contain values like Address Line 1 to 8. Here, Home has Address Line 1 to 4 whereas office has Address Line 1 to 3.
Roll_No
Role
Address Type
Address Value
0538
Home
Home Address
123 Main Street, London, Rogers Street, United Kingdom
0538
Office
Office Address
Adam Land, Valley Forge PA 19482, U.S.A
0738
School
School Address
Rogers Street, London, Holland Lane, United Kingdom

Use array_agg() function for combining column value with comma. Here ORDER BY clause isn't used because address line 4 wouldn't come before address line 1. Extra ORDER BY clause can degrade query performance for a large data sets
-- PostgreSQL
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(address_value), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check this url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e46cf351452a2715258b69afeea5c742
If ORDER BY must needed inside array_agg() function then use the below query
-- after applying order by inside array_agg()
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(address_value order by address_type), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check the url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=c196a9a2ca71bd886e4750b935d23040
If same address stored for multiple address line of a particular role for specific roll_no then DISTINCT keyword will use inside array_agg().
-- after applying distinct inside array_agg()
SELECT roll_no, role
, role || ' Address' address_type
, array_to_string(array_agg(DISTINCT address_value), ', ') address_value
FROM test
GROUP BY roll_no, role
ORDER BY roll_no, role;
Please check this url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e4f0de7c37e003133e2539149db245f8

You may use ARRAY_AGG() inside ARRAY_TO_STRING() but with caution. Because generally, SQL tables are unordered sets. Hence explicitly mentioning ORDER BY inside array_agg() is very very important.
Code:
SELECT
Roll_No,
role,
role || ' address' AS address_type,
array_to_string(array_agg(addressvalue ORDER BY roll_no, role, addresstype), ', ') as address_value
FROM t
GROUP BY Roll_No, role
ORDER BY roll_no, role, address_value
Look at the db<>fiddle. Take a look how results vary with and without ORDER BY inside array_agg().

Related

SQL, Trying to split Finnish addresses

I have address column which hosts Streetname+housenumber(+possible divider)(+possible apartment no.) + postcode + City
5 different examples:
( Street ), (Postal) (City)
"Testalley 3, 00200 Helsinki"
"Testalley 3 A 21, 00200 Helsinki
"TestAlley 3 B, 00300 Helsinki
"TestAlley 3, 00500 Helsinki AS
"testAlley 3 F 22, 00500 Helsinki AS
So, the variation of addresses change quite a bit.
I'll hope to get this big junk of address into 3 separate columns.
SELECT
bigAddress,
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress)+2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode
FROM TABLEXX
^^This works, almost for the postcode.
Only problem is that, if the city is not one part like "HELSINKI" then the city comes along the postcode. Like 00300 Ylistaro (When city is Ylistaro AS)
with cte as (
SELECT
ID,
bigAddress,
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress)+2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode,
RIGHT(bigAddress,CHARINDEX(',', (REVERSE(bigAddress))) - 1) AS City
FROM TableXXX
select
bigAddress,
LEFT(Postcode,5) As PostcodeV2,
STUFF(City, 1, 7, '') AS CityV2
FROM cte
^^
Also this was quite great, it did failed when tried to put this into PowerBi DirectQuery. PowerBI wont support it at DQ mode, and import mode did have some other problems.
What you are trying to do is very risky since it's a well known problem that there is no really proper and safe way to separate street, postal and city from such an entire string. So please note that the following is just an idea to help you, but in future, you should directly save the information in different columns.
Anyway, the following solution will work only with some assumptions. As example, there always must be a comma between the street and the rest. The postal must not contain any not numeric characters and the city must not contain any numeric characters. The idea is to first add four columns to your table:
ALTER TABLE yourtable ADD street varchar(200);
ALTER TABLE yourtable ADD postal varchar(200);
ALTER TABLE yourtable ADD city varchar(200);
ALTER TABLE yourtable ADD prov varchar(600);
The first three columns should be the columns you will in future use to save the information. The prov column will just be used during the data "transformation" and then be removed again.
As first step, you will update the street column with everything before the comma and the prov column with the rest:
UPDATE yourtable SET street = SUBSTRING(bigAddress, 0, charindex(',', bigAddress, 0)),
prov = REPLACE(SUBSTRING(bigAddress,CHARINDEX(',',bigAddress) + 1, LEN(bigAddress)),' ','');
Then you will fill the city column with the entire string which is currently saved in the prov column beginning with the first non numeric character. In other words, you will remove the postal from the city:
UPDATE yourtable SET
city = RIGHT(prov,LEN(prov) - (PATINDEX('%[^0-9]%',prov) -1));
After this, you will remove the city from the prov column to get the postal and save it in the postal column:
UPDATE yourtable SET postal = REPLACE(prov, city,'');
The three columns are now filled correctly (as I said, as long as the required conditions are met), so you can remove the prov column again:
ALTER TABLE yourtable DROP COLUMN prov;
I created an example which shows this is working correctly: db<>fiddle
In future, please don't do such things, but use separate columns.
Considering the postal codes as fixed length of 5 digits, you can make use of CHARINDEX, SUBSTRING, LEFT and RIGHT with some constants to get the data:
CREATE TABLE addresses (
address VARCHAR(50) NOT NULL
);
INSERT INTO addresses (address)
VALUES
('Testalley 3, 00200 Helsinki'),
('Testalley 3 A 21, 00200 Helsinki'),
('TestAlley 3 B, 00300 Helsinki'),
('TestAlley 3, 00500 Helsinki AS'),
('testAlley 3 F 22, 00500 Helsinki AS');
SELECT
LEFT(address, CHARINDEX(',', address) - 1) AS street,
SUBSTRING(address, CHARINDEX(',', address) + 2, 5) AS postcode,
RIGHT(address, LEN(address) - CHARINDEX(',', address) - 7) AS city
FROM addresses;
Results in:
street
postcode
city
Testalley 3
00200
Helsinki
Testalley 3 A 21
00200
Helsinki
TestAlley 3 B
00300
Helsinki
TestAlley 3
00500
Helsinki AS
testAlley 3 F 22
00500
Helsinki AS
You can play with the running demo at https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a9e1469c753158d8e3cd0a4ab08f97ec

How to get the differences between two rows **and** the name of the field where the difference is, in BigQuery?

I have a table in BigQuery like this:
Name
Phone Number
Address
John
123456778564
1 Penny Lane
John
873452987424
1 Penny Lane
Mary
845704562848
87 5th Avenue
Mary
845704562848
54 Lincoln Rd.
Amy
342847327234
4 Ocean Drive Avenue
Amy
347907387469
98 Truman Rd.
I want to get a table with the differences between two consecutive rows and the name of the field where occurs the difference:
I mean this:
Name
Field
Before
After
John
Phone Number
123456778564
873452987424
Mary
Address
87 5th Avenue
54 Lincoln Rd.
Amy
Phone Number
342847327234
347907387469
Amy
Address
4 Ocean Drive Avenue
98 Truman Rd.
How can I do this ? I've looked on other posts but couldn't find something that corresponds to my need.
Thank you
Consider below BigQuery'ish solution
select Name, ['Phone Number', 'Address'][offset(offset)] Field,
prev_field as Before, field as After
from (
select timestamp, Name, offset, field,
lag(field) over (partition by Name, offset order by timestamp) as prev_field
from yourtable,
unnest([`Phone Number`, Address]) field with offset
)
where prev_field != field
if applied to sample data in your question - output is
As you can see here - no matter how many columns in your table that you need to compare - it is still just one query - no unions and such.
You just need to enumerate your columns in two places
['Phone Number', 'Address'][offset(offset)] Field
and
unnest([`Phone Number`, Address]) field with offset
Note: you can further refactor above using scripting's execute immediate to compose such lists within the query on the fly (check my other answers - I frequently use such technique in them)
One method is just use to use lag() and union all
select name, 'phone', prev_phone as before, phone as after
from (select name, phone,
lag(phone) over (partition by name order by timestamp) as prev_phone
from t
) t
where prev_phone <> phone
union all
select name, 'address', prev_address as before, address as afte4r
from (select name, address,
lag(address) over (partition by name order by timestamp) as prev_address
from t
) t
where prev_address <> address

SQL Match City Name Inside Full Address?

How would you list the people from a database that are not from 'London'?
Say the database is:
Cust_id address
1 33 avenue, Liverpool
2 21 street 12345, London
3 469 connection ave, Manchester
I'd like to list the customers that are NOT from London. Here's what I've tried:
select Cust_id from customers where address <> 'London';
Now when I do that, it lists all the customers, regardless of location.
Help would be greatly appericated.
Not ideal but might satisfy your requirements:
select Cust_id from customers
where address NOT LIKE '% London%';
[Note the added space: it assumes you will always precede the city name with a space. '%London%' would match words containing London]
(It might be better if you had a normalised address, i.e. broken into street address, town, city, etc.))
Try this:
select Cust_id from customers where address not like '%London%';
or this:
select Cust_id from customers where not address like '%London%';
Both of these are OK.
For more details on LIKE see e.g. here: SQL LIKE

SQL Server 2008 - separating Address field

I have an address column that contains address, state and postcode. I would like to extract the address, suburb, state, and postcode into separate columns, how can a do this as the length of the address is variable, there is a ^ to separate the address and "other" details. The State can be 2 or 3 characters long and the postcode is always 4 characters long.
PostalAddress TO BE Address Suburb State Postcode
28 Smith Avenue^MOOROOLBARK VIC 3138^ 28 Smith Avenue MOOROOLBARK VIC 3138
16 Farr Street^HEYFIELD VIC 3858^ 16 Farr Street HEYFIELD VIC 3858
17 Terry Road^LOWER PLENTY VIC 3093^ 17 Terry Road LOWER PLENTY VIC 3093
String parsing in SQL is messy and tends to be brittle. I usually think it's best to do these sort of tasks outside of SQL altogether. That said, given the mini-spec above, it is possible to parse the data into the fields you want like so:
select
left(PostalAddress, charindex('^', PostalAddress) - 1) as street_address,
left(second_part, len(second_part) - charindex(' ', reverse(second_part))) as suburb,
right(second_part, charindex(' ', reverse(second_part))) as state,
reverse(substring(reverse(PostalAddress), 2, 4)) as postal_code
from (
select
PostalAddress,
rtrim(reverse(substring(reverse(PostalAddress), 6, len(PostalAddress) - charindex('^', PostalAddress) - 5))) as second_part
from Addresses
) as t1
Note that you'll need so substitute your table name for what I've called addresses in the subquery above.
You can see this in action against your sample data here.
In my case it's just to get a five-numeric from a string as a postcode:
Below is my code:
Select SUBSTRING([Column or string],patindex('%[0-9][0-9][0-9][0-9][0-9]%',[Column or string]),5) AS 'Postcode'

How do I do a DISTINCT and ORDER BY in PostgreSQL?

PostgreSQL is about to make me punch small animals. I'm doing the following SQL statement for MySQL to get a listing of city/state/countries that are unique.
SELECT DISTINCT city
, state
, country
FROM events
WHERE (city > '')
AND (number_id = 123)
ORDER BY occured_at ASC
But doing that makes PostgreSQL throw this error:
PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
But if I add occured_at to the SELECT, then it kills of getting back the unique list.
Results using MySQL and first query:
BEDFORD PARK IL US
ADDISON IL US
HOUSTON TX US
Results if I add occured_at to the SELECT:
BEDFORD PARK IL US 2009-11-02 19:10:00
BEDFORD PARK IL US 2009-11-02 21:40:00
ADDISON IL US 2009-11-02 22:37:00
ADDISON IL US 2009-11-03 00:22:00
ADDISON IL US 2009-11-03 01:35:00
HOUSTON TX US 2009-11-03 01:36:00
The first set of results is what I'm ultimately trying to get with PostgreSQL.
Well, how would you expect Postgres to determine which occured_at value to use in creating the sort order?
I don't know Postgres syntax particularly, but you could try:
SELECT DISTINCT city, state, country, MAX(occured_at)
FROM events
WHERE (city > '') AND (number_id = 123) ORDER BY MAX(occured_at) ASC
or
SELECT city, state, country, MAX(occured_at)
FROM events
WHERE (city > '') AND (number_id = 123)
GROUP BY city, state, country ORDER BY MAX(occured_at) ASC
That's assuming you want the results ordered by the MOST RECENT occurrence. If you want the first occurrence, change MAX to MIN.
Incidentally, your title asks about GROUP BY, but your syntax specifies DISTINCT.