SQL group by similar address but different longitude and latitude - sql

I Have a large Postgres dataset table,
the table ('tbl') has 4 columns,
and a data similar to this:
ID
address
x,y
1
22 E 4th Ave, Cordele, GA, 11015
x1,y1
2
22 E 4th Ave, Cordele, GA 11015
x2,y2
3
408 E 5th Ave, Cordele, CA 11215
x2,y2
4
408 E 5th Ave, Cordele, CA, 11215
x2,y2
5
408 E 5th Ave, vic, VA, 11215
x2,y2
6
408 E 5th Ave, vic, VA, 11215
x3,y3
My question is , how to find all the addresses that have similar address (similar address means ignoring the comma between the state and zip, that's the only part that should be ignored), But having different 'x,y' value
In the above example , id 1 and 2 should be returned because they have the same address ( with a diff in the comma) But different 'x,y' values.
Id 3 and 4 should not be returned because their 'x,y' values are identical.
Id 5 and 6 should not be returned because their address values are identical.
*I can count on the address format to always have a state and a zip

It might be overkill, but can you just remove all commas and compare?
select array_agg(distinct address)
from t
group by replace(address, ',', '')
having min(x_y) <> max(x_y);
To specifically remove that comma, you could instead use:
select array_agg(distinct address)
from t
group by (case when address like '%, _____'
then left(address, -7) || right(address, 6)
else address
end)
having min(x_y) <> max(x_y);

I am not sure how many variations you have in your data, but I was able to get what you want on the sample data provided. I inserted the data in a table named locdat, you could change columns and table as per your need.
SELECT
id, address, xy
FROM
(
SELECT
l.*,
COUNT(l.address)
OVER(PARTITION BY replace(l.address, ',', '')) AS addr_count,
COUNT(l.xy)
OVER(PARTITION BY replace(l.address, ',', ''), l.xy) AS xy_count
FROM
locdat l
)
WHERE
( addr_count >= 1
AND xy_count < 2 );

Related

Split string into 2 columns (Bigquery)

I need to split ADDRESS LINE into two columns - address number and street.
I have tried the following:
Select
REGEXP_EXTRACT_ALL(address_number, r"([0-9]+)"),
REGEXP_EXTRACT(address_street, r"([a-zA-Z]+)")
From table;
and
Select
substr(addressline1, 1, 4) as address_number,
substr(addressline1, 6, 30) as address_street,
From table;
However, none of them seem to be ideal because address line does not have strict structure.
It can be:
Adressline1
9666 Northridge Ct.
P.O. Box 8070
369 Peabody Road
83 Mountain View Blvd
3279 W 46th St
I would say to cut it into two parts - and split it after first space but did not find the right way.
You Can try following code
with data as
(select '9666 Northridge Ct.' as add1 Union all
select 'P.O. Box 8070' as add1 Union all
select '369 Peabody Road' as add1 Union all
select '83 Mountain View Blvd' as add1 Union all
select '3279 W 46th St' as add1 )
SELECT
add1,
REGEXP_EXTRACT_ALL(add1, r'\d+') AS numbers,
REGEXP_REPLACE(add1, r'\d+', '') AS non_numbers
FROM data
output looks like :-

How can I count unique attribute values using two attributes and joining two tables?

I'm a beginner in SQL.
Simplified, I have two tables, districts and streetdistricts, which contain information about city districts and streets. Every district has a unique number dkey and every street has a unique street number stkey (as primary keys respectively).
Here's an example:
Table districts:
dkey
name
1
Inner City
2
Outer City
3
Outskirts
Table streetdistricts:
stkey
dkey
113
1
126
2
148
2
148
3
152
3
154
3
What I want to do now is to find out how many streets are there per district that are located only in one single district. So that means I do not have to just remove duplicates (like street with stkey 148 here), but instead to remove streets that are situated in more than one district completely so that I only see the districts and the number of streets per district that are just located in one district only.
For this example, this would be:
name number_of_street_in_just_this_district
Inner City 1
Outer City 1
Outskirts 2
I've tried many things, but I always get stuck, mostly because when I SELECT the name of the district, it is also needed in GROUP BY as SQL says, but when I add it, then either the whole number of streets (here: 6) or at least the number including the duplicates (here: 5) are displayed, but not the right answer of 3.
Or I'm not able to JOIN the tables correctly so to get the output I want. Here is my last try:
SELECT SUM(StreetDistricts.dkey) as d_number, StreetDistricts.stkey, COUNT(StreetDistricts.stkey) as numb
FROM StreetDistricts
INNER JOIN Districts ON Districts.dkey = StreetDistricts.dkey
GROUP BY StreetDistricts.stkey
HAVING COUNT(StreetDistricts.dkey) = 1
ORDER BY d_number DESC
This works to get me the correct sum of rows, but I was not able to combine/join it with the other table to receive name and number of unique streets.
First obtain the streets that are found in only one district (cte1). Then count just those streets per district. This should do it:
WITH cte1 AS (
SELECT stkey FROM StreetDistricts GROUP BY stkey HAVING COUNT(DISTINCT dkey) = 1
)
SELECT d.name, COUNT(*) AS n
FROM StreetDistricts AS s
JOIN Districts AS d
ON s.dkey = d.dkey
AND s.stkey IN (SELECT stkey FROM cte1)
GROUP BY d.dkey
;
Result:
+------------+---+
| name | n |
+------------+---+
| Inner City | 1 |
| Outer City | 1 |
| Outskirts | 2 |
+------------+---+
Note: I used the fact that dkey is the primary key of Districts to avoid having to GROUP BY d.name as well. This is guaranteed by functional dependence. If your database doesn't guarantee that with a constraint, just add d.name to the final GROUP BY terms.
The test case:
CREATE TABLE Districts (dkey int primary key, name varchar(30));
CREATE TABLE StreetDistricts (stkey int, dkey int);
INSERT INTO Districts VALUES
(1,'Inner City')
, (2,'Outer City')
, (3,'Outskirts')
;
INSERT INTO StreetDistricts VALUES
(113,1)
, (126,2)
, (148,2)
, (148,3)
, (152,3)
, (154,3)
;

How to split comma delimited data from one column into multiple rows

I'm trying to write a query that will have a column show a specific value depending on another comma delimited column. The codes are meant to denote Regular time/overtime/doubletime/ etc. and they come from the previously mentioned comma delimited column. In the original view, there are columns for each of the different hours accrued separately. For the purposes of this, we can say A = regular time, B = doubletime, C = overtime. However, we have many codes that can represent the same type of time.
What my original view looks like:
Employee_FullName
EmpID
Code
Regular Time
Double Time
Overtime
John Doe
123
A,B
7
2
0
Jane Doe
234
B
4
0
1
What my query outputs:
Employee_FullName
EmpID
Code
Hours
John Doe
123
A, B
10
John Doe
123
A, B
5
Jane Doe
234
B
5
What I want the output to look like:
Employee_FullName
EmpID
Code
Hours
John Doe
123
A
10
John Doe
123
B
5
Jane Doe
234
B
5
It looks the way it does in the first table because currently it's only pulling from the regular time column. I've tried using a case switch to have it look for a specific code and then pull the number, but I get a variety of errors no matter how I write it. Here's what my query looks like:
SELECT [Employee_FullName],
SUBSTRING(col, 1, CHARINDEX(' ', col + ' ' ) -1)'Code',
hrsValue
FROM
(
SELECT [Employee_FullName], col, hrsValue
FROM myTable
CROSS APPLY
(
VALUES ([Code],[RegularHours])
) C (COL, hrsValue)
) SRC
Any advice on how to fix it or perspective on what to use is appreciated!
Edit: I cannot change the comma delimited data, it is provided that way. I think a case within a cross apply will solve it but I honestly don't know.
Edit 2: I will be using a unique EmployeeID to identify them. In this case yes A is regular time, B is double time, C is overtime. The complication is that there are a variety of different codes and multiple refer to each type of time. There is never a case where A would refer to regular time for one employee and double time for another, etc. I am on SQL Server 2017. Thank you all for your time!
If you are on SQL Server 2016 or better, you can use OPENJSON() to split up the code values instead of cumbersome string operations:
SELECT t.Employee_FullName,
Code = LTRIM(j.value),
Hours = MAX(CASE j.[key]
WHEN 0 THEN RegularTime
WHEN 1 THEN DoubleTime
WHEN 2 THEN Overtime END)
FROM dbo.MyTable AS t
CROSS APPLY OPENJSON('["' + REPLACE(t.Code,',','","') + '"]') AS j
GROUP BY t.Employee_FullName, LTRIM(j.value);
Example db<>fiddle
You can use the following code to split up the values
Note how NULLIF nulls out the CHARINDEX if it returns 0
The second half of the second APPLY is conditional on that null
SELECT
t.[Employee_FullName],
Code = TRIM(v2.Code),
v2.Hours
FROM myTable t
CROSS APPLY (VALUES( NULLIF(CHARINDEX(',', t.Code), 0) )) v1(comma)
CROSS APPLY (
SELECT Code = ISNULL(LEFT(t.Code, v1.comma - 1), t.Code), Hours = t.RegularTime
UNION ALL
SELECT SUBSTRING(t.Code, v1.comma + 1, LEN(t.Code)), t.DoubleTime
WHERE v1.comma IS NOT NULL
) v2;
db<>fiddle
You can go for CROSS APPLY based approach as given below.
Thanks to #Chalieface for the insert script.
CREATE TABLE mytable (
"Employee_FullName" VARCHAR(8),
"Code" VARCHAR(3),
"RegularTime" INTEGER,
"DoubleTime" INTEGER,
"Overtime" INTEGER
);
INSERT INTO mytable
("Employee_FullName", "Code", "RegularTime", "DoubleTime", "Overtime")
VALUES
('John Doe', 'A,B', '10', '5', '0'),
('Jane Doe', 'B', '5', '0', '0');
SELECT
t.[Employee_FullName],
c.Code,
CASE WHEN c.code = 'A' THEN t.RegularTime
WHEN c.code = 'B' THEN t.DoubleTime
WHEN c.code = 'C' THEN t.Overtime
END AS Hours
FROM myTable t
CROSS APPLY (select value from string_split(t.code,',')
) c(code)
Employee_FullName
Code
Hours
John Doe
A
10
John Doe
B
5
Jane Doe
B
0

SQL using between syntax to validate alpha numeric values

I have 2 tables, Table1 has address ranges, Table2 has exact addresses. All fields in both tables are char. I need to identify the addresses in Table2 that fall between the house number ranges in Table1. Please see below for example. I have a join on Street, City, State and HSE# between LOW and HIGH. The results bring back both the 1201 - 1214 and 101 - 126 TABLE1 records. Casting the values as integers doesn't work because some addresses contain alpha characters....101B as the house number for example. Can you help determine the best, most accurate way to accomplish this?
Table1
LOW HIGH STREET CITY STATE
101 126 A ST MYCITY MYSTATE
1201 1214 A ST MYCITY MYSTATE
TABLE2
HSE# STREET CITY STATE
1203 A ST MYCITY MYSTATE
SELECT *
FROM TABLE1 A,
TABLE2 B
WHERE B.STATE = A.STATE
AND B.CITY = A.CITY
AND B.STREET = A.STREET
AND B.HSE# BETWEEN A.LOW AND A.HIGH;
This brings back both the TABLE1 records in the example below. The expect result is that I only get the TABLE1 value with a LOW/HIGH range of 1201/1214 as this is truly the range the house number falls in.
Try this as is:
SELECT A.LOW, A.HIGH, B.HSE#
FROM
(
VALUES
('101B', '126')
, ('1201', '1214')
) A (LOW, HIGH)
JOIN (VALUES '1203') B(HSE#)
ON
INT(NULLIF(REGEXP_REPLACE(B.HSE#, '[^0-9]', ''), ''))
BETWEEN
INT(NULLIF(REGEXP_REPLACE(A.LOW, '[^0-9]', ''), ''))
AND
INT(NULLIF(REGEXP_REPLACE(A.HIGH, '[^0-9]', ''), ''));
The result is:
|LOW |HIGH|HSE#|
|----|----|----|
|1201|1214|1203|

How to select unique row records in SQL

I have a table with columns number, name, address & contact, i want to display unique in a row like say
number , name, address, contact
1 , joy, Elgin , Obere Str. 57
2 , saf, Berlin ,Obere Str. 57
3 , andy, Berlin, Avda. de la ConstituciĆ³n 2222
3 , rin, Berlin ,Mataderos 2312
Display like this
number , name, address, contact
1 , joy, Elgin , Obere Str. 57
2 , saf, Berlin ,Obere Str. 57
3 , andy, Berlin, Avda. de la ConstituciĆ³n 2222
How ?
Most databases support the ANSI-standard row_number() function. You can do this as:
select t.*
from (select t.*,
row_number() over (partition by number order by number) as seqnum
from table t
) t
where seqnum = 1;
Note: this chooses an arbitrary row from matching numbers, which seems entirely consistent with the phrasing of the question.
select min(number),name,address, contact from tableName where status='p' group by number
select distinct number , name, address, contact
order by id;