How to get the duplicate row count in postgres sql - sql

In the table data is like this
st_no st_name directions others
1210 6th street Northwest
1210 8th ST NW NULL
Need output like (1210,1210) (6th street ,8th st) Northwest
Any help you can provide is greatly appreciated. And, in case it comes into play, I'm using a Postgresql DB.

try like this
select string_agg(st_no,',') as st_no,
string_agg(st_name,',') as st_name,
string_agg(directions,',') as directions,
string_agg(others,',') as others from table1 group by st_no

Related

How to get value from data structure in column

I have data in a table and I am using SQL Server as follow:
Number
Value
1
/F10749180509 1/TOYOTA TSUSHO ASIA PACIFIC PTE. L 1/TD. 2/600 NORTH BRIDGE ROAD HEX19 01, P 3/SG/ARKVIEWSQUARE SINGAPORE 188778
2
/0019695051 1/PT ASURANSI ALLIANZ LIFE 1/INDONESIA 2/ALLIANZ TWR 16FL JL.HRRASUNA SAID 3/ID/JAKARTA
As you can see on the table, I need to find Country code from field value. The country code can be found in string after "3/". The example from the first row, I need to get "SG" after 3/ and the second row I need to get "ID" after 3/ and so on. Actually If I copy the first data from value field to notepad, the data separated by new line. The data will be like:
/F10749180509
1/TOYOTA TSUSHO ASIA PACIFIC PTE. L
1/TD.
2/600 NORTH BRIDGE ROAD HEX19 01, P
3/SG/ARKVIEWSQUARE SINGAPORE 188778
Please help to find the query to get country code. Thank you
We might be able to use PATINDEX here along with SUBSTRING. Assuming that the country code would always be exactly two uppercase letters, we can try:
SELECT val, SUBSTRING(val, PATINDEX('% [0-9]/[A-Z][A-Z]/%', val) + 3, 2) AS country_code
FROM yourTable
Demo
You can use CHARINDEX to get the data.
declare #table table(number int, val varchar(8000))
insert into #table
values
(1, '/F10749180509 1/TOYOTA TSUSHO ASIA PACIFIC PTE. L 1/TD. 2/600 NORTH BRIDGE ROAD HEX19 01, P 3/SG/ARKVIEWSQUARE SINGAPORE 188778')
select substring(val,charindex('3/',val,1)+2,2) from #table
SG

Select longest string for each user

I have a table like this :
Clients Cities
1 NY
1 NY | WDC | LA
1 NY | WDC
2 LA
So, I have duplicate clients with different cities (not in order, but with different length at each line). What I want is to display for each user the longest cities string. So, I should get something like this :
Clients Cities
1 NY | WDC | LA
2 LA
I am a beginner in SQL (I use Spark SQL but it's mainly the same thing), so can you please how can I fix this problem please ??
Thanks !
You can use max():
select client, max(cities)
from t
group by client;
Then you should fix your data model, so you are not storing lists of cities in a string. That is not a good way to store the data in a relational database.
I think you should handle that query (in MYSQL) by using SELECT DISTINCT statement,
As inside a table contains many duplicate values, I hope it will make it work!
For instance,
SELECT DISTINCT city_name FROM cities;
And continue.... this is my hint to lead you to the desired and great answer

Count number of rows that have a specific word in a varchar (in postgresql)

I have a table similar to the below:
id | name | direction |
--------------------------------------
1 Jhon Washington, DC
2 Diego Miami, Florida
3 Michael Orlando, Florida
4 Jenny Olympia, washington
5 Joe Austin, Texas
6 Barack Denver, Colorado
and I want to count how many people live in a specific state:
Washington 2
Florida 2
Texas 1
Colorado 1
How can I do this? (By the way this is just an question with an academic point of view )
Thanks in advance!
Postgres offers the function split_part(), which will break up a string by a delimiter. You want the second part (the part after the comma):
select split_part(direction, ', ', 2) as state, count(*)
from t
group by split_part(direction, ', ', 2);
Initially I would obtain the state from the direction field. Once you have that, it's quite simple:
SELECT state, count(*) as total FROM initial_table group by state.
To obtain the state, some functions depending on the dbms are useful. It depends on the language.
A possible pseudocode (given a function like substring_index of MySQL) for the query would be:
SELECT substring_index(direction,',',-1) as state, count(*) as total
FROM initial_table group by substring_index(direction,',',-1)
Edit: As it is suggested above, the query should return 1 for the Washington state.
My way do making such a queries is two-step - first, prepare fields you need, second, do you grouping or other calculation. That way you're following DRY principle and don't repeating yourself. I think CTE is the best tool for this:
with cte as (
-- we don't need other fields, only state
select
split_part(direction, ', ', 2) as state
from table1
)
select state, count(*)
from cte
group by state
sql fiddle demo
If you writing queries that way, it's easy to change grouping field in the future.
Hope that helps, and remember - readability counts! :)

Compare two addresses which are not in standard format

I have to compare addresses from two tables and get the Id if the address matches.
Each table has three columns Houseno, street, state
The address are not in standard format in either of the tables. There are approx. 50,000 rows, I need to scan through
At some places its Ave. Avenue Ave . Str Street, ST. Lane Ln. Place PL Cir CIRCLE.
Any combination with a dot or comma or spaces ,hypen.
I was thinking of combining all three What can be best way to do it in SQL or PLSQL for example
table1
HNO STR State
----- ----- -----
12 6th Ave NY
10 3rd Aven SD
12-11 Fouth St NJ
11 sixth Lane NY
A23 Main Parkway NY
A-21 124 th Str. VA
table2
id HNO STR state
-- ----- ----- -----
1 12 6 Ave. NY
13 10 3 Avenue SD
15 1121 Fouth Street NJ
33 23 9th Lane NY
24 X23 Main Cir. NY
34 A1 124th Street VA
There is no simple way to achieve what you want. There is a expensive software (google for "address standardization software") that can do this but rarely 100% automatic.
What this type of software does is to take the data, use complex heuristics to try to figure out the "official" address and then return that (sometimes with the confidence that the result is correct, sometimes a list of results sorted by confidence).
For a small percentage of the data, the software will simply not work and you'll have to fix that yourself.
Oracle has a built in package UTL_Match which has an edit_distance function (based on the Levenshtein algorithm, this is a measure of how many changes you would need to make to make one string the same as another). More info about this Package / Function can be found here: http://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm
You would need to make some decisions around whether to compare each column or concatenate and then compare and what a reasonable threshold is. For example, you may want to do a manual check on any with an edit distance of less than 8 on the concatenated values.
Let me know if you want any help with the syntax, the edit_distance function just takes 2 varchar2 args (the strings you want to compare) and returns a number.
This is not a perfect solution in that if you set the threshold high you will have a lot of manual checking to do to discard some, and if you set it too low you will miss some matches, but it may be about the best if you want a relatively simple solution.
The way we did this for one of our applications was to use a third party adddress normalization API(eg:Pitney Bowes),normalize each address(Address is a combination of Street Address,City ,State and Zip) and create a T-sql hash for that address.For the adress to compare do the same thing and compare the two hashes and if they match,we have a match
you can make a cursor where you do first a group by where house number and city =.
in a loop
you can separate a row with instr e substr considering chr(32).
After that you can try to consider to make a confront with substring where you have a number 6 = 6th , other case street = str.
good luck!

Need help in access query

I have doubt in access query...
Pls advise is it possible
i have linked the excel file into access,its has some number of columns ..My ques is
for E.g
To retrieve the description and Region column from laptop,Desktop table
i do use below query
SELECT Laptop.[Description], Laptop.[Region] From Laptop
union SELECT Desktop.[Description], Desktop.[Region] From Desktop
sometimes ..It may not contain Region field, in that time I do use “ ” as Laptop.[Region] or "" as Desktop.[Region]
My quest is
Is there any option like this
SELECT Laptop.[Description], If Laptop.[Region]=avairable
then Laptop.[Region] else “” as [Region] from Laptop;
or any way to skip from error...
Please help me in this ...THx in advance
Doubt:
To be clear
If desktop table has description and region Column ..
Description Region
Saran east
Sathish north
sathy west
And
Laptop has Desktop table has description and Cost …
Description Cost
asdf 23
dkasfjasd 34
flkasdf 55
Select Laptop.[Description], NZ(Laptop.[Region], "NA") as [Region]
from Laptop
UNION
SELECT Desktop.[Description], NZ(Desktop.[Region], "NA") as [Region]
FROM Desktop;
Will it return this result ?
I can’t run this because I had some access issue
Description Region
asdf
dkasfjasd
flkasdf
Saran east
Sathish north
sathy west
I'm assuming you ment in your pseudo-code that '=avairable' means a value exists. You just want to handle a null value.
Select Laptop.Description, NZ(Laptop.Region, "") as [Region] from Laptop;
The NZ() function will handle the null values and substitute whatever you want.
you can use switch case for this query but in mS-acess its not supported but the other way out of doing it in access is use of iif() here I am giving you a generic example you can easily convert this in your actual query.
IIf(expr, truepart, falsepart)
SELECT IIF(IsNull(Laptop.[Region])," ",Laptop.[Region]) as region
FROM Laptop ;