SQL Combine null rows with non null - sql

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?

This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

Related

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

Select longest string for each user

I have a table like this :
Clients Cities
1 NY
1 NY | WDC | LA
1 NY | WDC
2 LA
So, I have duplicate clients with different cities (not in order, but with different length at each line). What I want is to display for each user the longest cities string. So, I should get something like this :
Clients Cities
1 NY | WDC | LA
2 LA
I am a beginner in SQL (I use Spark SQL but it's mainly the same thing), so can you please how can I fix this problem please ??
Thanks !
You can use max():
select client, max(cities)
from t
group by client;
Then you should fix your data model, so you are not storing lists of cities in a string. That is not a good way to store the data in a relational database.
I think you should handle that query (in MYSQL) by using SELECT DISTINCT statement,
As inside a table contains many duplicate values, I hope it will make it work!
For instance,
SELECT DISTINCT city_name FROM cities;
And continue.... this is my hint to lead you to the desired and great answer

Eliminate duplicate records/rows?

I'm trying to list result from a multi-table query with on row, 2 columns. I have the correct data that I need, I merely need to trim it down to 1 line of results. In other words, eliminate duplicate entries in the result. I'm using a value not shown here, school_id. Should I go with that as a distinct value? Can I do that without displaying the school_id?
SQL> select DISTINCT(school_name),Team_Name
2 from school, team
3 where team.team_name like '%B%'
4 AND school.school_id = team.school_id;
SCHOOL_NAME TEAM_NAME
-------------------------------------------------- ----------
Lawrence Central High School Bears
Lawrence Central High School BEars
Lawrence Central High School BEARS
The problem, as I'm sure you know, is the fact that "Bears" is in 3 different cases here. The simple fix is to do the upper or lower of "Team_Name" so it will only have 1 return record.
UPPER(Team_Name)

Count number of rows that have a specific word in a varchar (in postgresql)

I have a table similar to the below:
id | name | direction |
--------------------------------------
1 Jhon Washington, DC
2 Diego Miami, Florida
3 Michael Orlando, Florida
4 Jenny Olympia, washington
5 Joe Austin, Texas
6 Barack Denver, Colorado
and I want to count how many people live in a specific state:
Washington 2
Florida 2
Texas 1
Colorado 1
How can I do this? (By the way this is just an question with an academic point of view )
Thanks in advance!
Postgres offers the function split_part(), which will break up a string by a delimiter. You want the second part (the part after the comma):
select split_part(direction, ', ', 2) as state, count(*)
from t
group by split_part(direction, ', ', 2);
Initially I would obtain the state from the direction field. Once you have that, it's quite simple:
SELECT state, count(*) as total FROM initial_table group by state.
To obtain the state, some functions depending on the dbms are useful. It depends on the language.
A possible pseudocode (given a function like substring_index of MySQL) for the query would be:
SELECT substring_index(direction,',',-1) as state, count(*) as total
FROM initial_table group by substring_index(direction,',',-1)
Edit: As it is suggested above, the query should return 1 for the Washington state.
My way do making such a queries is two-step - first, prepare fields you need, second, do you grouping or other calculation. That way you're following DRY principle and don't repeating yourself. I think CTE is the best tool for this:
with cte as (
-- we don't need other fields, only state
select
split_part(direction, ', ', 2) as state
from table1
)
select state, count(*)
from cte
group by state
sql fiddle demo
If you writing queries that way, it's easy to change grouping field in the future.
Hope that helps, and remember - readability counts! :)

Combining almost identical rows into 1

I have a tricky problem that I wouldn't mind a bit of help on, I've made some progress using queries that I've here and elsewhere, but am getting seriously stumped now.
I have a mailing list that has numerous near duplications that I'm trying to combine into one meaningful row, taking data such as this.
Title Forename Surname Address1 Postcode Phone Age Income Ownership Gas
Mrs D Andrews 122 Somewhere BH10 123456 66-70 Homeowner
Ms Diane Andrews 122 Somewhere BH10 123456 £25-40 EDF
and making one row along the lines of
Title Forename Surname Address1 Postcode Phone Age Income Ownership Gas
Mrs Diane Andrews 122 Somewhere BH10 123456 66-70 £25-40 Homeowner EDF
I have over 127 million records, most duplicated with a similar pattern, but no clear logic as was proven when I added an identity field. I also have over 90 columns to consider, so it's a bit of work!
There isn't a clear pattern to the data, so I'm thinking I may have a huge case statement to try to climb over.
Using the following code I can get a decent start on only returning the full name, but with the pattern of data - trying to compare the fields across rows is as follows.
SELECT c1.*
FROM
Mailing c1
JOIN
Mailingc2 ON c1.Telephone1 = c2.Telephone1 AND c1.surname = c2.surname
WHERE
len(c1.Forename) > len(c2.Forename)
AND c2.over_18 <> ''
AND c1.Telephone1 = '123456'
Has anyone got any pointers as to how I should progress please? I'm open to discussion and ideas...
I'm using SQL 2005 and apologies in advance if the tagging is all over the place!
Cheers,
Jon
Would it work by assuming that all persons with the same surname and phone number (Do all persons have a phone?) were the same person?
INSERT INTO newtable <fieldnames>
SELECT lastname,phone,max(field3),max(field4)....
FROM oldtable
GROUP BY lastname,phone
But that would collapse John Smith and Jack Smith living together into one person.
Perhaps you should consider outsourcing it to a data-entry sweatshop somewhere, adter you have preprocessed the data. :-)
And/or be prepared to take the flack for mistaken bundling.
Perhaps adding something like "To improve our green footprint, we have merged x listings on your adress together. If you would like separate mailings, please contact us"