Count number of rows that have a specific word in a varchar (in postgresql) - sql

I have a table similar to the below:
id | name | direction |
--------------------------------------
1 Jhon Washington, DC
2 Diego Miami, Florida
3 Michael Orlando, Florida
4 Jenny Olympia, washington
5 Joe Austin, Texas
6 Barack Denver, Colorado
and I want to count how many people live in a specific state:
Washington 2
Florida 2
Texas 1
Colorado 1
How can I do this? (By the way this is just an question with an academic point of view )
Thanks in advance!

Postgres offers the function split_part(), which will break up a string by a delimiter. You want the second part (the part after the comma):
select split_part(direction, ', ', 2) as state, count(*)
from t
group by split_part(direction, ', ', 2);

Initially I would obtain the state from the direction field. Once you have that, it's quite simple:
SELECT state, count(*) as total FROM initial_table group by state.
To obtain the state, some functions depending on the dbms are useful. It depends on the language.
A possible pseudocode (given a function like substring_index of MySQL) for the query would be:
SELECT substring_index(direction,',',-1) as state, count(*) as total
FROM initial_table group by substring_index(direction,',',-1)
Edit: As it is suggested above, the query should return 1 for the Washington state.

My way do making such a queries is two-step - first, prepare fields you need, second, do you grouping or other calculation. That way you're following DRY principle and don't repeating yourself. I think CTE is the best tool for this:
with cte as (
-- we don't need other fields, only state
select
split_part(direction, ', ', 2) as state
from table1
)
select state, count(*)
from cte
group by state
sql fiddle demo
If you writing queries that way, it's easy to change grouping field in the future.
Hope that helps, and remember - readability counts! :)

Related

SQL Result to multiple array

MY SQL returns the following array...
id
staff
province
1
Ben
Ontario
2
Ben
Quebec
3
John
Manitoba
4
John
Saskatchewan
6
Kitty
Alberta
7
Kitty
Nova Scotia
I would like to have the record displayed like this...
staff
province
Ben
Ontario, Quebec
John
Quebec, Manitoba, Saskatchewan
Kitty
Alberta, Nova Scotia
what approach should I use to approach this?
Would be better to post the tables as well for clearer context.
You can use Aggregate functions and Grouping to help doing this. A GROUP BY to group the rows by staff column, then use GROUP_CONCAT() to concatenate province values in one string.
A reference of how you want it to be, unsure what table you are using or if there are any other factors but you can adapt as needed.
SELECT staff, GROUP_CONCAT(province SEPARATOR ', ') as province
FROM table_name
GROUP BY staff;

How to get the differences between two rows **and** the name of the field where the difference is, in BigQuery?

I have a table in BigQuery like this:
Name
Phone Number
Address
John
123456778564
1 Penny Lane
John
873452987424
1 Penny Lane
Mary
845704562848
87 5th Avenue
Mary
845704562848
54 Lincoln Rd.
Amy
342847327234
4 Ocean Drive Avenue
Amy
347907387469
98 Truman Rd.
I want to get a table with the differences between two consecutive rows and the name of the field where occurs the difference:
I mean this:
Name
Field
Before
After
John
Phone Number
123456778564
873452987424
Mary
Address
87 5th Avenue
54 Lincoln Rd.
Amy
Phone Number
342847327234
347907387469
Amy
Address
4 Ocean Drive Avenue
98 Truman Rd.
How can I do this ? I've looked on other posts but couldn't find something that corresponds to my need.
Thank you
Consider below BigQuery'ish solution
select Name, ['Phone Number', 'Address'][offset(offset)] Field,
prev_field as Before, field as After
from (
select timestamp, Name, offset, field,
lag(field) over (partition by Name, offset order by timestamp) as prev_field
from yourtable,
unnest([`Phone Number`, Address]) field with offset
)
where prev_field != field
if applied to sample data in your question - output is
As you can see here - no matter how many columns in your table that you need to compare - it is still just one query - no unions and such.
You just need to enumerate your columns in two places
['Phone Number', 'Address'][offset(offset)] Field
and
unnest([`Phone Number`, Address]) field with offset
Note: you can further refactor above using scripting's execute immediate to compose such lists within the query on the fly (check my other answers - I frequently use such technique in them)
One method is just use to use lag() and union all
select name, 'phone', prev_phone as before, phone as after
from (select name, phone,
lag(phone) over (partition by name order by timestamp) as prev_phone
from t
) t
where prev_phone <> phone
union all
select name, 'address', prev_address as before, address as afte4r
from (select name, address,
lag(address) over (partition by name order by timestamp) as prev_address
from t
) t
where prev_address <> address

Finding the difference between two nearly identical SQL rows

I am developing an application and my boss wants to track all changes that have been made to a record throughout its life.
For instance, if I have the following table:
ID Name City Item Version
1 Mike Miami Test box 1
1 Mike Fort Lauderdale Test box 2
1 Mike Sarasota Testing box 3
And I want to see that from version 1 to version 2 the city was changed to Fort Lauderdale, is there a query that will help me do that? I would really like to be able to do this without specifying all the column names individually, because the actual table has 25+ columns and they may change at any time, plus it would be nice if the query could be easily portable to different tables. Ideally my result would look like the following, but I'm willing to accept anything that would help. Thanks in advance!
ColumnName Previous Value New Value
City Miami Fort Lauderdale
Assuming that the columns are all strings (which is rather necessary for your output format), you can do this by unpivoting the data and using lag():
select c.*
from (select t.id,
lag(v.col) over (partition by t.id order by t.version) as prev_val,
v.val
from t cross apply
(values ('Name', name), ('City', city), ('Item', item)
) v(col, val)
) c
where prev_val <> val;
If you have columns that are not strings, then you'll need to convert them to strings in the values clause.
This also assumes that the values are not NULL. That can be handled, but does not seem necessary.

SQL Combine null rows with non null

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?
This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

Populating column for Oracle Text search from 2 tables

I am investigating the benefits of Oracle Text search, and currently am looking at collecting search text data from multiple (related) tables and storing the data in the smaller table in a 1-to-many relationship.
Consider these 2 simple tables, house and inhabitants, and there are NEVER any uninhabited houses:
HOUSE
ID Address Search_Text
1 44 Some Road
2 31 Letsby Avenue
3 18 Moon Crescent
INHABITANT
ID House Name Nickname
1 1 Jane Doe Janey
2 1 John Doe JD
3 2 Jo Smythe Smithy
4 2 Percy Plum PC
5 3 Apollo Lander Moony
I want to to write SQL that updates the HOUSE.Search_Text column with text from INHABITANT. Now because this is a 1-to-many, the SQL needs to collate the data in INHABITANT for each matching row in house, and then combine the data (comma separated) and update the Search_Text field.
Once done, the Oracle Text search index on HOUSE.Search_Text will return me HOUSEs that match the search criteria, and I can look up INHABITANTs accordingly.
Of course, this is a very simplified example, I want to pick up data from many columns and Full Text Search across fields in both tables.
With the help of a colleague we've got:
select id, ADDRESS||'; '||Names||'; '||Nicknames as Search_Text
from house left join(
SELECT distinct house_id,
LISTAGG(NAME, ', ') WITHIN GROUP (ORDER BY NAME) OVER (PARTITION BY house_id) as Names,
LISTAGG(NICKNAME, ', ') WITHIN GROUP (ORDER BY NICKNAME) OVER (PARTITION BY house_id) as Nicknames
FROM INHABITANT)
i on house.id = i.house_id;
which returns:
1 44 Some Road; Jane Doe, John Doe; JD, Janey
2 31 Letsby Avenue; Jo Smythe, Percy Plum; PC, Smithy
3 18 Moon Crescent; Apollo Lander; Moony
Some questions:
Is this an efficient query to return this data? I'm slightly
concerned about the distinct.
Is this the right way to use Oracle Text search across multiple text fields?
How to update House.Search_Text with the results above? I think I need a correlated subquery, but can't quite work it out.
Would it be more efficient to create a new table containing House_ID and Search_Text only, rather than update House?