Update statement to set a column based the maximum row of another table - sql

I have a Family table:
SELECT * FROM Family;
id | Surname | Oldest | Oldest_Age
---+----------+--------+-------
1 | Byre | NULL | NULL
2 | Summers | NULL | NULL
3 | White | NULL | NULL
4 | Anders | NULL | NULL
The Family.Oldest column is not yet populated. There is another table of Children:
SELECT * FROM Children;
id | Name | Age | Family_FK
---+----------+------+--------
1 | Jake | 8 | 1
2 | Martin | 7 | 2
3 | Sarah | 10 | 1
4 | Tracy | 12 | 3
where many children (or no children) can be associated with one family. I would like to populate the Oldest column using an UPDATE ... SET ... statement that sets it to the Name and Oldest_Age of the oldest child in each family. Finding the name of each oldest child is a problem that is solved quite well here: How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
However, I don't know how to use the result of this in an UPDATE statement to update the column of an associated table using the h2 database.

The following is ANSI-SQL syntax that solves this problem:
update family
set oldest = (select name
from children c
where c.family_fk = f.id
order by age desc
fetch first 1 row only
)
In h2, I think you would use limit 1 instead of fetch first 1 row only.
EDIT:
For two columns -- alas -- the solution is two subqueries:
update family
set oldest = (select name
from children c
where c.family_fk = f.id
order by age desc
limit 1
),
oldest_age = (select age
from children c
where c.family_fk = f.id
order by age desc
limit 1
);
Some databases (such as SQL Server, Postgres, and Oracle) support lateral joins that can help with this. Also, row_number() can also help solve this problem. Unfortunately, H2 doesn't support this functionality.

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.
Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.
If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0
Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.
Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m
You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12

Matching algorithm in SQL

I have the following table in my database.
# select * FROM matches;
name | prop | rank
------+------+-------
carl | 1 | 4
carl | 2 | 3
carl | 3 | 9
alex | 1 | 8
alex | 2 | 5
alex | 3 | 6
alex | 3 | 8
alex | 2 | 11
anna | 3 | 8
anna | 3 | 13
anna | 2 | 14
(11 rows)
Each person is ranked at work by different properties/criterias called 'prop' and the performance is called 'rank'. The table contains multiple values of (name, prop) as the example shows. I want to get the best candidate following from some requirements. E.g. I need a candidate that have (prop=1 AND rank > 5) and (prop=3 AND rank >= 8). Then we must be able to sort the candidates by their rankings to get the best candidate.
EDIT: Each person must fulfill ALL requirements
How can I do this in SQL?
select x.name, max(x.rank)
from matches x
join (
select name from matches where prop = 1 AND rank > 5
intersect
select name from matches where prop = 3 AND rank >= 8
) y
on x.name = y.name
group by x.name
order by max(rank);
Filtering the data to match your criteria here is quite simple (as shown by both Amir and sternze):
SELECT *
FROM matches
WHERE prop=1 AND rank>5) OR (prop=3 AND rank>=8
The problem is how to aggregate this data so as to have just one row per candidate.
I suggest you do something like this:
SELECT m.name,
MAX(DeltaRank1) AS MaxDeltaRank1,
MAX(DeltaRank3) AS MaxDeltaRank3
FROM (
SELECT name,
(CASE WHEN prop=1 THEN rank-6 ELSE 0 END) AS DeltaRank1,
(CASE WHEN prop=3 THEN rank-8 ELSE 0 END) AS DeltaRank3,
FROM matches
) m
GROUP BY m.name
HAVING MaxDeltaRank1>0 AND MaxDeltaRank3>0
SORT BY MaxDeltaRank1+MaxDeltaRank3 DESC;
This will order the candidates by the sum of how much they exceeded the target rank in prop1 and prop3. You could use different logic to indicate which is best though.
In the case above, this should be the result:
name | MaxDeltaRank1 | MaxDeltaRank3
------+---------------+--------------
alex | 3 | 0
... because neither anna nor carl reach both the required ranks.
A typical case of relational division. We assembled a whole arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
Assuming you want the minimum rank of a person, I might solve your particular case with LEAST():
SELECT m1.name, LEAST(m1.rank, m2.rank, ...) AS best_rank
FROM matches m1
JOIN matches m2 USING (name)
...
WHERE m1.prop = 1 AND m1.rank > 5
AND m2.prop = 3 AND m2.rank >= 8
...
ORDER BY best_rank;
Also assuming name to be unique per individual person. You'd probably use some kind of foreign key to a pk column of a person table in reality.
And if you have such a person table like you should, the best rank would be stored in a column there ...
If I understand you question, then you just need to execute the following operation:
SELECT * FROM matches where (prop = 1 AND rank > 5) OR (prop = 3 AND rank >= 8) ORDER BY rank
It gives you the canidates that either have prop=1 and rank > 5 or prop=3 and rank >= 8 sorted by their rankings.

SQL Query to select bottom 2 from each category

In Mysql, I want to select the bottom 2 items from each category
Category Value
1 1.3
1 4.8
1 3.7
1 1.6
2 9.5
2 9.9
2 9.2
2 10.3
3 4
3 8
3 16
Giving me:
Category Value
1 1.3
1 1.6
2 9.5
2 9.2
3 4
3 8
Before I migrated from sqlite3 I had to first select a lowest from each category, then excluding anything that joined to that, I had to again select the lowest from each category. Then anything equal to that new lowest or less in a category won. This would also pick more than 2 in case of a tie, which was annoying... It also had a really long runtime.
My ultimate goal is to count the number of times an individual is in one of the lowest 2 of a category (there is also a name field) and this is the one part I don't know how to do.
Thanks
SELECT c1.category, c1.value
FROM catvals c1
LEFT OUTER JOIN catvals c2
ON (c1.category = c2.category AND c1.value > c2.value)
GROUP BY c1.category, c1.value
HAVING COUNT(*) < 2;
Tested on MySQL 5.1.41 with your test data. Output:
+----------+-------+
| category | value |
+----------+-------+
| 1 | 1.30 |
| 1 | 1.60 |
| 2 | 9.20 |
| 2 | 9.50 |
| 3 | 4.00 |
| 3 | 8.00 |
+----------+-------+
(The extra decimal places are because I declared the value column as NUMERIC(9,2).)
Like other solutions, this produces more than 2 rows per category if there are ties. There are ways to construct the join condition to resolve that, but we'd need to use a primary key or unique key in your table, and we'd also have to know how you intend ties to be resolved.
You could try this:
SELECT * FROM (
SELECT c.*,
(SELECT COUNT(*)
FROM user_category c2
WHERE c2.category = c.category
AND c2.value < c.value) cnt
FROM user_category c ) uc
WHERE cnt < 2
It should give you the desired results, but check if performance is ok.
Here's a solution that handles duplicates properly. Table name is 'zzz' and columns are int and float
select
smallest.category category, min(smallest.value) value
from
zzz smallest
group by smallest.category
union
select
second_smallest.category category, min(second_smallest.value) value
from
zzz second_smallest
where
concat(second_smallest.category,'x',second_smallest.value)
not in ( -- recreate the results from the first half of the union
select concat(c.category,'x',min(c.value))
from zzz c
group by c.category
)
group by second_smallest.category
order by category
Caveats:
If there is only one value for a given category, then only that single entry is returned.
If there was a unique recordID for each row you wouldn't need all the concats to simulate a unique key.
Your mileage may vary,
--Mark
A union should work. I'm not sure of the performance compared to Peter's solution.
SELECT smallest.category, MIN(smallest.value)
FROM categories smallest
GROUP BY smallest.category
UNION
SELECT second_smallest.category, MIN(second_smallest.value)
FROM categories second_smallest
WHERE second_smallest.value > (SELECT MIN(smallest.value) FROM categories smallest WHERE second.category = second_smallest.category)
GROUP BY second_smallest.category
Here is a very generalized solution, that would work for selecting first n rows for each Category. This will work even if there are duplicates in value.
/* creating temporary variables */
mysql> set #cnt = 0;
mysql> set #trk = 0;
/* query */
mysql> select Category, Value
from (select *,
#cnt:=if(#trk = Category, #cnt+1, 0) cnt,
#trk:=Category
from user_categories
order by Category, Value ) c1
where c1.cnt < 2;
Here is the result.
+----------+-------+
| Category | Value |
+----------+-------+
| 1 | 1.3 |
| 1 | 1.6 |
| 2 | 9.2 |
| 2 | 9.5 |
| 3 | 4 |
| 3 | 8 |
+----------+-------+
This is tested on MySQL 5.0.88
Note that initial value of #trk variable should be not the least value of Category field.