SQL (sqlite) compare sums of rows grouped by another repeating row - sql

I have a table like:
|------------------------|
|day name trees_planted|
|------------------------|
|1 | alice | 3 |
|2 | alice | 4 |
|1 | bob | 2 |
|2 | bob | 4 |
|------------------------|
I'm using SELECT name, SUM(trees_planted) FROM year2016 GROUP BY name to get:
name | trees_planted
alice | 7
bob | 6
But then I have another table from 2015 and I want to compare the results with the previous year, if for example Alice planted more trees in 2016 than in 2015 I'd get a result like this:
name | tree_difference
alice | -2 (if previous year she planted 5 trees, 5 -7 = -2)
bob | 0 (planted the same number of trees last year)

You could use a sub-query to get the records from both 2016 and 2015, but negate the values from 2016. Then group and sum like you already did:
SELECT name,
SUM(trees_planted) AS tree_difference
FROM (SELECT name, trees_planted
FROM year2015
UNION ALL
SELECT name, -trees_planted
FROM year2016
) AS years
GROUP BY name
This will also work for cases where a number is only given in one of the two years.

Assuming you can join using user field, you can do:
select a.name, a.tp, b.tp, a.tp - b.tp
from
(
(select name, SUM(trees_planted) tp from year2016 group by name) a
inner join
(select name, SUM(trees_planted) tp from year2015 group by name) b
using(name)
)
If you can't join on field user (you have different set of users in 2015 and 2016), it'll be easy to add the missing information by using a couple of union clauses.
Here's a link with artificial data to SQLFIDDLE to try the query.

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.
Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.
If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0
Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.
Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m
You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

Postgresql query using recursion and selfjoin

I have an assignment and I am having trouble with one question. Basically I have a table like to one below. Alex is a player and the table is showing the team he has played with in every season. Note that a season start in a specific year and ends in the following year. I need to use only SQL (no cursors) to produce the output as illustrated in the second table were the career of Alex is shown only in two rows as opposed to the first table were the career of Alex is shown in four rows.
I have hardly tried to solve this question but cannot understand how to produce the output in the second table. I can perceive that I have to use CTE since I can see the Year_End is equal to the Year_Start of the following row. I have also tried to research on the net but since this is a very specific question I cannot find any relevant solutions. I have also posted my query so far since I think I am on the right track but now I'm stuck.
**TABLE records**
Id | Name | Team_Name | Year_Start | Year_End
---------------------------------------------------
100 | Alex | New Team | 2010 | 2011
101 | Alex | New Team | 2011 | 2012
102 | Alex | Best Eleven | 2012 | 2013
103 | Alex | Best Eleven | 2013 | 2014
**Required result from query**
Name | Team Name | Year_Start | Year_End
-------------------------------------------
Alex | New Team | 2010 | 2012
Alex | Best Eleven | 2012 | 2014
My query so far...
WITH RECURSIVE cte(id, name, team_name, year_start, year_end) AS
(
SELECT *
FROM history
WHERE name = 'Alex'
UNION ALL
SELECT history.id, history.name, history.team_name, history.year_start, history.year_end
FROM cte, history
WHERE cte.year_start = history.year_end
)
SELECT *
FROM cte;
Query that produced the requested result.
WITH RECURSIVE cte(id, name, team_name, year_start, year_end) AS
(
SELECT *
FROM history
WHERE name = 'Alex'
UNION ALL
SELECT history.id, history.name, history.team_name, history.year_start, history.year_end
FROM cte, history
WHERE cte.year_start = history.year_end
)
SELECT team_name, MIN(year_start), MAX(year_end)
FROM cte
GROUP BY team_name;
You can use below query to get your desired result
Sample SQL Fiddle
Select Distinct t.Name,t.Team_Name,b.YStart,b.YEnd
FROM t INNER JOIN(
Select Team_Name, Min(Year_Start) YStart,Max(Year_End) YEnd
FROM t
Group BY Team_Name ) b
ON t.Team_Name = b.Team_Name
select Name, Team_Name, min(Year_Start) startY, max(Year_End) endY
from t group by Name, Team_Name

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12

Splitting a string column in BigQuery

Let's say I have a table in BigQuery containing 2 columns. The first column represents a name, and the second is a delimited list of values, of arbitrary length. Example:
Name | Scores
-----+-------
Bob |10;20;20
Sue |14;12;19;90
Joe |30;15
I want to transform into columns where the first is the name, and the second is a single score value, like so:
Name,Score
Bob,10
Bob,20
Bob,20
Sue,14
Sue,12
Sue,19
Sue,90
Joe,30
Joe,15
Can this be done in BigQuery alone?
Good news everyone! BigQuery can now SPLIT()!
Look at "find all two word phrases that appear in more than one row in a dataset".
There is no current way to split() a value in BigQuery to generate multiple rows from a string, but you could use a regular expression to look for the commas and find the first value. Then run a similar query to find the 2nd value, and so on. They can all be merged into only one query, using the pattern presented in the above example (UNION through commas).
Trying to rewrite Elad Ben Akoune's answer in Standart SQL, the query becomes like this;
WITH name_score AS (
SELECT Name, split(Scores,';') AS Score
FROM (
(SELECT * FROM (SELECT 'Bob' AS Name ,'10;20;20' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Sue' AS Name ,'14;12;19;90' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Joe' AS Name ,'30;15' AS Scores))
))
SELECT name, score
FROM name_score
CROSS JOIN UNNEST(name_score.score) AS score;
And this outputs;
+------+-------+
| name | score |
+------+-------+
| Bob | 10 |
| Bob | 20 |
| Bob | 20 |
| Sue | 14 |
| Sue | 12 |
| Sue | 19 |
| Sue | 90 |
| Joe | 30 |
| Joe | 15 |
+------+-------+
If someone is still looking for an answer
select Name,split(Scores,';') as Score
from (
# replace the inner custome select with your source table
select *
from
(select 'Bob' as Name ,'10;20;20' as Scores),
(select 'Sue' as Name ,'14;12;19;90' as Scores),
(select 'Joe' as Name ,'30;15' as Scores)
);

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);
Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).
Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc
SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_
Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name