SQL to Get Latest Field Value - sql

I'm trying to write an SQL query (SQL Server) that returns the latest value of a field from a history table.
The table structure is basically as below:
ISSUE TABLE:
issueid
10
20
30
CHANGEGROUP TABLE:
changegroupid | issueid | updated |
1 | 10 | 01/01/2020 |
2 | 10 | 02/01/2020 |
3 | 10 | 03/01/2020 |
4 | 20 | 05/01/2020 |
5 | 20 | 06/01/2020 |
6 | 20 | 07/01/2020 |
7 | 30 | 04/01/2020 |
8 | 30 | 05/01/2020 |
9 | 30 | 06/01/2020 |
CHANGEITEM TABLE:
changegroupid | field | newvalue |
1 | ONE | 1 |
1 | TWO | A |
1 | THREE | Z |
2 | ONE | J |
2 | ONE | K |
2 | ONE | L |
3 | THREE | K |
3 | ONE | 2 |
3 | ONE | 1 | <--
4 | ONE | 1A |
5 | ONE | 1B |
6 | ONE | 1C | <--
7 | ONE | 1D |
8 | ONE | 1E |
9 | ONE | 1F | <--
EXPECTED RESULT:
issueid | updated | newvalue
10 | 03/01/2020 | 1
20 | 07/01/2020 | 1C
30 | 06/01/2020 | 1F
So each change to an issue item creates 1 change group record with the date the change was made, which can then contain 1 or more change item records.
Each change item shows the field name that was changed and the new value.
I then need to link those tables together to get each issue, the latest value of the field name called 'ONE', and ideally the date of the latest change.
These tables are from Jira, for those familiar with that table structure.
I've been trying to get this to work for a while now, so far I've got this query:
SELECT issuenum, MIN(created) AS updated FROM
(
SELECT ISSUE.IssueId, UpdGrp.Created as Created, UpdItm.NEWVALUE
FROM ISSUE
JOIN ChangeGroup UpdGrp ON (UpdGrp.IssueID = CR.ID)
JOIN CHANGEITEM UpdItm ON (UpdGrp.ID = UpdItm.groupid)
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) AS dummy
GROUP BY issuenum
ORDER BY issuenum
This returns the first 2 columns I'm looking for but I'm struggling to work out how to return the final column as when I include that in the first line I get an error saying "Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
I've done a search on here and can't find anything that exactly matches my requirements.

Use window functions:
SELECT i.*
FROM (SELECT i.IssueId, cg.Created as Created, ui.NEWVALUE,
ROW_NUMBER() OVER (PARTITION BY i.IssueId ORDER BY cg.Created DESC) as seqnum
FROM ISSUE i JOIN
ChangeGroup cg
ON cg.IssueID = CR.ID JOIN
CHANGEITEM ci
ON cg.ID = ci.groupid
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) i
WHERE seqnum = 1
ORDER BY issueid;

Related

SQL - joining 3 tables and choosing newest logged entry per id

I got rather complicated riddle to solve. So far I'm unlocky.
I got 3 tables which I need to join to get the result.
Most important is that I need highest h_id per p_id. h_id is uniqe entry in log history. And I need newest one for given point (p_id -> num).
Apart from that I need ext and name as well.
history
+----------------+---------+--------+
| h_id | p_id | str_id |
+----------------+---------+--------+
| 1 | 1 | 11 |
| 2 | 5 | 15 |
| 3 | 5 | 23 |
| 4 | 1 | 62 |
+----------------+---------+--------+
point
+----------------+---------+
| p_id | num |
+----------------+---------+
| 1 | 4564 |
| 5 | 3453 |
+----------------+---------+
street
+----------------+---------+-------------+
| str_id | ext | name |
+----------------+---------+-------------+
| 15 | | Mein st. 33 | - bad name
| 11 | | eck st. 42 | - bad name
| 62 | abc | Main st. 33 |
| 23 | efg | Back st. 42 |
+----------------+---------+-------------+
EXPECTED RESULT
+----------------+---------+-------------+-----+
| num | ext | name |h_id |
+----------------+---------+-------------+-----+
| 3453 | efg | Back st. 42 | 3 |
| 4564 | abc | Main st. 33 | 4 |
+----------------+---------+-------------+-----+
I'm using Oracle SQL. Tried using query below but result is not true.
SELECT num, max(name), max(ext), MAX(h_id) maxm FROM history
INNER JOIN street on street.str_id = history._str_id
INNER JOIN point on point.p_id = history.p_id
GROUP BY point.num
In Oracle, you can use keep:
SELECT p.num,
MAX(h.h_id) as maxm,
MAX(s.name) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as name,
MAX(s.ext) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as ext
FROM history h INNER JOIN
street s
ON s.str_id = h._str_id INNER JOIN
point p
ON p.p_id = h.p_id
GROUP BY p.num;
The keep syntax allows you to do "first()" and "last()" for aggregations.

Select all rows where rows in another joined table match condition

So I want to select all rows where a subset of rows in another table match the given values.
I have following tables:
Main Profile:
+----+--------+---------------+---------+
| id | name | subprofile_id | version |
+----+--------+---------------+---------+
| 1 | Main 1 | 4 | 1 |
| 2 | Main 1 | 5 | 2 |
| 3 | Main 2 | ... | 1 |
+----+--------+---------------+---------+
Sub Profile:
+---------------+----------+
| subprofile_id | block_id |
+---------------+----------+
| 4 | 6 |
| 4 | 7 |
| 5 | 8 |
| 5 | 9 |
+---------------+----------+
Block:
+----------+-------------+
| block_id | property_id |
+----------+-------------+
| 7 | 10 |
| 7 | 11 |
| 7 | 12 |
| 7 | 13 |
| 8 | 14 |
| 8 | 15 |
| 8 | 16 |
| 8 | 17 |
| ... | ... |
+----------+-------------+
Property:
+----+--------------------+--------------------------+
| id | name | value |
+----+--------------------+--------------------------+
| 10 | Description | XY |
| 11 | Responsible person | Mr. Smith |
| 12 | ... | ... |
| 13 | ... | ... |
| 14 | Description | XY |
| 15 | Responsible person | Mrs. Brown |
| 16 | ... | ... |
| 17 | ... | ... |
+----+--------------------+--------------------------+
The user can define multiple conditions on the property table. For example:
Description = 'XY'
Responsible person = 'Mr. Smith'
I need all 'Main Profiles' with the highest version which have ALL matching properties and can have more of course which do not match.
It should be doable in JPA because i would translate it into QueryDSL to build typesafe, dynamic queries with the users input.
I already searched trough all questions regarding similar problems but couldn't project the answer onto my problem.
Also, I've already tried to write a query which worked quite good but retrieved all rows with at least one matching condition. Therefore i need all properties in my set but it only fetched (fetch join, which is missing in my code examplte) the matching ones.
from MainProfile as mainProfile
left join mainProfile.subProfile as subProfile
left join subProfile.blocks as block
left join block.properties as property
where mainProfile.version = (select max(mainProfile2.version)from MainProfile as mainProfile2 where mainProfile2.name = mainProfile.name) and ((property.name = 'Description' and property.value = 'XY') or (property.name = 'Responsible person' and property.value = 'Mr. Smith'))
Running my query i got two rows:
Main 1 with version 2
Main 2 with version 1
I would have expected to get only one row due to mismatch of 'responsible person' in 'Main 2'
EDIT 1:
So I found a solution which works but could be improved:
select distinct mainProfile
from MainProfile as mainProfile
left join mainProfile.subProfile as subProfile
left join subProfile.blocks as block
left join block.properties as property
where mainProfile.version = (select max(mainProfile2.version)from MainProfile mainProfile2 where mainProfile2.name = mainProfile.name)
and ((property.name = 'Description' and property.content = 'XY') or (property.name = 'Responsible person' and property.content = 'Mr. Smith'))
group by mainProfile.id
having count (distinct property) = 2
It actually retrieves the right 'Main Profiles'. But the problem is, that only the two found properties are getting fetched. I need all properties though because of further processing.

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.
You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

SQL Server: how do I get data from a history table?

Can you please help me build an SQL query to retrieve data from a history table?
I'm a newbie with only a one-week coding experience. I've been trying simple SELECT statements so far but have hit a stumbling block.
My football club's database has three tables. The first one links balls to players:
BallDetail
| BallID | PlayerID | TeamID |
|-------------------|--------|
| 1 | 11 | 21 |
| 2 | 12 | 22 |
The second one lists things that happen to the balls:
BallEventHistory
| BallID | Event | EventDate |
|--------|------ |------------|
| 1 | Pass | 2012-01-01 |
| 1 | Shoot | 2012-02-01 |
| 1 | Miss | 2012-03-01 |
| 2 | Pass | 2012-01-01 |
| 2 | Shoot | 2012-02-01 |
And the third one is a history change table. After a ball changes hands, history is recorded:
HistoryChanges
| BallID | ColumnName | ValueOld | ValueNew |
|--------|------------|----------|----------|
| 2 | PlayerID | 11 | 12 |
| 2 | TeamID | 21 | 22 |
I'm trying to obtain a table that would list all passes and shoots Player 11 had done to all balls before the balls went to other players. Like this:
| PlayerID | BallID | Event | Month |
|----------|--------|-------|-------|
| 11 | 1 | Pass | Jan |
| 11 | 1 | Shoot | Feb |
| 11 | 2 | Pass | Jan |
I begin so:
SELECT PlayerID, BallID, Event, DateName(month, EventDate)
FROM BallDetail bd INNER JOIN BallEventHistory beh ON bd.BallID = beh.BallID
WHERE PlayerID = 11 AND Event IN (Pass, Shoot) ...
But how to make sure that Ball 2 also gets included despite being with another player now?
Select PlayerID,BallID,Event,datename(month,EventDate) as Month,Count(*) as cnt from
(
Select
Coalesce(
(Select ValueNew from #HistoryChanges where ChangeDate=(Select max(ChangeDate) from #HistoryChanges h2 where h2.BallID=h.BallID and ColumnName='PlayerID' and ChangeDate<=EventDate) and BallID=h.BallID and ColumnName='PlayerID')
,(Select PlayerID from #BallDetail where BallID=h.BallID)
) as PlayerID,
h.BallID,h.Event,EventDate
from #BallEventHistory h
) a
Group by PlayerID, BallID, Event,datename(month,EventDate)
SELECT d.PlayerID, d.BallID, h.Event, DATENAME(mm, h.EventDate) AS Month
FROM BallDetail d JOIN BallEventHistory h ON d.BallID = h.BallID
WHERE h.Event IN ('Pass', 'Shoot') AND d.PlayerID = 11
OR EXISTS (SELECT 1
FROM dbo.HistoryChanges c
WHERE c.ValueOld = 11 AND c.ValueNew = d.PlayerID AND c.ColumnName = 'PlayerID' and c.ChangeDate = h.EventDate)