SQL query to get the same set of results - sql

This should be a simple one, but say I have a table with data like this:
| ID | Date | Value |
| 1 | 01/01/2013 | 40 |
| 2 | 03/01/2013 | 20 |
| 3 | 10/01/2013 | 30 |
| 4 | 14/02/2013 | 60 |
| 5 | 15/03/2013 | 10 |
| 6 | 27/03/2013 | 70 |
| 7 | 01/04/2013 | 60 |
| 8 | 01/06/2013 | 20 |
What I want is the sum of values per week of the year, showing ALL weeks.. (for use in an excel graph)
What my query gives me, is only the weeks that are actually in the database.

With SQL you cannot return rows that don't exist in some table. To get the effect you want you could create a table called WeeksInYear with only one field WeekNumber that is an Int. Populate the table with all the week numbers. Then JOIN that table to this one.
The query would then look something like the following:
SELECT w.WeekNumber, SUM(m.Value)
FROM MyTable as m
RIGHT OUTER JOIN WeeksInYear AS w
ON DATEPART(wk, m.date) = w.WeekNumber
GROUP BY w.WeekNumber
The missing weeks will not have any data in MyTable and show a 0.

Related

SQL to Get Latest Field Value

I'm trying to write an SQL query (SQL Server) that returns the latest value of a field from a history table.
The table structure is basically as below:
ISSUE TABLE:
issueid
10
20
30
CHANGEGROUP TABLE:
changegroupid | issueid | updated |
1 | 10 | 01/01/2020 |
2 | 10 | 02/01/2020 |
3 | 10 | 03/01/2020 |
4 | 20 | 05/01/2020 |
5 | 20 | 06/01/2020 |
6 | 20 | 07/01/2020 |
7 | 30 | 04/01/2020 |
8 | 30 | 05/01/2020 |
9 | 30 | 06/01/2020 |
CHANGEITEM TABLE:
changegroupid | field | newvalue |
1 | ONE | 1 |
1 | TWO | A |
1 | THREE | Z |
2 | ONE | J |
2 | ONE | K |
2 | ONE | L |
3 | THREE | K |
3 | ONE | 2 |
3 | ONE | 1 | <--
4 | ONE | 1A |
5 | ONE | 1B |
6 | ONE | 1C | <--
7 | ONE | 1D |
8 | ONE | 1E |
9 | ONE | 1F | <--
EXPECTED RESULT:
issueid | updated | newvalue
10 | 03/01/2020 | 1
20 | 07/01/2020 | 1C
30 | 06/01/2020 | 1F
So each change to an issue item creates 1 change group record with the date the change was made, which can then contain 1 or more change item records.
Each change item shows the field name that was changed and the new value.
I then need to link those tables together to get each issue, the latest value of the field name called 'ONE', and ideally the date of the latest change.
These tables are from Jira, for those familiar with that table structure.
I've been trying to get this to work for a while now, so far I've got this query:
SELECT issuenum, MIN(created) AS updated FROM
(
SELECT ISSUE.IssueId, UpdGrp.Created as Created, UpdItm.NEWVALUE
FROM ISSUE
JOIN ChangeGroup UpdGrp ON (UpdGrp.IssueID = CR.ID)
JOIN CHANGEITEM UpdItm ON (UpdGrp.ID = UpdItm.groupid)
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) AS dummy
GROUP BY issuenum
ORDER BY issuenum
This returns the first 2 columns I'm looking for but I'm struggling to work out how to return the final column as when I include that in the first line I get an error saying "Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
I've done a search on here and can't find anything that exactly matches my requirements.
Use window functions:
SELECT i.*
FROM (SELECT i.IssueId, cg.Created as Created, ui.NEWVALUE,
ROW_NUMBER() OVER (PARTITION BY i.IssueId ORDER BY cg.Created DESC) as seqnum
FROM ISSUE i JOIN
ChangeGroup cg
ON cg.IssueID = CR.ID JOIN
CHANGEITEM ci
ON cg.ID = ci.groupid
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) i
WHERE seqnum = 1
ORDER BY issueid;

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

Distinct lists on dates where an ID is present (i.e. intersects) on consecutive dates

I'm trying to make an MSSQL query that produces lists of apartment prices. The ultimate goal of the query is to calculate the percentage change in average prices of apartments. However, this final calculation (namely taking averages) is something I can fix in code provided that the list(s) of prices that are retrieved are correct.
What makes this tricky is that apartments are sold and new ones added all the time, so when comparing prices from week to week (I have weekly data), I only want to compare prices for apartments that have a recorded price in weeks (t-1, t), (t, t+1), (t+1,t+2) etc. In other words, some apartments that had a recorded price in time (t-1) might not be there at time t, and some apartments may have been added at time t (and thus weren't there at time t-1). I only want to select prices in week t-1 and t where some ApartmentID exists in both week t-1 and t to calculate the average change in week t.
Example data
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 2 | 2014-04-04 | 6 | 2 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
Solutions thoughts
I think it makes sense to produce two different lists, one for odd-numbered weeks and one for even-numbered weeks. List 1 would then contain Date, Price and ApartmentID that are valid for the tuples (t-1,t), (t+1,t+2), (t+3,t+4) etc. while list 2 would contain the same for the tuples (t,t+1),(t+2,t+3),(t+4,t+5) etc. The reason I think two lists are needed is that for any given week t, there are two sets of apartments and corresponding prices that need to be produced - one that is "forward compatible" and one that is "backwards compatible".
If two such lists can be produced, then the rest is simply an exercise in taking averages over each distinct date.
I'm not really sure to begin here. I played a little around with Intersect, but I'm pretty sure I need to nest queries to get this to work.
Result
Using the methodology described above would yield two lists.
List 1:
Notice how RegistrationID 2 and 6 disappear because they don't exist in on both dates 2014-04-04 and 2014-04-11. The same goes for RegistrationID 7 as this apartment doesn't exist for both 2014-04-19 and 2014-04-26.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
List 2:
Here, nothing disappears because every apartment is present in the tuples within the scope of this list.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
-------------------------------------------------------------
Here's a solution. First, I get all the records from the table (I named it "ApartmentPrice"), computing the WeekOf (which is the Sunday of that week), PreviousWeek (the Sunday of the previous week), and NextWeek (the Sunday of the following week). I store that in a table variable (you could also put it in a CTE or a temp table).
declare #tempTable table(RegistrationId int, PriceDate date, Price decimal(8,2), ApartmentId int, WeekOf date, PreviousWeek date, NextWeek date)
Insert #tempTable
select ap.RegistrationId,
ap.PriceDate,
ap.Price,
ap.ApartmentId,
DATEADD(ww, DATEDIFF(ww,0,ap.PriceDate), 0) WeekOf,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, -1, ap.PriceDate)), 0) PreviousWeek,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, 1, ap.PriceDate)), 0) NextWeek
from ApartmentPrice ap
Then I join that table variable to itself where WeekOf equals either NextWeek or PreviousWeek. This gives the apartments that have a record in the adjoining week.
select distinct t.RegistrationId, t.PriceDate, t.Price, t.ApartmentId
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
order by t.RegistrationId, t.ApartmentId, t.PriceDate
I'm using distinct because an apartment will appear more than once in the results if it does have an adjoining week record.
You can also find the average prices for each week like this:
select t.WeekOf, avg(distinct t.Price)
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
group by t.WeekOf
order by t.WeekOf
Here's a SQL Fiddle. I added a few more rows to the test data to show that it handles dates that cross the end of the year boundary.

SQL Combine two tables with two parameters

I searched forum for 1h and didn't find nothing similar.
I have this problem: I want to compare two colums ID and DATE if they are the same in both tables i want to put number from table 2 next to it. But if it is not the same i want to fill yearly quota on the date. I am working in Access.
table1
id|date|state_on_date
1|30.12.2013|23
1|31.12.2013|25
1|1.1.2014|35
1|2.1.2014|12
2|30.12.2013|34
2|31.12.2013|65
2|1.1.2014|43
table2
id|date|year_quantity
1|31.12.2013|100
1|31.12.2014|150
2|31.12.2013|200
2|31.12.2014|300
I want to get:
table 3
id|date|state_on_date|year_quantity
1|30.12.2013|23|100
1|31.12.2013|25|100
1|1.1.2014|35|150
1|2.1.2014|12|150
2|30.12.2013|34|200
2|31.12.2013|65|200
2|1.1.2014|43|300
I tried joins and reading forums but didn't find solution.
Are you looking for this?
SELECT id, date, state_on_date,
(
SELECT TOP 1 year_quantity
FROM table2
WHERE id = t.id
AND date >= t.date
ORDER BY date
) AS year_quantity
FROM table1 t
Output:
| ID | DATE | STATE_ON_DATE | YEAR_QUANTITY |
|----|------------|---------------|---------------|
| 1 | 2013-12-30 | 23 | 100 |
| 1 | 2013-12-31 | 25 | 100 |
| 1 | 2014-01-01 | 35 | 150 |
| 1 | 2014-01-02 | 12 | 150 |
| 2 | 2013-12-30 | 34 | 200 |
| 2 | 2013-12-31 | 65 | 200 |
| 2 | 2014-01-01 | 43 | 300 |
Here is SQLFiddle demo It's for SQL Server but should work just fine in MS Accesss.