How can i create retrospective trend over time using SQL - sql

I have some troubles creating a retrospective trend over time for next table using just SQL
user_id | Date of Exam | Exam Name | Result
-------------- +-----------------+--------------+-------
1 | 2013-01-01 6:00 | Geography | PASS
1 | 2013-01-02 6:00 | Math | FAIL
1 | 2013-01-03 6:00 | Geography | FAIL
1 | 2013-01-04 6:00 | Biology | FAIL
1 | 2013-01-04 7:00 | Biology | PASS
1 | 2013-01-04 6:00 | Math | FAIL
1 | 2013-01-04 7:00 | Math | PASS
2 | 2013-01-04 7:00 | Math | FAIL
I need to get pass rate for each day during a specific date range. For example for a specific day X i need to get latest available results as on current day for particular student(if no result is available for current date for him I need to take the one from previous date if previous day result is empty I need to take from the day before and so on). IF multiple results are available for one day for specific student the latest result should be used in calculations, the older one is ignored. I need to get pass percentage for each exam group for a particular day. The resulting table should look like this
Exam Name | 2013-01-01 | 2013-01-02 | 2013-01-03 | 2013-01-04
---------- +---------------+-- ------------+--------------+------------
Geography | 100% | 100% | 0% | 0%
Math | NULL | 0% | 0% | 50%
Biology | NULL | NULL | NULL | 100%
As of now i only managed to return multiple tables for each day. But i think it possible to merge it a single table. This is a query to get latest result for a specific day
select ExamName, COUNT(*) as TotalCount,
sum(case when Result = 'PASS' then 1 else 0 end) PassCount
from (SELECT
UserID,
ExamName,
Result,
DateOfExam,
ROW_NUMBER() OVER (Partition BY UserID, ExamName Order By DateOfExam DESC) AS RowNum
From dbo.ExamResults
where DateOfExam <= '2013-01-04 7:00'
) T1
where T1.RowNum = 1
group by ExamName
SQLFiddle with some DDL: http://sqlfiddle.com/#!6/6fde8/2/0

Related

Logic to read multiple rows in a table where flag = 'Y'

Consider the following scenario. I have a Customer table, which includes RowStart and EndDate logic, thus writing a new row every time a field value is updated.
Relevant fields in this table are:
RowStartDate
RowEndDate
CustomerNumber
EmployeeFlag
For this, I'd like to write a query, which will return an employee's period of tenure (EmploymentStartDate, and EmploymentEndDate). I.e. The RowStartDate when EmployeeFlag first became 'Y', and then the first RowStartDate where EmployeeFlag changed to 'N' (Ordered of course, by the RowStartDate asc). There is an additional complexity in that the Flag value may change between Y and N multiple times for a single person, as they may become staff, resign and then be employed again at a later date.
Example table structure is:
| CustomerNo | StaffFlag | RowStartDate | RowEndDate |
| ---------- | --------- | ------------ | ---------- |
| 12 | N | 2019-01-01 | 2019-01-14 |
| 12 | N | 2019-01-14 | 2019-03-02 |
| 12 | Y | 2019-03-02 | 2019-10-12 |
| 01 | Y | 2020-03-13 | NULL |
| 12 | N | 2019-10-12 | 2020-01-01 |
| 12 | Y | 2020-01-01 | NULL |
Output could be something like
| CustomerNo | StaffStartDate | StaffEndDate |
| ---------- | -------------- | ------------ |
| 12 | 2019-03-02 | 2019-10-12 |
| 01 | 2020-03-13 | NULL |
| 12 | 2021-01-01 | NULL |
Any ideas on how I might be able to solve this would be really appreciated.
Make sure you order the columns by ID and by dates:
select *
from yourtable
order by CustomerNumber asc,
EmployeeFlag desc,
RowStartDate asc,
RowEndDate asc
This gives you a list of all changes over time per employee.
Subsequently, you want to map two rows into a single row with two columns (two dates mapped into overall start and end date). Others have done this using the lead() function. For details please have a look here: Merging every two rows of data in a column in SQL Server

Can I put a condition on a window function in Redshift?

I have an events-based table in Redshift. I want to tie all events to the FIRST event in the series, provided that event was in the N-hours preceding this event.
If all I cared about was the very first row, I'd simply do:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows unbounded preceding) as first_time
FROM
my_table
But because I only want to tie this to the first event in the past N-hours, I want something like:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows between [N-hours ago] and current row) as first_time
FROM
my_table
A little background on my table. It's user actions, so effectively a user jumps on, performs 1-100 actions, and then leaves. Most users are 1-10x per day. Sessions rarely last over an hour, so I could set N=1.
If I just set a PARTITION BY date_trunc('hour', event_time), I'll double create for sessions that span the hour.
Assume my_table looks like
id | user_id | event_time
----------------------------------
1 | 123 | 2015-01-01 01:00:00
2 | 123 | 2015-01-01 01:15:00
3 | 123 | 2015-01-01 02:05:00
4 | 123 | 2015-01-01 13:10:00
5 | 123 | 2015-01-01 13:20:00
6 | 123 | 2015-01-01 13:30:00
My goal is to get a result that looks like
id | parent_id | user_id | event_time
----------------------------------
1 | 1 | 123 | 2015-01-01 01:00:00
2 | 1 | 123 | 2015-01-01 01:15:00
3 | 1 | 123 | 2015-01-01 02:05:00
4 | 4 | 123 | 2015-01-01 13:10:00
5 | 4 | 123 | 2015-01-01 13:20:00
6 | 4 | 123 | 2015-01-01 13:30:00
The answer appears to be "no" as of now.
There is a functionality in SQL Server of using RANGE instead of ROWS in the frame. This allows the query to compare values to the current row's value.
https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/
When I attempt this syntax in Redshift I get the error that "Range is not yet supported"
Someone update this when that "yet" changes!

How to determine an Increase in Employee Salary from consecutive Contract Rows?

I got a problem in my query :
My table store data like this
ContractID | Staff_ID | EffectDate | End Date | Salary | active
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 0
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 1
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 1
I would like to write a query like below:
ContractID | Staff_ID | EffectDate | End Date | Salary | Increase
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 50
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 50
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 200
-------------------------------------------------------------------------
Increase column is calculated by current contract minus previous contract
I use sql server 2008 R2
Unfortunately 2008R2 doesn't have access to LAG, but you can simulate the effect of obtaining the previous row (prev) in the scope of a current row (cur), with a RANKing and a self join to the previous ranked row, in the same partition by Staff_ID):
With CTE AS
(
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],[active],
ROW_NUMBER() OVER (Partition BY Staff_ID ORDER BY ContractID) AS Rnk
FROM Table1
)
SELECT cur.[ContractID], cur.[Staff_ID], cur.[EffectDate], cur.[End Date],
cur.[Salary], cur.Rnk,
CASE WHEN (cur.Rnk = 1) THEN 0 -- i.e. baseline salary
ELSE cur.Salary - prev.Salary END AS Increase
FROM CTE cur
LEFT OUTER JOIN CTE prev
ON cur.[Staff_ID] = prev.Staff_ID and cur.Rnk - 1 = prev.Rnk;
(If ContractId is always perfectly incrementing, we wouldn't need the ROW_NUMBER and could join on incrementing ContractIds, I didn't want to make this assumption).
SqlFiddle here
Edit
If you have Sql 2012 and later, the LEAD and LAG Analytic Functions make this kind of query much simpler:
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],
Salary - LAG(Salary, 1, Salary) OVER (Partition BY Staff_ID ORDER BY ContractID) AS Incr
FROM Table1
Updated SqlFiddle
One trick here is that we are calculating delta increments in salary, so for the first employee contract we need to return the current salary so that Salary - Salary = 0 for the first increase.

How to pivot two date columns and fill in the gaps with multiple overlapping date periods

I have a table with employee absence entries. The rows contain employee number, first and last day of absence and a whole lot of more data like absence type, approved, etc.
absencecalendarline:
EMPLOYEENUMBER | FIRSTDAYOFABSENCE | LASTDAYOFABSENCE | ABSENCETYPE | APPROVED
---------------+-------------------+------------------+-------------+----------
1 | 2013-01-01 | 2013-01-04 | VACATION | TRUE
2 | 2013-01-01 | 2013-01-02 | VACATION | TRUE
3 | 2013-02-05 | 2013-02-08 | VACATION | TRUE
2 | 2013-02-06 | 2013-02-07 | VACATION | TRUE
I would like to create a view with the absence entries listed with the all the dates. Something like this.
desired result:
EMPLOYEENUMBER | ABSENCEDATE | ABSENCETYPE | APPROVED
---------------+-------------+-------------+----------
1 | 2013-01-01 | VACATION | TRUE
1 | 2013-01-02 | VACATION | TRUE
1 | 2013-01-03 | VACATION | TRUE
1 | 2013-01-04 | VACATION | TRUE
2 | 2013-01-01 | VACATION | TRUE
2 | 2013-01-02 | VACATION | TRUE
3 | 2013-02-05 | VACATION | TRUE
.. .. .. ..
3 | 2013-02-08 | VACATION | TRUE
2 | 2013-02-06 | VACATION | TRUE
2 | 2013-02-07 | VACATION | TRUE
I also have a date table, CALENDARDAY loaded with all dates in the calendar and related information like week numbers, months etc. to help me with the date population.
My attempt at this Query have resulted in the following code:
SELECT unpvt.EMPLOYEENUMBER, unpvt.FIRSTORLAST, unpvt.ABSENCEDATE, unpvt.FIRSTABSENCE,
unpvt.LASTABSENCE, unpvt.ABSENCETYPE, unpvt.APPROVED, cd.THEDATE, cd.WEEKNUMBER,
(SELECT TOP 1 EMPLOYEENUMBER
FROM dbo.ABSENCECALENDARLINE asq
WHERE cd.THEDATE BETWEEN asq.FIRSTDAYOFABSENCE AND asq.LASTDAYOFABSENCE
ORDER BY cd.THEDATE DESC) EMPLOYEENUMBER
FROM
(SELECT EMPLOYEENUMBER, FIRSTDAYOFABSENCE, LASTDAYOFABSENCE, FIRSTDAYOFABSENCE AS
FIRSTABSENCE, LASTDAYOFABSENCE AS LASTABSENCE, ABSENCETYPE, APPROVED
FROM dbo.ABSENCECALENDARLINE acl) a
UNPIVOT
(ABSENCEDATE FOR FIRSTORLAST IN
(FIRSTDAYOFABSENCE, LASTDAYOFABSENCE)
) AS unpvt
RIGHT JOIN dbo.CALENDARDAY cd ON unpvt.ABSENCEDATE = cd.THEDATE
WHERE CAST(THEDATE AS datetime) BETWEEN '2013-01-01' AND '2013-12-31'
ORDER BY THEDATE
The challenge I meet with this is the SELECT subquery that requires a TOP 1 causing overlapping absences to only return one of the employees absent on a given date. A COUNT on this column returns the number of people absent on that day.
Am I thinking too complicated? How can I easily achieve my desired result? Any help would be greatly appreciated.
Best regards,
Alexander
Unless I'm missing something, I think you're over complicating this:
SELECT a.EMPLOYEENUMBER, b.DATE, a.ABSENCETYPE, a.APPROVED
FROM Table1 a
JOIN Calendar b
ON b.Date BETWEEN a.FIRSTDAYOFABSENCE AND a.LASTDAYOFABSENCE
Demo: SQL Fiddle

Get the highest odds from the last update

I have these tables in a PostgreSQL database:
bookmakers
-----------------------
| id | name |
-----------------------
| 1 | Unibet |
-----------------------
| 2 | 888 |
-----------------------
odds
---------------------------------------------------------------------
| id | odds_type | odds_index | bookmaker_id | created_at |
---------------------------------------------------------------------
| 1 | 1 | 1.55 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 2 | 2 | 3.22 | 2 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 3 | X | 3.00 | 1 | 2012-06-02 10:30 |
---------------------------------------------------------------------
| 4 | 2 | 1.25 | 1 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 5 | 1 | 2.30 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
| 6 | X | 2.00 | 2 | 2012-05-27 09:30 |
---------------------------------------------------------------------
What I am trying to query is the following:
Give me the 1/X/2 odds from the latest update (created_at) from ALL bookmakers and from that last update, give me the highest odds for each odds_type ('1', '2', 'X').
On my website I display them as:
Best odds right now: 1 | X | 2
--------------------
2.30 | 3.00 | 3.22
I have to first get the latest, because the odds from the update from yesterday are no longer valid. Then from that last update, I have - in this case - 2 odds from 2 different bookmakers, so I need to get the best one for type '1','2','X'.
Pseudo SQL would be something like:
SELECT MAX(odds_index) WHERE odds_type = '1' ORDER BY created_at DESC, odds_index DESC
But that doesn't work, because I would always get the latest odds (and not the highest/best from those latest)
I hope I'm making sense.
Subqueries to the rescue!
select o1.odds_type, max(o1.odds_index)
from odds o1
inner join (select odds_type, max(created_at) as created_at
from odds group by odds_type) o2
on o1.odds_type = o2.odds_type
and o1.created_at = o2.created_at
group by o1.odds_type
SQLFiddle: http://sqlfiddle.com/#!3/47df4/3
Your words "from the last update" contradict your example. Here are two methods.
To get from last update, how about getting the max created_at date aka last update and then using it for the rest.
declare #max_date date
select #max_date = max(created_at) from odds
select odds_type, odds_index
from odds
where created_at = #max_date
Or to match your example
select odds_type, odds_index
from odds
group by odds_type
having created_at = max(created_at)
Note: Different DBMS give different results depending on the select columns and whether there are more columns than in the group by clause.