Postgres GROUP BY looking at dates ranges - sql

I have a table with the history of the "Code" value changes. Every month this table gets a new record with the new value of the "Code" for the specified month.
+----------+------------+------------+------+
| Employee | FromDate | ToDate | Code |
+----------+------------+------------+------+
| Employee | 01/07/2016 | 31/07/2016 | 4 |
| Employee | 01/06/2016 | 30/06/2016 | 2 |
| Employee | 01/05/2016 | 31/05/2016 | 2 |
| Employee | 01/04/2016 | 30/04/2016 | 3 |
| Employee | 01/03/2016 | 31/03/2016 | 3 |
| Employee | 01/02/2016 | 29/02/2016 | 4 |
| Employee | 01/01/2016 | 31/01/2016 | 4 |
+----------+------------+------------+------+
I need to group by this data to get a new record every time "Code" changes and take the min value for the "From date" and the max value for the "To date". Data must be ordered descending by "FromDate". With my query I got this result:
+----------+------------+------------+------+
| Employee | FromDate | ToDate | Code |
+----------+------------+------------+------+
| Employee | 01/05/2016 | 30/06/2016 | 2 |
| Employee | 01/03/2016 | 30/04/2016 | 3 |
| Employee | 01/01/2016 | 31/07/2016 | 4 |
+----------+------------+------------+------+
It works fine but if the same "Code" has more the one date range (see the 4 code in the first table) I got a single row per code. I would like get this result with the 4 code in 2 records because its period is not continuos but it's broke by others codes (3 and 2):
+----------+------------+------------+------+
| Employee | FromDate | ToDate | Code |
+----------+------------+------------+------+
| Employee | 01/07/2016 | 31/07/2016 | 4 |
| Employee | 01/05/2016 | 30/06/2016 | 2 |
| Employee | 01/03/2016 | 30/04/2016 | 3 |
| Employee | 01/01/2016 | 29/02/2016 | 4 |
+----------+------------+------------+------+
I use this query:
SELECT
d."Employee",
MIN (d."FromDate") AS "FromDate",
MAX (d."ToDate") AS "ToDate",
d."Code"
FROM
(
SELECT
"Employees"."FromDate",
"Employees"."ToDate",
"Employees"."Code",
"Employees"."Employee"
FROM
schema_estelspa."Employees"
ORDER BY
"Employees"."FromDate" DESC
) d
GROUP BY
d."Code",
d."Employee"
ORDER BY
(MIN(d."FromDate")) DESC
Is there any trick to get the result I desired?
Date format is: dd/MM/yyyy

Here you need to make date range and make from_date as one part of group by column. you also need to self join to achieve this result. I prepared following SQL in teradata. Please make necessary changes for your database(coalesc is used as if null expression, you can use nvl or case statement as well)
Query:
SELECT E.EMPLOYEE, E.CODE,COALESCE(ET1.FROMdATE,E.FROMDATE)FROM_DATE ,MAX(E.TODATE)TO_D
FROM EMP_TEST E
LEFT OUTER JOIN EMP_TEST ET1
ON E.EMPLOYEE=ET1.EMPLOYEE
AND E.CODE=ET1.CODE
AND E.FromDate=ET1.ToDate+1
GROUP BY 1,2,3
ORDER BY FROM_DATE
Output:
Employee Code FROM_DATE TO_D
1 Employee 4 1/1/2016 2/29/2016
2 Employee 2 5/1/2016 6/30/2016
3 Employee 4 7/1/2016 7/31/2016
4 Employee 3 3/1/2016 4/30/2016

Standard recursive solution for connecting-the-dots
in practice, half-open intervals (lower_limit <= X < upper_limit) are easier to work with
Recursion starts with any segment that does not have a lower neigbor
adjacent segments are glued to the right side, building longer chains
the final query suppresses partial results
Note: the code below does not deal with overlapping intervals.
-- Table
CREATE TABLE ecode
( employee varchar NOT NULL
, code INTEGER NOT NULL
, fromdate DATE NOT NULL
, uptodate DATE NOT NULL
);
SET datestyle = 'DMY' ;
-- Data
INSERT INTO ecode(employee, fromdate, uptodate, code) VALUES
('Employee','01/07/2016','31/07/2016', 4)
, ('Employee','01/06/2016','30/06/2016', 2)
, ('Employee','01/05/2016','31/05/2016', 2)
, ('Employee','01/04/2016','30/04/2016', 3)
, ('Employee','01/03/2016','31/03/2016', 3)
, ('Employee','01/02/2016','29/02/2016', 4)
, ('Employee','01/01/2016','31/01/2016', 4)
;
-- Convert to half-open interval
UPDATE ecode SET uptodate = uptodate + '1 day'::interval;
-- SELECT * FROM ecode;
WITH RECURSIVE zzz AS (
SELECT employee, code, fromdate, uptodate
FROM ecode e0
WHERE NOT EXISTS ( -- first one in series
SELECT * FROM ecode nx
WHERE nx.employee = e0.employee
AND nx.code = e0.code
AND nx.uptodate = e0.fromdate
)
UNION ALL -- append consecutive intervals
SELECT e1.employee, e1.code, zzz.fromdate, e1.uptodate
FROM ecode e1
JOIN zzz ON zzz.employee = e1.employee
AND zzz.code = e1.code
AND zzz.uptodate = e1.fromdate
)
SELECT * FROM zzz
-- suppress the partial results
WHERE NOT EXISTS (SELECT * FROM ecode nx
WHERE nx.employee = zzz.employee
AND nx.code = zzz.code
AND nx.fromdate = zzz.uptodate
)
ORDER BY employee, code, fromdate
;
Result:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
SET
INSERT 0 7
UPDATE 7
employee | code | fromdate | uptodate
----------+------+------------+------------
Employee | 2 | 2016-05-01 | 2016-07-01
Employee | 3 | 2016-03-01 | 2016-05-01
Employee | 4 | 2016-01-01 | 2016-03-01
Employee | 4 | 2016-07-01 | 2016-08-01
(4 rows)

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.
You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)
The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)
select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Repeating ID based on

I have a very simple requirement but I'm struggling to find a way around this.
I have a very simple query:
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
ORDER BY StartDate;
Made up of 2 tables
#tmpAvailability which consists of the following fields:
SupplierCode
StartDate
Available
vwRSBooking which consists of the following fields
BookingID
DepartDate
Code
Nights
StatusID
Departure and startdate can be joined to link the first day, and the servicecode and suppliercode can be joined to make sure that the availability is linked to the same supplier.
Which produces an output like this:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | NULL | NULL
TEST | 2018-01-08 | 0 | NULL | NULL
TEST | 2018-01-09 | 0 | NULL | NULL
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | NULL | 234567
TEST | 2018-01-14 | 0 | NULL | NULL
TEST | 2018-01-15 | 0 | NULL | NULL
What I need is when the BookingID in for 4 days that the bookingID and the nights are spread across those days, for example:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | 4 | 123456
TEST | 2018-01-08 | 0 | 4 | 123456
TEST | 2018-01-09 | 0 | 4 | 123456
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | 3 | 234567
TEST | 2018-01-14 | 0 | 3 | 234567
TEST | 2018-01-15 | 0 | 3 | 234567
TEST | 2018-01-16 | 1 | NULL | NULL
If anyone has any ideas on how to solve it would be most appreciated.
Andrew
You could replace your vwRSBooking with another view which uses a CTE to obtain all the dates the booking covers. Then use the view's coverdate for joining to the #tmpAvailability table:
CREATE VIEW vwRSBookingFull
AS
WITH cte ( bookingid, nights, depart, code, coverdate)
AS (SELECT bookingid,
nights,
depart,
code,
depart
FROM vwRSBooking
UNION ALL
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
DATEADD(d, 1, c.coverdate)
FROM cte c
WHERE DATEDIFF(d, c.depart, c.coverdate) < (c.nights - 1))
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
c.coverdate
FROM cte c
GO
You will need a calendar table with all the dates in the date range your dates may fall into. For this example, I build one for January 2018. We can then join onto this table to create the additional rows.
Here is the sample code I used. You can see it at SQL Fiddle.
CREATE TABLE code (
code varchar(max),
dates date,
available int,
nights int,
bookingid int
)
INSERT INTO code VALUES
('TEST','2018-01-04','1',NULL,NULL),
('TEST','2018-01-05','1',NULL,NULL),
('TEST','2018-01-06','0',4,123456),
('TEST','2018-01-07','0',NULL,NULL),
('TEST','2018-01-08','0',NULL,NULL),
('TEST','2018-01-09','0',NULL,NULL),
('TEST','2018-01-10','1',NULL,NULL),
('TEST','2018-01-11','1',NULL,NULL),
('TEST','2018-01-12','1',NULL,NULL),
('TEST','2018-01-13','0',3,234567),
('TEST','2018-01-14','0',NULL,NULL),
('TEST','2018-01-15','0',NULL,NULL)
CREATE TABLE dates (
dates date
)
INSERT INTO dates VALUES
('2018-01-01'),('2018-01-02'),('2018-01-03'),('2018-01-04'),('2018-01-05'),('2018-01-06'),('2018-01-07'),('2018-01-08'),('2018-01-09'),('2018-01-10'),('2018-01-11'),('2018-01-12'),('2018-01-13'),('2018-01-14'),('2018-01-15'),('2018-01-16'),('2018-01-17'),('2018-01-18'),('2018-01-19'),('2018-01-20'),('2018-01-21'),('2018-01-22'),('2018-01-23'),('2018-01-24'),('2018-01-25'),('2018-01-26'),('2018-01-27'),('2018-01-28'),('2018-01-29'),('2018-01-30'),('2018-01-31')
Here is the query based on this dataset:
SELECT
code.code,
dates.dates,
code.available,
code.nights,
code.bookingid
FROM code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
Edit: Here is an example using your initial query as a subquery to join your result set onto the dates table if you want a copy & paste. Still requires creating the dates table.
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM (
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
) code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
ORDER BY StartDate;

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

How to determine an Increase in Employee Salary from consecutive Contract Rows?

I got a problem in my query :
My table store data like this
ContractID | Staff_ID | EffectDate | End Date | Salary | active
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 0
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 1
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 1
I would like to write a query like below:
ContractID | Staff_ID | EffectDate | End Date | Salary | Increase
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 50
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 50
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 200
-------------------------------------------------------------------------
Increase column is calculated by current contract minus previous contract
I use sql server 2008 R2
Unfortunately 2008R2 doesn't have access to LAG, but you can simulate the effect of obtaining the previous row (prev) in the scope of a current row (cur), with a RANKing and a self join to the previous ranked row, in the same partition by Staff_ID):
With CTE AS
(
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],[active],
ROW_NUMBER() OVER (Partition BY Staff_ID ORDER BY ContractID) AS Rnk
FROM Table1
)
SELECT cur.[ContractID], cur.[Staff_ID], cur.[EffectDate], cur.[End Date],
cur.[Salary], cur.Rnk,
CASE WHEN (cur.Rnk = 1) THEN 0 -- i.e. baseline salary
ELSE cur.Salary - prev.Salary END AS Increase
FROM CTE cur
LEFT OUTER JOIN CTE prev
ON cur.[Staff_ID] = prev.Staff_ID and cur.Rnk - 1 = prev.Rnk;
(If ContractId is always perfectly incrementing, we wouldn't need the ROW_NUMBER and could join on incrementing ContractIds, I didn't want to make this assumption).
SqlFiddle here
Edit
If you have Sql 2012 and later, the LEAD and LAG Analytic Functions make this kind of query much simpler:
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],
Salary - LAG(Salary, 1, Salary) OVER (Partition BY Staff_ID ORDER BY ContractID) AS Incr
FROM Table1
Updated SqlFiddle
One trick here is that we are calculating delta increments in salary, so for the first employee contract we need to return the current salary so that Salary - Salary = 0 for the first increase.

Latest Records using Hive

Input Data
SNO | Name | Salary | HireDate
------------------------------------------
1 | A | 10 | 01-13-2014
2 | B | 20 | 11-15-2014
3 | C | 3 | 05-03-2015
4 | D | 4 | 07-03-2015
5 | E | 5 | 12-03-2015
6 | F | 60 | 25-03-2015
7 | G | 70 | 30-03-2015
Final Output Data
I want to get only current month data using hive query
SNO | Name | Salary | HireDate
----------------------------------------
3 | C | 3 | 05-03-2015
4 | D | 4 | 07-03-2015
5 | E | 5 | 12-03-2015
6 | F | 60 | 25-03-2015
7 | G | 70 | 30-03-2015
Do this in shell script:
curmon=`date +%m-%Y`
cusdate="01-$curmon";
$HIVE_HOME/bin/hive -e "select * from tablename where HireDate>$cusdate;"
curmon will store current month and year.
cusdate will store 1st day of this month.
Hive query will display all the results greater than 1st day of this month. (Change tablename and column as per your requirements)
Just use current_date and the date time functions in Hive. This is probably the easiest way:
select id.*
from inputdata id
where year(hiredate) = year(current_date()) and
month(hiredate) = month(current_date());
EDIT:
Having just tried this out, current_date() is not in at least one implementation of Hive 0.14, despite the documentation. So, you can try:
select id.*
from inputdata id
where year(hiredate) = year(from_unixtime(unix_timestamp())) and
month(hiredate) = month(from_unixtime(unix_timestamp()));