I'm looking for a SQL query (or even better a LINQ query) to remove people who have cancelled their leave, i.e. remove all records with the same NAME and same START and END and the DAYS_TAKEN values differ only in the sign.
How to get from this
NAME |DAYS_TAKEN |START |END |UNIQUE_LEAVE_ID
--------|-----------|-----------|-----------|-----------
Alice | 2 | 1 June | 3 June | 1 --remove because cancelled
Alice | -2 | 1 June | 3 June | 2 --cancelled
Alice | 3 | 5 June | 8 June | 3 --keep
Bob | 10 | 4 June | 14 June | 4 --keep
Charles | 12 | 2 June | 14 June | 5 --remove because cancelled
Charles | -12 | 2 June | 14 June | 6 --cancelled
David | 5 | 3 June | 8 June | 7 --keep
To this?
NAME |DAYS_TAKEN |START |END |UNIQUE_LEAVE_ID
--------|-----------|-----------|-----------|-----------
Alice | 3 | 5 June | 8 June | 3 --keep
Bob | 10 | 4 June | 14 June | 4 --keep
David | 5 | 3 June | 8 June | 7 --keep
What I've tried
Query1 to find all the cancelled records (not sure if this is correct)
SELECT L1.UNIQUE_LEAVE_ID
FROM LEAVE L1
INNER JOIN LEAVE L2 ON L2.DAYS_TAKEN > 0 AND ABS(L1.DAYS_TAKEN) = L2.DAYS_TAKEN AND L1.NAME= L2.NAME AND L1.START = L2.START AND L1.END = L2.END
WHERE L1.DAYS_TAKEN < 0
Then I use Query1 twice in an inner select like so
SELECT L.* FROM LEAVE L WHERE
L.UNIQUE_LEAVE_ID NOT IN (Query1)
AND L.UNIQUE_LEAVE_ID NOT IN (Query1)
Is there a way to use the inner query only once?
(It's an Oracle database, being called from .NET/C#)
You can use a query like the following:
SELECT NAME, START, END
FROM LEAVE
GROUP BY NAME, START, END
HAVING SUM(DAYS_TAKEN) = 0
in order to get NAME, START, END groups that have been cancelled (assuming DAYS_TAKEN of the cancellation record negates the days of the initial record).
Output:
NAME |START |END
--------|-----------|----------
Alice | 1 June | 3 June
Charles | 2 June | 14 June
Using the above query as a derived table you can get records not being related to 'cancelled' groups:
SELECT L1.NAME, L1.DAYS_TAKEN, L1.START, L1.END, L1.UNIQUE_LEAVE_ID
FROM LEAVE L1
LEFT JOIN (
SELECT NAME, START, END
FROM LEAVE
GROUP BY NAME, START, END
HAVING SUM(DAYS_TAKEN) = 0
) L2 ON L1.NAME = L2.NAME AND L1.START = L2.START AND L1.END = L2.END
WHERE L2.NAME IS NULL
Output:
NAME |DAYS_TAKEN |START |END |UNIQUE_LEAVE_ID
--------|-----------|-----------|-----------|-----------
Alice | 3 | 5 June | 8 June | 3
Bob | 10 | 4 June | 14 June | 4
David | 5 | 3 June | 8 June | 7
You can use not exists:
select l.*
from leave l
where not exists (select 1
from leave l2
where l2.name = l.name and l2.start = l.start and
l2.end = l.name and l2.days_taken = - l.days_taken
);
This query can take advantage of an index on leave(name, start, end, days_taken).
Here is a variation with SUM() OVER:
SELECT x.*
FROM (SELECT l.*, SUM (days_taken) OVER (PARTITION BY name, "START", "END", ABS (days_taken) ORDER BY NULL) s
FROM leave l) x
WHERE s <> 0
And if you have Oracle 12, this give you the canceled:
SELECT l.*
FROM leave l,
LATERAL (SELECT days_taken
FROM leave l2
WHERE l2.name = l.name
AND l2."START" = l."START"
AND l2."END" = l."END"
AND l2.days_taken = -l.days_taken) x
and this what should remain:
SELECT l.*
FROM leave l
OUTER APPLY (SELECT days_taken
FROM leave l2
WHERE l2.name = l.name
AND l2."START" = l."START"
AND l2."END" = l."END"
AND l2.days_taken = -l.days_taken) x
WHERE x.days_taken IS NULL
And something about the column names.Using reserved word in Oracle SQL is not recommended, but if you must do it, use '"' like here.
I used Giorgos answer to come up with this Linq solution. This solution also considers people who cancel / apply their leave multiple times. See Alice and Edgar below.
Sample data
int id = 0;
List<Leave> allLeave = new List<Leave>()
{
new Leave() { UniqueLeaveID=id++, Name="Alice", Start=new DateTime(2016,6,1), End=new DateTime(2016,6,3), Taken=-2 },
new Leave() { UniqueLeaveID=id++,Name="Alice", Start=new DateTime(2016,6,1), End=new DateTime(2016,6,3), Taken=2 },
new Leave() { UniqueLeaveID=id++, Name="Alice", Start=new DateTime(2016,6,1), End=new DateTime(2016,6,3), Taken=2 },
new Leave() { UniqueLeaveID=id++,Name="Alice", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,5), Taken=3 },
new Leave() { UniqueLeaveID=id++,Name="Bob", Start=new DateTime(2016,6,4), End=new DateTime(2016,6,14), Taken=10 },
new Leave() { UniqueLeaveID=id++,Name="Charles", Start=new DateTime(2016,6,2), End=new DateTime(2016,6,14), Taken=12 },
new Leave() { UniqueLeaveID=id++,Name="Charles", Start=new DateTime(2016,6,2), End=new DateTime(2016,6,14), Taken=-12 },
new Leave() { UniqueLeaveID=id++,Name="David", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,8), Taken=5 },
new Leave() { UniqueLeaveID=id++,Name="Edgar", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,8), Taken=5 },
new Leave() { UniqueLeaveID=id++,Name="Edgar", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,8), Taken=5 },
new Leave() { UniqueLeaveID=id++,Name="Edgar", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,8), Taken=5 },
new Leave() { UniqueLeaveID=id++,Name="Edgar", Start=new DateTime(2016,6,3), End=new DateTime(2016,6,8), Taken=5 }
};
Linq Query (watch out for Oracle version 11 vs 12)
var filteredLeave = allLeave
.GroupBy(a => new { a.Name, a.Start, a.End })
.Select(a => new { Group = a.OrderByDescending(b=>b.Taken), Count = a.Count() })
.Where(a => a.Count % 2 != 0)
.Select(a => a.Group.First());
"OrderByDescending" ensures only positive days taken are returned.
Oracle SQL
SELECT
*
FROM
(
SELECT
L1.NAME, L1.START, L1.END, MAX(TAKEN) AS TAKEN, COUNT(*) AS CNT
FROM LEAVE L1
GROUP BY L1.NAME, L1.START, L1.END
) L2
WHERE MOD(L2.CNT,2)<>0 -- replace MOD with % for Microsoft SQL
The condition "WHERE MOD(L2.CNT,2)<>0" (or in Linq "a.Count % 2 != 0") only returns people who applied once or odd number of times (e.g. apply - cancel - apply). But people who apply - cancel - apply - cancel are filtered out.
Related
Hi i am trying to do a stored procedure in postgresql,
and I have to fill a table (vol_raleos), from 3 others, these are the tables:
super
zona | sitio | manejo
1 | 1 | 1
2 | 2 | 2
datos_vol_raleos
zona | sitio | manejo |vol_prodn
1 | 1 | 10 | 0
2 | 2 | 15 | 0
datos_manejos
manejoVR | manejoSuper
10 | 1
15 | 2
table to fill
vol_raleos
zona | sitio | manejo |vol_prodn
1 | 1 | 1 | 0
2 | 2 | 2 | 0
So, what I do is take the data that is in datos_vol_raleos, verify that it is in super, but first I must convert the manejoVR value according to the table datos_manejos
INSERT INTO vol_raleos
(zona, sitio, manejo, edad, densidad, vol_prod1, vol_prod2, ..., vol_prod36)
select zona, sitio, manejo, edad, densidad, vol_prod1, vol_prod2, ..., vol_prod36
from (
select volr.*, sup.zona, sup.sitio, sup.manejo, dm.manejo,
from datos_vol_raleos volr
left join super sup on (sup.zona = volr.zona and sup.sitio = volr.sitio and sup.manejo = volr.manejo) selrs
order by zona, sitio, manejo, edad, densidad
) sel_min_max;
so here I don't know how to get the manejoSuper value from datos_manejos, to later compare
You can insert from a select with a couple of joins. For example:
insert into vol_raleos
select s.zona, s.sitio, s.manejo, m.manejoSuper
from super s
join datos_vol_raleos d on (d.zona, d.sitio) = (s.zona, s.sitio)
join datos_manejos m on m.manejoVR = d.manejo
Assuming my data has the following structure :
Year | Location | New_client
2018 | Paris | true
2018 | Paris | true
2018 | Paris | false
2018 | London | true
2018 | Madrid | true
2018 | Madrid | false
2017 | Paris | true
I'm trying to calculate for each year and location the percentage of true value for New_client, so an example taking the records from the structure example would be
2018 | Paris | 66
2018 | London | 100
2018 | Madrid | 50
2017 | Paris | 100
Adapting from https://stackoverflow.com/a/13484279/2802552 my current script is but the difference is that instead of 1 column it's using 2 columns (Year and Location)
data = load...
grp = group inpt by Year; -- creates bags for each value in col1 (Year)
result = FOREACH grp {
total = COUNT(data);
t = FILTER data BY New_client == 'true'; --create a bag which contains only T values
GENERATE FLATTEN(group) AS Year, total AS TOTAL_ROWS_IN_INPUT_TABLE, 100*(double)COUNT(t)/(double)total AS PERCENTAGE_TRUE_IN_INPUT_TABLE;
};
The problem is this uses Year as reference while I need it to be Year AND District.
Thanks for your help.
You need to group by both Year and Location, which will require two modifications. First, add Location to the group by statement. Second, change FLATTEN(group) AS Year to FLATTEN(group) AS (Year, Location) since group is now a tuple with two fields.
grp = group inpt by (Year, Location);
result = FOREACH grp {
total = COUNT(inpt);
t = FILTER inpt BY New_client == 'true';
GENERATE
FLATTEN(group) AS (Year, Location),
total AS TOTAL_ROWS_IN_INPUT_TABLE,
100*(double)COUNT(t)/(double)total AS PERCENTAGE_TRUE_IN_INPUT_TABLE;
};
Tested this code and looks working for me:
A = LOAD ...
B = GROUP A BY (year, location);
C = FOREACH B {
TRUE_CNT = FILTER A BY (chararray)new_client == 'true';
GENERATE group.year, group.location, (int)((float)COUNT(TRUE_CNT) / COUNT(A) * 100);
}
DUMP C;
(2017,Paris,100)
(2018,Paris,66)
(2018,London,100)
(2018,Madrid,50)
I got those models (simplified) :
User(id: Int, name: String)
Restaurant(id: Int, ownerId: Int, name: String)
Employee(userId: Int, restaurantId: Int)
when I use this query :
for {
r <- Restaurants
e <- Employees
if r.ownerId === userId || (e.userId === userId && e.restaurantId === r.id)
} yield r
which is converted to :
select x2."id", x2."owner_id", x2."name" from "restaurants" x2, "employees" x3 where (x2."owner_id" = 2) or ((x3."user_id" = 2) and (x3."restaurant_id" = x2."id"))
So far no problems. But when I insert those data :
User(1, "Foo")
User(2, "Fuu")
Restaurant(1, 2, "Fuu")
Restaurant(2, 1, "Foo")
Restaurant(3, 1, "Bar")
Employee(2, 2)
Employee(2, 3)
then try to query, I get this result :
List(Restaurant(1, 2, "Fuu"), Restaurant(1, 2, "Fuu"), Restaurant(2, 1, "Foo"), Restaurant(3, 1, "Bar))
I do not understand why Restaurant(1, 2, "Fuu") is present 2 times.
(I am using org.h2.Driver with url jdbc:h2:mem:play)
Am I missing something ?
Why you are getting 4 rows back
Cross joins are hard; what you are asking for with your SQL query is:
-- A Cartesian product of all of the rows in restaurants and employees
Employee.user_id | Employee.restaurant_id | Restaurant.name | Restaurant.owner_id
2 | 2 | Fuu | 2
2 | 3 | Fuu | 2
2 | 2 | Foo | 1
2 | 3 | Foo | 1
2 | 2 | Bar | 1
2 | 3 | Bar | 1
-- Filtering out those where the owner != 2
Employee.user_id | Employee.restaurant_id | Restaurant.name | Restaurant.owner_id
2 | 2 | Fuu | 2
2 | 3 | Fuu | 2
-- And combining that set with the set of those where the employee's user_id = 2
-- and the restaurant's ID is equal to the employee's restaurant ID
Employee.user_id | Employee.restaurant_id | Restaurant.name | Restaurant.owner_id
2 | 2 | Foo | 1
2 | 2 | Bar | 1
How to fix it
Make it an explicit left-join instead:
for {
(r, e) <- Restaurants leftJoin Employees on (_.id = _.restaurantId)
if r.ownerId === userId || e.userId === userId
} yield r
Alternately, use exists to make it even clearer:
for {
r <- Restaurants
if r.ownerId === userId ||
Employees.filter(e => e.userId === userId && e.restaurantId === r.id).exists
} yield r
I'm using Entity Framework.
There is table A with columns id, someInt, someDateTime.
For example:
id | someInt | someDateTime
1 | 2 | 2014-03-11
2 | 2 | 2013-01-01
3 | 2 | 2013-01-02
4 | 1 | 2014-03-05
5 | 1 | 2014-03-06
Now I want to take some statistics: per each someInt value I want to know how many rows are not more than i.e. 24h old, 1 week old, 1 month old. From values above I would get:
someInt | 24h | 1 week | 1 month | any time
1 | 0 | 2 | 2 | 2
2 | 1 | 1 | 1 | 3
Is it possible in SQL and if so is it possible in Entity Framework? How should I make queries like this?
Group records by someInt and then make sub-queries to get number of rows for each period of time you are interested in:
DateTime now = DateTime.Now;
DateTime yesterday = now.AddDays(-1);
DateTime weekAgo = now.AddDays(-7);
DateTime monthAgo = now.AddDays(-30); // just assume you need 30 days
DateTime yearAgo = new DateTime(now.Year - 1, now.Month, now.Day);
var query = from a in db.A
group a by a.someInt into g
select new {
someInt = g.Key,
lastDay = g.Count(a => a.someDateTime >= yesterday),
lastWeek = g.Count(a => a.someDateTime >= weekAgo),
lastMonth = g.Count(a => a.someDateTime >= monthAgo),
lastYear = g.Count(a => a.someDateTime >= yearAgo),
anyTime = g.Count()
};
I have a database with some tables and I want to retrieve from a user the last 8 tee of user that I follow:
This is my table:
Table users:
- id
- name
- surname
- created
- modified
2 | Caesar | Surname1
3 | Albert | Surname2
4 | Paul | Surname3
5 | Nicol | Surname4
Table tee
- id
- name
- user_id
1 | first | 3
2 | second | 3
3 | third | 4
4 | fourth | 4
5 | fifth | 5
6 | sixth | 5
7 | seventh | 5
table user_follow
- id
- user_follower_id //the user that decide to follo someone
- user_followed_id //the user that I decide to follow
1 | 2 | 3
2 | 2 | 5
I expect to retrieve this tee with its creator because my id is 2 (I'm Caesar for example):
1 | first | 3
2 | second | 3
5 | fifth | 5
6 | sixth | 5
7 | seventh | 5
For example if I user that I follow have created 4 tee another that I follow 1, another 2, I think that I can retrieve all this tee if are the last inserted in all sites because are created from user that I follow.
But I retrieve only one tee of an user
This is my query:
SELECT *, `tee`.`id` as id, `tee`.`created` as created, `users`.`id` as user_id, `users`.`created` as user_created
FROM (`tee`)
LEFT JOIN `users`
ON `users`.`id` = `tee`.`user_id`
LEFT JOIN `user_follow` ON `tee`.`user_id` = `user_follow`.`user_followed_id`
WHERE `tee`.`id` != '41' AND
`tee`.`id` != '11' AND
`tee`.`id` != '13' AND
`tee`.`id` != '20' AND
`tee`.`id` != '14' AND
`tee`.`id` != '35' AND
`tee`.`id` != '31' AND
`tee`.`id` != '36' AND
`user_follow`.`user_follower_id` = '2'
ORDER BY `tee`.`created` desc LIMIT 8
I have added 8 tee id that I don't want to retrieve because are "special".
Why this query doesn't work?
I think the problem is in left join or I have to make other thing to retreve this results.
Thanks
I don't see anything wrong with your query -- I have updated the syntax to use INNER JOINs and to use NOT IN though:
SELECT *,
`tee`.`id` as id, `tee`.`created` as created, `users`.`id` as user_id, `users`.`created` as user_created
FROM `tee`
JOIN `users` ON `users`.`id` = `tee`.`user_id`
JOIN `user_follow` ON `tee`.`user_id` = `user_follow`.`user_followed_id`
WHERE `tee`.`id` NOT IN (41,11,13,20,14,35,31,36)
AND `user_follow`.`user_follower_id` = '2'
ORDER BY `tee`.`created` desc
LIMIT 8
Condensed SQL Fiddle Demo
you can use Not In ('41','11',...) instead of tee.id != '41' AND tee.id != '11'.....
Use a UNION clause. First select your tees then select the tees from people you follow.