Can't quite get my head into this: I want all those rows in master table that do not have matching transaction rows in transaction table, for a given date range. Example:
master table m:
id Name
1 John
2 David
3 Simon
4 Jessica
5 Becky
transaction table t
id parent date action
1 1 2015-08-28 IN
2 1 2015-09-03 IN
3 2 2015-08-17 IN
4 2 2015-10-01 IN
5 4 2015-09-05 IN
I want all those entries in m that do not have any transactions in september: so I should get m.id 2,3,5
1 does not match: events in september
2 matches: events but none in september
3 matches: no events at all
4 does not match: events in september
5 matches: No events at all
I can get those with nothing in t, with left join, and those with dates but not in range, with join, but can't see how to get both conditions. I might just be having a duh day.
Often, when we try to jump directly to a final query, it can turn out to be much more complicated that is should -- if it works at all. Generally, it doesn't hurt to just perform a straight join on the tables in question and look at the results. If nothing else, you verify the join is correct:
with
Master( ID, Name )as(
select 1, 'John' from dual union all
select 2, 'David' from dual union all
select 3, 'Simon' from dual union all
select 4, 'Jessica' from dual union all
select 5, 'Becky' from dual
),
Trans( MasterID, EffDate, Action )as(
select 1, date '2015-08-28', 'IN' from dual union all
select 1, date '2015-09-03', 'IN' from dual union all
select 2, date '2015-08-17', 'IN' from dual union all
select 2, date '2015-10-01', 'IN' from dual union all
select 4, date '2015-09-05', 'IN' from dual
)
select *
from Master m
join Trans t
on t.MasterID = m.ID;
(Excuse me for renaming some of your fields.) I happen to have Oracle up at the moment, you can use whatever you have. Probably this code will work with any non-Oracle system by just removing the 'from dual' from the CTE code.
Now let's extend the join criteria, but let's do so to generate the data we don't want to see.
join Trans t
on t.MasterID = m.ID
and t.EffDate >= date '2015-09-01'
and t.EffDate < date '2015-10-01';
I've hard-coded the date values, but this is the best format to use to get "during the month of..." ranges. Every value is selected from the first tick of the first day of the month to the absolutely last tick before the first day of the next month. You'll want to store these values in variables or generate them on the fly, of course.
So now we see only the two transactions that occurred during September. The obvious next step is to perform an outer join so we get all the other Master records that don't match.
left join Trans t
Now we have all the records we want, plus the two that we don't want. As they are the only ones that match the date restrictions, we add filtering criteria to remove those. Here is the final query:
select m.*
from Master m
left join Trans t
on t.MasterID = m.ID
and t.EffDate >= date '2015-09-01'
and t.EffDate < date '2015-10-01'
where t.MasterID is null;
Simple really. Once you've done this a few times, you'll be able to jump to the finished query without the intervening steps. Still, it doesn't hurt to execute the intervening steps and look at the outputs along the way. Any flaws in the logic will be caught earlier when it can be fixed easier.
select all master, join with transaction grouped by parent (which results in 0..1 row per master entry).
if there is no transaction record, t.parent will be null which translates in no transaction for that master entry.
if there are transaction entries, you'll find the count in t.a, if some of them are in september, you'll find them in t.m9
if you want all master without any transaction, you'll filter where t.parent is null
if you want all master without a transaction in september, you'll filter where t.parent is null or t.m9=0
select m.id, m.name,
, t.a, t.m9
from master_table m
left join ( select a = count(*)
, m9 = count(case when datepart(Month, t.date) = 9 then 1 end)
, t.parent
from transaction_table t
group by t.parent
) t on t.parent = m.id
where t.parent is null or t.m9=0
Related
EDIT: added third requirement after playing with solution from Tim Biegeleisen
EDIT2: modified Robbie's DOB to be before his parent's marriage date
I am trying to create a query that will look at two tables and determine the difference in dates based on a percentage. I know, super confusing... Let me try and explain using the tables below:
Bob and Mary are married on 2010-01-01 and expect 4 kids (Parent table)
I want to know how many years it took until they met 50% of their expected kids (i.e. 2/4 kids). Using the Child table to see the DOB of their 4 kids, we know that Frankie is the second child which meets our 50% threshold so we use Frankie's DOB and subtract it from Frankie's parent's marriage date and end up with 3 years!
If the goal isn't reached then display no value e.g. Mick and Jo only had 1 child so far so they haven't yet reached their goal
Hoping this is doable using BigQuery standard SQL.
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
Below both soluthions for BigQuery Standard SQL
First one is more in classic sql way, the second one is more of BigQuery style (I think)
First Solution: with analytics function
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
Second Solution: with use of ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
Dummy Data
You can test / play with both solutions using below dummy data
#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)
Try the following query, whose logic is too verbose to explain it well. I join the parent and child tables, bringing into line the parent id, number of years elapsed since marriage, running number of children, and expected number of children. With this information in hand, we can easily find the first row whose running number of children matches or exceeds half of the expected number.
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
Note that there may be issues with how I computed the yearly differences, or how I compute the threshhold for half the number of expected children. Also, if we were using a recent enterprise database we could have used an analytic function to get the running number of children instead of a correlated subquery, but I was unsure if Big Query would support that, so I used the latter.
I have three tables:
Employee_leave(EmployeeID,Time_Period,leave_type)
Employee(EID,Department,Designation)
leave_eligibility(Department,Designation, LeaveType, LeavesBalance).
I want to fetch the number of leaves availed by a particular employee in each LeaveTypes(Category) so I wrote following query Query1
SELECT LEAVE_TYPE, SUM(TIME_PERIOD)
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID=78
GROUP BY LEAVE_TYPE
order by leave_type;
output for Query1
Leave_Type | SUM(Time_Period)
Casual 1
Paid 4
Sick 1
I want to fetch the number of leaves an employee is eligible for each leave_type(category). Following query Query2 gives the desire result.
Select UNIQUE Leavetype,LEAVEBALANCE
from LEAVE_ELIGIBILITY
INNER JOIN EMPLOYEE
ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT
AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID=78
order by leavetype;
output for Query2
LeaveType | LeaveBalance
Casual 10
Paid 15
Privlage 6
Sick 20
Now I want to join these 2 queries Query1 and Query2 or create view which displays records from both queries. Also as you can see from output there are different no. of records from different queries. For a record which is not present in output of query1, it should display 0 in final output. Like in present case there is no record in output of query1 like privlage but it should display 0 in Sum(time_period) in Privlage of final output. I tried creating views of these 2 queries and then joining them, but I'm unable to run final query.
Code for View 1
create or replace view combo_table1 as
Select UNIQUE Leavetype,LEAVEBALANCE,EMPLOYEE.DEPARTMENT,EMPLOYEE.DESIGNATION, EID
from LEAVE_ELIGIBILITY
INNER JOIN EMPLOYEE
ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT
AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID='78';
Code for View 2
create or replace view combo_table2 as
SELECT LEAVE_TYPE, SUM(TIME_PERIOD) AS Leave_Availed
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID='78'
GROUP BY LEAVE_TYPE;
Code for joining 2 views
SELECT combo_table1.Leavetype, combo_table1.LEAVEBALANCE, combo_table2.leave_availed
FROM combo_table1 v1
INNER JOIN combo_table2 v2
ON v1.Leavetype = v2.LEAVE_TYPE;
But I'm getting "%s: invalid identifier" while executing the above query. Also I know I can't use union as it requires same column which here it is not.
I'm using Oracle 11g, so please answer accordingly.
Thanks in advance.
Desired final output
LeaveType | LeaveBalance | Sum(Time_period)
Casual 10 1
Paid 15 4
Privlage 6 0
Sick 20 1
To get the final desired output ...
"For a record which is not present in output of query1, it should display 0 in final output. "
... use an outer join to tie the taken leave records to the other tables. This will give zero time_duration for leave types which the employee has not taken.
select emp.Employee_ID
, le.leavetype
, le.leavebalance
, sum (el.Time_Duration) as total_Time_Duration
from employee emp
inner join leave_eligibility le
on le.department= emp.department
and le.designation= emp.designation
left outer join Employee_leave el
on el.EmployeeID = emp.Employee_ID
and el.leave_type = le.leavetype
group by emp.Employee_ID
, le.leavetype
, le.leavebalance
;
Your immediate problem:
I'm getting "%s: invalid identifier"
Your view has references to a column EID although none of your posted tables have a column of that name. Likewise there is confusion between Time_Duration and time_period.
More generally, you will find life considerably easier if you use the exact same name for common columns (i.e. consistently use either employee_id or employeeid, don't chop and change).
Try this examle:
with t as (
select 'Casual' as Leave_Type, 1 as Time_Period, 0 as LeaveBalance from dual
union all
select 'Paid', 4,0 from dual
union all
select 'Sick', 1,0 from dual),
t1 as (
select 'Casual' as Leave_Type, 0 as Time_Period, 10 as LeaveBalance from dual
union all
select 'Paid', 0, 15 from dual
union all
select 'Privlage', 0, 6 from dual
union all
select 'Sick', 0, 20 from dual)
select Leave_Type, sum(Time_Period), sum(LeaveBalance)
from(
select *
from t
UNION ALL
select * from t1
)
group by Leave_Type
Ok, edit:
create or replace view combo_table1 as
Select UNIQUE Leavetype, 0 AS Leave_Availed, LEAVEBALANCE
from LEAVE_ELIGIBILITY INNER JOIN EMPLOYEE ON LEAVE_ELIGIBILITY.DEPARTMENT= EMPLOYEE.DEPARTMENT AND LEAVE_ELIGIBILITY.DESIGNATION= EMPLOYEE.DESIGNATION
WHERE EID='78';
create or replace view combo_table2 as
SELECT LEAVE_TYPE as Leavetype, SUM(TIME_PERIOD) AS Leave_Availed, 0 as LEAVEBALANCE
FROM EMPLOYEE_LEAVE
WHERE EMPLOYEEID='78'
GROUP BY LEAVE_TYPE, LEAVEBALANCE;
SELECT Leavetype, sum(LEAVEBALANCE), sum(leave_availed)
FROM (
select *
from combo_table1
UNION ALL
select * from combo_table2
)
group by Leavetype;
l want to get the gap between dates range via SQL query lets see the situation:
l have table employees like : Every month the employee deserve payment
ID Name From_date To_date Paid_Amount`
1 ali 01/01/2002 31/01/2002 300
2 ali 01/02/2002 28/02/2002 300
3 ali 01/04/2002 30/04/2002 300
4 ali 01/05/2002 31/05/2002 300
5 ali 01/07/2002 31/07/2002 300
Now, we notice there are no payments in March and June
so, how by SQL query I can't get these months ??
Try this,
with mine(ID,Name,From_date,To_date,Paid_Amount) as
(
select 1,'ali','01/01/2002','31/01/2002',300 from dual union all
select 2,'ali','01/02/2002','28/02/2002',300 from dual union all
select 3,'ali','01/04/2002','30/04/2002',300 from dual union all
select 4,'ali','01/05/2002','31/05/2002',300 from dual union all
select 5,'ali','01/07/2002','31/07/2002',300 from dual
),
gtfirst (fromdt,todt) as (
select min(to_Date(from_Date,'dd/mm/yyyy')) fromdt,max(to_Date(to_Date,'dd/mm/yyyy')) todt from mine
),
dualtbl(first,last,fromdt,todt) as
(
select * from(select TRUNC(ADD_MONTHS(fromdt, rownum-1), 'MM') AS first,TRUNC(LAST_DAY(ADD_MONTHS(fromdt, rownum-1))) AS last,fromdt,todt from gtfirst connect by level <=12)
where first between fromdt and todt and last between fromdt and todt
)
select to_char(first,'month') no_payment_date from dualtbl where first not in (select to_Date(from_Date,'dd/mm/yyyy') from mine)
and first not in (select to_Date(to_date,'dd/mm/yyyy') from mine)
If you want to get the date difference between one payment date and the previous payment date and the ID field is sequential, then you may simply join back to the table and select the previous row.
SELECT X.From_date, Y.From_date, Y.From_date - X.From_date Difference
FROM Employees X
LEFT OUTER JOIN Employees Y ON Y.ID = X.ID - 1
If the ID field is not sequential, then you can use a similar method, but build a temporary table with a row index that you can use to join back to the previous payment.
I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.
Suppose I have a table with 2 columns (status and date) like the following:
status: U T U U L
date: 12 14 15 16 17
Can I (using only 1 SQL statement) count the number of distinct values in the status? That is:
count(U)=3
count(T)=1
count(L)=2
count(P)=0
Can I do this with 1 SQL query?
Note: I have static values in status. I can only have (U-T-L-P)
You need to use Group By:
SELECT Status, Count(Status)
FROM table
GROUP BY Status
This will not return P = 0 if P is not populated in the table. In your application logic you will need to check and if a certain status is not returned, it means there are no entries (i.e. 0).
SQL cannot query records that are not there.
This will return a row for every status and the count in the second column:
SELECT Status, COUNT(*) Cnt
FROM Tbl
GROUP BY Status
So it would return
Status Cnt
U 3
T 1
L 1
for your example (in no defined order). Use ORDER BY if you want to sort the results.
You can do this with a query which groups on your status column, e.g.
SELECT COUNT(*) as StatusCount, Status
FROM MyTable
GROUP BY Status
To get the zero for the status P, you have to do some devious stuff using a table that lists all the possible statuses.
SELECT COUNT(A.Status), B.Status
FROM AnonymousTable AS A RIGHT OUTER JOIN
(SELECT 'P' AS Status FROM Dual
UNION
SELECT 'U' AS Status FROM Dual
UNION
SELECT 'L' AS Status FROM Dual
UNION
SELECT 'T' AS Status FROM Dual
) AS B ON A.Status = B.Status
GROUP BY B.Status;
The 4-way UNION is one way of generating a list of values; your DBMS may provide more compact alternatives. I'm assuming that the table Dual contains just one row (as found in Oracle).
The COUNT(A.Status) counts the number of non-null values in A.Status. The RIGHT OUTER JOIN lists the row from B with Status = 'P' and joins it with a single NULL for the A.Status, which the COUNT(A.Status) therefore counts as zero. If you used COUNT(*), you'd get a 1 for the count.