How to use comaprison operators in where clause with select statement? - hive

I am currently working on a query on HIVE and using SQL Workbench. I want to get data from 2 tables on a monthly basis. For referencing the dates I am using another table that has two columns: start_Date and end_date. start_Date contains month and starting date, i.e. 01/01/2018. Similarly end_date contains month and end date, i.e. 31/01/2018.
The query goes something like:
select *
from table1 a join
table2 b
on a.pkey = b.pkey
where effective_date >= (select start_Date
from date_table
where year(start_date) = year(current_date) and month(start_date) = month(current_date)
);
but obviously it is not working.
Could someone please give me a correct solution for this problem?
Let me know if there are any doubts.

I think that something like this can help you:
SELECT *
FROM table1 a
JOIN table2 b
ON a.pkey = b.pkey
JOIN (SELECT start_Date
FROM date_table
WHERE year(start_date) = year(current_date)
AND month(start_date) = month(current_date)
) dt
ON a.effective_date >= dt.start_date ;
I haven't tested the code since you haven't published sample data, but I hope it helps.

Related

Sampling issue with query in BigQuery (Standard SQL)

I have been running a query of the format below
SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-10-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-10-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1
Let's say the above is result1
SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-07-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-07-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1
The above is result2
Here 2018-07-19 is the launch date.
Since I have the data till 2018-10-19, I want to run the query from the later date to optimize the cost and the data consumption by the query....but some how, I am getting incorrect data.
But, if I run the same query from the launch date, I am getting the correct results.
I mean the NewUsers from result1 for the corresponding dates (like date >= 2018-10-19) are more than the NewUsers from result2.
No sure, where I am missing something.
Any help would be greatly appreciated.
Thanks
I think - it is because of use of 'MIN(date)' - You see shift in counts because you restricted dates so those users who were first seen in earlier dates - now those same "old" users are counted for recent days - thus the confusion

Select records that don't exist in a union between a table and a subset of that table

I have a table with appointments, past, present and future. I would like to be able to run a single query that would give me a list of appointments from a given date, with a status of "no show" that DO NOT have an appointment in the table with a date in the future.
So, what I have so far is (pseduocodey)
SELECT *
FROM (SELECT *
FROM Appointments
WHERE Appointments.Date >= Today's Date)
WHERE NOT EXISTS
(SELECT *
FROM Appointments
WHERE Appointments.PatID = SUBQUERYRESULTS.PatID)
The subquery would be
SELECT *
FROM Appointments
WHERE (Appointments.Status = "NoShow" AND (Appointment.Date is >= Start_date and <= End_date))
I'm not sure how to include the subquery to get it to work. I'm new to this, so please excuse the idiocy. Thank you.
You seem to want not exists as a where condition. Based on your description, this seems to be:
select a.*
from appointments a
where a.status = 'no show' and
a.date = #date and
not exists (select 1
from appointments a2
where a2.patid = a.patid and a2.date > current_ate
);
If the date column has a time component, then the date comparison needs to take this into account.
appointments ... with a status of "no show" that DO NOT have an appointment in the table with a date in the future
This seems to work (tested with Access 2010), and includes "Start_date" and "End_date" comparisons to limit the 'NoShow' appointments to a date range:
SELECT a1.*
FROM Appointments a1
WHERE a1.Status='NoShow'
AND a1.Date >= Start_date AND a1.Date <= End_date
AND NOT EXISTS
(
SELECT *
FROM Appointments a2
WHERE a2.PatID = a1.PatID
AND a2.Date > a1.Date
)
Here's another option (albeit untested) which uses a LEFT JOIN in place of the subquery:
SELECT t.*
FROM
Appointments t LEFT JOIN Appointments u
ON t.PatID = u.PatID AND t.Date < u.Date
WHERE
t.Status = "NoShow" AND
t.Date >= Start_date AND
t.Date <= End_date AND
u.PatID IS NULL
The line u.PatID IS NULL essentially performs the selection of those records with no future appointment.

adding a column which calculates days until next event

For a given Id, I have a series of START_DATE. Along with displaying other columns, I want to add a new column which finds the difference between the START_DATE for an Id(person), and his next START_DATE.
Basically, want to find the interval between his present START_DATE and his next START_DATE, along with displaying other columns.
For example, the data looks as follows
I tried doing this as follows :
SELECT t.Id,t.START_DATE,(select top 1 s.START_DATE from dbo.MyTable t INNER JOIN dbo.MyTable s ON(t.Id = s.Id and t.START_DATE > s.START_DATE) GROUP BY t.patient_Id,t.START_DATE)
I think this is what you need
SELECT t.Id,t.START_DATE,
DATEDIFF(day,(select min(s.START_DATE) from dbo.MyTable s
where t.Id = s.Id and t.START_DATE < s.START_DATE),
t.start_date) as DifferenceInDays
FROM dbo.MyTable t
This will give you (in days, you can change it if you want) the difference between each date and the next one
If your version of SQL Server supports it then you can do this with the LEAD windowed function like this:
SELECT
id,
start_date,
DATEDIFF(DAY, start_date, LEAD(start_date, 1) OVER (PARTITION BY id ORDER BY start_date)) AS days_until_next_start_date
FROM
dbo.My_Table

get date range between dates

I have following table tbl in database and I have dynamic joining date 1-1-2012 and I want this date is between (Fall and spring) or (spring and summer) or (summer and fall).I want query in which i passed only joining date which return semestertime and joining date in Oracle.
Semestertime joiningDate
Fall 10-13-2011
Spring 2-1-2012
Summer 6-11-2012
Fall 10-1-2015
If I understand your question correctly:
SELECT *
FROM your_table
WHERE joiningDate between to_date (your_lower_limit_date_here, 'mm-dd-yyyy')
AND to_date (your_upper_limit_date_here, 'mm-dd-yyyy`);
What about something like that:
select 'BEFORE' term,
t."Semestertime", to_char(t."joiningDate", 'MM-DD-YYYY')
from (
select tbl.*, rownum rn from tbl where tbl."joiningDate" < to_date('1-1-2012','MM-DD-YYYY')
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- your reference date
order by tbl."joiningDate" desc) t
where rn = 1
union all
select 'AFTER' term,
t."Semestertime", to_char(t."joiningDate", 'MM-DD-YYYY')
from (
select tbl.*, rownum rn from tbl where tbl."joiningDate" > to_date('1-1-2012','MM-DD-YYYY')
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- your reference date
order by tbl."joiningDate" asc) t
where rn = 1
This will return the "term" before and after a given date. You will probably have to adapt such query to your specific needs. But that might be a good starting point.
For example, given your business rules, you might consider using <= instead of <. You you might require to have the result displayer a column instead of rows. Bu all of this shouldn't be too had to change.
As an alternate solution using CTE and sub-queries:
with testdata as (select to_date('1-1-2012','MM-DD-YYYY') refdate from dual)
select v.what, tbl.* from tbl join
(
select 'BEFORE' what, max(t1."joiningDate") d
from tbl t1
where t1."joiningDate" < to_date('1-1-2012','MM-DD-YYYY')
union all
select 'AFTER' what, min(t1."joiningDate") d
from tbl t1
where t1."joiningDate" > to_date('1-1-2012','MM-DD-YYYY')
) v
on tbl."joiningDate" = v.d
See http://sqlfiddle.com/#!4/c7fa5/15 for a live demo comparing those solutions.

How to count records for each day in a range (including days without records)

I'm trying to refine this question a little since I didn't really ask correctly last time. I am essentially doing this query:
Select count(orders)
From Orders_Table
Where Order_Open_Date<=To_Date('##/##/####','MM/DD/YYYY')
and Order_Close_Date>=To_Date('##/##/####','MM/DD/YYYY')
Where ##/##/#### is the same day. In essence this query is designed to find the number of 'open' orders on any given day. The only problem is I'm wanting to do this for each day of a year or more. I think if I knew how to define the ##/##/#### as a variable and then grouped the count by that variable then I could get this to work but I'm not sure how to do that-or there may be another way as well. I am currently using Oracle SQL on SQL developer. Thanks for any input.
You could use a "row generator" technique like this (edited for Hogan's comments):
Select RG.Day,
count(orders)
From Orders_Table,
(SELECT trunc(SYSDATE) - ROWNUM as Day
FROM (SELECT 1 dummy FROM dual)
CONNECT BY LEVEL <= 365
) RG
Where RG.Day <=To_Date('##/##/####','MM/DD/YYYY')
and RG.Day >=To_Date('##/##/####','MM/DD/YYYY')
and Order_Open_Date(+) <= RG.Day
and Order_Close_Date(+) >= RG.Day - 1
Group by RG.Day
Order by RG.Day
This should list each day of the previous year with the corresponding number of orders
Lets say you had a table datelist with a column adate
aDate
1/1/2012
1/2/2012
1/3/2012
Now you join that to your table
Select *
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
This gives you a list of all the orders you care about, now you group by and count
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want to pass in a parameters then just generate the dates with a recursive cte
with datelist as
(
select #startdate as adate
UNION ALL
select adate + 1
from datelist
where (adate + 1) <= #lastdate
)
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
NOTE: I don't have an Oracle DB to test on so I might have some syntax wrong for this platform, but you get the idea.
NOTE2: If you want all dates listed with 0 for those that have nothing use this as your select statement:
Select aDate, count(Order_Open_Date)
From Orders_Table
left join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want only one day you can query using TRUNC like this
select count(orders)
From orders_table
where trunc(order_open_date) = to_date('14/05/2012','dd/mm/yyyy')