CASE-Statement in WHERE-Clause | SQL - sql

Hi I have following Table with the current month 'Month':
+---------------+
| current_Month |
+---------------+
| 12 |
+---------------+
And I have another Table with workers 'Workers'
+--------+--------------------------+
| Name | Month_joined_the_company |
+--------+--------------------------+
| Peter | 12 |
| Paul | 9 |
| Sarah | 5 |
| Donald | 12 |
+--------+--------------------------+
I now want, based on my Month table, Display all workers which joined the company untill the previous month the current month is 10 I would like to have this output
+--------+--------------------------+
| Name | Month_joined_the_company |
+--------+--------------------------+
| Paul | 9 |
| Sarah | 5 |
+--------+--------------------------+
But on the end of the year, I would like to include all workers even thos which month is equal with the current month
+--------+--------------------------+
| Name | Month_joined_the_company |
+--------+--------------------------+
| Peter | 12 |
| Paul | 9 |
| Sarah | 5 |
| Donald | 12 |
+--------+--------------------------+
I now have this Statement, but it does not work...
SELECT *
FROM workers
WHERE
CASE
WHEN (SELECT TOP (1) Current_Month FROM Month) = 12
THEN (Month_joined_the_company <= (SELECT TOP (1) Current_Month FROM Month))
ELSE (Month_joined_the_company < (SELECT TOP (1) Current_Month FROM Month))
END
But this does not work and I get an error. Can someone help me, how I can use CASE in a WHERE-Clause

Is this what you want?
select w.*
from workers w
inner join month m
on m.current_month = 12
or w.month_joined_the_company < m.current_month
This phrases as: if current_month = 12 then return all workers, else just return those whose month_joined_the_company is stricly smaller than current_month.
NB: you should probably consider use date datatypes to store these values, otherwise what happens when a new year begins?

Related

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Adding new rows into query from nonexistent data in the database table

I have the following sample table:
+----------+------+-------+
| DATE | NAME | HOURS |
+----------+------+-------+
| 2018-5-3 | JOHN | 8 |
+----------+------+-------+
| 2018-5-9 | JOHN | 5 |
+----------+------+-------+
How can I generate a query that fills new rows to the existent data, e.g, sample query result:
+-----------+------+-------+
| DATE | NAME | HOURS |
+-----------+------+-------+
| 2018-5-1 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-2 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-3 | JOHN | 8 |
+-----------+------+-------+
| 2018-5-4 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-5 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-6 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-7 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-8 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-9 | JOHN | 5 |
+-----------+------+-------+
| 2018-5-10 | JOHN | 0 |
+-----------+------+-------+
Check that I've added 0 into HOURS column because JOHN doesn't appear with hours in the specified date (only in 2018-5-3 and 2018-5-8). I am currently trying to get this result. This is only the begin of a big table I need to process, so I'll need to generate this fixed values per user. I was trying using left/right join with previously generated dates but it didn't work.
Can you advice me the best way to accomplish it? Thanks.
Use generate_series() and left join:
select g.dte, t.name, coalesce(t.hours, 0) as hours
from generate_series('2018-05-01'::date, '2018-05-10'::date, interval '1 day') g(dte) left join
t
on g.dte = t.date;
For multiple users, you need to generate all the rows for all the users and then left join:
select g.dte, n.name, coalesce(t.hours, 0) as hours
from generate_series('2018-05-01'::date, '2018-05-10'::date, interval '1 day'
) g(dte) cross join
(select distinct name from t) n left join
t
on g.dte = t.date and n.name = t.name;

Latest Records using Hive

Input Data
SNO | Name | Salary | HireDate
------------------------------------------
1 | A | 10 | 01-13-2014
2 | B | 20 | 11-15-2014
3 | C | 3 | 05-03-2015
4 | D | 4 | 07-03-2015
5 | E | 5 | 12-03-2015
6 | F | 60 | 25-03-2015
7 | G | 70 | 30-03-2015
Final Output Data
I want to get only current month data using hive query
SNO | Name | Salary | HireDate
----------------------------------------
3 | C | 3 | 05-03-2015
4 | D | 4 | 07-03-2015
5 | E | 5 | 12-03-2015
6 | F | 60 | 25-03-2015
7 | G | 70 | 30-03-2015
Do this in shell script:
curmon=`date +%m-%Y`
cusdate="01-$curmon";
$HIVE_HOME/bin/hive -e "select * from tablename where HireDate>$cusdate;"
curmon will store current month and year.
cusdate will store 1st day of this month.
Hive query will display all the results greater than 1st day of this month. (Change tablename and column as per your requirements)
Just use current_date and the date time functions in Hive. This is probably the easiest way:
select id.*
from inputdata id
where year(hiredate) = year(current_date()) and
month(hiredate) = month(current_date());
EDIT:
Having just tried this out, current_date() is not in at least one implementation of Hive 0.14, despite the documentation. So, you can try:
select id.*
from inputdata id
where year(hiredate) = year(from_unixtime(unix_timestamp())) and
month(hiredate) = month(from_unixtime(unix_timestamp()));

Select rows where one column is within a day of another column

I have two tables from a site similar to SO: one with posts, and one with up/down votes for each post. I would like to select all votes cast on the day that a post was modified.
My tables layout is as seen below:
Posts:
-----------------------------------------------
| post_id | post_author | modification_date |
-----------------------------------------------
| 0 | David | 2012-02-25 05:37:34 |
| 1 | David | 2012-02-20 10:13:24 |
| 2 | Matt | 2012-03-27 09:34:33 |
| 3 | Peter | 2012-04-11 19:56:17 |
| ... | ... | ... |
-----------------------------------------------
Votes (each vote is only counted at the end of the day for anonymity):
-------------------------------------------
| vote_id | post_id | vote_date |
-------------------------------------------
| 0 | 0 | 2012-01-13 00:00:00 |
| 1 | 0 | 2012-02-26 00:00:00 |
| 2 | 0 | 2012-02-26 00:00:00 |
| 3 | 0 | 2012-04-12 00:00:00 |
| 4 | 1 | 2012-02-21 00:00:00 |
| ... | ... | ... |
-------------------------------------------
What I want to achieve:
-----------------------------------
| post_id | post_author | vote_id |
-----------------------------------
| 0 | David | 1 |
| 0 | David | 2 |
| 1 | David | 4 |
| ... | ... | ... |
-----------------------------------
I have been able to write the following, but it selects all votes on the day before the post modification, not on the same day (so, in this example, an empty table):
SELECT Posts.post_id, Posts.post_author, Votes.vote_id
FROM Posts
LEFT JOIN Votes ON Posts.post_id = Votes.post_id
WHERE CAST(Posts.modification_date AS DATE) = Votes.vote_date;
How can I fix it so the WHERE clause takes the day before Votes.vote_date? Or, if not possible, is there another way?
Depending on which type of database you are using (SQL, Oracle ect..);To take the Previous days votes you can usually just subtract 1 from the date and it will subtract exactly 1 day:
Where Cast(Posts.modification_date - 1 as Date) = Votes.vote_date
or if modification_date is already in date format just:
Where Posts.modification_date - 1 = Votes.vote_date
If you have a site similar to Stack Overflow, then perhaps you also use SQL Server:
SELECT p.post_id, p.post_author, v.vote_id
FROM Posts p LEFT JOIN
Votes v
ON p.post_id = v.post_id
WHERE CAST(DATEDIFF(day, -1, p.modification_date) AS DATE) = v.vote_date;
Different databases have different ways of subtracting one day. If this doesn't work, then your database has something similar.
I found another solution, which is to add a day to Posts.modification_date:
...
WHERE CAST(CEILING(CAST(p.modification_date AS FLOAT)) AS datetime) = v.vote_date

SQL query to get the same set of results

This should be a simple one, but say I have a table with data like this:
| ID | Date | Value |
| 1 | 01/01/2013 | 40 |
| 2 | 03/01/2013 | 20 |
| 3 | 10/01/2013 | 30 |
| 4 | 14/02/2013 | 60 |
| 5 | 15/03/2013 | 10 |
| 6 | 27/03/2013 | 70 |
| 7 | 01/04/2013 | 60 |
| 8 | 01/06/2013 | 20 |
What I want is the sum of values per week of the year, showing ALL weeks.. (for use in an excel graph)
What my query gives me, is only the weeks that are actually in the database.
With SQL you cannot return rows that don't exist in some table. To get the effect you want you could create a table called WeeksInYear with only one field WeekNumber that is an Int. Populate the table with all the week numbers. Then JOIN that table to this one.
The query would then look something like the following:
SELECT w.WeekNumber, SUM(m.Value)
FROM MyTable as m
RIGHT OUTER JOIN WeeksInYear AS w
ON DATEPART(wk, m.date) = w.WeekNumber
GROUP BY w.WeekNumber
The missing weeks will not have any data in MyTable and show a 0.