HiveQL Difference between rows, columns based in Dates - hive

I have a table (t_stocks) with data like this:
exchanged,stock_symbol,closing_date,closing_price
NSE,TCS,2009-08-09,2200.1
NSE,TCS,2009-08-10,2300.1
NSE,TCS,2009-08-11,12200.1
NSE,TCS,2009-08-12,22300.1
NSE,TCS,2009-09-09,2200.1
NSE,TCS,2009-09-10,2300.1
NSE,TCS,2009-09-11,12200.1
NSE,TCS,2009-09-12,22300.1
NSE,INFY,2009-08-09,2500.34
NSE,INFY,2009-08-10,1500.34
NSE,INFY,2009-08-09,7500.34
NSE,INFY,2009-08-10,14500.34
NSE,INFY,2009-09-09,2500.34
NSE,INFY,2009-09-10,1500.34
NSE,INFY,2009-09-09,7500.34
NSE,INFY,2009-09-10,14500.34
NSE,TCS,2010-08-09,2200.1
NSE,TCS,2010-08-10,2300.1
NSE,TCS,2010-08-11,12200.1
NSE,TCS,2010-08-12,22300.1
NSE,TCS,2010-09-09,2200.1
NSE,TCS,2010-09-10,2300.1
NSE,TCS,2010-09-11,12200.1
NSE,TCS,2010-09-12,22300.1
NSE,INFY,2010-08-09,2500.34
NSE,INFY,2010-08-10,1500.34
NSE,INFY,2010-08-09,7500.34
NSE,INFY,2010-08-10,14500.34
NSE,INFY,2010-09-09,2500.34
NSE,INFY,2010-09-10,1500.34
NSE,INFY,2010-09-09,7500.34
NSE,INFY,2010-09-10,14500.34
...
...
I would need to write a query which generate a report as follow.
exchanged, stock_symbol , closing_date , closing_price , yesterday_closing , diff_yesterday_price
(Price difference between Yesterday prices and today prices) with an output like this:
+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| exchanged | stock_symbol | closing_date | closing_price | yesterday_closing | diff_yesterday_price |
+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| NSE | INFY | 2009-08-09 | 2500.34 | NULL | NULL |
| NSE | INFY | 2009-08-09 | 7500.34 | 2500.34 | -5000 |
| NSE | INFY | 2009-08-10 | 14500.34 | 7500.34 | -7000 |
| NSE | INFY | 2009-08-10 | 1500.34 | 14500.34 | 13000 |
| NSE | INFY | 2009-09-09 | 7500.34 | 1500.34 | -6000 |
| NSE | INFY | 2009-09-09 | 2500.34 | 7500.34 | 5000 |
| NSE | INFY | 2009-09-10 | 14500.34 | 1500.34 | -13000 |
| NSE | INFY | 2009-09-10 | 1500.34 | 2500.34 | 1000 |
| NSE | INFY | 2010-08-09 | 7500.34 | 14500.34 | 7000 |
| NSE | INFY | 2010-08-09 | 2500.34 | 7500.34 | 5000 |
.....
.....
May anyone give me some clues to do this in an efficient way.
Thanks in advance,
Regards.

You can use hive window function lag() to solve this. You can read more about window functions in hive here.
Here is the working DEMO in PostgreSQL, but same query works in HIVE as well.
select
exchanged,
stock_symbol,
closing_date,
closing_price,
yesterday_price,
(yesterday_price - closing_price) as diff_yesterday_price
from
(
select
*,
lag(closing_price) over (partition by stock_symbol order by closing_date) as yesterday_price
from stockExchange
) la
order by
stock_symbol,
closing_date

Related

Oracle SQL get last year of dates excluding weekends

I'd expect this to work to get me a list of calendar dates over the past 12 months excluding weekends; but it just gives me the entire list of dates - which I suppose is fine - but want to know why the below is incorrect.
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')
Because you're doing this:
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')
And today isn't Saturday or Sunday. You need to look at the calculated CalendarDate value; but you can't do that in the same level of subquery. You could try to recalculate it:
AND to_char(ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum,'DY') NOT IN ('SAT','SUN')
but this will return no rows - at least when run at the moment. As it happens, March 1st 2020 was a Sunday, so that is excluded; and because of when and how rownum is generated, that result is excluded, and the next one sees the same value, which is excluded, and so on.
You can use an inline view to avoid both issues:
SELECT CalendarDate
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
CALENDARDATE
02-MAR-20
03-MAR-20
04-MAR-20
05-MAR-20
06-MAR-20
09-MAR-20
10-MAR-20
...
db<>fiddle
I've chucked in a language modifier to stop it behaving differently for users with sessions not set to English.
Querying against all_objects isn't ideal though, it would be better to use a hierarcical query:
SELECT *
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + level AS CalendarDate
FROM dual
CONNECT BY level <= TRUNC(SYSDATE) - ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) + 1
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle
or a recursive CTE, if you're 11gR2+:
WITH rcte (CalendarDate) AS (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12)
FROM dual
UNION ALL
SELECT rcte.CalendarDate + interval '1' day
FROM rcte
WHERE rcte.CalendarDate < TRUNC(SYSDATE)
)
SELECT CalendarDate
FROM rcte
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle (as 18c to avoid a couple of issues with the patch level in the 11g version it uses).
You checking whether today is sunday or monday with to_char(sysdate,'DY'). you need to check CalendarDate which is not available in your window. You can use cte to calculate the calendar then you can remove weekends with your condition as below.
with cte (CalendarDate) as
(
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
select * from cte where
to_char(CalendarDate,'DY') not in ('SAT','SUN');
| CALENDARDATE |
| :----------- |
| 02-MAR-20 |
| 03-MAR-20 |
| 04-MAR-20 |
| 05-MAR-20 |
| 06-MAR-20 |
| 09-MAR-20 |
| 10-MAR-20 |
| 11-MAR-20 |
| 12-MAR-20 |
| 13-MAR-20 |
| 16-MAR-20 |
| 17-MAR-20 |
| 18-MAR-20 |
| 19-MAR-20 |
| 20-MAR-20 |
| 23-MAR-20 |
| 24-MAR-20 |
| 25-MAR-20 |
| 26-MAR-20 |
| 27-MAR-20 |
| 30-MAR-20 |
| 31-MAR-20 |
| 01-APR-20 |
| 02-APR-20 |
| 03-APR-20 |
| 06-APR-20 |
| 07-APR-20 |
| 08-APR-20 |
| 09-APR-20 |
| 10-APR-20 |
| 13-APR-20 |
| 14-APR-20 |
| 15-APR-20 |
| 16-APR-20 |
| 17-APR-20 |
| 20-APR-20 |
| 21-APR-20 |
| 22-APR-20 |
| 23-APR-20 |
| 24-APR-20 |
| 27-APR-20 |
| 28-APR-20 |
| 29-APR-20 |
| 30-APR-20 |
| 01-MAY-20 |
| 04-MAY-20 |
| 05-MAY-20 |
| 06-MAY-20 |
| 07-MAY-20 |
| 08-MAY-20 |
| 11-MAY-20 |
| 12-MAY-20 |
| 13-MAY-20 |
| 14-MAY-20 |
| 15-MAY-20 |
| 18-MAY-20 |
| 19-MAY-20 |
| 20-MAY-20 |
| 21-MAY-20 |
| 22-MAY-20 |
| 25-MAY-20 |
| 26-MAY-20 |
| 27-MAY-20 |
| 28-MAY-20 |
| 29-MAY-20 |
| 01-JUN-20 |
| 02-JUN-20 |
| 03-JUN-20 |
| 04-JUN-20 |
| 05-JUN-20 |
| 08-JUN-20 |
| 09-JUN-20 |
| 10-JUN-20 |
| 11-JUN-20 |
| 12-JUN-20 |
| 15-JUN-20 |
| 16-JUN-20 |
| 17-JUN-20 |
| 18-JUN-20 |
| 19-JUN-20 |
| 22-JUN-20 |
| 23-JUN-20 |
| 24-JUN-20 |
| 25-JUN-20 |
| 26-JUN-20 |
| 29-JUN-20 |
| 30-JUN-20 |
| 01-JUL-20 |
| 02-JUL-20 |
| 03-JUL-20 |
| 06-JUL-20 |
| 07-JUL-20 |
| 08-JUL-20 |
| 09-JUL-20 |
| 10-JUL-20 |
| 13-JUL-20 |
| 14-JUL-20 |
| 15-JUL-20 |
| 16-JUL-20 |
| 17-JUL-20 |
| 20-JUL-20 |
| 21-JUL-20 |
| 22-JUL-20 |
| 23-JUL-20 |
| 24-JUL-20 |
| 27-JUL-20 |
| 28-JUL-20 |
| 29-JUL-20 |
| 30-JUL-20 |
| 31-JUL-20 |
| 03-AUG-20 |
| 04-AUG-20 |
| 05-AUG-20 |
| 06-AUG-20 |
| 07-AUG-20 |
| 10-AUG-20 |
| 11-AUG-20 |
| 12-AUG-20 |
| 13-AUG-20 |
| 14-AUG-20 |
| 17-AUG-20 |
| 18-AUG-20 |
| 19-AUG-20 |
| 20-AUG-20 |
| 21-AUG-20 |
| 24-AUG-20 |
| 25-AUG-20 |
| 26-AUG-20 |
| 27-AUG-20 |
| 28-AUG-20 |
| 31-AUG-20 |
| 01-SEP-20 |
| 02-SEP-20 |
| 03-SEP-20 |
| 04-SEP-20 |
| 07-SEP-20 |
| 08-SEP-20 |
| 09-SEP-20 |
| 10-SEP-20 |
| 11-SEP-20 |
| 14-SEP-20 |
| 15-SEP-20 |
| 16-SEP-20 |
| 17-SEP-20 |
| 18-SEP-20 |
| 21-SEP-20 |
| 22-SEP-20 |
| 23-SEP-20 |
| 24-SEP-20 |
| 25-SEP-20 |
| 28-SEP-20 |
| 29-SEP-20 |
| 30-SEP-20 |
| 01-OCT-20 |
| 02-OCT-20 |
| 05-OCT-20 |
| 06-OCT-20 |
| 07-OCT-20 |
| 08-OCT-20 |
| 09-OCT-20 |
| 12-OCT-20 |
| 13-OCT-20 |
| 14-OCT-20 |
| 15-OCT-20 |
| 16-OCT-20 |
| 19-OCT-20 |
| 20-OCT-20 |
| 21-OCT-20 |
| 22-OCT-20 |
| 23-OCT-20 |
| 26-OCT-20 |
| 27-OCT-20 |
| 28-OCT-20 |
| 29-OCT-20 |
| 30-OCT-20 |
| 02-NOV-20 |
| 03-NOV-20 |
| 04-NOV-20 |
| 05-NOV-20 |
| 06-NOV-20 |
| 09-NOV-20 |
| 10-NOV-20 |
| 11-NOV-20 |
| 12-NOV-20 |
| 13-NOV-20 |
| 16-NOV-20 |
| 17-NOV-20 |
| 18-NOV-20 |
| 19-NOV-20 |
| 20-NOV-20 |
| 23-NOV-20 |
| 24-NOV-20 |
| 25-NOV-20 |
| 26-NOV-20 |
| 27-NOV-20 |
| 30-NOV-20 |
| 01-DEC-20 |
| 02-DEC-20 |
| 03-DEC-20 |
| 04-DEC-20 |
| 07-DEC-20 |
| 08-DEC-20 |
| 09-DEC-20 |
| 10-DEC-20 |
| 11-DEC-20 |
| 14-DEC-20 |
| 15-DEC-20 |
| 16-DEC-20 |
| 17-DEC-20 |
| 18-DEC-20 |
| 21-DEC-20 |
| 22-DEC-20 |
| 23-DEC-20 |
| 24-DEC-20 |
| 25-DEC-20 |
| 28-DEC-20 |
| 29-DEC-20 |
| 30-DEC-20 |
| 31-DEC-20 |
| 01-JAN-21 |
| 04-JAN-21 |
| 05-JAN-21 |
| 06-JAN-21 |
| 07-JAN-21 |
| 08-JAN-21 |
| 11-JAN-21 |
| 12-JAN-21 |
| 13-JAN-21 |
| 14-JAN-21 |
| 15-JAN-21 |
| 18-JAN-21 |
| 19-JAN-21 |
| 20-JAN-21 |
| 21-JAN-21 |
| 22-JAN-21 |
| 25-JAN-21 |
| 26-JAN-21 |
| 27-JAN-21 |
| 28-JAN-21 |
| 29-JAN-21 |
| 01-FEB-21 |
| 02-FEB-21 |
| 03-FEB-21 |
| 04-FEB-21 |
| 05-FEB-21 |
| 08-FEB-21 |
| 09-FEB-21 |
| 10-FEB-21 |
| 11-FEB-21 |
| 12-FEB-21 |
| 15-FEB-21 |
| 16-FEB-21 |
| 17-FEB-21 |
| 18-FEB-21 |
| 19-FEB-21 |
| 22-FEB-21 |
| 23-FEB-21 |
| 24-FEB-21 |
| 25-FEB-21 |
| 26-FEB-21 |
| 01-MAR-21 |
| 02-MAR-21 |
| 03-MAR-21 |
| 04-MAR-21 |
| 05-MAR-21 |
| 08-MAR-21 |
| 09-MAR-21 |
db<>fiddle here

PostgreSQL: Count number of rows in table 1 for distinct rows in table 2

I am working with really big data that at the moment I become confused, looking like I'm just repeating one thing.
I want to count the number of trips per user from two tables, trips and session.
psql=> SELECT * FROM trips limit 10;
trip_id | session_ids | daily_user_id | seconds_start | seconds_end
---------+-----------------+---------------+---------------+-------------
400543 | {172079} | 17118 | 1575550944 | 1575551181
400542 | {172078} | 17118 | 1575541533 | 1575542171
400540 | {172077} | 17118 | 1575539001 | 1575539340
400538 | {172076} | 17117 | 1575540499 | 1575541999
400534 | {172074,172075} | 17117 | 1575537161 | 1575539711
400530 | {172073} | 17116 | 1575447043 | 1575447682
400529 | {172071} | 17115 | 1575496394 | 1575497803
400527 | {172070} | 17113 | 1575495241 | 1575496034
400525 | {172068} | 17115 | 1575485658 | 1575489378
400524 | {172067} | 17113 | 1575488721 | 1575490491
(10 rows)
psql=> SELECT * FROM session limit 10;
session_id | user_id | key | start_time | daily_user_id
------------+---------+--------------------------+------------+---------------
172079 | 43 | hLB8S7aSfp4gAFp7TykwYQ==+| 1575550921 | 17118
| | | |
172078 | 43 | YATMrL/AQ7Nu5q2dQTMT1A==+| 1575541530 | 17118
| | | |
172077 | 43 | fOLX4tqvsyFOP3DCyBZf1A==+| 1575538997 | 17118
| | | |
172076 | 7 | 88hwGj4Mqa58juy0PG/R4A==+| 1575540515 | 17117
| | | |
172075 | 7 | 1O+8X49+YbtmoEa9BlY5OQ==+| 1575538384 | 17117
| | | |
172074 | 7 | XOR7hsFCNk+soM75ZhDJyA==+| 1575537405 | 17117
| | | |
172073 | 42 | rAQWwYgqg3UMTpsBYSpIpA==+| 1575447109 | 17116
| | | |
172072 | 276 | 0xOsxRRN3Sq20VsXWjlrzQ==+| 1575511120 | 17114
| | | |
172071 | 7 | P4beN3W/ZrD+TCpZGYh23g==+| 1575496642 | 17115
| | | |
172070 | 43 | OFi30Zv9e5gmLZS5Vb+I7Q==+| 1575495238 | 17113
| | | |
(10 rows)
Goal: get the distribution of trips per user
Attempt:
psql=> SELECT COUNT(distinct trip_id) as trips
, count(distinct user_id) as users
, extract(year from to_timestamp(seconds_start)) as year_date
, extract(month from to_timestamp(seconds_start)) as month_date
FROM trips
INNER JOIN session
ON session_id = ANY(session_ids)
GROUP BY year_date, month_date
ORDER BY year_date, month_date;
+-------+-------+-----------+------------+
| trips | users | year_date | month_date |
+-------+-------+-----------+------------+
| 371 | 44 | 2016 | 3 |
| 12207 | 185 | 2016 | 4 |
| 3859 | 88 | 2016 | 5 |
| 1547 | 28 | 2016 | 6 |
| 831 | 17 | 2016 | 7 |
| 427 | 4 | 2016 | 8 |
| 512 | 13 | 2016 | 9 |
| 431 | 11 | 2016 | 10 |
| 1011 | 26 | 2016 | 11 |
| 791 | 15 | 2016 | 12 |
| 217 | 8 | 2017 | 1 |
| 490 | 17 | 2017 | 2 |
| 851 | 18 | 2017 | 3 |
| 1890 | 66 | 2017 | 4 |
| 2143 | 43 | 2017 | 5 |
| . | | | |
| . | | | |
| . | | | |
+-------+-------+-----------+------------+
This resultset count number of users and trips, my intention is actually to get an analysis of trips per user, like so:
+------+-------------+
| user | no_of_trips |
+------+-------------+
| 1 | 489 |
| 2 | 400 |
| 3 | 12 |
| 4 | 102 |
| . | |
| . | |
| . | |
+------+-------------+
How do I do this, please?
You seem to just want aggregation by user_id:
SELECT s.user_id, COUNT(distinct t.trip_id) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;
I'm pretty sure that the COUNT(DISTINCT) is unnecessary, so I would advise removing it:
SELECT s.user_id, COUNT(*) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;

SQL subcategory total is not properly placed at the end of every parent category

I am having trouble in SQl query,The query result should be like this
+------------+------------+-----+------+-------+--+--+--+
| District | Tehsil | yes | no | Total | | | |
+------------+------------+-----+------+-------+--+--+--+
| ABBOTTABAD | ABBOTTABAD | 377 | 5927 | 6304 | | | |
| ABBOTTABAD | HAVELIAN | 112 | 2276 | 2388 | | | |
| ABBOTTABAD | Overall | 489 | 8203 | 8692 | | | |
| CHARSADDA | CHARSADDA | 289 | 3762 | 4051 | | | |
| CHARSADDA | SHABQADAR | 121 | 1376 | 1497 | | | |
| CHARSADDA | TANGI | 94 | 1703 | 1797 | | | |
| CHARSADDA | Overall | 504 | 6841 | 7345 | | | |
+------------+------------+-----+------+-------+--+--+--+
The overall total should be should be shown at the end of every parent category but now it is showing like this
+------------+------------+-----+------+-------+--+--+--+
| District | Tehsil | yes | no | Total | | | |
+------------+------------+-----+------+-------+--+--+--+
| ABBOTTABAD | ABBOTTABAD | 377 | 5927 | 6304 | | | |
| ABBOTTABAD | HAVELIAN | 112 | 2276 | 2388 | | | |
| ABBOTTABAD | Overall | 489 | 8203 | 8692 | | | |
| CHARSADDA | CHARSADDA | 289 | 3762 | 4051 | | | |
| CHARSADDA | Overall | 504 | 6841 | 7345 | | | |
| CHARSADDA | SHABQADAR | 121 | 1376 | 1497 | | | |
| CHARSADDA | TANGI | 94 | 1703 | 1797 | | | |
+------------+------------+-----+------+-------+--+--+--+
My query is sorting second column with respect to first column although order by query is applied on my first column. This is my query
select District as 'District', tName as 'tehsil',[1] as 'yes',[0] as 'no',ISNULL([1]+[0], 0) as "Total" from
(
select d.Name as 'District',
case when grouping (t.Name)=1 then 'Overall' else t.Name end as tName,
BoundaryWallAvailable,
count(*) as total from School s
INNER JOIN SchoolIndicator i ON (i.refSchoolID=s.SchoolID)
INNER JOIN Tehsil t ON (t.TehsilID=s.refTehsilID)
INNER JOIN district d ON (d.DistrictID=t.refDistrictID)
group by
GROUPING sets((d.Name, BoundaryWallAvailable), (d.Name,t.Name, BoundaryWallAvailable))
) B
PIVOT
(
max(total) for BoundaryWallAvailable in ([1],[0])
) as Pvt
order by District
P.S: BoundaryWall is one column through pivoting i am breaking it into Yes and No Column

How to conditionally count rows from another table WITHOUT USING A CORRELATED SUBQUERY?

I have a dataset for which I have to conditionally count rows from table B that are between two dates in table A. I have to do this without the use of a correlated subquery in the SELECT clause, as this is not supported in Netezza - docs: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.0.3/com.ibm.nz.dbu.doc/c_dbuser_correlated_subqueries_ntz_sql.html.
Background on tables: Users can log in to a site (logins). When they log in, they can take actions, which are in (actions_taken). The desired output is a count of rows that are between the actions_taken action_date and lag_action_date.
Data and attempt found here: http://rextester.com/NLDH13254
Table: actions_taken (with added calculations - see RexTester.)
| user_id | action_type | action_date | lag_action_date | elapsed_days |
|---------|---------------|-------------|-----------------|--------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL |
Table: logins
| user_id | login_date |
|---------|------------|
| 12345 | 6/27/2017 |
| 12345 | 6/26/2017 |
| 12345 | 3/7/2017 |
| 12345 | 3/6/2017 |
| 12345 | 3/3/2017 |
| 12345 | 3/2/2017 |
| 12345 | 3/1/2017 |
| 12345 | 2/28/2017 |
| 12345 | 2/27/2017 |
| 12345 | 2/25/2017 |
| 12345 | 3/25/2016 |
| 12345 | 3/23/2016 |
| 12345 | 3/20/2016 |
| 99887 | 6/27/2017 |
| 99887 | 6/26/2017 |
| 99887 | 6/24/2017 |
| 99887 | 4/2/2017 |
| 99887 | 4/1/2017 |
| 99887 | 3/30/2017 |
| 99887 | 3/8/2017 |
| 99887 | 3/6/2017 |
| 99887 | 3/3/2017 |
| 99887 | 3/2/2017 |
| 99887 | 2/28/2017 |
| 99887 | 2/11/2017 |
| 99887 | 1/28/2017 |
| 99887 | 1/26/2017 |
| 99887 | 5/28/2016 |
DESIRED OUTPUT: cnt_logins_between_action_dates field
| user_id | action_type | action_date | lag_action_date | elapsed_days | cnt_logins_between_action_dates |
|---------|---------------|-------------|-----------------|--------------|---------------------------------|
| 12345 | action_type_1 | 6/27/2017 | 3/3/2017 | 116 | 5 |
| 12345 | action_type_1 | 3/3/2017 | 2/28/2017 | 3 | 4 |
| 12345 | action_type_1 | 2/28/2017 | NULL | NULL | 1 |
| 12345 | action_type_2 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_2 | 3/3/2017 | 3/25/2016 | 343 | 7 |
| 12345 | action_type_2 | 3/25/2016 | NULL | NULL | 1 |
| 12345 | action_type_4 | 3/6/2017 | 3/3/2017 | 3 | 2 |
| 12345 | action_type_4 | 3/3/2017 | NULL | NULL | 1 |
| 99887 | action_type_1 | 4/1/2017 | 2/11/2017 | 49 | 8 |
| 99887 | action_type_1 | 2/11/2017 | 1/28/2017 | 14 | 2 |
| 99887 | action_type_1 | 1/28/2017 | NULL | NULL | 1 |
You don't need a correlated sub-query. Get the previous date using lag and join the logins table to count the actions between dates.
with prev_dates as (select at.*
,coalesce(lag(action_date) over(partition by user_id,action_type order by action_date)
,action_date) as lag_action_date
from actions_taken at
)
select at.user_id,at.action_type,at.action_date,at.lag_action_date
,at.action_date-at.lag_action_date as elapsed_days
,count(*) as cnt
from prev_dates at
join login l on l.user_id=at.user_id and l.login_date<=at.action_date and l.login_date>=at.lag_action_date
group by at.user_id,at.action_type,at.action_date,at.lag_action_date
order by 1,2,3

How to check dates condition from one table to another in SQL

Which way we can use to check and compare the dates from one table to another.
Table : inc
+--------+---------+-----------+-----------+-------------+
| inc_id | cust_id | item_id | serv_time | inc_date |
+--------+---------+-----------+-----------+-------------+
| 1 | john | HP | 40 | 17-Apr-2015 |
| 2 | John | HP | 60 | 10-Jan-2016 |
| 3 | Nick | Cisco | 120 | 11-Jan-2016 |
| 4 | samanta | EMC | 180 | 12-Jan-2016 |
| 5 | Kerlee | Oracle | 40 | 13-Jan-2016 |
| 6 | Amir | Microsoft | 300 | 14-Jan-2016 |
| 7 | John | HP | 120 | 15-Jan-2016 |
| 8 | samanta | EMC | 20 | 16-Jan-2016 |
| 9 | Kerlee | Oracle | 10 | 2-Feb-2017 |
+--------+---------+-----------+-----------+-------------+
Table: Contract:
+-----------+---------+----------+------------+
| item_id | con_id | Start | End |
+-----------+---------+----------+------------+
| Dell | DE2015 | 1/1/2015 | 12/31/2015 |
| HP | HP2015 | 1/1/2015 | 12/31/2015 |
| Cisco | CIS2016 | 1/1/2016 | 12/31/2016 |
| EMC | EMC2016 | 1/1/2016 | 12/31/2016 |
| HP | HP2016 | 1/1/2016 | 12/31/2016 |
| Oracle | OR2016 | 1/1/2016 | 12/31/2016 |
| Microsoft | MS2016 | 1/1/2016 | 12/31/2016 |
| Microsoft | MS2017 | 1/1/2017 | 12/31/2017 |
+-----------+---------+----------+------------+
Result:
+-------+---------+---------+--------------+
| Calls | Cust_id | Con_id | Tot_Ser_Time |
+-------+---------+---------+--------------+
| 2 | John | HP2016 | 180 |
| 2 | samanta | EMC2016 | 200 |
| 1 | Nick | CIS2016 | 120 |
| 1 | Amir | MS2016 | 300 |
| 1 | Oracle | OR2016 | 40 |
+-------+---------+---------+--------------+
MY Query:
select count(inc_id) as Calls, inc.cust_id, contract.con_id,
sum(inc.serv_time) as tot_serv_time
from inc inner join contract on inc.item_id = contract.item_id
where inc.inc_date between '2016-01-01' and '2016-12-31'
group by inc.cust_id, contract.con_id
The result from inc table with filter between 1-jan-2016 to 31-Dec-2016 with
count of inc_id based on the items and its contract start and end dates .
If I understand correctly your problem, this query will return the desidered result:
select
count(*) as Calls,
inc.cust_id,
contract.con_id,
sum(inc.serv_time) as tot_serv_time
from
inc inner join contract
on inc.item_id = contract.item_id
and inc.inc_date between contract.start and contract.end
where
inc.inc_date between '2016-01-01' and '2016-12-31'
group by
inc.cust_id,
contract.con_id
the question is a little vague so you might need some adjustments to this query.
select
Calls = count(*)
, Cust = i.Cust_id
, Contract = c.con_id
, Serv_Time = sum(Serv_Time)
from inc as i
inner join contract as c
on i.item_id = c.item_id
and i.inc_date >= c.[start]
and i.inc_date <= c.[end]
where c.[start]>='20160101'
group by i.Cust_id, c.con_id
order by i.Cust_Id, c.con_id
returns:
+-------+---------+----------+-----------+
| Calls | Cust | Contract | Serv_Time |
+-------+---------+----------+-----------+
| 1 | Amir | MS2016 | 300 |
| 2 | John | HP2016 | 180 |
| 1 | Kerlee | OR2016 | 40 |
| 1 | Nick | CIS2016 | 120 |
| 2 | samanta | EMC2016 | 200 |
+-------+---------+----------+-----------+
test setup: http://rextester.com/WSYDL43321
create table inc(
inc_id int
, cust_id varchar(16)
, item_id varchar(16)
, serv_time int
, inc_date date
);
insert into inc values
(1,'john','HP', 40 ,'17-Apr-2015')
,(2,'John','HP', 60 ,'10-Jan-2016')
,(3,'Nick','Cisco', 120 ,'11-Jan-2016')
,(4,'samanta','EMC', 180 ,'12-Jan-2016')
,(5,'Kerlee','Oracle', 40 ,'13-Jan-2016')
,(6,'Amir','Microsoft', 300 ,'14-Jan-2016')
,(7,'John','HP', 120 ,'15-Jan-2016')
,(8,'samanta','EMC', 20 ,'16-Jan-2016')
,(9,'Kerlee','Oracle', 10 ,'02-Feb-2017');
create table contract (
item_id varchar(16)
, con_id varchar(16)
, [Start] date
, [End] date
);
insert into contract values
('Dell','DE2015','20150101','20151231')
,('HP','HP2015','20150101','20151231')
,('Cisco','CIS2016','20160101','20161231')
,('EMC','EMC2016','20160101','20161231')
,('HP','HP2016','20160101','20161231')
,('Oracle','OR2016','20160101','20161231')
,('Microsoft','MS2016','20160101','20161231')
,('Microsoft','MS2017','20170101','20171231');