Count by week between dates - sql

I'm trying to show a count by week but I am unsure of how to find the week that isn't showing between effdate and expdat. How do show the week and count shown below? Thanks.

You could use a recursive query to enumerate the weeks, then join it with the table
with cte as (
select min(effweek) week, max(expweek) max_week from mytable
union all
select week + 1, max_week from cte where week < max_week
)
select c.week, count(t.id_num) cnt
from cte c
left join mytable t on c.week between t.effweek and t.expweek
group by c.week
order by c.week
(Simplified) demo on DB Fiddle:
week | cnt
---: | --:
12 | 2
13 | 1
14 | 1

Related

SQL query group by using day startdatetime and end datetime

I have the following table Jobs:
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-10-20 23:00:00 | 2020-10-21 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Note job id 1 spans October 20 and 21.
I am using the following query
SELECT DAY(StartDateTime), COUNT(id)
FROM Job
GROUP BY DAY(StartDateTime)
To get the following output. But the problem I am facing is that day 21 is not including job id 1. Since the job spans two days I want to include it in both days 20 and 21.
Day | TotalJobs
----+----------
20 | 1
21 | 1
I am struggling to get the following expected output:
Day | TotalJobs
----+----------
20 | 1
21 | 2
One method is to generate the days that you want and then count overlaps:
with days as (
select convert(date, min(j.startdatetime)) as startd,
convert(date, max(j.enddatetime)) as endd
from jobs j
union all
select dateadd(day, 1, startd), endd
from days
where startd < endd
)
select days.startd, count(j.id)
from days left join
jobs j
on j.startdatetime < dateadd(day, 1, startd) and
j.enddatetime >= startd
group by days.startd;
Here is a db<>fiddle.
You can first group by with same start and end date and then group by for start and end date having different start and end date
SELECT a.date, SUM(counts) from (
SELECT DAY(StartDateTime) as date, COUNT(id) counts
FROM Table1
WHERE DAY(StartDateTime) = DAY(EndDateTime)
GROUP BY StartDateTime
UNION ALL
SELECT DAY(EndDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY EndDateTime
UNION ALL
SELECT DAY(StartDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY StartDateTime) a
GROUP BY a.date
Here is SQL Fiddle link
SQL Fiddle
Also replace Table1 with Jobs when running over your db context

Fill the data on the missing date range

I have a table will the data with exist data below:
Select Date, [Closing Balance] from StockClosing
Date | Closing Quantity
---------------------------
20200828 | 5
20200901 | 10
20200902 | 8
20200904 | 15
20200905 | 18
There are some missing date on the table, example 20200829 to 20200831 and 20200903.
Those closing quantity of the missing date will be follow as per previous day closing quantity.
I would like select the table result in a full range of date (show everyday) with the closing quantity. Expected result,
Date | Closing Quantity
---------------------------
20200828 | 5
20200829 | 5
20200830 | 5
20200831 | 5
20200901 | 10
20200902 | 8
20200903 | 8
20200904 | 15
20200905 | 18
Beside using cursor/for loop to insert the missing date and data 1 by 1, is that any SQL command can do it at once?
You have option to use recursive CTE.
For reference Click Here
;with cte as(
select max(date) date from YourTable
),cte1 as (
select min(date) date from YourTable
union all
select dateadd(day,1,cte1.date) date from cte1 where date<(select date from cte)
)select c.date,isnull(y.[Closing Quantity],
(select top 1 a.[Closing Quantity] from YourTable a where c.date>a.date order by a.date desc) )
as [Closing Quantity]
from cte1 c left join YourTable y on c.date=y.date
The easiest way to do this is to use LAST_VALUE along with the IGNORE NULLS option. Sadly, SQL Server does not support this. There is a workaround using analytic functions, but I would actually offer this simple option, which uses a correlated subquery to fill in the missing values:
WITH dates AS (
SELECT '20200828' AS Date UNION ALL
SELECT '20200829' UNION ALL
SELECT '20200830' UNION ALL
SELECT '20200831' UNION ALL
SELECT '20200901' UNION ALL
SELECT '20200902' UNION ALL
SELECT '20200903' UNION ALL
SELECT '20200904' UNION ALL
SELECT '20200905'
)
SELECT
d.Date,
(SELECT TOP 1 t2.closing FROM StockClosing t2
WHERE t2.Date <= d.Date AND t2.closing IS NOT NULL
ORDER BY t2.Date DESC) AS closing
FROM dates d
LEFT JOIN StockClosing t1
ON d.Date = t1.Date;
Demo

Get detail days between two date (mysql query)

I have data like this:
id | start_date | end_date
----------------------------
1 | 16-09-2019 | 22-12-2019
I want to get the following results:
id | month | year | days
------------------------
1 | 09 | 2019 | 15
1 | 10 | 2019 | 31
1 | 11 | 2019 | 30
1 | 12 | 2019 | 22
Is there a way to get that result ?
This is what you want to do:
SELECT id, EXTRACT(MONTH FROM start_date ) as month , EXTRACT(YEAR FROM start_date ) as year , DATEDIFF(end_date, start_date ) as days
From tbl
You can use MONTH() , YEAR() and DATEDIFF() functions
SELECT id, MONTH(start_date) as month, YEAR(start_date ) as year, DATEDIFF(end_date, start_date ) as days from table-name
One way is to create a Calendar table and use that.
select month,year, count(*)
from Calendar
where db_date between '2019-09-16'
and '2019-12-22'
group by month,year
CHECK DEMO HERE
Also you can use recursive CTE to achieve the same.
You can use a recursive CTE and aggregation:
with recursive cte as (
select id, start_date, end_date
from t
union all
select id, start_date + interval 1 day, end_date
from cte
where start_date < end_date
)
select id, year(start_date), month(start_date), count(*) as days
from cte
group by id, year(start_date), month(start_date);
Here is a db<>fiddle.

Cumulative distinct count with Spark SQL

Using Spark 1.6.2.
Here the data:
day | visitorID
-------------
1 | A
1 | B
2 | A
2 | C
3 | A
4 | A
I want to count how many distinct visitors by day + cumul with the day before (I dont know the exact term for that, sorry).
This should give:
day | visitors
--------------
1 | 2 (A+B)
2 | 3 (A+B+C)
3 | 3
4 | 3
Tried self-join but really too slow
I am sure windowed function is what I am looking for but didnt manage to find it :/
You should be able to do:
select day, max(visitors) as visitors
from (select day,
count(distinct visitorId) over (order by day) as visitors
from t
) d
group by day;
Actually, I think a better approach is to record a visitor only on the first day s/he appears:
select startday, sum(count(*)) over (order by startday) as visitors
from (select visitorId, min(day) as startday
from t
group by visitorId
) t
group by startday
order by startday;
In SQL, you could do this.
select t1.day,sum(max(t.cnt)) over(order by t1.day) as visitors
from tbl t1
left join (select minday,count(*) as cnt
from (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
group by minday
) t
on t1.day=t.minday
group by t1.day
Get the first day a visitorID appears using min.
Count the rows per such minday found above.
Left join this to your original table and get the cumulative sum.
Another approach would be
select t1.day,sum(count(t.visitorid)) over(order by t1.day) as cnt
from tbl t1
left join (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
on t1.day=t.minday and t.visitorid=t1.visitorid
group by t1.day
Try it's
select
day,
count(*),
(
select count(*) from your_table b
where a.day >= b.day
) cumulative
from your_table as a
group by a.day
order by 1

Querying for an ID that has the most number of reads

Suppose I have a table like the one below:
+----+-----------+
| ID | TIME |
+----+-----------+
| 1 | 12-MAR-15 |
| 2 | 23-APR-14 |
| 2 | 01-DEC-14 |
| 1 | 01-DEC-15 |
| 3 | 05-NOV-15 |
+----+-----------+
What I want to do is for each year ( the year is defined as DATE), list the ID that has the highest count in that year. So for example, ID 1 occurs the most in 2015, ID 2 occurs the most in 2014, etc.
What I have for a query is:
SELECT EXTRACT(year from time) "YEAR", COUNT(ID) "ID"
FROM table
GROUP BY EXTRACT(year from time)
ORDER BY COUNT(ID) DESC;
But this query just counts how many times a year occurs, how do I fix it to highest count of an ID in that year?
Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 2 |
| 2012 | 2 |
+------+----+
Expected Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 1 |
| 2014 | 2 |
+------+----+
Starting with your sample query, the first change is simply to group by the ID as well as by the year.
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
ORDER BY EXTRACT(year from time) DESC, COUNT(*) DESC
With that, you could find the rows you want by visual inspection (the first row for each year is the ID with the most rows).
To have the query just return the rows with the highest totals, there are several different ways to do it. You need to consider what you want to do if there are ties - do you want to see all IDs tied for highest in a year, or just an arbitrary one?
Here is one approach - if there is a tie, this should return just the lowest of the tied IDs:
WITH groups AS (
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
)
SELECT year, MIN(id) KEEP (DENSE_RANK FIRST ORDER BY total DESC)
FROM groups
GROUP BY year
ORDER BY year DESC
You need to count per id and then apply a RANK on that count:
SELECT *
FROM
(
SELECT EXTRACT(year from time) "YEAR" , ID, COUNT(*) AS cnt
, RANK() OVER (PARTITION BY "YEAR" ORDER BY COUNT(*) DESC) AS rnk
FROM table
GROUP BY EXTRACT(year from time), ID
) dt
WHERE rnk = 1
If this return multiple rows with the same high count per year and you want just one of them randomly, you can switch to a ROW_NUMBER.
This should do what you're after, I think:
with sample_data as (select 1 id, to_date('12/03/2015', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('23/04/2014', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('01/12/2014', 'dd/mm/yyyy') time from dual union all
select 1 id, to_date('01/12/2015', 'dd/mm/yyyy') time from dual union all
select 3 id, to_date('05/11/2015', 'dd/mm/yyyy') time from dual)
-- End of creating a subquery to mimick a table called "sample_data" containing your input data.
-- See SQL below:
select yr,
id most_frequent_id,
cnt_id_yr cnt_of_most_freq_id
from (select to_char(time, 'yyyy') yr,
id,
count(*) cnt_id_yr,
dense_rank() over (partition by to_char(time, 'yyyy') order by count(*) desc) dr
from sample_data
group by to_char(time, 'yyyy'),
id)
where dr = 1;
YR MOST_FREQUENT_ID CNT_OF_MOST_FREQ_ID
---- ---------------- -------------------
2014 2 2
2015 1 2