Sum where values of a column matches without number of rows changing - sql

I am trying to values of a column where values of another column matches. Below is a sample of my data.
DT No_of_records LD_VOY_N LD_VSL_M
2017-05-06 04:00:00.000 7 0002W pqo emzmnwp
2017-05-06 20:00:00.000 6 0002W pqo emzmnwp
2017-05-02 04:00:00.000 1 0007E omq ynzmeoyn
2017-05-01 08:00:00.000 2 0016W rmhp sunhpnw
2017-05-01 12:00:00.000 1 0016W rmhp sunhpnw
2017-05-05 12:00:00.000 2 0019N omq wqmsy
2017-05-06 04:00:00.000 12 0019N omq wqmsy
Below is my desired output
DT No_of_records LD_VOY_N LD_VSL_M Total_no_of_records
2017-05-06 04:00:00.000 7 0002W pqo emzmnwp 13
2017-05-06 20:00:00.000 6 0002W pqo emzmnwp 13
2017-05-02 04:00:00.000 1 0007E omq ynzmeoyn 1
2017-05-01 08:00:00.000 2 0016W rmhp sunhpnw 3
2017-05-01 12:00:00.000 1 0016W rmhp sunhpnw 3
2017-05-05 12:00:00.000 2 0019N omq wqmsy 14
2017-05-06 04:00:00.000 12 0019N omq wqmsy 14
I am trying to find the Total_no_of_records column. Do you have any ideas?

You seem to want a window function by LD_VOY_N:
select t.*,
sum(No_of_records) over (partition by LD_VOY_N) as Total_no_of_records
from t;

select DT,No_of_records,LD_VOY_N,LD_VSL_M ,COUNT(DISTINCT (DT,No_of_records,LD_VOY_N,LD_VSL_M )) as Total_no_of_records from tablename
group by DT,No_of_records,LD_VOY_N,LD_VSL_M

Related

Following Start and End Date Columns

I have start and end date columns, and there are some where the start date equals the end date of the previous row without a gap. I'm trying to get it so that it would basically go from the Start Date row who's End Date is null and kinda "zig-zag" up going until the Start Date does not match the End Date.
I've tried CTEs, and ROW_NUMBER() OVER().
START_DTE END_DTE
2018-01-17 2018-01-19
2018-01-26 2018-02-22
2018-02-22 2018-08-24
2018-08-24 2018-09-24
2018-09-24 NULL
Expected:
START_DTE END_DTE
2018-01-26 2018-09-24
EDIT
Using a proposed solution with an added CTE to ensure dates don't have times with them.
WITH
CTE_TABLE_NAME AS
(
SELECT
ID_NUM,
CONVERT(DATE,START_DTE) START_DTE,
CONVERT(DATE,END_DTE) END_DTE
FROM
TABLE_NAME
WHERE ID_NUM = 123
)
select min(start_dte) as start_dte, max(end_dte) as end_dte, grp
from (select t.*,
sum(case when prev_end_dte = end_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from CTE_TABLE_NAME t
) t
) t
group by grp;
The following query provides these results:
start_dte end_dte grp
2014-08-24 2014-12-19 1
2014-08-31 2014-09-02 2
2014-09-02 2014-09-18 3
2014-09-18 2014-11-03 4
2014-11-18 2014-12-09 5
2014-12-09 2015-01-16 6
2015-01-30 2015-02-02 7
2015-02-02 2015-05-15 8
2015-05-15 2015-07-08 9
2015-07-08 2015-07-09 10
2015-07-09 2015-08-25 11
2015-08-31 2015-09-01 12
2015-10-06 2015-10-29 13
2015-11-10 2015-12-11 14
2015-12-11 2015-12-15 15
2015-12-15 2016-01-20 16
2016-01-29 2016-02-01 17
2016-02-01 2016-03-03 18
2016-03-30 2016-08-29 19
2016-08-30 2016-12-06 20
2017-01-27 2017-02-20 21
2017-02-20 2017-08-15 22
2017-08-15 2017-08-29 23
2017-08-29 2018-01-17 24
2018-01-17 2018-01-19 25
2018-01-26 2018-02-22 26
2018-02-22 2018-08-24 27
2018-08-24 2018-09-24 28
2018-09-24 NULL 29
I tried using having count (*) > 1 as suggested, but it provided no results
Expected example
START_DTE END_DTE
2017-01-27 2018-01-17
2018-01-26 2018-09-24
You can identify where groups of connected rows start by looking for where adjacent rows are not connected. A cumulative sum of these starts then gives you the groups.
select min(start_dte) as start_dte, max(end_dte) as end_dte
from (select t.*,
sum(case when prev_end_dte = start_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from t
) t
) t
group by grp;
If you want only multiply connected rows (as implied by your question), then add having count(*) > 1 to the outer query.
Here is a db<>fiddle.

Count number of records within the same time period

I am trying to count the total number of records that have been added in at a specific time. Below is a sample of my data.
CNTR_N LOAD_VESSEL_M VOYAGE_OUT_N
HGTU 4615032 opgqqun 039E
TCNU 5590060 plq jpxxqyi 016E12
PCIU 1189368 iunpj igspnw 310N
CLHU 3193420 qpji oi 735S
RFSU 2000199 unqy ihpj 003NN
OOLU 1543519 mmaq ywclh 004E11
TFTU 8600600 epn vpu 490 W037
MSKU 5414708 syyhvmfyn 1708
SNAP_DT
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
Below is my desired output. I am trying to get the No_of_records column.
SNAP_DT No_of_records
2017-04-25 20:00:00.000 4
2017-05-03 16:00:00.000 4
Do any of you have ideas on how to get the above output? Would really appreciate your help.
You Can Use Group By clause with aggregate function Count.
Assuming your table name is table1, below is the query that will return your desired result.
SELECT snap_dt, Count(*)
FROM table1
GROUP BY snap_dt;
Try this:
SELECT
SNAP_DT
,COUNT(*)
FROM data
GROUP BY SNAP_DT

SQL Find Datetime outside Datetime range

I have 2 tables one called Production and the other called Schedule.
I am trying to find is there is some production outside the schedule.
So far I am getting duplicated value because the production could be inside one schedule but outside the other one.
So far I have no luck with this sql query I was wondering if someone can point me to the right direction.
thanks in advance.
SELECT TB1.*
FROM Production AS TB1
INNER JOIN Schedule AS TB2
ON TB1.ProduceDate < TB2.StartDate OR TB1.ProduceDate > tb2.EndDate
GROUP BY TB1.ID,TB1.ProduceDate
ORDER BY Tb1.ProduceDate
ID Produce Date
1 2017-02-03 09:00:00.000
2 2017-02-03 11:00:00.000
3 2017-02-03 13:00:00.000
4 2017-02-03 18:00:00.000
7 2017-02-03 19:00:00.000
5 2017-02-03 20:00:00.000
6 2017-02-03 23:00:00.000
Production Table Data
ID ProduceDate
1 2017-02-03 09:00:00.000
2 2017-02-03 11:00:00.000
3 2017-02-03 13:00:00.000
4 2017-02-03 18:00:00.000
5 2017-02-03 20:00:00.000
6 2017-02-03 23:00:00.000
7 2017-02-03 19:00:00.000
Schedule Table Data
ID StartDate EndDate
1 2017-02-03 10:00:00.000 2017-02-03 12:00:00.000
2 2017-02-03 15:00:00.000 2017-02-03 19:00:00.000
I think you just want not exists:
select p.*
from production p
where not exists (select 1
from schedule s
where p.producedate >= s.startdate and
p.producedate <= s.enddate
);
select Production.*
from Production
left join Schedule
on ProduceDate between StartDate and EndDate
where Schedule.id is null

SQL Server : compare rows, exclude from results when some values are the same

I have the following SQL Server query problem.
If there is a row where Issue_DATE = as Maturity_Date in another row, and if both rows have the same ID and Amount USD, then none of these rows should be displayed.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
I tried with self join, but I do not get right result.
Thanks in advance!
Can you try something like this? 'not exists' is the way of doing it.
select * from table t1 where not exists (select 'x' from table t2 where t1.issue_date = t2.maturity_date and t1.amount_usd=t2.amount_usd and t1.id = t2.id)
I'd think about making subquery of all the dupes and then eliminating them from the first table like so:
select t1.ID
, t1.ISSUE_DATE
, t1.MATURITY_DATE
, t1.AMOUNT_USD
FROM
t1
LEFT JOIN
(select a.ID
, a.ISSUE_DATE
, a.MATURITY_DATE
, a.AMOUNT_USD
FROM
t1 a
INNER JOIN
ti b
) dupes
on
t1.ID = dupes.ID
WHERE dupes.ID IS NULL;

Group By,Order by DateTime

I have nearly 15000 data rows with the first column containing date in the format:
2012-05-10 09:00:00.000
I need this data to be sorted by year then month, then day, then hour so for example:
2012-05-10 09:00:00.000
2012-05-10 10:00:00.000
2012-05-10 11:00:00.000
2012-05-10 12:00:00.000
2012-05-11 09:00:00.000
2012-05-11 10:00:00.000
2012-05-11 11:00:00.000
2012-05-11 12:00:00.000
2012-06-01 02:00:00.000
2012-06-01 03:00:00.000
2012-06-01 04:00:00.000
2012-06-01 05:00:00.000
Current SQL Query to do this is below:
SELECT MIN(Datetime)
GROUP BY DATEPART(M,jmusa_LOG1.DateTime),DATEPART(D,jmusa_LOG1.DateTime),DATEPART(HH,jmusa_LOG1.DateTime)
HAVING MIN(jmusa_LOG1.DateTime) NOT IN(SELECT DateTime FROM AverageRawData)
ORDER BY DATEPART(M,jmusa_LOG1.DateTime),DATEPART(D,jmusa_LOG1.DateTime),DATEPART(HH,jmusa_LOG1.DateTime)
You are describing a normal date sort, so you can just do:
select MyDate
from AverageRawData
order by MyDate
If you don't want duplicates, add DISTINCT like this:
select distinct MyDate
from AverageRawData
order by MyDate
If this does not meet your requirements, please provide sample data used to generate your output example.