Count number of records within the same time period - sql

I am trying to count the total number of records that have been added in at a specific time. Below is a sample of my data.
CNTR_N LOAD_VESSEL_M VOYAGE_OUT_N
HGTU 4615032 opgqqun 039E
TCNU 5590060 plq jpxxqyi 016E12
PCIU 1189368 iunpj igspnw 310N
CLHU 3193420 qpji oi 735S
RFSU 2000199 unqy ihpj 003NN
OOLU 1543519 mmaq ywclh 004E11
TFTU 8600600 epn vpu 490 W037
MSKU 5414708 syyhvmfyn 1708
SNAP_DT
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-04-25 20:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
2017-05-03 16:00:00.000
Below is my desired output. I am trying to get the No_of_records column.
SNAP_DT No_of_records
2017-04-25 20:00:00.000 4
2017-05-03 16:00:00.000 4
Do any of you have ideas on how to get the above output? Would really appreciate your help.

You Can Use Group By clause with aggregate function Count.
Assuming your table name is table1, below is the query that will return your desired result.
SELECT snap_dt, Count(*)
FROM table1
GROUP BY snap_dt;

Try this:
SELECT
SNAP_DT
,COUNT(*)
FROM data
GROUP BY SNAP_DT

Related

get time series in 8 hours of interval

I am generating one time-series from using the below query.
SELECT * from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::timestamp),
'2021-12-13 10:01:38'::timestamp,
concat(480, ' minutes')::interval) as t(time_ent)) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and it will give me output like below.
2021-11-13 18:00:00.000
2021-11-14 02:00:00.000
2021-11-14 10:00:00.000
2021-11-14 18:00:00.000
2021-11-15 02:00:00.000
but I need output like.
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
currently, the time series hours depend upon the timestamp that I pass. in above it gives me hours like 02,10,18...but I want the hours like 00,08,16...hours should not depend on the time I passed in query. I tried many things but not any success.
as your start of generate_series is set to 10:00:00, so your next step will be 18:00:00
you have to start your serie from 00:00:00 (cast to date) e.g.:
SELECT
time_ent::timestamp without time zone
from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::date),
'2021-12-13 10:01:38'::timestamp ,
concat(480, ' minutes')::interval) as t(time_ent)
) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and the result will be:
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
2021-11-15 08:00:00.000

Sum where values of a column matches without number of rows changing

I am trying to values of a column where values of another column matches. Below is a sample of my data.
DT No_of_records LD_VOY_N LD_VSL_M
2017-05-06 04:00:00.000 7 0002W pqo emzmnwp
2017-05-06 20:00:00.000 6 0002W pqo emzmnwp
2017-05-02 04:00:00.000 1 0007E omq ynzmeoyn
2017-05-01 08:00:00.000 2 0016W rmhp sunhpnw
2017-05-01 12:00:00.000 1 0016W rmhp sunhpnw
2017-05-05 12:00:00.000 2 0019N omq wqmsy
2017-05-06 04:00:00.000 12 0019N omq wqmsy
Below is my desired output
DT No_of_records LD_VOY_N LD_VSL_M Total_no_of_records
2017-05-06 04:00:00.000 7 0002W pqo emzmnwp 13
2017-05-06 20:00:00.000 6 0002W pqo emzmnwp 13
2017-05-02 04:00:00.000 1 0007E omq ynzmeoyn 1
2017-05-01 08:00:00.000 2 0016W rmhp sunhpnw 3
2017-05-01 12:00:00.000 1 0016W rmhp sunhpnw 3
2017-05-05 12:00:00.000 2 0019N omq wqmsy 14
2017-05-06 04:00:00.000 12 0019N omq wqmsy 14
I am trying to find the Total_no_of_records column. Do you have any ideas?
You seem to want a window function by LD_VOY_N:
select t.*,
sum(No_of_records) over (partition by LD_VOY_N) as Total_no_of_records
from t;
select DT,No_of_records,LD_VOY_N,LD_VSL_M ,COUNT(DISTINCT (DT,No_of_records,LD_VOY_N,LD_VSL_M )) as Total_no_of_records from tablename
group by DT,No_of_records,LD_VOY_N,LD_VSL_M

SQL Find Datetime outside Datetime range

I have 2 tables one called Production and the other called Schedule.
I am trying to find is there is some production outside the schedule.
So far I am getting duplicated value because the production could be inside one schedule but outside the other one.
So far I have no luck with this sql query I was wondering if someone can point me to the right direction.
thanks in advance.
SELECT TB1.*
FROM Production AS TB1
INNER JOIN Schedule AS TB2
ON TB1.ProduceDate < TB2.StartDate OR TB1.ProduceDate > tb2.EndDate
GROUP BY TB1.ID,TB1.ProduceDate
ORDER BY Tb1.ProduceDate
ID Produce Date
1 2017-02-03 09:00:00.000
2 2017-02-03 11:00:00.000
3 2017-02-03 13:00:00.000
4 2017-02-03 18:00:00.000
7 2017-02-03 19:00:00.000
5 2017-02-03 20:00:00.000
6 2017-02-03 23:00:00.000
Production Table Data
ID ProduceDate
1 2017-02-03 09:00:00.000
2 2017-02-03 11:00:00.000
3 2017-02-03 13:00:00.000
4 2017-02-03 18:00:00.000
5 2017-02-03 20:00:00.000
6 2017-02-03 23:00:00.000
7 2017-02-03 19:00:00.000
Schedule Table Data
ID StartDate EndDate
1 2017-02-03 10:00:00.000 2017-02-03 12:00:00.000
2 2017-02-03 15:00:00.000 2017-02-03 19:00:00.000
I think you just want not exists:
select p.*
from production p
where not exists (select 1
from schedule s
where p.producedate >= s.startdate and
p.producedate <= s.enddate
);
select Production.*
from Production
left join Schedule
on ProduceDate between StartDate and EndDate
where Schedule.id is null

Determine which POS sale takes precedence. Following logic tree in Sybase SQL Anywhere 10

My small grocery store has partnered with a third party to take online orders. I made an Excel sheet that queries our product database for a bunch of product information, and I feed that data into another macro-enabled worksheet that does the heavy lifting and generates a .CSV that I upload each week.
What I'm trying to accomplish now is to pull in current and future sales information so we can advertise them. Sales are created in our POS by putting them into groupings called "worksheets." The complicating factor is that a product can exist on multiple worksheets at once, and there's a logic tree that is followed to determine which of the worksheets will affect the scan price at any given time.
The two attributes that determine precedence are "priority" and "date committed".
The three priorities, High, Medium, and Low are represented as 1, 2, and 3 respectively. Higher priority sheets will always override lower priority sheets for the duration of their life.
When worksheets have equal priorities, the one that is most recently committed takes precedence.
So, given data that looks like this:
item_id worksheet_name priority date_committed sale_start_date sale_end_date sale_price
011259904209 A 2 2016-06-22 09:21:09.041 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 B 2 2016-06-22 09:49:31.722 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 2.0000
074682105322 C 2 2016-06-22 08:57:04.641 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 2.0000
042563013660 A 2 2016-06-22 09:21:09.048 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.9900
042563013660 D 1 2016-06-25 14:03:33.499 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
042563013660 E 2 2016-06-22 08:49:13.515 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.9900
073360772054 A 2 2016-06-22 09:21:09.114 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.9900
073360772054 B 2 2016-06-22 09:49:31.831 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 3.9900
073360772054 E 2 2016-06-22 08:49:13.520 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.9900
073360772054 C 2 2016-06-22 08:57:04.649 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 3.9900
012993221010 A 2 2016-06-22 09:21:09.110 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.3900
012993221010 B 2 2016-06-22 09:49:31.828 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 3.3900
012993221010 D 1 2016-06-25 14:03:33.502 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
012993221010 E 2 2016-06-22 08:49:13.517 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.3900
012993221010 C 2 2016-06-22 08:57:04.646 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 3.3900
I want to get this:
Run on 6/27
item_id worksheet_name sale_start_date sale_end_date sale_price
011259904209 A 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 C 2016-07-19 00:00:00.000 2016-07-20 00:00:00.000 2.0000
042563013660 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
073360772054 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
012993221010 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
Run on 6/29
item_id worksheet_name sale_start_date sale_end_date sale_price
011259904209 A 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 C 2016-07-19 00:00:00.000 2016-07-20 00:00:00.000 2.0000
042563013660 D 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
073360772054 A 2016-06-29 00:00:00.000 2016-07-19 00:00:00.000 3.9900
012993221010 D 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
Bonus points for combining overlapping sale periods to reflect the shopper's perception, but that's not necessary.
How can I get this result using SQL? Our sales run Wednesday through Tuesday, and I'm ideally generating my data file for the coming week on Wednesday afternoon or Thursday morning after we finish our weekly price changes.
We have tens of thousands of products in file.
Here's a graphical representation of the worksheet priorities per day
This is for SQL Anywhere 10, and running SELECT ##VERSION tells me 12.0.1.3967
I'm mostly familiar with SQL Server, but Sybase's SQL Anywhere is still very similar.
This is the "greatest N per group" problem. The typical solution (in anything other than MySQL) is to use ROW_NUMBER(), which is available on SQL Anywhere 10 and later as far as I can tell from Sybase books online.
SELECT a.item_id,
a.worksheet_name,
a.sale_start_date,
a.sale_end_date,
a.sale_price
FROM (
SELECT item_id,
worksheet_name,
sale_start_date,
sale_end_date,
sale_price,
ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY priority, date_committed DESC) AS rn
FROM UnnamedSalesTable
WHERE sale_start_date <= CURRENT DATE
AND sale_end_date > CURRENT DATE) a
WHERE a.rn = 1
Obviously, you can replace CURRENT DATE with whatever date value you want to run.
If you happen to have two sales with an identical priority and date_committed, then you'll still only get one row. In other words, if there are duplicates you won't know. I suspect that's what you want. However, if you need to see duplicates, then instead of ROW_NUMBER() you'd want to use RANK() or DENSE_RANK() (either work in this case). That will allow "ties" to show up. Otherwise the query would be identical. If this duplication happens a lot, then you'll want to add a third column to the ORDER BY portion of the WINDOW clause.

Group By,Order by DateTime

I have nearly 15000 data rows with the first column containing date in the format:
2012-05-10 09:00:00.000
I need this data to be sorted by year then month, then day, then hour so for example:
2012-05-10 09:00:00.000
2012-05-10 10:00:00.000
2012-05-10 11:00:00.000
2012-05-10 12:00:00.000
2012-05-11 09:00:00.000
2012-05-11 10:00:00.000
2012-05-11 11:00:00.000
2012-05-11 12:00:00.000
2012-06-01 02:00:00.000
2012-06-01 03:00:00.000
2012-06-01 04:00:00.000
2012-06-01 05:00:00.000
Current SQL Query to do this is below:
SELECT MIN(Datetime)
GROUP BY DATEPART(M,jmusa_LOG1.DateTime),DATEPART(D,jmusa_LOG1.DateTime),DATEPART(HH,jmusa_LOG1.DateTime)
HAVING MIN(jmusa_LOG1.DateTime) NOT IN(SELECT DateTime FROM AverageRawData)
ORDER BY DATEPART(M,jmusa_LOG1.DateTime),DATEPART(D,jmusa_LOG1.DateTime),DATEPART(HH,jmusa_LOG1.DateTime)
You are describing a normal date sort, so you can just do:
select MyDate
from AverageRawData
order by MyDate
If you don't want duplicates, add DISTINCT like this:
select distinct MyDate
from AverageRawData
order by MyDate
If this does not meet your requirements, please provide sample data used to generate your output example.