sql query to get today new records compared with yesterday - sql

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times

Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.

The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().

Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

Related

How to Group by Current day and Count Rows?

Hello I have table "os_txn.pay_link" and inside there are many columns.
What I want to do is that I want to count the row numbers by looking at "merchant_id" column for the current day.
So for example what I am looking for an output is that today one of "merchant_id" has
"8" rows. So I want to know the number of rows of the "merchant_id" column for current day.
I think I should use count(*) in view with select statement but couldnt succeed about syntax. So I am open your suggestions thank you.
If I understood you correctly, a simple option would be
select merchant_id, count(*)
from os_txn.pay_link
where date_column = trunc(sysdate)
group by merchant_id;
presuming that date_column contains date only (i.e. for today, 8th of October 2022, that's its value - no hours, minutes or seconds).
If date column contains time component, again - a simple option - would be
select merchant_id, count(*)
from os_txn.pay_link
where trunc(date_column) = trunc(sysdate)
group by merchant_id;
If there's an index on date_column, then such a code wouldn't use it (unless it is a function-based index) so you'd rather modify it to
where date_column >= trunc(sysdate)
and date_column < trunc(sysdate + 1)
If that's not it, do post sample data and desired result.

tdate issue I'm facing in SQL query

While fetching count from table by using following query
Select count(*)
from tab
where tdate = '17-05-19' ---> output 0
or
Select count(*)
from tab
where trunc(tdate) = '17-05-19' ---->output 0
If I use:
Select count(*)
from tab
where tdate >sysdate - 1 ---> it returns some count(yesterday+some of the today txn)
But here I want only yesterday txn whenever I fire this query.
But here I want only yesterday txn whenever I fire this query.
You may use this.
Select count (*) from tab where
tdate >= TRUNC(SYSDATE) - 1
AND tdate < TRUNC(SYSDATE)
The advantage of this over using TRUNC on the date column is that it will utilize an index if it exists over tdate
If you tried by using
Select count(*) from tab where trunc(tdate) = date'2019-05-17'
(or, you could use
Select count(*) from tab where to_char(tdate,'dd-mm-yy') = '17-05-19' by formatting through to_char function
or, you could use
Select count(*) from tab where trunc(tdate) = trunc(sysdate)-1 to get only the data for the day before
)
you'd get some results provided you have data for the date 17th May.
So, you need to provide a formatting for your literal as date'2019-05-17'(known as date literal) especially for Oracle DB, it might be used as '2019-05-17' without date part in MySQL as an example.
Btw, trunc function is used to extract the date portion, and remove the time part of a date type column value.
If your table is populated with huge data, therefore performance may matter, then you can even create functional index on trunc(tdate).
Demo

Efficient way of counting a large content from a cloumn or a two in a database using selected time period

I need to list number of column1 that have been added to the database over the selected time period (since the day the list is requested)-daily, weekly (last 7 days), monthly (last 30 days) and quarterly (last 3 months). for example below is the table I created to perform this task.
Column | Type | Modifiers
------------------+-----------------------------+-----------------------------------------------------
column1 character varying (256) not null default nextval
date timestamp without time zone not null default now()
coloumn2 charater varying(256) ..........
Now, I need the total count of entries in column1 with respect the selected time period.
Like,
Column 1 | Date | Coloumn2
------------------+-----------------------------+-----------------------------------------------------
abcdef 2013-05-12 23:03:22.995562 122345rehr566
njhkepr 2013-04-10 21:03:22.337654 45hgjtron
ffb3a36dce315a7 2013-06-14 07:34:59.477735 jkkionmlopp
abcdefgggg 2013-05-12 23:03:22.788888 22345rehr566
From above data, for daily selected time period it should be count= 2
I have tried doing this query
select count(column1) from table1 where date='2012-05-12 23:03:22';
and have got the exact one record matching the time stamp. But I really needed to do it in proper way I believe this is not an efficient way of retrieving the count. Anyone who could help me know the right and efficient way of writing such query would be great. I am new to the database world, and I am trying to be efficient in writing any query.
Thanks!
[EDIT]
Each query currently is taking 175854ms to get process. What could be the efficient way to lessen the time to have it processed accordingly. Any help would be really great. I am using Postgresql to do the same.
To be efficient, conditions should compare values of the sane type as the columns being compared. In this case, the column being compared - Date - has type timestamp, so we need to use a range of tinestamp values.
In keeping with this, you should use current_timestamp for the "now" value, and as confirmed by the documentation, subtracting an interval from a timestamp yields a timestamp, so...
For the last 1 day:
select count(*) from table1
where "Date" > current_timestamp - interval '1 day'
For the last 7 days:
select count(*) from table1
where "Date" > current_timestamp - interval '7 days'
For the last 30 days:
select count(*) from table1
where "Date" > current_timestamp - interval '30 days'
For the last 3 months:
select count(*) from table1
where "Date" > current_timestamp - interval '3 months'
Make sure you have an index on the Date column.
If you find that the index is not being used, try converting the condition to a between, eg:
where "Date" between current_timestamp - interval '3 months' and current_timestamp
Logically the same, but may help the optimizer to choose the index.
Note that column1 is irrelevant to the question; being unique there is no possibility of the row count being different from the number of different values of column1 found by any given criteria.
Also, the choice of "Date" for the column name is poor, because a) it is a reserved word, and b) it is not in fact a date.
If you want to count number of records between two dates:
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-05-13'
-- count for one day, upper bound not included
select count(*)
from Table1
where "Date" >= '2013-05-12' and "Date" < '2013-06-13'
-- count for one month, upper bound not included
select count(*)
from Table1
where
"Date" >= current_date and
"Date" < current_date + interval '1 day'
-- current date
What I understand from your wording is
select date_trunc('day', "date"), count(*)
from t
where "date" >= '2013-01-01'
group by 1
order by 1
Replace 'day' for 'week', 'month', 'quarter' as needed.
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
Create an index on the "date" column.
select count(distinct column1) from table1 where date > '2012-05-12 23:03:22';
I assume "number of column1" means "number of distinct values in column1.
Edit:
Regarding your second question (speed of the query): I would assume that an index on the date column should speed up the runtime. Depending on the data content, this could even be declared unique.
To throw another option into the mix...
Add a column of type "date" and index that -- named "datecol" for this example:
create index on tbl_datecol_idx on tbl (datecol);
analyze tbl;
Then your query can use an equality operator:
select count(*) from tbl where datecol = current_date - 1; --yesterday
Or if you can't add the date datatype column, you could create a functional index on the existing column:
create index tbl_date_fbi on tbl ( ("date"::DATE) );
analyze tbl;
select count(*) from tbl where "date"::DATE = current_date - 1;
Note1: you do not need to query "column1" directly as every row has that attribute filled due to the NOT NULL.
Note2: Creating a column named "date" is poor form, and even worse that it is of type TIMESTAMP.

Generate Dates starting from a date returned by a condition - SQL

A series of dates with a specified interval can be generated using a variable and a static date as per the linked question that I asked earlier. However when there's a where clause to produce a start date, the dates generation seems to stop and only shows the first interval date. I also checked other posts, those that I found e.g. 1, e.g. 2, e.g. 3 are shown with a static date or using CTE.. I am looking for a solution without storedprocedures/functions...
This works:
SELECT DATE(DATE_ADD('2012-01-12',
INTERVAL #i:=#i+30 DAY) ) AS dateO
FROM members, (SELECT #i:=0) r
where #i < DATEDIFF(now(), date '2012-01-12')
;
These don't:
SELECT DATE_ADD(date '2012-01-12',
INTERVAL #j:=#j+30 DAY) AS dateO, #j
FROM `members`, (SELECT #j:=0) s
where #j <= DATEDIFF(now(), date '2012-01-12')
and mmid = 100
;
SELECT DATE_ADD(stdate,
INTERVAL #k:=#k+30 DAY) AS dateO, #k
FROM `members`, (SELECT #k:=0) t
where #k <= DATEDIFF(now(), stdate)
and mmid = 100
;
SQLFIDDLE REFERENCE
Expected Results:
Be the same as the first query results given it starts generating dates with stDate of mmid=100.
Preferably in ANSI SQL so it can be supported in MYSQL, SQL Server/MS Access SQL as Oracle has trunc and rownum given per this query with 14 votes and PostGres has generatge_Series function. I would like to know if this is a bug or a limitation in MYSQL?
PS: I have asked a similar quetion before. It was based on static date values where as this one is based on a date value from a table column based on a condition.
The simplest way to insure cross-platform compatibility is to use a calendar table. In its simplest form
create table calendar (
cal_date date primary key
);
insert into calendar values
('2013-01-01'),
('2013-01-02'); -- etc.
There are many ways to generate dates for insertion.
Instead of using a WHERE clause to generate rows, you use a WHERE clause to select rows. To select October of this year, just
select cal_date
from calendar
where cal_date between '2013-10-01' and '2013-10-31';
It's reasonably compact--365,000 rows to cover a period of 1000 years. That ought to cover most business scenarios.
If you need cross-platform date arithmetic, you can add a tally column.
drop table calendar;
create table calendar (
cal_date date primary key,
tally integer not null unique check (tally > 0)
);
insert into calendar values ('2012-01-01', 1); -- etc.
To select all the dates of 30-day intervals, starting on 2012-01-12 and ending at the end of the calendar year, use
select cal_date
from calendar
where ((tally - (select tally
from calendar
where cal_date = '2012-01-12')) % 30 ) = 0;
cal_date
--
2012-01-12
2012-02-11
2012-03-12
2012-04-11
2012-05-11
2012-06-10
2012-07-10
2012-08-09
2012-09-08
2012-10-08
2012-11-07
2012-12-07
If your "mmid" column is guaranteed to have no gaps--an unspoken requirement for a calendar table--you can use the "mmid" column in place of my "tally" column.

Query aggregate faster than MAX

I have a fairly large table in which one of the columns is a date column. The query I execute is as follows.
select max(date) from tbl where date < to_date('10/01/2010','MM/DD/YYYY')
That is, I want to find the cell value closest to and less than a particular date value. This takes considerable time because of the max on the large table. Is there a faster way to do this? maybe using LAST_VALUE?
Put an index on the date column and the query should be plenty fast.
1) Add an index to the date column. Simply put, an index allows the database engine to store information about the data so it will speed up most queries where that column is one of the clauses. Info here http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
2) Consider adding a second clause to the query. You have where date < to_date('10/01/2010','MM/DD/YYYY') now, why not change it to:
where date < to_date('10/01/2010','MM/DD/YYYY') and date > to_date('09/30/2010', 'MM/DD/YYYY')
since this will reduce the number of scanned rows.
Try
select date from (
select date from tbl where date < to_date('10/01/2010','MM/DD/YYYY') order by date desc
) where rownum = 1