Adding in missing dates from results in SQL

Adding in missing dates from results in SQL - sql

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?

You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?

You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]

Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

SQL Server : getting sum of values in "calendar" table without joining

Is it possible to get a the sum of value from the calendar_table to the main_table without joining like below?
select
date, sum(value)
from
main_table
inner join
calendar_table on start_date <= date and end_date >= date
group by
date
I am trying to avoid a join like this because main_table is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I've already indexed both tables.
Sample desired results:
+-----------+-------+
| date | total |
+-----------+-------+
| 7-24-2010 | 11 |
+-----------+-------+
Sample tables
calendar_table:
+-----------+-------+
| date | value |
+-----------+-------+
| 7-24-2010 | 5 |
| 7-25-2010 | 6 |
| ... | ... |
| 7-23-2020 | 2 |
| 7-24-2020 | 10 |
+-----------+-------+
main_table:
+------------+-----------+
| start_date | end_date |
+------------+-----------+
| 7-24-2010 | 7-25-2010 |
| 8-1-2011 | 8-5-2011 |
+------------+-----------+

You want the sum in the calendar table. So, I would recommend an "incremental" approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:
select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
union all
select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date;
The final step is to aggregate and do a cumulative sum:
select date, sum(inc) as value_on_date,
sum(sum(inc)) over (order by date) as net_value
from ((select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
) union all
(select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date
)
) c
group by date
order by date;
This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.

Here's a cross-apply example to possibly work from.
select main_table.date
, CalendarTable.ValueSum
from main_table
CROSS APPLY(
SELECT SUM(value) as ValueSum
FROM calendar_table
WHERE start_date <= main_table.date and main_table.end_date >= date
) as CalendarTable
group by date

You could try something like this ... but be aware, it is still technically 'joined' to the main table. If you look at an execution plan, you will see that there is a join operation of some kind going on.
select
date,
(select sum(value) from calendar_table t where m.start_date <= t.date and m.end_date >= t.date)
from
main_table m
The thing about that query is that the 'main_table' is not grouped as part of the results. You could possibly do that outside the select, but I don't know what you are trying to achieve. If you are grouping just to get the SUM, then perhaps maintaining the 'main_table' in the group is superflous.

As already mentioned, you must perform a join of some sort in order to get data from more than one table in a query.
You did not provide details if the indexes which are important for performance. I suggest the following indexes to optimize query performance.
For calendar_table, make sure you have a unique clustered index (or primary key) on date. Alternatively, a unique nonclustered index on date with the value column included.
A composite index on the main_table start_date and end_date columns may also be beneficial.
Even with optimal indexes, the query will still take some time against a 500M row table (e.g. a couple of minutes) with no additional filter criteria. If you need results in milliseconds, create an indexed view to materialize the join and aggregation results. Be aware the indexed view will add overhead for inserts/deletes on both tables as well as for updates to the value column in order to keep the index consistent with the underlying data.
Below is an indexed view DDL example.
CREATE VIEW dbo.vw_example
WITH SCHEMABINDING
AS
SELECT
date, sum(value) AS value, COUNT_BIG(*) AS countbig
from
dbo.main_table
inner join
dbo.calendar_table on start_date <= date and end_date >= date
group by
date;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.vw_example(date);
GO
Depending on your SQL Server edition, the optimizer may be able to use the indexed view automatically so your original query can use the view index without changes. Otherwise, query the view directly and specify a NOEXPAND hint:
SELECT date, value AS total
FROM dbo.vw_example WITH (NOEXPAND);
EDIT:
With the query improvement #GordonLinoff suggested, a non-clustered index on the main_table end_date column will help optimize that query.

Oracle SQL Remove Duplicates on 2 of 4 fields

I am using Oracle SQL to extract the data;
I have supply periods for IDs in 2 systems. I have this working with the below code:
select distinct b.ID_Code, b.supply_start_date, b.supply_end_date, b.system_id
from (
select ID_Code, max(supply_start_date) as max_dt
from tmp_mmt_sup
group by ID_Code) a
inner join tmp_mmt_sup b
on a.ID_Code=b.ID_Code and a.max_dt=b.SUPPLY_START_DATE;
However, I have several records that are on the 2 different systems, but have the same start date/end dates. I only want to keep one of them - not bothered which!
So instead of
ID_Code Start End System
123 01-04-2018 30-04-2018 ABC
123 01-04-2018 30-04-2018 DEF
I only have one of these records.
Many thanks
D

If you don't care which one to return, then one of aggregate functions (such as MIN or MAX) does the job. For example:
select b.id_code,
b.supply_start_date,
b.supply_end_date,
max(b.system_id) system_id --> added MAX here ...
from (select id_code,
max(supply_start_date) as max_dt
from tmp_mmt_sup
group by id_code
) a
inner join tmp_mmt_sup b
on a.id_code = b.id_code and a.max_dt = b.supply_start_date
group by b.id_code, --> ... and GROUP BY here
b.supply_start_date,
b.supply_end_date;

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?

You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)

In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random

This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

Show data from table even if there is no data!! Oracle

I have a query which shows count of messages received based on dates.
For Eg:
1 | 1-May-2012
3 | 3-May-2012
4 | 6-May-2012
7 | 7-May-2012
9 | 9-May-2012
5 | 10-May-2012
1 | 12-May-2012
As you can see on some dates there are no messages received. What I want is it should show all the dates and if there are no messages received it should show 0 like this
1 | 1-May-2012
0 | 2-May-2012
3 | 3-May-2012
0 | 4-May-2012
0 | 5-May-2012
4 | 6-May-2012
7 | 7-May-2012
0 | 8-May-2012
9 | 9-May-2012
5 | 10-May-2012
0 | 11-May-2012
1 | 12-May-2012
How can I achieve this when there are no rows in the table?

First, it sounds like your application would benefit from a calendar table. A calendar table is a list of dates and information about the dates.
Second, you can do this without using temporary tables. Here is the approach:
with constants as (select min(thedate>) as firstdate from <table>)
dates as (select( <firstdate> + rownum - 1) as thedate
from (select rownum
from <table> cross join constants
where rownum < sysdate - <firstdate> + 1
) seq
)
select dates.thedate, count(t.date)
from dates left outer join
<table> t
on t.date = dates.thedate
group by dates.thedate
Here is the idea. The alias constants records the earliest date in your table. The alias dates then creates a sequence of dates. The inner subquery calculates a sequence of integers, using rownum, and then adds these to the first date. Note this assumes that you have on average at least one transaction per date. If not, you can use a bigger table.
The final part is the join that is used to bring back information about the dates. Note the use of count(t.date) instead of count(*). This counts the number of records in your table, which should be 0 for dates with no data.

You don't need a separate table for this, you can create what you need in the query. This works for May:
WITH month_may AS (
select to_date('2012-05-01', 'yyyy-mm-dd') + level - 1 AS the_date
from dual
connect by level < 31
)
SELECT *
FROM month_may mm
LEFT JOIN mytable t ON t.some_date = mm.the_date
The date range will depend on how exactly you want to do this and what your range is.

You could achieve this with a left outer join IF you had another table to join to that contains all possible dates.
One option might be to generate the dates in a temp table and join that to your query.
Something like this might do the trick.
CREATE TABLE #TempA (Col1 DateTime)
DECLARE #start DATETIME = convert(datetime, convert(nvarchar(10), getdate(), 121))
SELECT #start
DECLARE #counter INT = 0
WHILE #counter < 50
BEGIN
INSERT INTO #TempA (Col1) VALUES (#start)
SET #start = DATEADD(DAY, 1, #start)
SET #counter = #counter+1
END
That will create a TempTable to hold the dates... I've just generated 50 of them starting from today.
SELECT
a.Col1,
COUNT(b.MessageID)
FROM
TempA a
LEFT OUTER JOIN YOUR_MESSAGE_TABLE b
ON a.Col1 = b.DateColumn
GROUP BY
a.Col1
Then you can left join your message counts to that.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas