How do I count data from 2 different tables by date - sql

I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you

The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".

Why can't you create a temporary table for each function then join them together?

Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).

Related

Sum Column for Running Total where Overlapping Date

I have a table with about 3 million rows of Customer Sales by Date.
For each CustomerID row I need to get the sum of the Spend_Value
WHERE Order_Date BETWEEN Order_Date_m365 AND Order_Date
Order_Date_m365 = OrderDate minus 365 days.
I just tried a self join but of course, this gave the wrong results due to rows overlapping dates.
If there is a way with Window Functions this would be ideal but I tried and can't do the between dates in the function, unless I missed a way.
Tonly way I can think now is to loop so process all rank 1 rows into a table, then rank 2 in a table, etc., but this will be really inefficient on 3 million rows.
Any ideas on how this is usually handled in SQL?
SELECT CustomerID,Order_Date_m365,Order_Date,Spend_Value
FROM dbo.CustomerSales
Window functions likely won't help you here, so you are going to need to reference the table again. I would suggest that you use an APPLY with a subquery to do this. Provided you have relevant indexes then this will likely be the more efficient approach:
SELECT CS.CustomerID,
CS.Order_Date_m365,
CS.Order_Date,
CS.Spend_Value,
o.output
FROM dbo.CustomerSales CS
CROSS APPLY (SELECT SUM(Spend_Value) AS output
FROM dbo.CustomerSales ca
WHERE ca.CustomerID = CS.CustomerID
AND ca.Order_Date >= CS.Order_Date_m365 --Or should this is >?
AND ca.Order_Date <= CS.Order_Date) o;

SQL filtering activity after certain event

I am struggling with a SQL query.
My query looks something like this:
Select
Count(user-id),
sum(distinct(date),
Sum(characters-posted)
From (
Select
Date,
User-Id,
Session-Id,
Characters—posted,
Variant-id
From database-name
Where date between ‘2022-09-01’ and ‘2022-09-31’)
This works ok. But, there is another field in the table “mailing-list”, which is just 0 or 1. I want to only get activity for members from the date when they join the mailing list onwards, even if they then leave the list, so can’t just do “where mailing-list=1”.
How can I do this?
It's not obvious what works fine for you as it seems to be uncommon to sum dates, given it is a regular date format. Are you trying to get number of active dates? as for the bottom question you might.
As for your buttom line quesiton, it seems that you might want to use a cte or subselect in a join.
your query...
from db_name dbn
inner join (select user_id, min(date) date from database_name
where mailing_list = 1 group by 1) start_date
on start_date.user_id = dbn.user_id
and start_date.date <= dbn.date
That way you're only getting activity starting from the first time your users join the mailing list.
But I still think you have an error in your final query, check it out.

Executing a Aggregate function within a case without Group by

I am trying to assign a specific code to a client based on the number of gifts that they have given in the past 6 months using a CASE. I am unable to use WITH (screenshot) due to the limitations of the software that I am creating the query in. It only allows for select functions. I am unsure how to get a distinct count from another table (transaction data) and use that as parameters in the CASE I have currently built (based on my client information table). Does anyone know of any workarounds for this? I am unable to GROUP BY clientID at the end of my query because not all of my columns are aggregate, and I only need to GROUP BY clientID for this particular WHEN statement in the CASE. I have looked into the OVER() clause, but I am needing my date range that I am evaluating to be dynamic (counting transactions over the last six months), and the amount of rows that I would be including is variable, as the transaction count month to month varies. Also, the software that I am building this in does not recognize the PARTITIONED BY parameter of the over clause.
Any help would be great!
EDIT:
it is not letting me attach an image... -____- I have added the two sections of code that I am looking for assistance with!
WITH "6MonthGIftCount" (
"ConstituentID"
,"GiftCount"
)
AS (
SELECT COUNT(DISTINCT "GiftView"."GiftID" FROM "GiftView" WHERE MONTHS_BETWEEN("GiftView"."GiftDate", getdate()) <= 6 GROUP BY "GiftView"."ConstituentID")
SELECT...CASE
WHEN "6MonthGiftCount"."GiftCount" >= 4
THEN 'A010'
)
Perform your grouping/COUNT(1) in a subquery to obtain the total # of donations by ConstituentID, then JOIN this total into your main query that uses this new column to perform its CASE statement.
select
hist.*,
case when timesDonated > 5 then 'gracious donor'
when timesDonated > 3 then 'repeated donor'
when timesDonated >= 1 then 'donor'
else null end as donorCode
from gifthistory hist
left join ( /* your grouping subquery here, pretending to be a new table */
select
personID,
count(1) as timesDonated
from gifthistory i
WHERE abs(months_between(giftDate, sysdate)) <= 6
group by personid ) grp on hist.personid = grp.personID
order by 1;
*Naturally, syntax changes will vary by DB; you didn't specify which it was based on, but you should be able to use this template with whichever you utilize. This works in both Oracle and SQL Server after tweaking the month calculation appropriately.

How can I get the first instance of an event per day with multiple columns including a datetime and return those columns plus the full datetime value?

I need to generate a SQL script that will pull out Distinct entries using a number of columns, one of which is a datetime column. I am only interested in the first occurrence of the day per event and the query needs to span multiple days. The query will be run against a very large database and can potentially be returning hundreds of thousands of results if not millions. Therefore I need this script to be as efficient as possible as well. This will eventually be a script running in SSRS to pull access transactions.
I've tried using GROUP BY, DISTINCT, subqueries, FIRST, and such without success. All the examples I can find online don't have JOIN statements or calculated columns such as only gathering the date from a datetime field.
I've simplified the below script some to only pull one day and one door, but the prod will be multiple days and doors. This code returns the data I need, I don't care about the COUNT, but I also need to get the (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) field in my result set as well somehow. The problem is since it goes down to the second it makes all records DISTINCT.
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
SELECT
CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) AS 'Date'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
,COUNT(*) AS 'Count'
FROM Access.JournalLogView AJ
LEFT OUTER JOIN Access.Personnel as AP on AP.GUID = AJ.PrimaryObjectIdentity
WHERE (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101)
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
I want to get the columns called out in the SELECT statement plus the datetime which includes the second. Again I also want the most efficient way of pulling this data as well. Thank you very much.
Assuming PrimaryObjectIdentity is the primary key to find the personnel in JournalLogview and ServerLocaleOffset as the datetime column in that table,I have written down this:
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
WITH cte
AS(
SELECT
ROW_NUMBER() OVER
(PARTITION BY PrimaryObjectIdentity,CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) ORDER BY ServerLocaleOffset) AS row_num,
--whatever the columns you want here
*
FROM
Access.JournalLogView)
SELECT
DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) AS 'DateTime'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
--I guess count(*) won't be of use a we are selecting only the first row
,COUNT(*) AS 'Count'
FROM cte AJ
LEFT OUTER JOIN
Access.Personnel as AP
on
AP.GUID = AJ.PrimaryObjectIdentity
WHERE
AJ.row_num = 1
AND (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY (DateAdd(minute,-(ServerLocaleOffset),ServerUTC))
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
In this query, I have used PARTITION to partition the whole table by each user, date and then assign row_number() to each row starting from the first entry of each user in that particular date. So, any row with row_num() = 1 will give you the first entry of that user in that date (which is the same condition I have used in the where clause). Hope this helps :)

How to have GROUP BY and COUNT include zero sums?

I have SQL like this (where $ytoday is 5 days ago):
$sql = 'SELECT Count(*), created_at FROM People WHERE created_at >= "'. $ytoday .'" AND GROUP BY DATE(created_at)';
I want this to return a value for every day, so it would return 5 results in this case (5 days ago until today).
But say Count(*) is 0 for yesterday, instead of returning a zero it doesn't return any data at all for that date.
How can I change that SQLite query so it also returns data that has a count of 0?
Without convoluted (in my opinion) queries, your output data-set won't include dates that don't exist in your input data-set. This means that you need a data-set with the 5 days to join on to.
The simple version would be to create a table with the 5 dates, and join on that. I typically create and keep (effectively caching) a calendar table with every date I could ever need. (Such as from 1900-01-01 to 2099-12-31.)
SELECT
calendar.calendar_date,
Count(People.created_at)
FROM
Calendar
LEFT JOIN
People
ON Calendar.calendar_date = People.created_at
WHERE
Calendar.calendar_date >= '2012-05-01'
GROUP BY
Calendar.calendar_date
You'll need to left join against a list of dates. You can either create a table with the dates you need in it, or you can take the dynamic approach I outlined here:
generate days from date range