SQL multiple joins and sums on same table - sql

I'm learning SQL on the fly as I work on a project and would appreciate some help with the following. I'm also fairly new to stackoverflow so I apologize if my formatting is off:
I have a table with columns Date, Group, Person, Amount. For every day I have an entry for each person with an amount and the group they're in, so one row would look like:
Date Group Person Amount
8/7/2012 A Steve 10
I'm trying to write a statement that will return the sum of all groups for two different days.
I have:
Select t1.group,sum(t1.amount),sum(t2.amount)
From table t1, table t2
Where t1.group=t2.group AND t1.date=current_date-1 AND t2.date=current_date-2
Group by t1.group
I'm not getting any errors but the two sums are different from what I get if I just do
Select date,sum(amount) From table Group by date
and look at the days in question.

Why are you joining between two tables?
I think you want:
Select t.group,
sum(case when t.date = current_date - 1 then t.amount end),
sum(case when t.date = current_date - 2 then t.amount end)
From table t
Group by t.group

Related

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

Need to count unique transactions by month but ignore records that occur 3 days after 1st entry for that ID

I have a table with just two columns: User_ID and fail_date. Each time somebody's card is rejected they are logged in the table, their card is automatically tried again 3 days later, and if they fail again, another entry is added to the table. I am trying to write a query that counts unique failures by month so I only want to count the first entry, not the 3 day retries, if they exist. My data set looks like this
user_id fail_date
222 01/01
222 01/04
555 02/15
777 03/31
777 04/02
222 10/11
so my desired output would be something like this:
month unique_fails
jan 1
feb 1
march 1
april 0
oct 1
I'll be running this in Vertica, but I'm not so much looking for perfect syntax in replies. Just help around how to approach this problem as I can't really think of a way to make it work. Thanks!
You could use lag() to get the previous timestamp per user. If the current and the previous timestamp are less than or exactly three days apart, it's a follow up. Mark the row as such. Then you can filter to exclude the follow ups.
It might look something like:
SELECT month,
count(*) unique_fails
FROM (SELECT month(fail_date) month,
CASE
WHEN datediff(day,
lag(fail_date) OVER (PARTITION BY user_id,
ORDER BY fail_date),
fail_date) <= 3 THEN
1
ELSE
0
END follow_up
FROM elbat) x
WHERE follow_up = 0
GROUP BY month;
I'm not so sure about the exact syntax in Vertica, so it might need some adaptions. I also don't know, if fail_date actually is some date/time type variant or just a string. If it's just a string the date/time specific functions may not work on it and have to be replaced or the string has to be converted prior passing it to the functions.
If the data spans several years you might also want to include the year additionally to the month to keep months from different years apart. In the inner SELECT add a column year(fail_date) year and add year to the list of columns and the GROUP BY of the outer SELECT.
You can add a flag about whether this is a "unique_fail" by doing:
select t.*,
(case when lag(fail_date) over (partition by user_id order by fail_date) > fail_date - 3
then 0 else 1
end) as first_failure_flag
from t;
Then, you want to count this flag by month:
select to_char(fail_date, 'Mon'), -- should aways include the year
sum(first_failure_flag)
from (select t.*,
(case when lag(fail_date) over (partition by user_id order by fail_date) > fail_date - 3
then 0 else 1
end) as first_failure_flag
from t
) t
group by to_char(fail_date, 'Mon')
order by min(fail_date)
In a Derived Table, determine the previous fail_date (prev_fail_date), for a specific user_id and fail_date, using a Correlated subquery.
Using the derived table dt, Count the failure, if the difference of number of days between current fail_date and prev_fail_date is greater than 3.
DateDiff() function alongside with If() function is used to determine the cases, which are not repeated tries.
To Group By this result on Month, you can use MONTH function.
But then, the data can be from multiple years, so you need to separate them out yearwise as well, so you can do a multi-level group by, using YEAR function as well.
Try the following (in MySQL) - you can get idea for other RDBMS as well:
SELECT YEAR(dt.fail_date) AS year_fail_date,
MONTH(dt.fail_date) AS month_fail_date,
COUNT( IF(DATEDIFF(dt.fail_date, dt.prev_fail_date) > 3, user_id, NULL) ) AS unique_fails
FROM (
SELECT
t1.user_id,
t1.fail_date,
(
SELECT t2.fail_date
FROM your_table AS t2
WHERE t2.user_id = t1.user_id
AND t2.fail_date < t1.fail_date
ORDER BY t2.fail_date DESC
LIMIT 1
) AS prev_fail_date
FROM your_table AS t1
) AS dt
GROUP BY
year_fail_date,
month_fail_date
ORDER BY
year_fail_date ASC,
month_fail_date ASC

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

Creating a TSQL query to make a report

I am in the process of creating a report from the data I have stored in my database; just a little stuck on the next piece of it.
Here is an SQLFiddle of my structure
The report is run every Friday. It gets all records from the table that are within the last 7 days (since it was last reported).
The piece I need to add to my query is only get me records where the SUM of awardValue exceeds $75 in the current year.
I have it pulling my records for the time frame (since last report) but need to include that other piece.
How can I accomplish this?
Assuming that you don't care about the Award status when you calculate sum -
Select Main.*
from main
INNER JOIN(
SELECt EMPLOYEE,SUM(AWARDVALUE) SUM
FROM MAIN
WHERE YEAR(AWARDDATE) = YEAR(GetDate())
GROUP BY EMPLOYEE
HAVING SUM(AWARDVALUE)>75) EMPLIMIT
ON Main.EMPLOYEE = EMPLIMIT.EMPLOYEE
Where awardStatus = '1' AND awardDate BETWEEN GetDate() - 7 AND GetDate()
Modified query to Pull in SUM with results and TaxIt Column
Select Main.*,EmployeeSum,
CASE WHEN EmployeeSum>75 THEN 'Yes' ELSE 'No' END AS TaxIt
from main
INNER JOIN(
SELECt EMPLOYEE,SUM(AWARDVALUE) EmployeeSum
FROM MAIN
WHERE YEAR(AWARDDATE) = YEAR(GetDate())
GROUP BY EMPLOYEE) EMPLIMIT
ON Main.EMPLOYEE = EMPLIMIT.EMPLOYEE
Where awardStatus = '1' AND awardDate BETWEEN GetDate() - 7 AND GetDate()
for this you need to put after the group query the next.
SELECT A,B,C, SUM(D)
FROM TABLE
GROUP BY A,B,C
HAVING sum(awardValue) > 75

Efficient way to query separate days of data?

I want to query statistics using SQL from 3 different days (in a row). The display would be something like:
15 users created today, 10 yesterday, 12 two days ago
The SQL would be something like (for today):
SELECT Count(*) FROM Users WHERE created_date >= '2012-05-11'
And then I would do 2 more queries for yesterday and the day before.
So in total I'm doing 3 queries against the entire database. The format for created_date is 2012-05-11 05:24:11 (date & time).
Is there a more efficient SQL way to do this, say in one query?
For specifics, I'm using PHP and SQLite (so the PDO extension).
The result should be 3 different numbers (one for each day).
Any chance someone could show some performance numbers in comparison?
You can use GROUP BY:
SELECT Count(*), created_date FROM Users GROUP BY created_date
That will give you a list of dates with the number of records found on that date. You can add criteria for created_date using a normal WHERE clause.
Edit: based on your edit:
SELECT Count(*), created_date FROM Users WHERE created_date>='2012-05-09' GROUP BY date(created_date)
The best solution is to use GROUP BY DAY(created_date). Here is your query:
SELECT DATE(created_date), count(*)
FROM users
WHERE created_date > CURRENT_DATE - INTERVAL 3 DAY
GROUP BY DAY(created_date)
This would work I believe though I have no way to test it:
SELECT
(SELECT Count(*) FROM Users WHERE created_date >= '2012-05-11') as today,
(SELECT Count(*) FROM Users WHERE created_date >= '2012-05-10') as yesterday,
(SELECT Count(*) FROM Users WHERE created_date >= '2012-05-11') as day_before
;
Use GROUP BY like jeroen suggested, but if you're planning for other periods you can also set ranges like this:
SELECT SUM(IF(created_date BETWEEN '2012-05-01' AND NOW(), 1, 0)) AS `this_month`,
SUM(IF(created_date = '2012-05-09', 1, 0)) AS `2_days_ago`
FROM ...
As noted below, SQLite doesn't have IF function but there is CASE instead. So this way it should work:
SELECT SUM(CASE WHEN created_date BETWEEN '2012-05-01' AND NOW() THEN 1 ELSE 0 END) AS `this_month`,
SUM(CASE created_date WHEN '2012-05-09' THEN 1 ELSE 0 END) AS `2_days_ago`
FROM ...