I have a table with the following data columns
date item
Every row is a transaction record of what item was bought on which date. How do I write a SQL to find the item that has gained most sales as a percentage over two time ranges. For example, to get at sales by items on a daily basis for a week, I can do
select
date,
item,
count(*)
from table
date >= '2012-01-01'
date <= '2012-01-07'
group by 1,2 order by 1
If I change the date range for the next week, I can get the items that sold on that week.
But I want to find a SQL solution to get this done in one pull without having to manipulate the data in Excel.
I always get this backwards the first time, but something like the below might get you started.
SELECT d1.item, COUNT(d1.item) / COUNT(d2.item) AS slope
FROM data AS d1
JOIN data AS d2 ON d2.item = d1.item
WHERE d1.date > '2012-01-01'
AND d2.date > '2012-01-07'
GROUP BY d1.item
ORDER BY COUNT(d1.item) / COUNT(d2.item) DESC
Have you tried this
select item
, 100 * count(*) / ( select count(*)
from table
where date between '2012-01-01' and '2012-01-07'
) as avrg
from table
where date between '2012-01-01' and '2012-01-07'
group by item
order by item
Related
For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC
I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.
I have a SQL code that looks like this:
select cast(avg(age) as decimal(16,2)) as 'avg' From
(select distinct acct.Account, cast(Avg(year(getdate())- year(client_birth_date)) as decimal(16,2)) as 'Age'
from WF_PM_ACCT_DB DET
inner join WF_PM_ACCT_DET_DB ACCT
ON det.Account = acct.Account
where (acct_closing_date is null or acct_closing_date > '2017-01-01')
and Acct_Open_Date < '2017-01-01'
group by acct.Account
) x
Then basically what this give me is a simple one cell answer of the average age of accounts in the year Acct_Open_Date < '2017-01-01' . I am an ameture so i change the date everytime and run the query again and again to get the remaining year. Is there an easy way to say lets have all the years as column headings and just one row with the average account age in that year.
Please note that the account closing date being null means accounts never got close and i have to change it to less than the analysis year in order to get a true picture of the average account age that existed at that time
Any help is appreciated. Thanks.
You can run this for multiple dates by including them in a single derived table:
with dates as (
select cast('2017-01-01' as date) as yyyy union all
select cast('2016-01-01' as date)
)
select yyyy, cast(avg(age) as decimal(16,2)) as avg_age
From (select dates.yyyy, acct.Account,
cast(Avg(year(getdate())- year(client_birth_date)) as decimal(16,2)) as Age
from dates cross join
WF_PM_ACCT_DB DET inner join WF_PM_ACCT_DET_DB
ACCT
on det.Account = acct.Account
where (acct_closing_date is null or acct_closing_date > dates.yyyy) and
Acct_Open_Date < dates.yyyy
group by acct.Account, dates.yyyy
) x
group by yyyy
order by yyyy;
I want to run a report from sales of any customer that has ordered in the last two years.
I can run a report of all invoices dated within two years then remove duplicates in excel, but I would rather do it directly within (Firebird) SQL
I can use a WHERE date < 1 Jan 2015 (2 years or thereabours), but how do I get it to only show the customer once? I thought if I used MAX(Date) therefore showing the most recent date in that two year period. Where am I going wrong? I believe I need to use a UNIQUE() function like UNIQUE(ORDERCUSTOMER) within the SELECT clause.
SELECT
FINANCIALSALESINVOICES.TRANSACTIONDATE,
FINANCIALSALESINVOICES.INVOICECUSTOMER,
FINANCIALSALESINVOICES.ORDERCUSTOMER,
FINANCIALSALESINVOICES.INVOICENUMBER,
FINANCIALSALESINVOICES.SOURCENUMBER,
MAX(FINANCIALSALESINVOICES.TRANSACTIONDATE)
FROM FINANCIALSALESINVOICES
WHERE (FINANCIALSALESINVOICES.TRANSACTIONDATE>={d '2015-01-01'})
ORDER BY FINANCIALSALESINVOICES.INVOICECUSTOMER, FINANCIALSALESINVOICES.TRANSACTIONDATE
I did having it showing the max date for each instance of invoice in the past two years, but now can't fine that file or replicate it.
One approach is to use a subquery in the WHERE clause which checks for the most recent invoice:
SELECT
t.TRANSACTIONDATE,
t.INVOICECUSTOMER,
t.ORDERCUSTOMER,
t.INVOICENUMBER,
t.SOURCENUMBER
FROM FINANCIALSALESINVOICES t
WHERE t.TRANSACTIONDATE >= date '2015-01-01' AND
t.TRANSACTIONDATE = (SELECT MAX(f.TRANSACTIONDATE)
FROM FINANCIALSALESINVOICES f
WHERE t.ORDERCUSTOMER = f.ORDERCUSTOMER AND
f.TRANSACTIONDATE >= date '2015-01-01')
ORDER BY t.INVOICECUSTOMER,
t.TRANSACTIONDATE
With Firebird 3 you can use row_number() to assign a unique value to each row within a group (partition), that value can then be filtered on:
select
a.TRANSACTIONDATE,
a.INVOICECUSTOMER,
a.ORDERCUSTOMER,
a.INVOICENUMBER,
a.SOURCENUMBER
from (
select
TRANSACTIONDATE,
INVOICECUSTOMER,
ORDERCUSTOMER,
INVOICENUMBER,
SOURCENUMBER,
row_number() over (partition by INVOICECUSTOMER, order by TRANSACTIONDATE desc) as rownr
from FINANCIALSALESINVOICES
where TRANSACTIONDATE >= date '2015-01-01'
) a
where a.rownr = 1
order by a.INVOICECUSTOMER, a.TRANSACTIONDATE
See also Window (Analytical) Functions in the Firebird 3 release notes.
I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a