I have a customer table in which a new row is inserted when a customer signup occurs.
Problem
I want to know the total number of signup per day for a given date range.
For example, find the total number of signup each day from 2015-07-01 to 2015-07-10
customer table
sample data [relevant columns shown]
customerid username created
1 mrbean 2015-06-01
2 tom 2015-07-01
3 jerry 2015-07-01
4 bond 2015-07-02
5 superman 2015-07-10
6 tintin 2015-08-01
7 batman 2015-08-01
8 joker 2015-08-01
Required Output
created signup
2015-07-01 2
2015-07-02 1
2015-07-03 0
2015-07-04 0
2015-07-05 0
2015-07-06 0
2015-07-07 0
2015-07-08 0
2015-07-09 0
2015-07-10 1
Query used
SELECT
DATE(created) AS created, COUNT(1) AS signup
FROM
customer
WHERE
DATE(created) BETWEEN '2015-07-01' AND '2015-07-10'
GROUP BY DATE(created)
ORDER BY DATE(created)
I am getting the following output:
created signup
2015-07-01 2
2015-07-02 1
2015-07-10 1
What modification should I make in the query to get the required output?
You're looking for a way to get all the days listed, even those days that aren't represented in your customer table. This is a notorious pain in the neck in SQL. That's because in its pure form SQL lacks the concept of a contiguous sequence of anything ... cardinal numbers, days, whatever.
So, you need to introduce a table containing a source of contiguous cardinal numbers, or dates, or something, and then LEFT JOIN your existing data to that table.
There are a few ways of doing that. One is to create yourself a calendar table with a row for every day in the present decade or century or whatever, then join to it. (That table won't be very big compared to the capability of a modern database.
Let's say you have that table, and it has a column named date. Then you'd do this.
SELECT calendar.date AS created,
ISNULL(a.customer_count, 0) AS customer_count
FROM calendar
LEFT JOIN (
SELECT COUNT(*) AS customer_count,
DATE(created) AS created
FROM customer
GROUP BY DATE(created)
) a ON calendar.date = a.created
WHERE calendar.date BETWEEN start AND finish
ORDER BY calendar.date
Notice a couple of things. First, the LEFT JOIN from the calendar table to your data set. If you use an ordinary JOIN the missing data in your data set will suppress the rows from the calendar.
Second, the ISNULL in the toplevel SELECT to turn the missing, null, values from your dataset into zero values.
Now, you ask, where can I get that calendar table? I respectfully suggest you look that up, and ask another question if you can't figure it out.
I wrote a little essay on this, which you can find here.http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/
Look here
Create teble with calendar and join it in your query.
DECLARE #MinDate DATE = '2015-07-01',
#MaxDate DATE = '2015-07-10';
Create Table tblTempDates
(created date, signup int)
insert into tblTempDates
SELECT TOP (DATEDIFF(DAY, #MinDate, #MaxDate) + 1)
Date = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY a.object_id) - 1, #MinDate), 0 As Signup
FROM sys.all_objects a
CROSS JOIN sys.all_objects b;
Create Table tblTempQueryDates
(created date, signup int)
INSERT INTO tblTempQueryDates
SELECT
created AS created, COUNT(scandate) AS signup
FROM
customer
WHERE
created BETWEEN #MinDate AND #MaxDate
GROUP BY created
UPDATE tblTempDates
SET tblTempDates.signup = tblTempQueryDates.signup
FROM tblTempDates INNER JOIN
tblTempQueryDates ON tblTempDates.created = tblTempQueryDates.created
select * from tblTempDates
order by created
Drop Table tblTempDates
Drop Table tblTempQueryDates
Not pretty, but it gives you what you want.
Related
I have a sqlite3 database maintained on an AWS exchange that is regularly updated by a Python script. One of the things it tracks is when any team generates a new post for a given topic. The entries look something like this:
id
client
team
date
industry
city
895
acme industries
blueteam
2022-06-30
construction
springfield
I'm trying to create a table that shows me how many entries for construction occur each day. Right now, the entries with data populate, but they exclude dates with no entries. For example, if I search for just
SELECT date, count(id) as num_records
from mytable
WHERE industry = "construction"
group by date
order by date asc
I'll get results that looks like this:
date
num_records
2022-04-01
3
2022-04-04
1
How can I make sqlite output like this:
date
num_records
2022-04-02
3
2022-04-02
0
2022-04-03
0
2022-04-04
1
I'm trying to generate some graphs from this data and need to be able to include all dates for the target timeframe.
EDIT/UPDATE:
The table does not already include every date; it only includes dates relevant to an entry. If no team posts work on a day, the date column will jump from day 1 (e.g. 2022-04-01) to day 3 (2022-04-03).
Given that your "mytable" table contains all dates you need as an assumption, you can first select all of your dates, then apply a LEFT JOIN to your own query, and map all resulting NULL values for the "num_records" field to "0" using the COALESCE function.
WITH cte AS (
SELECT date,
COUNT(id) AS num_records
FROM mytable
WHERE industry = "construction"
GROUP BY date
ORDER BY date
)
SELECT dates.date,
COALESCE(cte.num_records, 0) AS num_records
FROM (SELECT date FROM mytable) dates
LEFT JOIN cte
ON dates.date = cte.date
I am creating a customer activity by day table, which requires 9 CTEs.
The first table I want to cross join all customer unique IDs with the dates of a calendar table. So there will be multiple rows with the same unique ID for each day.
The problem is making sure the days are consecutive, regardless of the dates in the following CTEs.
This is a shortened example of what it would look like this:
GUID DATE CONDITIONS
1 3/13/2015 [NULL]
1 3/14/2015 Y
1 3/15/2015 [NULL]
....
1 9/2/2020 Y
2 4/15/2015 Y
2 4/16/2015 [NULL]
2 4/17.2015 [NULL]
2 4/18/2015 Y
...
2 9/2/2020 [NULL]
And so on - so that each customers has consecutive dates with their GUID, beginning with the creation date of their account (i.e. 3/13/2015) and ending on the current date.
the create date is on Table 1 with the unique ID, and I'm joining it with a date table.
My problem is that I can't get the query to run with a minimum create date per unique ID. Because if I don't create a minimum start date, the query runs forever (it's trying to create every unique ID for every consecutive date, even before the customer account was created.)
This is the code I have now.
Can anyone tell me if I have made the min. create date right? It's still just timing out when I run the query.
with
cte_carrier_guid (carrier_guid, email, date, carrier_id) as
(
SELECT
guid as carrier_guid
,mc.email
,dt2.date as date
,mc.id as carrier_id
FROM ctms_db_public.msd_carrier mc
CROSS JOIN public.dim_calendar dt2
WHERE dt2.date <= CURRENT_DATE
AND mc.created_at >= dt2.date
GROUP BY guid, mc.id, dt2."date", mc.email
ORDER BY guid, dt2.date asc
)
Select top 10 * from cte_carrier_guid
Here:
dt2.date <= CURRENT_DATE AND mc.created_at >= dt2.date
Since you want dates between the creation date of the user and today, you probably want the inequality condition on the creation date the other way around. I find it easier to follow when we put the lower bound first:
dt2.date >= mc.created_at AND dt2.date <= CURRENT_DATE
Other things about the query:
You want an INNER JOIN in essence, so use that instead of CROSS JOIN ... WHERE; it is clearer
ORDER BY in a cte makes no sense to me
Do you really need GROUP BY? The columns in the SELECT clause are the same as in the GROUP BY, so all this does is remove potential duplicates (but why would there be duplicates?)
You could probably phrase the cte as:
SELECT ...
FROM ctms_db_public.msd_carrier mc
INNER JOIN public.dim_calendar dt2 ON dt2.date >= mc.created_at
WHERE dt2.date <= CURRENT_DATE
I have a dataset of parts, price per part, and month. I am accessing this data via a live connection to a SQL Server database. This database gets updated monthly with new prices for each part. What I would like to do is graph one year of price data for the ten parts whose prices changed the most over the last month (either as a percentage of last month's price or as a total change in dollars.)
Since my database connection is live, ideally Tableau would grab the new price data each month, updating the top ten parts whose prices changed for the new period. I don't want to manually have to change the months or use a stored procedure if possible.
part price date
110 167.66 2018-12-01 00:00:00.000
113 157.82 2018-12-01 00:00:00.000
121 99.16 2018-12-01 00:00:00.000
133 109.82 2018-12-01 00:00:00.000
137 178.66 2018-12-01 00:00:00.000
138 154.99 2018-12-01 00:00:00.000
143 67.32 2018-12-01 00:00:00.000
149 103.82 2018-12-01 00:00:00.000
113 167.34 2018-11-01 00:00:00.000
121 88.37 2018-11-01 00:00:00.000
133 264.02 2018-11-01 00:00:00.000
Create a calculated field called Recent_Price as
if DateDiff(‘month’, [date], Today()) <= 1 then [price] end. This returns the price for recent records and null for older records. You might need to tweak the condition based on details, or use an LOD calc to always get the last 2 values regardless of today’s date.
Create a calculated field called Price_Change as Max([Recent_Price]) - Min([Recent_Price]) Note you can’t tell from this whether the change was positive or negative, just its magnitude.
Make sure part is a discrete dimension. Drag it to the Filter Shelf. Set the filter to show the the Top N part by Price_Change
It’s not hard to extend this to include the sign in the price change, or to convert it a percentage. Hint, you’ll probably need a pair of calcs like that in step 1 to select prices for specific months
You haven't provided any sample data, but you could follow something like this,
;WITH top_parts AS (
-- select the top 10 parts based on some criteria
SELECT TOP 10 parts.[id], parts.[name] FROM parts
ORDER BY <most changed>
)
SELECT price.[date], p.[name], price.[price] FROM top_parts p
INNER JOIN part_price price ON p.[id] = price.[part_id]
ORDER BY price.[date]
Use a CTE to get your top parts.
Select from the CTE, join to the price table to get the prices for each part.
Order the prices or bucketize them into months.
Feed it to your graph.
It will be something like this for just one month. If you need the whole year you have to specify clearly what exactly you want to see:
;WITH cte as (
SELECT TOP 10 m0.Part
, Diff = ABS(m0.Price - m1.Price)
, DiffPrc = ABS(m0.Price - m1.Price) / m1.Price
FROM Parts as m0
INNER JOIN (SELECT MaxDate = MAX([Date] FROM Parts) as md
ON md.MaxDate = m0.[Date]
INNER JOIN Parts as m1 ON m0.Part = m1.Part and DATEADD(MONTH,-1,md.MaxDate) = m1.[Date]
ORDER BY ABS(m0.Price - m1.Price) DESC
-- Top 10 by percentage:
-- ORDER BY ABS(m0.Price - m1.Price) / m1.Price DESC
)
SELECT * FROM Parts as p
INNER JOIN cte ON cte.Part = p.Part
-- Input from user,you decide in which format last month date will be pass
-- In other words , #InputLastMonth is parameter of proc
--Suppose it pass in yyyy-MM-dd manner
Declare #InputLastMonth date='2018-12-31'
-- to get last one year data
--Declare local variable which is not pass
declare #From date= dateadd(day,1,dateadd(month,-12, #InputLastMonth))
Declare #TopN int=10-- requirement
--select #InputLastMonth,#From
Select TOP (#TopN) parts,ChangePrice
(
select parts,ABS(max(price)-min(price)) as ChangePrice
from dbo.Table1
where dates>=#From and dates<=#InputLastMonth
group by parts
)t4
order by ChangePrice desc
By change most ,I understand that,suppose there is one parts 'Part1' which was price 100 in first month and change to 1000 in last months.
On the other hand Part2 change several times during same period but final change was only 12.
In other word Part1 change only twice but change difference was huge,Part2 change several time but change difference was small.
So Part1 will be preferred.
Second thing is change can be negative as well as positive.
Correct me if I have not understood your requirement.
So this is somewhat of a common question on here but I haven't found an answer that really suits my specific needs. I have 2 tables. One has a list of ProjectClosedDates. The other table is a calendar table that goes through like 2025 which has columns for if the row date is a weekend day and also another column for is the date a holiday.
My end goal is to find out based on the ProjectClosedDate, what date is 5 business days post that date. My idea was that I was going to use the Calendar table and join it to itself so I could then insert a column into the calendar table that was 5 Business days away from the row-date. Then I was going to join the Project table to that table based on ProjectClosedDate = RowDate.
If I was just going to check the actual business-date table for one record, I could use this:
SELECT actual_date from
(
SELECT actual_date, ROW_NUMBER() OVER(ORDER BY actual_date) AS Row
FROM DateTable
WHERE is_holiday= 0 and actual_date > '2013-12-01'
ORDER BY actual_date
) X
WHERE row = 65
from here:
sql working days holidays
However, this is just one date and I need a column of dates based off of each row. Any thoughts of what the best way to do this would be? I'm using SQL-Server Management Studio.
Completely untested and not thought through:
If the concept of "business days" is common and important in your system, you could add a column "Business Day Sequence" to your table. The column would be a simple unique sequence, incremented by one for every business day and null for every day not counting as a business day.
The data would look something like this:
Date BDAY_SEQ
========== ========
2014-03-03 1
2014-03-04 2
2014-03-05 3
2014-03-06 4
2014-03-07 5
2014-03-08
2014-03-09
2014-03-10 6
Now it's a simple task to find the N:th business day from any date.
You simply do a self join with the calendar table, adding the offset in the join condition.
select a.actual_date
,b.actual_date as nth_bussines_day
from DateTable a
join DateTable b on(
b.bday_seq = a.bday_seq + 5
);
If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using some sort of CROSS JOIN, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
1 5 05-Jan-2007 5000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
I'm using MS SQL Server 2005 (no hope for an upgrade soon) and I can already do this using a table variable and while loop, but it seemed like some sort of CROSS JOIN would apply but I don't know how that might work.
Your thoughts are appreciated.
EDIT: I'm actually using SQL Server 2005 though I initially said 2000. We aren't quite as backwards as I thought. Sorry.
I cannot test the code right now, so take it with a pinch of salt, but I think that something looking more or less like the following should answer the question:
with q(SchedId, PaymentNum, DueDate, RunningExpectedTotal) as
(select SchedId,
1 as PaymentNum,
StartDate as DueDate,
PaymentAmt as RunningExpectedTotal
from PaymentScheduleTable
union all
select q.SchedId,
1 + q.PaymentNum as PaymentNum,
DATEADD(month, s.Frequency, q.DueDate) as DueDate,
q.RunningExpectedTotal + s.PaymentAmt as RunningExpectedTotal
from q
inner join PaymentScheduleTable s
on s.SchedId = q.SchedId
where q.PaymentNum <= s.Term / s.Frequency)
select *
from q
order by SchedId, PaymentNum
Try using a table of integers (or better this: http://www.sql-server-helper.com/functions/integer-table.aspx) and a little date math, e..g. start + int * freq
I've used table-valued functions to achieve a similar result. Basically the same as using a table variable I know, but I remember being really pleased with the design.
The usage ends up reading very well, in my opinion:
/* assumes #startdate and #enddate schedule limits */
SELECT
p.paymentid,
ps.paymentnum,
ps.duedate,
ps.ret
FROM
payment p,
dbo.FUNC_get_payment_schedule(p.paymentid, #startdate, #enddate) ps
ORDER BY p.paymentid, ps.paymentnum
A typical solution is to use a Calendar table. You can expand it to fit your own needs, but it would look something like:
CREATE TABLE Calendar
(
calendar_date DATETIME NOT NULL,
is_holiday BIT NOT NULL DEFAULT(0),
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED calendar_date
)
In addition to the is_holiday you can add other columns that are relevant for you. You can write a script to populate the table up through the next 10 or 100 or 1000 years and you should be all set. It makes queries like that one that you're trying to do much simpler and can give you additional functionality.