SQL Query Across Partitioned Database (by Day) - sql

So in the past 3 months, we have gone from a Google Sheets with 5 tabs, up to a connected BigQuery DB referencing the Google Sheets with 5 tables and writing queries. Today, we just upgraded again to a full daily partitioned database.
I am struggling to figure out how to write my queries across multiple days of data.
When I go to start the query it defaults to today.
SELECT order_number
FROM `project-123456.client_name.orders`
WHERE DATE(submitted_date) = "2022-02-10"
LIMIT 1000
I am trying to figure out the syntax for the month of January for example (and I know this isn't right)
WHERE DATE(submitted_date) = Jan 1 - Jan 31.
Any suggestions would be great, I am learning SQL at an alarming pace but in this case, I think I just don't know the right question to ask.

Ok I figured it out.
WHERE DATE(submitted_date) >= "2022-01-01" AND DATE(submitted_date) <= "2022-01-31"

Another option:
where date_trunc(date(submitted_date), month) = '2022-01-01'

Related

Need column comprised of data from date two weeks ago for comparison

Let me start by saying that I am somewhat new to SQL/Snowflake and have been putting together queries for roughly 2 months. Some of my query language may not be ideal and I fully understand if there's a better, more efficient way to execute this query. Any and all input is appreciated. Also, this particular query is being developed in Snowflake.
My current query is pulling customer volumes by department and date based on a 45 day window with a 24 day lookback from current date and a 21 day look forward based on scheduled appointments. Each date is grouped based on where it falls within that 45 day window: current week (today through next 7 days), Week 1 (forward-looking days 8-14), and Week 2 (forward-looking days 15-21). I have been working to try and build out a comparison column that, for any date that lands within either the Week 1 or Week 2 group, will pull in prior period volumes from either 14 days prior (Week 1) or 21 days prior (Week 2) but am getting nowhere. Is there a best-practice for this type of column? Generic example of the current output is attached. Please note that the 'Prior Wk' column in the sample output was manually populated in an effort to illustrate the way this column should ideally work.
I have tried several different iterations of count(case...) similar to that listed below; however, the 'Prior Wk' column returns the count of encounters/scheduled encounters for the same day rather than those that occurred 14 or 21 days ago.
Count(Case When datediff(dd,SCHED_DTTM,getdate())
between -21 and -7 then 1 else null end
) as "Prior Wk"
I've tried to use an IFF statement as shown below, but no values return.
(IFF(ENCOUNTER_DATE > dateadd(dd,8,getdate()),
count(case when ENC_STATUS in (“Phone”,”InPerson”) AND
datediff(dd,ENCOUNTER_Date,getdate()) between 7 and 14 then 1
else null end), '0')
) as "Prior Wk"
Also have attempted creating and using a temporary table (example included) but have not managed to successfully pull information from the temp table that didn't completely disrupt my encounter/scheduled counts. Please note for this approach I've only focused on the 14 day group and have not begun to look at the 21 day/Week 2 group. My attempt to use the temp table to resolve the problem centered around the following clause (temp table alias: "Date1"):
CASE when AHS.GL_Number = "DATEVISIT1"."GL_NUMBER" AND
datevisit1.lookback14 = dateadd(dd,14,PE.CONTACT_Date)
then "DATEVISIT1"."ENC_Count"
else null end
as "Prior Wk"*
I am extremely appreciative of any insight on the current best practices around pulling prior period data into a column alongside current period data. Any misuse of terminology on my part is not deliberate.
I'm struggling to understand your requirement but it sounds like you need to use window functions https://docs.snowflake.com/en/sql-reference/functions-analytic.html, in this case likely a SUM window function. The LAG window function, https://docs.snowflake.com/en/sql-reference/functions/lag.html, might also be of some help

SQL Statement - want daily dates rolled up and displayed as Year

I have two years worth of data that I'm summing up for instance
Date | Ingredient_cost_Amount| Cost_Share_amount |
I'm looking at two years worth of data for 2012 and 2013,
I want to roll up all the totals so I have only two rows, one row for 2012 and one row for 2013. How do I write a SQL statement that will look at the dates but display only the 4 digit year vs 8 digit daily date. I suspect the sum piece of it will be taken care of by summing those columns withe calculations, so I'm really looking for help in how to tranpose a daily date to a 4 digit year.
Help is greatly appreciated.
select DATEPART(year,[Date]) [Year]
, sum(Ingredient_cost_Amount) Total
from #table
group by DATEPART(year,[Date])
Define a range/grouping table.
Something similar to the following should work in most RDBMSs:
SELECT Grouping.id, SUM(Ingredient.ingredient_cost_amount) AS Ingredient_Cost_Amount,
SUM(Ingredient.cost_share_amount) AS Cost_Share_Amount
FROM (VALUES (2013, DATE('2013-01-01'), DATE('2014-01-01')),
(2012, DATE('2012-01-01'), DATE('2013-01-01'))) Grouping(id, gStart, gEnd)
JOIN Ingredient
ON Ingredient.date >= Grouping.gStart
AND Ingredient.date < Grouping.gEnd
GROUP BY Grouping.id
(DATE() and related conversion functions are heavily DB dependent. Some RDBMSs don't support using VALUES this way, although there are other ways to create the virtual grouping table)
See this blog post for why I used an exclusive upper bound for the range.
Using a range table this way will potentially allow the db to use indices to help with the aggregation. How much this helps depends on a bunch of other factors, like the specific RDBMS used.

is it possible to find out how much of the db data is older than some N years in SQL Server?

I have two database in SQL Server. I wanted to find out the data older than (let say 3) years.
I know the database creation date, currently I have around 550 GB (both the database) of data spanned for 7 years, I wanted to know 'how much of the DB data (out of total 550 GB)is older than 3 years OR (5 years)'?
I was going through this link but couldn't get the expected data.
SQL SERVER – Query to find number Rows, Columns, ByteSize for each table in the current database – Find Biggest Table in Database
One of the solution coming in my mind right now is to find out the total number of rows accounted for 7 years (easily get this number), total number of rows accounted for 5 years (starting from the date creation) (don't know how to get this number).
then for row_count_7_years accounts for 550 GB of data , what will be the row_count_5_years? i will get the approx data.
Please Help
For such purposes you should keep some datetime field as marc mentioned. I suppose you don't have it.
In you suggested solution you can get the whole count of rows from your table (for 7 years i suppose), but you wouldn't be able to get the rows for 5 years, because there is no date.
You can get the whole number of records for 7 years and divide them on the number of years, and ONLY IN CASE you have your database avarage fulfill, you can make query for top (numberOFRows in one year)*5 and order them by row_number(). The result - the rows, you should delete. But I wouldn't recommend you to use this solution.
I would recommend you to alter your tables and add the datetime columns for each of them. Before that you should make the backup for the whole date and copy it somewhere. After 3 years you would be able to make your clean up.
as mentioned above u shud have a date column , however if you dont , depending on the realtionships in your tables u might be able to estimate the number of rows looking up realtionships with some other table that has the datetime column , else if you have a backup ( unlikely but still) you can restore that to identify the delta

SQL to select records for a specific date given created time and modified time

CONTEXT
I've been asked by my management to "analyze" our issue tracking database - they use it to catalog our internal bugs, etc. My SQL and DB skills are primitive so I need some help.
THE DATA
I received a single table of 3 million records. It accounts for 250K bugs. Each revision of a bug is a row in the table. That's how 250K bugs ends up in 3 million records.
The data looks like this
BugID Created Modified AssignedTo Priority Status
27 mar-31-2003 mar-31-2003 mel 2 Open
27 mar-31-2003 apr-01-2003 mel 1 Open
27 mar-31-2003 apr-10-2003 steve 1 Fixed
Thus, I have the complete history of every bug and can see how they have evolved every day.
WHAT I WANT TO ACCOMPLISH
I have a lot of things I've been asked to provide as reports. But the most basic question I have been asked to do is enable someone to look at the bugs as they existed at a specific date.
For example, if someone asked for all the bugs on mar 1 2003, then bug 27 isn't in the result because it doesn't exist on that day. Or if they asked for the bugs on April 7 they'd see bug 27 and that still marked as open
MY SPECIFIC QUESTION
Given the schema I outlined, what SQL query will provide a view of the records on a specific date?
TECHNICAL DETAILS
I am using Microsoft SQL Server 2008
WHAT I'VE TRIED SO FAR
As I said my SQL skills are primitive. I was able use WHERE clauses to filter out modifications made after the target date and bugs that didn't exist by the target date, but wasn't able to find the single record happened on that date.
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY BugID ORDER BY Modified DESC) AS sequence_id,
*
FROM
yourTable
WHERE
Modified <= #datetime_stamp
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
This assumes you want to see the fixed bugs. If you want to filter out bugs that were fixed 'a long time ago' (say, 30 days), add this...
AND (Status <> 'Fixed' OR Modified >= DATEADD(DAY, -30, #datetime_stamp))

Selecting records from the past three months

I have 2 tables from which i need to run a query to display number of views a user had in the last 3 months from now.
So far I have come up with: all the field types are correct.
SELECT dbo_LU_USER.USERNAME
, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE (((DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=0
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=1
Or (DateDiff("m",[dbo_SDB_SESSION].[SESSIONSTART],Now()))=2))
GROUP BY dbo_LU_USER.USERNAME;
Basically, the code above display a list of all records within the past 3 months; however, it starts from the 1st day of the month and ends on the current date, but I need it to start 3 months prior to today's date.
Also to let you know this is SQL View in MS Access 2007 code.
Thanks in advance
Depending on how "strictly" you define your 3 months rule, you could make things a lot easier and probably efficient, by trying this:
SELECT dbo_LU_USER.USERNAME, Count(*) AS No_of_Sessions
FROM dbo_SDB_SESSION
INNER JOIN dbo_LU_USER
ON dbo_SDB_SESSION.FK_USERID = dbo_LU_USER.PK_USERID
WHERE [dbo_SDB_SESSION].[SESSIONSTART] between now() and DateAdd("d",-90,now())
GROUP BY dbo_LU_USER.USERNAME;
(Please understand that my MS SQL is a bit rusty, and can't test this at the moment: the idea is to make the query scan all record whose date is between "TODAY" and "TODAY-90 days").