My question is specific to my problem at hand, so I would try to explain the scenario first. I need to write a sql query. Following is the scenario:
Table columns for table1:
effective_date
expire_date
amount_value
state_id
amount_entry_type
Case 1, Input values:
input_date
I've achieved it using following sql query:
Sample Query:
select state_id, sum(amount)
from table1
where state_id=3 and (input_date between effective_date and expiry_date)
group by state_id;
My Question:
Now I've a date range and I wish to achieve the above for all the dates between date range.
Input values 2:
input_start_date
input_end_date
Expected Output:
Find the sum of amount_value grouped by states where input_date between effective and expire_date for input_date between input_start_date and input_end_date.
So the query would give following sample result for date range 2016-07-07 and 2016-07-08 :
state amount_sum date
California 100 2016-07-07
Florida 200 2016-07-08
I'm using postgres as database and django for querying and processing the result.
Options:
1. Fetch all the data and process using python.
2. Loop over given date range and fire the query above as:
for input_date in date_range(input_start_date, input_end_date):
//Execute above query
Both above solutions might have performance issues so I was wondering if I could achieve it using single sql query.
You can indeed do this with a single query, using the generate_series() set-returning-function to make the list of days. If you are sure that all dates have corresponding rows for the state then you can you use a regular JOIN, otherwise use a LEFT JOIN as below.
SELECT state_id, sum(amount), dt AS "date"
FROM generate_series(input_start_date, input_end_date, '1 day') dates(dt)
LEFT JOIN table1 ON state_id = 3 AND (dt BETWEEN effective_date AND expiry_date)
GROUP BY state_id, dt;
Related
I have a sqlite3 database maintained on an AWS exchange that is regularly updated by a Python script. One of the things it tracks is when any team generates a new post for a given topic. The entries look something like this:
id
client
team
date
industry
city
895
acme industries
blueteam
2022-06-30
construction
springfield
I'm trying to create a table that shows me how many entries for construction occur each day. Right now, the entries with data populate, but they exclude dates with no entries. For example, if I search for just
SELECT date, count(id) as num_records
from mytable
WHERE industry = "construction"
group by date
order by date asc
I'll get results that looks like this:
date
num_records
2022-04-01
3
2022-04-04
1
How can I make sqlite output like this:
date
num_records
2022-04-02
3
2022-04-02
0
2022-04-03
0
2022-04-04
1
I'm trying to generate some graphs from this data and need to be able to include all dates for the target timeframe.
EDIT/UPDATE:
The table does not already include every date; it only includes dates relevant to an entry. If no team posts work on a day, the date column will jump from day 1 (e.g. 2022-04-01) to day 3 (2022-04-03).
Given that your "mytable" table contains all dates you need as an assumption, you can first select all of your dates, then apply a LEFT JOIN to your own query, and map all resulting NULL values for the "num_records" field to "0" using the COALESCE function.
WITH cte AS (
SELECT date,
COUNT(id) AS num_records
FROM mytable
WHERE industry = "construction"
GROUP BY date
ORDER BY date
)
SELECT dates.date,
COALESCE(cte.num_records, 0) AS num_records
FROM (SELECT date FROM mytable) dates
LEFT JOIN cte
ON dates.date = cte.date
Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily
I would like to get the output for the over lapping date records
> Data: Id Open_date Closed_Date
> 1 2016-01-01 2017-01-01
**> 1 2016-12-31 2018-21-01
> 1 2016-01-01 2018-01-01**
> 2 2017-01-01 2018-02-02
Here, you see the second & 3rd records are starting with date than the closed_Date of their previous records. Here i need to identify those type of records
As you question is not much clear, I am assuming that you are looking for min of open date and max of close date.
If this is not the requirement edit the question to provide more details.
select id, min(Open_date), max(Closed_Date)
from table
group by id
Looks like you want to normalize a Slowly Changing Dimension Type 2. Of course the best way to handle them would be using Temporal tables using either Teradata or ANSI syntax.
There's a nice syntax in Teradata to get your expected result based on the Period data type, but it's imple to cast your begin/end dates to a period:
SELECT id,
-- split the period back into seperate dates
Begin(pd) AS Open_date,
End(pd) AS Closed_Date
FROM
(
SELECT NORMALIZE -- magic keyword :-)
id, PERIOD(Open_date, Closed_Date) AS pd
FROM tab
) AS dt
I want to select rows for a field MRD which is declared as date where it is prior for that date only.
So
(case when sum (transPoints) > 4 and MRD is that same date then 4
So if a row has a date of today, I want the case when to be triggered when the transaction points are bigger than 4 against all columns with the same date.
As you can imagine the date field will be different against many rows.
Based on what I can understand from your question, it seems that the GROUP BY clause may be what you're looking for. If your date column is in the correct format then you may have to use something like:
SELECT CAST(DateColumn as DATE)
FROM YourTable
GROUP BY CAST(DateColumn as DATE)
I'm trying to compare data from an Access 2010 database based on a date interval. Example I have items from various purchase orders and I want to maintain the history of these item's delivery to a warehouse. So my purchase order has a request for a quantity of 10 of a material, for example, and it can be partially delivered in many deliveries and I want to know how this delivery varied in a date interval. To fill the date field the criteria used is the following: if the item had an update in the QtyPending field, I copy the current row deactivating it with a booelan field, create a new entry with the current update date updating the QtyPending field, so the active record is the actual state of the item. So I have a table that holds informations about these items like that
PO POItem QtyPending Date Active
4500000123 10 10 01/09/2014 FALSE
4500000123 10 8 05/09/2014 TRUE
4500000122 30 5 03/09/2014 FALSE
4500000122 30 1 04/09/2014 TRUE
With this example, for the first item, it means that from date 01/09 to 04/09 the QtyPending field didn't suffer a variation, meaning that the supplier didn't make any delivery to me, but from 01/09 to 05/08 he delivered me a qty of 2 of a material. For the second one, from date 03/09 to 04/09 the supplier delivered me a qty of 4 of a material. So, if I were to be making a report query from 02/09/2014 to 04/09/2014, the expected output is like this:
PO POItem QtyDelivered
4500000123 10 0
4500000122 30 4
And a report from 31/08/2014 to 10/09/2014, would have this output
PO POItem QtyDelivered
4500000123 10 2
4500000122 30 4
I'm not coming up with a query to make this report. Can anyone help me?
There are many ways of solving this. The easiest one would be to simply make a query of all the necessary records between two dates, loop over them and insert into a temporary table the result. This temporary table can then be the source of your report. A lot of people will scream at you for not using a big query instead but getting the result that you want in the fastest and simplest way should be your priority.
Your problem with your schema is that you don't have the QtyDelivered stored for each record. If you would have it, it would be an easy thing to sum over it in order to get needed result. By not storing this value, you have transformed a simple and fast query into a much harder and slower one because you need to recalculate this value in some way or other and you must do this without forgetting the fact that it's possible to have more than two records.
For calculating this value, you can either use a sub-query to retrieve the value from the previous row or a Left join do to the same. Once you have this value, you can subtract these two to get the needed difference; allowing for the possibility of Null value if there is no previous row. Once you have these values, you can now sum over them to get the final result with a Group By. Notice that in order to perform these calculations, you need to have one or two more levels of subquery. The first query should be something like:
Select PO, POItem, QtyPending, (Select Top 1 QtyPending from MyTable T2 where T1.PO = T2.PO and T2.Date < T1.Date And (T2.Date between #Date1 and #Date2) Order by T2.Date Desc) as QtyPending2 from MyTable T1 Where T1.Date between #Date1 and #Date2) ...
With this as either another subquery or as a View, you can then compute the desired difference by comparing the values of QtyPending and QtyPending2; without forgetting that QtyPendin2 may be Null. The remaining steps are easy to do.
Notice that the above example is for SQL-Server, you might have to change it a little for Access. In any case, you can find here many examples on how to compare two rows under Access. As noted earlier, you can also use a Left Join instead of a subquery to compare your rows.
I came up with this query that solved the problem, it wasn't that simple
SELECT
ItmDtIni.PO
,ItmDtIni.POItem AS [PO Item]
,ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) AS [Qty Delivered]
,ROUND((ItmDtIni.QtyPending - ItmDtEnd.QtyPending) * ItmDtEnd.Price, 2) AS [Value delivered(US$)]
//Filtering subqueries to bring only the items in the date interval to make a self join
FROM (((SELECT
PO
,POItem
,QtyPending
,MIN(Date) AS MinDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending) AS ItmDtIni
//Self join filtering to bring only items in the date interval with the previously filtered table
INNER JOIN (SELECT
PO
,POItem
,QtyPending
,Price
,MAX(Date) AS MaxDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending
,Price) AS ItmDtEnd
ON ItmDtIni.PO = ItmDtEnd.PO
AND ItmDtIni.POItem = ItmDtEnd.POItem)
INNER JOIN PO
ON ItmDtEnd.PO = PO.Numero)
WHERE
//Showing only items that had a variation in the date interval
ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) <> 0
//Anchoring min date in the interval for each item found by the first subquery
AND ItmDtIni.MinDate = (SELECT MIN(Item.Date)
FROM Item
WHERE
ItmDtIni.PO = Item.PO
AND ItmDtIni.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))
//Anchoring max date in the interval for each item found by the second subquery
AND ItmDtEnd.MaxDate = (SELECT MAX(Item.Date)
FROM Item
WHERE
ItmDtEnd.PO = Item.PO
AND ItmDtEnd.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))