Avoiding roundtrips in the database caused by looping - sql

I am using postgres and, I recently encountered that the code I am using has too many roundtrips.
What I am doing is basically getting data from a table on a daily basis because I have to look for changes on a daily basis, but the whole function that does this job is called once a month.
An example of my table
Amount
Id | Itemid | Amount | Date
1 | 2 | 50 | 20-5-20
Now this table can be updated to add items at any point in time and I have to see the total amount that is SUM(Amount) every day.
But here's the catch, I have to add interest to the amount of each day at the rate of 5%.
So I can't just once call the function, I have to look at its value every day.
For example if I add an item of 50$ on the 1st of may then the interest on that day is 5/100*50
I add another item on the 5th of may worth 50$ and now the interest on the 5th day is 5/100*50.
But prior to 5th, the interest was on only 50$ so If I just simply use SUM(Amount)*5/100. It is wrong.
Also, another issue is the fact that dates are stored as timestamps and I need to group it by date of the timestamp because if I group it on the basis of timestamp then it will create multiple rows for the same date which I want to avoid while taking the sum.
So if there are two entries on the same date but different hours ideally the query should sum it up as one single date.
Example
Amount Table
Date | Amount
2020-5-5 20:8:8 100
2020-5-5 7:8:8 | 100
Result should be
Amount Table
Date | Amount
2020-5-5 200
My current code.
for i in numberofdaysinthemonth:
amount = amount + session.query(func.sum(Amount.Amount)).filter(Amount.date<current_date).scalar() * 5/100
I want a query that gets all these values according to dates, for example
date | Sum of amount till that date
20-5-20 | 50
20-6-20 | 100
Any ideas about what I should do to avoid a loop that runs 30 times since the function is called once in a month.

I am supposed to get all this data in a table daywise and aggregated as the sum of amount for each day
That is a simple "running total"
select "date",
sum(amount) over (order by "date") as amount_til_date
from the_table
order by "date";
If you need the amount per itemid
select "date",
sum(amount) over (partition by itemid order by "date") as amount_til_date
from the_table
order by "date";
If you also need to calculate the "compound interest rate" up to that day, you can do that as well:
select item_id,
"date",
sum(amount) over (partition by itemid order by "date") as amount_til_date,
sum(amount) over (partition by item_id order by "date") * power(1.05, count(*) over (partition by item_id order by "date")) as compound_interest
from the_table
order by "date";
To get that for a specific month, add a WHERE clause:
where "date" >= date '2020-06-01'
and "date" < date '2020-07-01'

In general to avoid round trips between application and database, application code must be moved from application to database in stored code (stored procedures an stored functions) using a procedural language. This approach is sometimes called "thick database" in commercial databases like Oracle Database.
PostgreSQL default procedural language is pl/pgsql but you can use Java, Perl, Python, Javascript using PostgreSQL extensions that you would need to install in PostgreSQL.

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

How many customers upgraded from Product A to Product B?

I have a "daily changes" table that records when a customer "upgrades" or "downgrades" their membership level. In the table, let's say field 1 is customer ID, field 2 is membership type and field 3 is the date of change. Customers 123 and ABC each have two rows in the table. Values in field 1 (ID) are the same, but values in field 2 (TYPE) and 3 (DATE) are different. I'd like to write a SQL query to tell me how many customers "upgraded" from membership type 1 to membership type 2 how many customers "downgraded" from membership type 2 to membership type 1 in any given time frame.
The table also shows other types of changes. To identify the records with changes in the membership type field, I've created the following code:
SELECT *
FROM member_detail_daily_changes_new
WHERE customer IN (
SELECT customer
FROM member_detail_daily_changes_new
GROUP BY customer
HAVING COUNT(distinct member_type_cd) > 1)
I'd like to see an end report which tells me:
For Fiscal 2018,
X,XXX customers moved from Member Type 1 to Member Type 2 and
X,XXX customers moved from Member Type 2 to Member type 1
Sounds like a good time to use a LEAD() analytical function to look ahead for a given customer's member_Type; compare it to current record and then evaluate if thats an upgrade/downgrade then sum results.
DEMO
CTE AS (SELECT case when lead(Member_Type_Code) over (partition by Customer order by date asc) > member_Type_Code then 1 else 0 end as Upgrade
, case when lead(Member_Type_Code) over (partition by Customer order by date asc) < member_Type_Code then 1 else 0 end as DownGrade
FROM member_detail_daily_changes_new
WHERE Date between '20190101' and '20190201')
SELECT sum(Upgrade) upgrades, sum(downgrade) downgrades
FROM CTE
Giving us: using my sample data
+----+----------+------------+
| | upgrades | downgrades |
+----+----------+------------+
| 1 | 3 | 2 |
+----+----------+------------+
I'm not sure if SQL express on rex tester just doesn't support the sum() on the analytic itself which is why I had to add the CTE or if that's a rule in non-SQL express versions too.
Some other notes:
I let the system implicitly cast the dates in the where clause
I assume the member_Type_Code itself tells me if it's an upgrade or downgrade which long term probably isn't right. Say we add membership type 3 and it goes between 1 and 2... now what... So maybe we need a decimal number outside of the Member_Type_Code so we can handle future memberships and if it's an upgrade/downgrade or a lateral...
I assumed all upgrades/downgrades are counted and a user can be counted multiple times if membership changed that often in time period desired.
I assume an upgrade/downgrade can't occur on the same date/time. Otherwise the sorting for lead may not work right. (but if it's a timestamp field we shouldn't have an issue)
So how does this work?
We use a Common table expression (CTE) to generate the desired evaluations of downgrade/upgrade per customer. This could be done in a derived table as well in-line but I find CTE's easier to read; and then we sum it up.
Lead(Member_Type_Code) over (partition by customer order by date asc) does the following
It organizes the data by customer and then sorts it by date in ascending order.
So we end up getting all the same customers records in subsequent rows ordered by date. Lead(field) then starts on record 1 and Looks ahead to record 2 for the same customer and returns the Member_Type_Code of record 2 on record 1. We then can compare those type codes and determine if an upgrade or downgrade occurred. We then are able to sum the results of the comparison and provide the desired totals.
And now we have a long winded explanation for a very small query :P
You want to use lag() for this, but you need to be careful about the date filtering. So, I think you want:
SELECT prev_membership_type, membership_type,
COUNT(*) as num_changes,
COUNT(DISTINCT member) as num_members
FROM (SELECT mddc.*,
LAG(mddc.membership_type) OVER (PARTITION BY mddc.customer_id ORDER BY mddc.date) as prev_membership_type
FROM member_detail_daily_changes_new mddc
) mddc
WHERE prev_membership_type <> membership_type AND
date >= '2018-01-01' AND
date < '2019-01-01'
GROUP BY membership_type, prev_membership_type;
Notes:
The filtering on date needs to occur after the calculation of lag().
This takes into account that members may have a certain type in 2017 and then change to a new type in 2018.
The date filtering is compatible with indexes.
Two values are calculated. One is the overall number of changes. The other counts each member only once for each type of change.
With conditional aggregation after self joining the table:
select
2018 fiscal,
sum(case when m.member_type_cd > t.member_type_cd then 1 else 0 end) upgrades,
sum(case when m.member_type_cd < t.member_type_cd then 1 else 0 end) downgrades
from member_detail_daily_changes_new m inner join member_detail_daily_changes_new t
on
t.customer = m.customer
and
t.changedate = (
select max(changedate) from member_detail_daily_changes_new
where customer = m.customer and changedate < m.changedate
)
where year(m.changedate) = 2018
This will work even if there are more than 2 types of membership level.

how do i calculate the total billed amount for a particular day?

I have a database table where there is a field- billed_amount which keeps the records of the billed amount for a particular person and another field- billing_date. Now,I want to display the total billed amount for all people for a particular day, for example today, to generate day to day sales report.
For a particular day you could run the following, changing the date to whatever date you're running it for.
select sum(billed_amount)
from tbl
where billing_date = '2014-07-19'
Note that each database varies with its default date format. (you didn't specify a database)
To get the total for each date ("grouped by" date), you can use the following:
select billing_date, sum(billed_amount)
from tbl
group by billing_date
order by billing_date
You can group by day, and sum the amounts billed:
select cast(billing_date as date)
, sum(billing_amount)
from YourTable
group by
cast(billing_date as date)
Date operations vary by database. In SQL Server 2008+, you can cast a datetime to the date type to strip off the time.

How to query date in oracle?

Assuming I have the following table in oracle:
id|orderdatetime (date type)|foodtype (string type)
1|2013-12-02T00:26:00 | burger
2|2013-12-02T00:20:00 | fries
...
(assume there are many dates and times)
Assuming someone happened to have a date in mind (i.e. "2010-12-02T00:25:00").
even though there is no database entry with that specific time in there...
is there some way to query the database such that I can get the row that has a date time that is closest to it without being ahead of the date in mind (ideally, it would be less than or equal to)?
(i.e. in this case, the sql query would return the row for "fries" and not "burger" because the time for burger is past the time the user had in mind despite the fact that the time for "burger" is closer.)
select x.* from (select id,orderdatetime,foods from orders
where orderdatetime <= YOURTIME order by orderdatetime desc)x
where rownum =1
Another would be:
select * from orders where orderdatetime = (select max(orderdatetime) from orders
where orderdatetime <= YOURTIME)

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data