I'm trying to create a parametrization of points in space from a specific point according to a specific inequality.
I'm doing it using Sympy.solevset method while the calculation will return an interval of the parameter t that represents all points between those in my dataframe.
Sadly, performing a Sympy.solveset over 13 sets of values (i.e 13 iterations) leads to execution times of over 20 seconds overall, and over 1 sec calculation time per set.
The code:
from sympy import *
from sympy import S
from sympy.solvers.solveset import solveset, solveset_real
import pandas as pd
import time
t=symbols('t',positive=True)
p1x,p1y,p2x,p2y=symbols('p1x p1y p2x p2y')
centerp=[10,10]
radius=5
data={'P1X':[0,1,2,3,1,2,3,1,2,3,1,2,3],'P1Y':[3,2,1,0,1,2,3,1,2,3,1,2,3],'P2X':[3,8,2,4,1,2,3,1,2,3,1,2,3],'P2Y':[3,9,10,7,1,2,3,1,2,3,1,2,3],'result':[0,0,0,0,0,0,0,0,0,0,0,0,0]}
df=pd.DataFrame(data)
parameterized_x=p1x+t*(p2x-p1x)
parameterized_y=p1y+t*(p2y-p1y)
start_whole_process=time.time()
overall_time=0
for index,row in df.iterrows():
parameterized_x.subs([[p1x,row['P1X']],[p2x,row['P2X']]])
parameterized_y.subs([[p1y,row['P1Y']],[p2y,row['P2Y']]])
expr=sqrt((parameterized_x-centerp[0])**2+(parameterized_y-centerp[1])**2)-radius
start=time.time()
df.at[index,'result']=solveset(expr>=0,t,domain=S.Reals)
end=time.time()
overall_time=overall_time+end-start
end_whole_process=time.time()
I need to know if there's a way to enhance calculation time or maybe there is another package that can preform a specific inequality over large quantities of data without having to wait minutes upon minutes.
There is one big mistake in your current approach than needs to be fixed first. Inside your for loop you did:
parameterized_x.subs([[p1x,row['P1X']],[p2x,row['P2X']]])
parameterized_y.subs([[p1y,row['P1Y']],[p2y,row['P2Y']]])
expr=sqrt((parameterized_x-centerp[0])**2+(parameterized_y-centerp[1])**2)-radius
This is wrong: SymPy expressions cannot be modified in place. This leads your expr to be exactly the same for each row, namely:
# sqrt((p1x + t*(-p1x + p2x) - 10)**2 + (p1y + t*(-p1y + p2y) - 10)**2) - 5
Then, solveset tries to solve the same expression on each row. Because this expression contains 3 symbols, solveset takes a long time trying to compute the solution, eventually producing the same answer for each row:
# ConditionSet(t, sqrt((p1x + t*(-p1x + p2x) - 10)**2 + (p1y + t*(-p1y + p2y) - 10)**2) - 5 >= 0, Complexes)
Remember: every operation you apply to a SymPy expression creates a new SymPy expression. So, the above code has to be modified to:
px_expr = parameterized_x.subs([[p1x,row['P1X']],[p2x,row['P2X']]])
py_expr = parameterized_y.subs([[p1y,row['P1Y']],[p2y,row['P2Y']]])
expr=sqrt((px_expr-centerp[0])**2+(py_expr-centerp[1])**2)-radius
In doing so, expr is different for each row, as it is expected. Then, solveset computes different solutions, and it is much much faster.
Here is your full example:
from sympy import *
from sympy.solvers.solveset import solveset, solveset_real
import pandas as pd
import time
t=symbols('t',positive=True)
p1x,p1y,p2x,p2y=symbols('p1x p1y p2x p2y')
centerp=[10,10]
radius=5
data={'P1X':[0,1,2,3,1,2,3,1,2,3,1,2,3],'P1Y':[3,2,1,0,1,2,3,1,2,3,1,2,3],'P2X':[3,8,2,4,1,2,3,1,2,3,1,2,3],'P2Y':[3,9,10,7,1,2,3,1,2,3,1,2,3],'result':[0,0,0,0,0,0,0,0,0,0,0,0,0]}
df=pd.DataFrame(data)
parameterized_x=p1x+t*(p2x-p1x)
parameterized_y=p1y+t*(p2y-p1y)
start_whole_process=time.time()
overall_time=0
for index,row in df.iterrows():
px_expr = parameterized_x.subs([[p1x,row['P1X']],[p2x,row['P2X']]])
py_expr = parameterized_y.subs([[p1y,row['P1Y']],[p2y,row['P2Y']]])
expr=sqrt((px_expr-centerp[0])**2+(py_expr-centerp[1])**2)-radius
df.at[index,'result']=solveset(expr>=0,t,domain=S.Reals)
end_whole_process=time.time()
print("end_whole_process - start_whole_process", end_whole_process - start_whole_process)
I'm new to Python (and coding) and bit off more than I can chew trying to use copy_from.
I am reading rows from a CSV, manipulating them a bit, then writing them into SQL. Using the normal INSERT commands takes a very long time with hundreds of thousands of rows, so I want to use copy_from. It does work with INSERT though.
https://www.psycopg.org/docs/cursor.html#cursor.copy_from this example uses tabs as column separators and newline at the end of each row, so I made each IO line accordingly:
43620929 2018-04-11 11:38:14 30263506 30263503 30262500 0 0 0 0 0 1000 1000 0
That's what the below outputs with the first print statement:
def copyFromIO(thisOutput):
print(thisOutput.getvalue())
cursor.copy_from(thisOutput, 'hands_new')
thisCommand = 'SELECT * FROM hands_new'
cursor.execute(thisCommand)
print(cursor.fetchall())
hands_new is an existing, empty SQL table. The second print statement is just [], so it isn't writing to the db. What am I getting wrong?
Obviously if it worked, I could make thisOutput much longer, with lots of rows instead of just the one.
I think I figured it out, so if anyone comes across this in the future for some reason:
'thisOutput' format was wrong, I built it from smaller pieces including adding '\t' etc. It works if instead I do:
copyFromIO(io.StringIO('43620929\t2018-04-11 11:38:14\t30263506\t30263503\t30262500\t0\t0\t0\t0\t0\t1000\t1000\t0\n'))
& I needed the right columns in the copy_from command:
def copyFromIO(thisOutput):
print(thisOutput.getvalue())
thisCol = ('pkey', 'created', 'gameid', 'tableid', 'playerid', 'bet', 'pot',
'isout', 'outround', 'rake', 'endstack', 'startstack', 'stppaid')
cursor.copy_from(thisOutput, 'hands_new', columns=(thisCol))
thisCommand = 'SELECT * FROM hands_new'
cursor.execute(thisCommand)
print(cursor.fetchall())
I am building a Django app and I would like to show some statistics on the main page, like total number of transactions, percentage of successful transactions, daily number of active users etc.
I don't want to calculate these values in the view every time a user requests the main page for performance reasons. I thought of 2 possible solutions.
(1) Create a number of one-record tables
Create a table for each of the statistics, e.g.:
from django.db import models
class LastSuccessfulTransactionDate(models.Model):
date = models.DateTimeField()
class TotalTransactionAmount(models.Model):
total_amount = models.DecimalField(max_digits=8, decimal_places=2)
# ...
and make sure that only one record exists in each table.
(2) Create a table with key-value data
class Statistics(models.Model):
key = models.CharField(max_length=100)
value = models.TextField()
and save the data by doing:
from datetime import datetime
from decimal import Decimal
import pickle
statistics = {
'last_successful_transaction_date': datetime(2010, 2, 3),
'total_transaction_amount': Decimal('1234.56'),
}
for k, v in statistics.items():
try:
s = Statistics.objects.get(key=k)
except Statistics.DoesNotExist:
s = Statistics(key=k)
s.value = base64.b64encode(pickle.dumps(v, pickle.HIGHEST_PROTOCOL)).decode()
s.save()
and retrieve by:
for s in Statistics.objects.all():
k = s.key
v = pickle.loads(base64.b64decode(s.value.encode()))
print(k, v)
In both cases the data would be updated every now and then by a cron job (they don't have to be very acurate).
To me solution (2) looks better, because to display the main page I would need to get data from the Statistics table only, not from a number of one-record tables. Is there any recommended solution to this problem? Thanks
There is even better solution if you are using Postgresql, which is JSONField. You can directly store the key value pair in the JSONField. Try like this:
# model
class Statistics(models.Model):
records = JSONField()
# usage
statistics = {
'last_successful_transaction_date': datetime(2010, 2, 3),
'total_transaction_amount': Decimal('1234.56'),
}
Statistics.objects.create(records=statistics)
Statistics.objects.filter(records__last_successful_transaction_date= datetime(2010, 2, 3)) # Running query
s = Statistics.objects.filter(records__last_successful_transaction_date= datetime(2010, 2, 3)).first()
new_statistics = {
'last_successful_transaction_date': datetime(2012, 2, 3),
'total_transaction_amount': Decimal('1234.50'),
}
records = s.records
records.update(new_statistics)
s.records = records
s.save()
I'm using RJDBC in RStudio to pull a set of data from an Oracle database into R.
After loading the RJDBC package I have the following lines:
drv = JDBC("oracle.jdbc.OracleDriver", classPath="C:/R/ojdbc7.jar", identifier.quote = " ")
conn = dbConnect(drv,"jdbc:oracle:thin:#private_server_info", "804301", "password")
rs = dbSendQuery(conn, statement= paste("LONG SQL QUERY TO SELECT REQUIRED DATA INCLUDING REQUEST FOR VARIABLE x"))
masterdata = fetch(rs, n = -1) # extract all rows
Run through the usual script, they always execute without fail; it can sometimes take a few minutes dependent on variable x, e.g. may result in 100K rows or 1M rows being pulled. masterdata will return everything in a dataframe.
I'm now trying to place all of the above into a function, with one required argument, variable x which is a TEXT argument (a city name); this input however is also part of the LONG SQL QUERY.
The function I wrote called Data_Grab is as follows:
Data_Grab = function(x) {
drv = JDBC("oracle.jdbc.OracleDriver", classPath="C:/R/ojdbc7.jar", identifier.quote = " ")
conn = dbConnect(drv,"jdbc:oracle:thin:#private_server_info", "804301", "password")
rs = dbSendQuery(conn, statement= paste("LONG SQL QUERY TO SELECT REQUIRED DATA,
INCLUDING REQUEST FOR VARIABLE x"))
masterdata = fetch(rs, n = -1) # extract all rows
return (masterdata)
}
My function appears to execute in seconds (no error is produced) however I get just the 21 column headings for the dataframe and the line
<0 rows> (or 0-length row.names)
Not sure what is wrong here; obviously expecting function to still take minutes to execute as data being pulled is large, but not being returned any actual data frame.
Help is appreciated!
if you want to parameterize your query to a JDBC database, try also using the gsubfn package. code might look like this:
library(gsubfn)
library(RJDBC)
Data_Grab = function(x) {
rd1 = x
df <- fn$dbGetQuery(conn,"SELECT BLAH1, BLAH2
FROM TABLENAME
WHERE BLAH1 = '$rd1')
return(df)
basically, you need to put a $ before the variable name that stores the parameter you wish to pass.