I built a scraper that saves to a .csv file and am now attempting to save rows from that .csv file to an sqlite3 database with an IF statement, but it's not working. I've tried formatting the values in a dozen different ways and am getting nowhere.
"Match" prints every time the IF statement is True, but the row doesn't get added to the sqlite database. Calling cur.fetchall()/one()/etc results in 'None' being returned.
db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute("DROP TABLE IF EXISTS jobs_table")
cur.execute('''CREATE TABLE IF NOT EXISTS
jobs_table(id TEXT,
date TEXT,
company TEXT,
position TEXT,
tags TEXT,
description TEXT,
url TEXT)''')
skills = ('python')
for row in csv_data:
if skills in row.get('description').lower():
print('')
print('Match!')
cur.execute("INSERT INTO jobs_table(id,
date,
company,
position,
tags,
description,
url) VALUES(:id,
:epoch,
:date,
:company,
:position,
:tags,
:description,
:url)", row)
I assume the problem is in my cur.execute() function, but I can't figure out how else it should be run. Any takers?
If you are calling cur.fetchone() right after the cur.execute() is normal to get None (or [] for cur.fetchall()). You need to execute a query first to get the results, for example cur.execute("SELECT * FROM jobs_table").
Related
I have a rails data migration (postgres db) where I have to use pure sql to convert the data due to some model restrictions. The data is stored as json as a string, but I need it to be a usable hash for other purposes.
My migration works to convert it to the hash. However, my down method ends up just deleting the data or leaving it as an empty {}. Btw to clear up any confusion, my column name is actually saved as data in table Games
Based on my up method, how would i properly reverse the migration using sql only?
class ConvertGamesDataToJson < ActiveRecord::Migration[6.0]
def up
statement = <<~SQL
update games set data = regexp_replace(trim(both '"' from data::text), '\\\\"', '"', 'g')::jsonb;
SQL
ActiveRecord::Base.connection.execute(statement)
# this part works!
end
def down
statement = <<~SQL
update games set data = to_json(data::text)::jsonb;
SQL
ActiveRecord::Base.connection.execute(statement)
end
end
Here is how the it looks after properly converting it
data: {
"id"=>"d092a-f2323",
"recent"=>'yes',
"note"=>"some text",
"order"=>1
}
how it is before the migration and what it needs to rollback to:
data:
"{
\"id\":\"d092a-f2323\",
\"recent\":\"yes\",
\"note\":\"some text\",
\"order\":1,
}"
If you're displaying a data structure in the rails console, those \" aren't really there. They're just formatting because the console has wrapped the string in ". For example...
[2] pry(main)> %{"up": "down"}
=> "\"up\": \"down\""
But if we print it...
[3] pry(main)> puts %{"up": "down"}
"up": "down"
Given that is a JSON string, you can simply change the type of the column to jsonb and be done with it.
-- up
alter table games alter column data type jsonb USING data::jsonb;
-- down
alter table games alter column data type text;
Postgres doesn't know how to automatically cast text to jsonb, so we need to tell it. using data::jsonb does a simple cast of the text to jsonb. It can cast jsonb to text just fine.
You can do this in a migration with change_column.
def up
change_column :users, :data, :jsonb, using: 'data::jsonb'
end
def down
change_column :users, :data, :text
end
I have a database with a json field which has multiple parts including one called tags, there are other entries as below but I want to return only the fields with "{"tags":{"+good":true}}".
"{"tags":{"+good":true}}"
"{"has_temps":false,"tags":{"+good":true}}"
"{"tags":{"+good":true}}"
"{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}"
I can get part of the way there with this statement in my where clause trips.metadata->'tags'->>'+good' = 'true' but that returns all instances where tags are good and true including all entries above. I want to return entries with the specific statement "{"tags":{"+good":true}}" only. So taking out the two entries that begin has_temps.
Any thoughts on how to do this?
With jsonb column the solution is obvious:
with trips(metadata) as (
values
('{"tags":{"+good":true}}'::jsonb),
('{"has_temps":false,"tags":{"+good":true}}'),
('{"tags":{"+good":true}}'),
('{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}')
)
select *
from trips
where metadata = '{"tags":{"+good":true}}';
metadata
-------------------------
{"tags":{"+good":true}}
{"tags":{"+good":true}}
(2 rows)
If the column's type is json then you should cast it to jsonb:
...
where metadata::jsonb = '{"tags":{"+good":true}}';
If I get you right, you can check text value of the "tags" key, like here:
select true
where '{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}'::json->>'tags'
= '{"+good":true}'
I need to perform data smoothing using averaging, with a non-standard group_by variable that is created on-the-fly. My model consists of two tables:
class WthrStn(models.Model):
name=models.CharField(max_length=64, error_messages=MOD_ERR_MSGS)
owner_email=models.EmailField('Contact email')
location_city=models.CharField(max_length=32, blank=True)
location_state=models.CharField(max_length=32, blank=True)
...
class WthrData(models.Model):
stn=models.ForeignKey(WthrStn)
date=models.DateField()
time=models.TimeField()
temptr_out=models.DecimalField(max_digits=5, decimal_places=2)
temptr_in=models.DecimalField(max_digits=5, decimal_places=2)
class Meta:
ordering = ['-date','-time']
unique_together = (("date", "time", "stn"),)
The data in WthrData table are entered from an xml file in variable time increments, currently 15 or 30 minutes, but that could vary and change over time. There are >20000 records in that table. I want to provide an option to display the data smoothed to variable time units, e.g. 30 minutes, 1, 2 or N hours (60, 120, 180, etc minutes)
I am using SQLIte3 as the DB engine. I tested the following sql, which proved quite adequate to perform the smoothing in 'bins' of N-minutes duration:
select id, date, time, 24*60*julianday(datetime(date || time))/N jsec, avg(temptr_out)
as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg,
avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct,
avg(rain_in) as rain_in, avg(rain_rate) as rain_rate,
datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where
stn_id=19 group by round(jsec,0) order by stn_id,date,time;
Note I create an output variable 'jsec' using the SQLite3 function 'julianday', which returns number of days in the integer part and fraction of day in the decimal part. So, multiplying by 24*60 gives me number of minutes. Dividing by N-minute resolution gives me a nice 'group by' variable, compensating for varying time increments of the raw data.
How can I implement this in Django? I have tried the objects.raw(), but that returns a RawQuerySet, not a QuerySet to the view, so I get error messages from the html template:
</p>
Number of data entries: {{ valid_form|length }}
</p>
I have tried using a standard Query, with code like this:
wthrdta=WthrData.objects.all()
wthrdta.extra(select={'jsec':'24*60*julianday(datetime(date || time))/{}'.format(n)})
wthrdta.extra(select = {'temptr_out':'avg(temptr_out)',
'temptr_in':'avg(temptr_in)',
'barom_mmhg':'avg(barom_mmhg)',
'wind_mph':'avg(wind_mph)',
'wind_dir':'avg(wind_dir)',
'humid_pct':'avg(humid_pct)',
'rain_in':'avg(rain_in)',
'rain_sum_in':'sum(rain_in)',
'rain_rate':'avg(rain_rate)',
'avg_date':'datetime(avg(julianday(datetime(date || time))))'})
Note that here I use the sql-avg functions instead of using the django aggregate() or annotate(). This seems to generate correct sql code, but I cant seem to get the group_by set properly to my jsec data that is created at the top.
Any suggestions for how to approach this? All I really need is to have the QuerySet.raw() method return a QuerySet, or something that can be converted to a QuerySet instead of RawQuerySet. I can not find an easy way to do that.
The answer to this turns out to be really simple, using a hint I found from
[https://gist.github.com/carymrobbins/8477219][1]
though I modified his code slightly. To return a QuerySet from a RawQuerySet, all I did was add to my models.py file, right above the WthrData class definition:
class MyManager(models.Manager):
def raw_as_qs(self, raw_query, params=()):
"""Execute a raw query and return a QuerySet. The first column in the
result set must be the id field for the model.
:type raw_query: str | unicode
:type params: tuple[T] | dict[str | unicode, T]
:rtype: django.db.models.query.QuerySet
"""
cursor = connection.cursor()
try:
cursor.execute(raw_query, params)
return self.filter(id__in=(x[0] for x in cursor))
finally:
cursor.close()
Then in my class definition for WthrData:
class WthrData(models.Model):
objects=MyManager()
......
and later in the WthrData class:
def get_smoothWthrData(stn_id,n):
sqlcode='select id, date, time, 24*60*julianday(datetime(date || time))/%s jsec, avg(temptr_out) as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg, avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct, avg(rain_in) as rain_in, avg(rain_rate) as rain_rate, datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where stn_id=%s group by round(jsec,0) order by stn_id,date,time;'
return WthrData.objects.raw_as_qs(sqlcode,[n,stn_id]);
This allows me to grab results from the highly populated WthrData table smoothed over time increments, and the results come back as a QuerySet instead of RawQuerySet
I currently have a .find method in one of my rails controller actions - the relevant part is:
.find(:all, :select => 'last_name as id, last_name as name')
I am getting some odd behaviour trying to alias the last_name column as id - if I alias it as anything else, it works fine (i can do last_name as xyz and it outputs the last name in a column called xyz, but as I am using this to populate a drop-down where I need to have the name in the id column, i need it to be called 'id').
I should point out that it does output an id column, but it is always "id":0.
Could anyone shed any light on what I need to do to get this column aliased as 'id'?
Thanks!
I'm not sure of how you can do this in a Rails query statement. Rails is going to try and take over the id column, casting the value returned by the database as id with the type of column that id is (presumably integer). That's why your id column keeps getting set to 0, because "string".to_i #=> 0
However, there is a way to do it, once you have the results back.
Since you have the question tagged as Rails 3, it is preferable to use the new ActiveRelation syntax. You can do the following:
# First, get the results from the query, then loop through all of them.
Customer.select("last_name as 'ln', last_name as 'name'").all.collect do |c|
# The first step of the loop is to get the attributes into a hash form
h = c.attributes
# The next step is to create an "id" key in the hash.
# The Hash#delete method deletes the key/value pair at the key specified and returns the value.
# We'll take that returned value and assign it to the just created "id" key.
h["id"] = h.delete("ln")
# And we have to call out the hash to ensure that it's the returned value from the collect.
h
end
That will get you a hash with the id value as the text string value last_name and a name value as the same.
Hope that helps!
You shouldn't need to setup aliases in the finder SQL just to populate a drop-down. Instead simply use the last_name value for the value attribute (as well as the display text).
Eg if you're using the collection_select helper:
<%= f.collection_select :attribute_id, #collection, :last_name, :last_name %>
With a simple model like that
class Model < ActiveRecord::Base
# ...
end
we can do queries like that
Model.where(["name = :name and updated_at >= :D", \
{ :D => (Date.today - 1.day).to_datetime, :name => "O'Connor" }])
Where the values in the hash will be substituted into the final SQL statement with proper escaping depending on the underlying database engine.
I would like to know a similar feature for SQL execution like:
ActiveRecord::Base.connection.execute( \
["update models set name = :name, hired_at = :D where id = :id;"], \
{ :id => 73465, :D => DateTime.now, :name => "O'My God" }] \
) # THIS CODE IS A FANTASY. NOT WORKING.
(Please do not solve the example with loading a Model object, modifying and then saving! The example is only an illustration for the feature I would like to have / know. Concentrate on the subject!)
The original problem is that I want to insert large amount (many thousand lines) of data into the database. I want to use some features of the SQL abstraction of the ActiveRecord framework but I don't want to use model objects based on ActiveRecord::Base because they are damn slow! (8 queries per second for my current problem.)
query = ActiveRecord::Base.connection.raw_connection.prepare("INSERT INTO users (name) VALUES(:name)")
query.execute(:name => 'test_name')
query.close
Extending the #peufeu solution with concrete code example for bulk insert:
users_places = []
users_values = []
timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
params[:users].each do |user|
users_places << "(?,?,?,?)"
users_values << user[:name] << user[:punch_line] << timestamp << timestamp
end
bulk_insert_users_sql_arr = ["INSERT INTO users (name, punch_line, created_at, updated_at) VALUES #{users_places.join(", ")}"] + users_values
begin
sql = ActiveRecord::Base.send(:sanitize_sql_array, bulk_insert_users_sql_arr)
ActiveRecord::Base.connection.execute(sql)
rescue
"something went wrong with the bulk insert sql query"
end
Here is the reference to sanitize_sql_array method in ActiveRecord::Base, it generates the proper query string by escaping the single quotes in the strings. For example the punch_line "Don't let them get you down" will become "Don\'t let them get you down".
Yes you could do raw SQL, but checkout the ar-extensions gem that helps with batch inserts:
https://github.com/zdennis/ar-extensions
Here's a post on it, and various other techniques:
http://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
For INSERTs, batching them using a long VALUES clause (as shown by Simon's link) is the fastest way (unless you want to generate a text file and load it in your database with MySQL's LOAD DATA INFILE). But you have to be very careful about escaping your text values (which is not done in the example).
I was asking "what database are you using" because it does matter for mass UPDATEs.
For instance, you can do this on postgres (and I believe SQL Server changing "columnX" to "colX" ):
UPDATE foo
JOIN (VALUES (1,2),(3,4),... long list) v ON (foo.id=v.column1)
SET foo.bar = v.column2
And you can update a load of rows using a single statement, very fast.
If you don't need Ruby to perform some Ruby-specific magic on your data, the fastest way to transfer data from one DB to a different one is to export as a text file (CSV or tab separated), load it on the other DB (LOAD DATA INFILE on MySQL), perhaps in a temporary table, and bulk process using SQL.
EDIT : Here's how I do this in Python :
sql = [ "INSERT INTO foo (column list) VALUES " ]
values = []
for tuple in tuple_list:
append "(?,?,?,?)" to sql
extend values list with tuple
Then join sql into a string, you get "INSERT INTO foo (column list) VALUES (?,?,?,?),(?,?,?,?),(?,?,?,?)" with the "(?,?,?,?)" repeated as many times as you have lines to insert.
Then "values" contains a list of (a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3) with an,bn,cn,dn being the tuples you want to insert for line n. Each one corresponds to a placeholder in the sql string.
Then pass this to the usual "execute query with parameters" function which will handle quoting and escaping as usual.
I encountered a similar issue recently when tying to insert 100K+ records into a MySQL database for a Rails 4 app using mysql2 gem. The data included characters that had to be sanitized prior to insert.
The solution I ended going with was a slightly modified version of Option 3 described at https://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
Here's the relevant code block from the above link:
TIMES = 10000
inserts = []
TIMES.times do
inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)"
end
sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"
The modification I made was using the public method ActiveRecord::Base.sanitize() on values that required it.
inserts = []
created = Time.now.strftime "%Y-%m-%d %H:%M:%S"
params[:audits].each do |audit|
inserts.push "(#{audit.user_id), #{created}," + ActiveRecord::Base.sanitize(audit.comment) + ", #{audit.status})"
end
sql = "INSERT INTO user_audits (`user_id`, `created_at`, `comment`, `status`) VALUES #{inserts.join(", ")}"