Django: copy data from one database to another - sql

I have two sqlite.db files. I'd like to copy the contents of one column in a table of on db file to another.
for example:
I have the model Information in db file called new.db:
class Information(models.Model):
info_id = models.AutoField(primary_key = True)
info_name = models.CharField( max_length = 50)
and the following information model in db file called old.db:
class Information(models.Model):
info_id = models.AutoField(primary_key = True)
info_type = models.CharField(max_length = 50)
info_name = models.CharField( max_length = 50)
I'd like to copy all the data in column info_id and info_name from old.db to info_id and info_name in new.db.
I was thinking something like:
manage.py dbshell
then
INSERT INTO "new.Information" ("info_id", "info_name")
SELECT "info_id", "info_name"
FROM "old.Information";
This doesn't seem to be working. It says new.Information table does not exist... any ideas?

You'd need to switch your database URL in your settings file to db2 and run syncdb to create the new tables. After that the easiest thing to do imo would be to switch back to db1 and run ./manage.py dumpdata myapp > data.json, followed by another switch to db2 where you can run ./manage.py loaddata data.json.
Afterwards, you can drop the data you don't need from db2.
Edit: Another approach would be to use the ATTACH function from sqlite. First, I recommend you do the first step above (change database settings and use syncdb to create the tables), then you can switch back and do this:
./manage.py dbshell
> ATTACH DATABASE 'new.db' AS newdb;
> INSERT INTO newdb.Information SELECT * FROM Information;

The dumped file from old.db contains info_type field which is not in the new Information model. This will fail the loaddata which checks all field loaded from JSON file. You could comment out info_type line before dump from old model.
The Attach way mentioned by Alex is easier and great, which needs a tiny tweak
INSERT INTO newdb.Information SELECT * FROM Information;
note the missing parentheses around the SELECT, sqlite does not accept them. Refs http://sqlite.org/lang_insert.html
If you are performing migration, have you tried South

Related

Flask and sqlite3 - does a table need to be populated with data before the 'insert into' sql command can work?

When creating a flask site with sqlite3, I noticed that if I tried to INSERTINTO (using a web-form) the database, when the database was empty, it wouldn't produce an error, but nothing would happen, the INSERT INTO (SQL) would simply not take effect.
I had to manually populate the database using a pre-made query, and then, without changing anything else in the code, the inserting-into the form worked fine.
Can anyone shed any light as to why this might have happened? I assume it isn't necessary for a database to be populated in order for it to be used/functional in flask/sqlite3.
I wondered if it had anything to do with my setup.
create-db.py
import sqlite3
db_locale='users.db' #flask will create this db if it doesn't exist
connie=sqlite3.connect(db_locale)
c=connie.cursor() #used to create commands
c.execute("""
CREATE TABLE comments
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT,
name TEXT,
comments TEXT
)
""")
connie.commit()
connie.close()
The addcomment() and insertcomment() methods in the main .py file
def addcomment():
if request.method=="GET":
return render_template('addcomment.html')
else:
user_details=(
request.form['title'],
request.form['name'],
request.form['comments']
)
insertcomment(user_details)
return render_template('addsuccess.html')
def insertcomment(user_details):
connie=sqlite3.connect(db_locale)
c=connie.cursor()
sql_execute_string='INSERT INTO comments (title,name,comments) VALUES (?,?,?)';
c.execute(sql_execute_string,user_details)
connie.commit()
connie.close()
print(user_details)

Schema option in spark_read_parquet()

I am pretty new to R and spark. I want to read a parquet file with the following code. Anyone knows how to specify schema there?
library(sparklyr)
sc <- spark_connect(master = "yarn",
appname = "test")
df <- spark_read_parquet(sc,
"name",
"path/to/the/file",
repartition = 0,
schema = "?")
I looked at the link https://spark.rstudio.com/reference/spark_read_parquet/, there isn't any detail or example regarding how to set schema in the function to optimize it.
If you are only trying to read a parquet file, a schema does not need to be used, it is just an available option. The following code should work.
df <- spark_read_parquet(sc,
"name",
"path/to/the/file",
repartition = 0,
schema = Null)
But if you want to use a schema, there are many options and choosing the right one depends on your data and what you are using it for. But try running your code without a schema option to see if that works for your data.
try
tbl_change_db(sc, "dbName")
and if u are using RStudio then click the refresh button on the upper right part of snippet

How to create a view against a table that has record fields?

We have a weekly backup process which exports our production Google Appengine Datastore onto Google Cloud Storage, and then into Google BigQuery. Each week, we create a new dataset named like YYYY_MM_DD that contains a copy of the production tables on that day. Over time, we have collected many datasets, like 2014_05_10, 2014_05_17, etc. I want to create a data set Latest_Production_Data that contains a view for each of the tables in the most recent YYYY_MM_DD dataset. This will make it easier for downstream reports to write their query once and always retrieve the most recent data.
To do this, I have code that gets the most recent dataset and the names of all the tables that dataset contains from the BigQuery API. Then, for each of these tables, I fire a tables.insert call to create a view that is a SELECT * from the table I am looking to create a reference to.
This fails for tables that contain a RECORD field, from what looks to be a pretty benign column-naming rule.
For example, I have this table:
For which I issue this API call:
{
'tableReference': {
'projectId': 'redacted',
'tableId': u'AccountDeletionRequest',
'datasetId': 'Latest_Production_Data'
}
'view': {
'query': u'SELECT * FROM [2014_05_17.AccountDeletionRequest]'
},
}
This results in the following error:
HttpError: https://www.googleapis.com/bigquery/v2/projects//datasets/Latest_Production_Data/tables?alt=json returned "Invalid field name "__key__.namespace". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.">
When I execute this query in the BigQuery web console, the columns are renamed to translate the . to an _. I kind of expected the same thing to happen when I issued the create view API call.
Is there an easy way I can programmatically create a view for each of the tables in my dataset, regardless of their underlying schema? The problem I'm encountering now is for record columns, but another problem I anticipate is for tables that have repeated fields. Is there some magic alternative to SELECT * that will take care of all these intricacies for me?
Another idea I had was doing a table copy, but I would prefer not to duplicate the data if I can at all avoid it.
Here is the workaround code I wrote to dynamically generate a SELECT statement for each of the tables:
def get_leaf_column_selectors(dataset, table):
schema = table_service.get(
projectId=BQ_PROJECT_ID,
datasetId=dataset,
tableId=table
).execute()['schema']
return ",\n".join([
_get_leaf_selectors("", top_field)
for top_field in schema["fields"]
])
def _get_leaf_selectors(prefix, field):
if prefix:
format = prefix + ".%s"
else:
format = "%s"
if 'fields' not in field:
# Base case
actual_name = format % field["name"]
safe_name = actual_name.replace(".", "_")
return "%s as %s" % (actual_name, safe_name)
else:
# Recursive case
return ",\n".join([
_get_leaf_selectors(format % field["name"], sub_field)
for sub_field in field["fields"]
])
We had a bug where you needed to need to select out the individual fields in the view and use an 'as' to rename the fields to something legal (i.e they don't have '.' in the name).
The bug is now fixed, so you shouldn't see this issue any more. Please ping this thread or start a new question if you see it again.

Perl DBI modifying Oracle database by creating a VIEW

I wrote a Perl script to check the data in an Oracle database. Because the query process is very complex I chose to create a VIEW in the middle. Using this view the code could be largely simplified.
The Perl code run well when I used it to query the database starting from a file, like Perl mycode.pl file_a. The Perl code reads lines from file_a and creates/updates the view until the end of the input. The results I achieved are completely right.
The problem came when I simultaneously run
perl mycode.pl file_a
and
perl mycode.pl file_b
to access the same database. According to my observation, the VIEW used by the first process will be modified by the second process. These two processes were intertwined on the same view.
Is there any suggestion to make these two processes not conflict with one another?
The Perl code for querying database is normally like this, but the details in each real query is more complex.
my ($gcsta,$gcsto,$cms) = #t; #(details of #t is read from a line in file a or b)
my $VIEWSS = 'CREATE OR REPLACE VIEW VIEWSS AS SELECT ID,GSTA,GSTO,GWTA FROM TABLEA WHERE GSTA='.$gcsta.' AND GSTO='.$gcsto.' AND CMS='.$cms;
my $querying = q{ SELECT COUNT(*) FROM VIEWSS WHERE VIEWSS.ID=1};
my $inner_sth = $dbh->prepare($VIEWSS);
my $inner_rv = $inner_sth->execute();
$inner_sth = $dbh->prepare($querying);
$inner_rv = $inner_sth->execute();
You must
Create the view only once, and use it everywhere
Use placeholders in your SQL statements, and pass the actual parameters with the call to execute
Is this the full extent of your SQL? Probably not, but if so it really is fairly simple.
Take a look at this refactoring for some ideas. Note that is uses a here document to express the SQL. The END_SQL marker for the end of the text must have no whitespace before or after it.
If your requirement is more complex than this then please describe it to us so that we can better help you
my $stmt = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $rv = $stmt->execute($gcsta, $gcsto, $cms);
If you must use a view then you should use placeholders in the CREATE VIEW as before, and make every set of changes into a transaction so that other processes can't interfere. This involves disabling AutoCommit when you create the database handle $dbh and adding a call to $dbh->commit when all the steps are complete
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('dbi:Oracle:mydbase', 'user', 'pass',
{ AutoCommit => 0, RaiseError => 1 } );
my $make_view = $dbh->prepare(<<'END_SQL');
CREATE OR REPLACE VIEW viewss AS
SELECT id, gsta, gsto, gwta
FROM tablea
WHERE gsta = ? AND gsto = ? AND cms= ? AND id = 1
END_SQL
my $get_count = $dbh->prepare(<<'END_SQL');
SELECT count(*)
FROM viewss
WHERE id = 1
END_SQL
while (<>) {
my ($gcsta, $gcsto, $cms) = split;
my $rv = $make_view->execute($gcsta, $gcsto, $cms);
$rv = $get_count->execute;
my ($count) = $get_count->fetchrow_array;
$dbh->commit;
}
Is the view going to be the same or different?
If the views are all the same then create it only once, or check if it exists with the all_views table : http://docs.oracle.com/cd/B12037_01/server.101/b10755/statviews_1202.htm#i1593583
You can easily create a view including your pid with the $$ variable to be the pid, but it wont be unique across computers, oracle has also some unique ids, see http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions150.htm, for example, the SESSIONID.
But do you really need to do this? why dont you prepare a statement and then execute it? http://search.cpan.org/dist/DBI/DBI.pm#prepare
thanks,
mike

how to use django.core.management.sql.sql_create?

I want to use this function:
django.core.management.sql.sql_create in my view, to get the "CREATE" statements
the function gets 3 arguments:
app, style, connection
what is "app"?? is it a specific object or just the app name?!
I know style is something to do with colors... I reckon django.core.management.color.colorstyle() should work..
what about connection, how do I get this one?
thanks
=========================== edited from here down
ok, after some time, I figured the things, this is what I ended up with:
def sqldumper(model):
"""gets a db model, and returns the SQL statements to build it on another SQL-able db"""
#This is how django inserts a new record
#u'INSERT INTO "polls_poll" ("question", "pub_date") VALUES (GE!!, 2011-05-03 15:45:23.254000)'
result = "BEGIN;\n"
#add CREATE TABLE statements
result+= '\n'.join(sql_create(models.get_app('polls'), color_style(), connections.all()[0]))+"\n"
result+= '\n'.join(sql_custom(models.get_app('polls'), color_style(), connections.all()[0]))+"\n"
result+= '\n'.join(sql_indexes(models.get_app('polls'), color_style(), connections.all()[0]))+"\n"
result+= '\n'
#add INSERT INTO staetements
units = model.objects.all().values()
for unit in units:
statement = "INSERT INTO yourapp.model "+str(tuple(unit.keys()))+" VALUES " + str(tuple(unit.values()))+"\n"
result+=statement
result+="\nCOMMIT;\n"
return result.encode('utf-8')
it's still a bit weird, because you get the CREATE TABLE for the whole app, but the INSERT INTO only for the model you ask.... but it's fixable from here
It's the models.py module found in one of your apps:
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'myproject.myapp',
)
When I printed it, I got < module 'myproject.myapp.models' from '...' >. I used the ellipses instead of typing the entire models.py file's path.