How to auto generate ID of changeset with liquibase?
I don't want to set the ID of every changeset manually, is there a way to do it automatically?
I dont think that generated ids are a good idea. The reason is that liquibase uses the changeSet id to calculate the checksum (in addition to the author and fileName). So if you ever insert a changeSet between others the checksums of all subsequent changeSets will change and you will get tons of warnings/errors.
Anyway i can think of those solutions if you still want to generate Ids:
create your own ChangeLogParser
If you parse the ChangeLog on your own you are free to generate the ids as you want.
The downside is that you will have to provide a custom Xml Schema for the changeLog. The schema from Liquibase has a constraint on changeSet ids (required). With a new schema you'll probably have to do a significant amount of tweaking on the parser.
Alternatively you may choose another changeLog Format (YAML, JSON, Groovy). Their parsers may be easier to customize as they do not need that schema definition.
Do some preprocessing
You may write a simple xslt (Xml transformation) that generates a changeLog with changeSet ids, from a file that has none.
use timestamps as Ids
This would be my advice. It does not solve the question the way you asked, but it is simple, consistent, provides additional information and is a good practice for other database migration tools as well http://www.jeremyjarrell.com/using-flyway-db-with-distributed-version-control/
I wrote a Python script to generate unique IDs into Liquibase changelogs.
Be careful!
DO generate IDs
when the changelog is in development or ready for release
or when you control the checksums of the target database
DON'T generate IDs
- when the changelog(s) are deployed already
"""
###############################################################################
Purpose: Generate unique subsequent IDs into Liquibase changelogs
###############################################################################
Args:
param1: Full Windows path changelog directory (optional)
OR
--inplace: directly process changelogs (optional)
By default, XML files in the current directory are processed.
Returns:
In case of success, the output path is returned to stdout.
Otherwise, we crash and drag the system into mordor.
If you feel like wasting time you can:
a) port path handling to *nix
b) handle any obscure exceptions
c) add Unicode support (for better entertainment)
Dependencies:
Besides Python 3, in order to preserve XML comments, I had to use lxml
instead of the stock ElementTree parser.
Install lxml:
$ pip install lxml
Proxy clusterfuck? Don't panic! Simply download a .whl package from:
https://pypi.org/project/lxml/#files and install with pip.
Bugs:
Changesets having id="0" are ignored. Usually, these do not occur.
Author:
Tobias Bräutigam
Versions:
0.0.1 - re based, deprecated
0.0.2 - parse XML with lxml, CURRENT
"""
import datetime
import sys
import os
from pathlib import Path, PureWindowsPath
try:
import lxml.etree as ET
except ImportError as error:
print ('''
Error: module lxml is missing.
Please install it:
pip install lxml
''')
exit()
# Process arguments
prefix = '' # hold separator, if needed
outdir = 'out'
try: sys.argv[1]
except: pass
else:
if sys.argv[1] == '--inplace':
outdir = ''
else:
prefix = outdir + '//'
# accept Windows path syntax
inpath = PureWindowsPath(sys.argv[1])
# convert path format
inpath = Path(inpath)
os.chdir(inpath)
try: os.mkdir(outdir)
except: pass
filelist = [ f for f in os.listdir(outdir) ]
for f in filelist: os.remove(os.path.join(outdir, f))
# Parse XML, generate IDs, write file
def parseX(filename,prefix):
cnt = 0
print (filename)
tree = ET.parse(filename)
for node in tree.getiterator():
if int(node.attrib.get('id', 0)):
now = datetime.datetime.now()
node.attrib['id'] = str(int(now.strftime("%H%M%S%f"))+cnt*37)
cnt = cnt + 1
root = tree.getroot()
# NS URL element name is '' for Etree, lxml requires at least one character
ET.register_namespace('x', u'http://www.liquibase.org/xml/ns/dbchangelog')
tree = ET.ElementTree(root)
tree.write(prefix + filename, encoding='utf-8', xml_declaration=True)
print(str(cnt) +' ID(s) generated.')
# Process files
print('\n')
items = 0
for infile in os.listdir('.'):
if (infile.lower().endswith('.xml')) == True:
parseX(infile,prefix)
items=items+1
# Message
print('\n' + str(items) + ' file(s) processed.\n\n')
if items > 0:
print('Output was written to: \n\n')
print(str(os.getcwd()) + '\\' + outdir + '\n')
Related
I would like to set up tests on my transforms into Foundry, passing test inputs and checking that the output is the expected one. Is it possible to call a transform with dummy datasets (.csv file in the repo) or should I create functions inside the transform to be called by the tests (data created in code)?
If you check your platform documentation under Code Repositories -> Python Transforms -> Python Unit Tests, you'll find quite a few resources there that will be helpful.
The sections on writing and running tests in particular is what you're looking for.
// START DOCUMENTATION
Writing a Test
Full documentation can be found at https://docs.pytest.org
Pytest finds tests in any Python file that begins with test_.
It is recommended to put all your tests into a test package under the src directory of your project.
Tests are simply Python functions that are also named with the test_ prefix and assertions are made using Python’s assert statement.
PyTest will also run tests written using Python’s builtin unittest module.
For example, in transforms-python/src/test/test_increment.py a simple test would look like this:
def increment(num):
return num + 1
def test_increment():
assert increment(3) == 5
Running this test will cause checks to fail with a message that looks like this:
============================= test session starts =============================
collected 1 item
test_increment.py F [100%]
================================== FAILURES ===================================
_______________________________ test_increment ________________________________
def test_increment():
> assert increment(3) == 5
E assert 4 == 5
E + where 4 = increment(3)
test_increment.py:5: AssertionError
========================== 1 failed in 0.08 seconds ===========================
Testing with PySpark
PyTest fixtures are a powerful feature that enables injecting values into test functions simply by adding a parameter of the same name. This feature is used to provide a spark_session fixture for use in your test functions. For example:
def test_dataframe(spark_session):
df = spark_session.createDataFrame([['a', 1], ['b', 2]], ['letter', 'number'])
assert df.schema.names == ['letter', 'number']
// END DOCUMENTATION
If you don't want to specify your schemas in code, you can also read in a file in your repository by following the instructions in documentation under How To -> Read file in Python repository
// START DOCUMENTATION
Read file in Python repository
You can read other files from your repository into the transform context. This might be useful in setting parameters for your transform code to reference.
To start, In your python repository edit setup.py:
setup(
name=os.environ['PKG_NAME'],
# ...
package_data={
'': ['*.yaml', '*.csv']
}
)
This tells python to bundle the yaml and csv files into the package. Then place a config file (for example config.yaml, but can be also csv or txt) next to your python transform (e.g. read_yml.py see below):
- name: tbl1
primaryKey:
- col1
- col2
update:
- column: col3
with: 'XXX'
You can read it in your transform read_yml.py with the code below:
from transforms.api import transform_df, Input, Output
from pkg_resources import resource_stream
import yaml
import json
#transform_df(
Output("/Demo/read_yml")
)
def my_compute_function(ctx):
stream = resource_stream(__name__, "config.yaml")
docs = yaml.load(stream)
return ctx.spark_session.createDataFrame([{'result': json.dumps(docs)}])
So your project structure would be:
some_folder
config.yaml
read_yml.py
This will output in your dataset a single row with one column "result" with content:
[{"primaryKey": ["col1", "col2"], "update": [{"column": "col3", "with": "XXX"}], "name": "tbl1"}]
// END DOCUMENTATION
I discovered that the snakemake STAR module outputs as 'BAM Unsorted'.
Q1:Is there a way to change this to:
--outSAMtype BAM SortedByCoordinate
When I add the option in the 'extra' options I get an error message about duplicate definition:
EXITING: FATAL INPUT ERROR: duplicate parameter "outSAMtype" in input "Command-Line"
SOLUTION: keep only one definition of input parameters in each input source
Nov 15 09:46:07 ...... FATAL ERROR, exiting
logs/star/se/UY2_S7.log (END)
Should I consider adding a sorting module behind STAR instead?
Q2: How can I take a module from the wrapper repo and make it a local module, allowing me to edit it?
the code:
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester#jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = [snakemake.input.fq1] if isinstance(snakemake.input.fq1, str) else snakemake.input.fq1
fq2 = snakemake.input.get("fq2")
if fq2:
fq2 = [snakemake.input.fq2] if isinstance(snakemake.input.fq2, str) else snakemake.input.fq2
assert len(fq1) == len(fq2), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])
if fq1[0].endswith(".gz"):
readcmd = "--readFilesCommand zcat"
else:
readcmd = ""
outprefix = os.path.dirname(snakemake.output[0]) + "/"
shell(
"STAR "
"{extra} "
"--runThreadN {snakemake.threads} "
"--genomeDir {snakemake.params.index} "
"--readFilesIn {input_str} "
"{readcmd} "
"--outSAMtype BAM Unsorted "
"--outFileNamePrefix {outprefix} "
"--outStd Log "
"{log}")
Q1:Is there a way to change this to:
--outSAMtype BAM SortedByCoordinate
I would add another sorting rule after the wrapper as it is the most 'standardized` way of doing it. You can also use another wrapper for sorting.
There is an explanation from the author of snakemake for the reason why the default is unsorted and why there is no option for sorted output in the wrapper:
https://bitbucket.org/snakemake/snakemake/issues/440/pre-post-wrapper
Regarding the SAM/BAM issue, I would say any wrapper should always output the optimal file format. Hence, whenever I write a wrapper for a read mapper, I ensure that output is not SAM. Indexing and sorting should not be part of the same wrapper I think, because such a task has a completely different behavior regarding parallelization. Also, you would loose the mapping output if something goes wrong during the sorting or indexing.
Q2: How can I take a module from the wrapper repo and make it a local module, allowing me to edit it?
If you wanted to do this, one way would be to download the local copy of the wrapper. Change in the shell portion of the downloaded wrapper Unsorted to {snakemake.params.outsamtype}. In your Snakefile change (wrapper to script, path/to/downloaded/wrapper and add the outsamtype parameter):
rule star_se:
input:
fq1 = "reads/{sample}_R1.1.fastq"
output:
# see STAR manual for additional output files
"star/{sample}/Aligned.out.bam"
log:
"logs/star/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra="",
outsamtype = "SortedByCoordinate"
threads: 8
script:
"path/to/downloaded/wrapper"
I think a separate rule w/o a wrapper for sorting or even making your own star rule rather is better. Modifying the wrapper defeats the whole purpose of it.
I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately).
But in the interim (and in the hopes of a generic solution), what are my options for inspecting the file?
I found a tool forqlift: http://www.exmachinatech.net/01/forqlift/
And have tried 'forqlift list' on the file. It complains that it can't load classes for the custom subclass Writables included. So I will need to track down those implementations.
But is there any other option available in the meantime? I understand that most likely I can't extract the data, but is there some tool for scanning how many key values and of what type?
From shell:
$ hdfs dfs -text /user/hive/warehouse/table_seq/000000_0
or directly from hive (which is much faster for small files, because it is running in an already started JVM)
hive> dfs -text /user/hive/warehouse/table_seq/000000_0
works for sequence files.
Check the SequenceFileReadDemo class in the 'Hadoop : The Definitive Guide'- Sample Code. The sequence files have the key/value types embedded in them. Use the SequenceFile.Reader.getKeyClass() and SequenceFile.Reader.getValueClass() to get the type information.
My first thought would be to use the Java API for sequence files to try to read them. Even if you don't know which Writable is used by the file, you can guess and check the error messages (there may be a better way that I don't know).
For example:
private void readSeqFile(Path pathToFile) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, pathToFile, conf);
Text key = new Text(); // this could be the wrong type
Text val = new Text(); // also could be wrong
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
}
This program would crash if those are the wrong types, but the Exception should say which Writable type the key and value actually are.
Edit:
Actually if you do less file.seq usually you can read some of the header and see what the Writable types are (at least for the first key/value). On one file, for example, I see:
SEQ^F^Yorg.apache.hadoop.io.Text"org.apache.hadoop.io.BytesWritable
I'm not a Java or Hadoop programmer, so my way of solving problem could be not the best one, but anyway.
I spent two days solving the problem of reading FileSeq locally (Linux debian amd64) without installation of hadoop.
The provided sample
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
works well for Text, but didn't work for BytesWritable compressed input data.
What I did?
I downloaded this utility for creating (writing SequenceFiles Hadoop data)
github_com/shsdev/sequencefile-utility/archive/master.zip
, and got it working, then modified for reading input Hadoop SeqFiles.
The instruction for Debian running this utility from scratch:
sudo apt-get install maven2
sudo mvn install
sudo apt-get install openjdk-7-jdk
edit "sudo vi /usr/bin/mvn",
change `which java` to `which /usr/lib/jvm/java-7-openjdk-amd64/bin/java`
Also I've added (probably not required)
'
PATH="/home/mine/perl5/bin${PATH+:}${PATH};/usr/lib/jvm/java-7-openjdk-amd64/"; export PATH;
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export JAVA_VERSION=1.7
'
to ~/.bashrc
Then usage:
sudo mvn install
~/hadoop_tools/sequencefile-utility/sequencefile-utility-master$ /usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.0-jar-with-dependencies.jar
-- and this doesn't break the default java 1.6 installation that is required for FireFox/etc.
For resolving error with FileSeq compatability (e.g. "Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"), I used the libs from the Hadoop master server as is (a kind of hack):
scp root#10.15.150.223:/usr/lib/libhadoop.so.1.0.0 ~/
sudo cp ~/libhadoop.so.1.0.0 /usr/lib/
scp root#10.15.150.223:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server/libjvm.so ~/
sudo cp ~/libjvm.so /usr/lib/
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so.1
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so
One night drinking coffee, and I've written this code for reading FileSeq hadoop input files (using this cmd for running this code "/usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.3-jar-with-dependencies.jar -d test/ -c NONE"):
import org.apache.hadoop.io.*;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.ValueBytes;
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
Path file = new Path("/home/mine/mycompany/task13/data/2015-08-30");
reader = new SequenceFile.Reader(fs, file, conf);
long pos = reader.getPosition();
logger.info("GO from pos "+pos);
DataOutputBuffer rawKey = new DataOutputBuffer();
ValueBytes rawValue = reader.createValueBytes();
int DEFAULT_BUFFER_SIZE = 1024 * 1024;
DataOutputBuffer kobuf = new DataOutputBuffer(DEFAULT_BUFFER_SIZE);
kobuf.reset();
int rl;
do {
rl = reader.nextRaw(kobuf, rawValue);
logger.info("read len for current record: "+rl+" and in more details ");
if(rl >= 0)
{
logger.info("read key "+new String(kobuf.getData())+" (keylen "+kobuf.getLength()+") and data "+rawValue.getSize());
FileOutputStream fos = new FileOutputStream("/home/mine/outb");
DataOutputStream dos = new DataOutputStream(fos);
rawValue.writeUncompressedBytes(dos);
kobuf.reset();
}
} while(rl>0);
I've just added this chunk of code to file src/main/java/eu/scape_project/tb/lsdr/seqfileutility/SequenceFileWriter.java just after the line
writer = SequenceFile.createWriter(fs, conf, path, keyClass,
valueClass, CompressionType.get(pc.getCompressionType()));
Thanks to these sources of info:
Links:
If using hadoop-core instead of mahour, then will have to download asm-3.1.jar manually:
search_maven_org/remotecontent?filepath=org/ow2/util/asm/asm/3.1/asm-3.1.jar
search_maven_org/#search|ga|1|asm-3.1
The list of avaliable mahout repos:
repo1_maven_org/maven2/org/apache/mahout/
Intro to Mahout:
mahout_apache_org/
Good resource for learning interfaces and sources of Hadoop Java classes (I used it for writing my own code for reading FileSeq):
http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/io/BytesWritable.java
Sources of project tb-lsdr-seqfilecreator that I used for creating my own project FileSeq reader:
www_javased_com/?source_dir=scape/tb-lsdr-seqfilecreator/src/main/java/eu/scape_project/tb/lsdr/seqfileutility/ProcessParameters.java
stackoverflow_com/questions/5096128/sequence-files-in-hadoop - the same example (read key,value that doesn't work)
https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/RawSequenceFileRecordReader.java - this one helped me (I used reader.nextRaw the same as in nextKeyValue() and other subs)
Also I've changed ./pom.xml for native apache.hadoop instead of mahout.hadoop, but probably this is not required, because the bugs for read->next(key, value) are the same for both so I had to use read->nextRaw(keyRaw, valueRaw) instead:
diff ../../sequencefile-utility/sequencefile-utility-master/pom.xml ./pom.xml
9c9
< <version>1.0</version>
---
> <version>1.3</version>
63c63
< <version>2.0.1</version>
---
> <version>2.4</version>
85c85
< <groupId>org.apache.mahout.hadoop</groupId>
---
> <groupId>org.apache.hadoop</groupId>
87c87
< <version>0.20.1</version>
---
> <version>1.1.2</version>
93c93
< <version>1.1</version>
---
> <version>1.1.3</version>
I was just playing with Dumbo. When you run a Dumbo job on a Hadoop cluster, the output is a sequence file. I used the following to dump out an entire Dumbo-generated sequence file as plain text:
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar \
-input totals/part-00000 \
-output unseq \
-inputformat SequenceFileAsTextInputFormat
$ bin/hadoop fs -cat unseq/part-00000
I got the idea from here.
Incidentally, Dumbo can also output plain text.
Following the anwer of Praveen Sripati, here a small example of SequenceFileReadDemo.java from Hadoop the Definitive Guide by Tom White.
Data are in HDFS, in this position : user/hduser/output-hashsort/ and the file is
part-r-00001
In eclipse, in the Arguments folder I've written this string :
and this is part of the output, with the debugger
I am currently trying to configure collective.xsendfile, Apache mod_xsendfile and Plone 4.
Apparently the Apache process does not see blobstrage files on the file system because they contain permissions:
ls -lh var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x18/0xd5/0x19/0x038ea09d0eddc611.blob
-r-------- 1 plone plone 1006K May 28 15:30 var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x18/0xd5/0x19/0x038ea09d0eddc611.blob
How do I configure blobstorage to give additional permissions, so that Apache could access these files?
The modes with which the blobstorage writes it's directories and files is hardcoded in ZODB.blob. Specifically, the standard ZODB.blob.FileSystemHelper class creates secure directories (only readable and writable for the current user) by default.
You could provide your own implementation of FileSystemHelper that would either make this configurable, or just sets the directory modes to 0750, and then patch ZODB.blob.BlobStorageMixin to use your class instead of the default:
import os
from ZODB import utils
from ZODB.blob import FilesystemHelper, BlobStorageMixin
from ZODB.blob import log, LAYOUT_MARKER
class GroupReadableFilesystemHelper(FilesystemHelper):
def create(self):
if not os.path.exists(self.base_dir):
os.makedirs(self.base_dir, 0750)
log("Blob directory '%s' does not exist. "
"Created new directory." % self.base_dir)
if not os.path.exists(self.temp_dir):
os.makedirs(self.temp_dir, 0750)
log("Blob temporary directory '%s' does not exist. "
"Created new directory." % self.temp_dir)
if not os.path.exists(os.path.join(self.base_dir, LAYOUT_MARKER)):
layout_marker = open(
os.path.join(self.base_dir, LAYOUT_MARKER), 'wb')
layout_marker.write(self.layout_name)
else:
layout = open(os.path.join(self.base_dir, LAYOUT_MARKER), 'rb'
).read().strip()
if layout != self.layout_name:
raise ValueError(
"Directory layout `%s` selected for blob directory %s, but "
"marker found for layout `%s`" %
(self.layout_name, self.base_dir, layout))
def isSecure(self, path):
"""Ensure that (POSIX) path mode bits are 0750."""
return (os.stat(path).st_mode & 027) == 0
def getPathForOID(self, oid, create=False):
"""Given an OID, return the path on the filesystem where
the blob data relating to that OID is stored.
If the create flag is given, the path is also created if it didn't
exist already.
"""
# OIDs are numbers and sometimes passed around as integers. For our
# computations we rely on the 64-bit packed string representation.
if isinstance(oid, int):
oid = utils.p64(oid)
path = self.layout.oid_to_path(oid)
path = os.path.join(self.base_dir, path)
if create and not os.path.exists(path):
try:
os.makedirs(path, 0750)
except OSError:
# We might have lost a race. If so, the directory
# must exist now
assert os.path.exists(path)
return path
def _blob_init_groupread(self, blob_dir, layout='automatic'):
self.fshelper = GroupReadableFilesystemHelper(blob_dir, layout)
self.fshelper.create()
self.fshelper.checkSecure()
self.dirty_oids = []
BlobStorageMixin._blob_init = _blob_init_groupread
Quite a hand-full, you may want to make this a feature request for ZODB3 :-)
While setting up a backup routine for a ZOPE/ZEO setup, I ran into the same problem with blob permissions.
After trying to apply the monkey patch that Mikko wrote (which is not that easy) i came up with a "real" patch to solve the problem.
The patch suggested by Martijn is not complete, it still does not set the right mode on blob files.
So here's my solution:
1.) Create a patch containing:
Index: ZODB/blob.py
===================================================================
--- ZODB/blob.py (Revision 121959)
+++ ZODB/blob.py (Arbeitskopie)
## -337,11 +337,11 ##
def create(self):
if not os.path.exists(self.base_dir):
- os.makedirs(self.base_dir, 0700)
+ os.makedirs(self.base_dir, 0750)
log("Blob directory '%s' does not exist. "
"Created new directory." % self.base_dir)
if not os.path.exists(self.temp_dir):
- os.makedirs(self.temp_dir, 0700)
+ os.makedirs(self.temp_dir, 0750)
log("Blob temporary directory '%s' does not exist. "
"Created new directory." % self.temp_dir)
## -359,8 +359,8 ##
(self.layout_name, self.base_dir, layout))
def isSecure(self, path):
- """Ensure that (POSIX) path mode bits are 0700."""
- return (os.stat(path).st_mode & 077) == 0
+ """Ensure that (POSIX) path mode bits are 0750."""
+ return (os.stat(path).st_mode & 027) == 0
def checkSecure(self):
if not self.isSecure(self.base_dir):
## -385,7 +385,7 ##
if create and not os.path.exists(path):
try:
- os.makedirs(path, 0700)
+ os.makedirs(path, 0750)
except OSError:
# We might have lost a race. If so, the directory
# must exist now
## -891,7 +891,7 ##
file2.close()
remove_committed(f1)
if chmod:
- os.chmod(f2, stat.S_IREAD)
+ os.chmod(f2, stat.S_IRUSR | stat.S_IRGRP)
if sys.platform == 'win32':
# On Windows, you can't remove read-only files, so make the
You can also take a look at the patch here -> http://pastebin.com/wNLYyXvw
2.) Store the patch under name 'blob.patch' in your buildout root directory
3.) Extend your buildout configuration:
parts +=
patchblob
postinstall
[patchblob]
recipe = collective.recipe.patch
egg = ZODB3
patches = blob.patch
[postinstall]
recipe = plone.recipe.command
command =
chmod -R g+r ${buildout:directory}/var
find ${buildout:directory}/var -type d | xargs chmod g+x
update-command = ${:command}
The postinstall sections sets desired group read permissions on already existing blobs. Note, also execute permission must be given to the blob folders, that group can enter the directories.
I've tested this patch with ZODB 3.10.2 and 3.10.3.
As Martijn suggested, this should be configurable and part of the ZODB directly.
How can I execute whole sql file into database using SQLAlchemy? There can be many different sql queries in the file including begin and commit/rollback.
sqlalchemy.text or sqlalchemy.sql.text
The text construct provides a straightforward method to directly execute .sql files.
from sqlalchemy import create_engine
from sqlalchemy import text
# or from sqlalchemy.sql import text
engine = create_engine('mysql://{USR}:{PWD}#localhost:3306/db', echo=True)
with engine.connect() as con:
with open("src/models/query.sql") as file:
query = text(file.read())
con.execute(query)
SQLAlchemy: Using Textual SQL
text()
I was able to run .sql schema files using pure SQLAlchemy and some string manipulations. It surely isn't an elegant approach, but it works.
# Open the .sql file
sql_file = open('file.sql','r')
# Create an empty command string
sql_command = ''
# Iterate over all lines in the sql file
for line in sql_file:
# Ignore commented lines
if not line.startswith('--') and line.strip('\n'):
# Append line to the command string
sql_command += line.strip('\n')
# If the command string ends with ';', it is a full statement
if sql_command.endswith(';'):
# Try to execute statement and commit it
try:
session.execute(text(sql_command))
session.commit()
# Assert in case of error
except:
print('Ops')
# Finally, clear command string
finally:
sql_command = ''
It iterates over all lines in a .sql file ignoring commented lines.
Then it concatenates lines that form a full statement and tries to execute the statement. You just need a file handler and a session object.
You can do it with SQLalchemy and psycopg2.
file = open(path)
engine = sqlalchemy.create_engine(db_url)
escaped_sql = sqlalchemy.text(file.read())
engine.execute(escaped_sql)
Unfortunately I'm not aware of a good general answer for this. Some dbapi's (psycopg2 for instance) support executing many statements at a time. If the files aren't huge you can just load them into a string and execute them on a connection. For others, I would try to use a command-line client for that db and pipe the data into that using the subprocess module.
If those approaches aren't acceptable, then you'll have to go ahead and implement a small SQL parser that can split the file apart into separate statements. This is really tricky to get 100% correct, as you'll have to factor in database dialect specific literal escaping rules, the charset used, any database configuration options that affect literal parsing (e.g. PostgreSQL standard_conforming_strings).
If you only need to get this 99.9% correct, then some regexp magic should get you most of the way there.
If you are using sqlite3 it has a useful extension to dbapi called conn.executescript(str), I've hooked this up via something like this and it seemed to work: (Not all context is shown but it should be enough to get the drift)
def init_from_script(script):
Base.metadata.drop_all(db_engine)
Base.metadata.create_all(db_engine)
# HACK ALERT: we can do this using sqlite3 low level api, then reopen session.
f = open(script)
script_str = f.read().strip()
global db_session
db_session.close()
import sqlite3
conn = sqlite3.connect(db_file_name)
conn.executescript(script_str)
conn.commit()
db_session = Session()
Is this pure evil I wonder? I looked in vain for a 'pure' sqlalchemy equivalent, perhaps that could be added to the library, something like db_session.execute_script(file_name) ? I'm hoping that db_session will work just fine after all that (ie no need to restart engine) but not sure yet... further research needed (ie do we need to get a new engine or just a session after going behind sqlalchemy's back?)
FYI sqlite3 includes a related routine: sqlite3.complete_statement(sql) if you roll your own parser...
You can access the raw DBAPI connection through this
raw_connection = mySqlAlchemyEngine.raw_connection()
raw_cursor = raw_connection() #get a hold of the proxied DBAPI connection instance
but then it will depend on which dialect/driver you are using which can be referred to through this list.
For pyscog2, you can just do
raw_cursor.execute(open("my_script.sql").read())
but pysqlite you would need to do
raw_cursor.executescript(open("my_script").read())
and in line with that you would need to check the documentation of whichever DBAPI driver you are using to see if multiple statements are allowed in one execute or if you would need to use a helper like executescript which is unique to pysqlite.
Here's how to run the script splitting the statements, and running each statement directly with a "connectionless" execution with the SQLAlchemy Engine. This assumes that each statement ends with a ; and that there's no more than one statement per line.
engine = create_engine(url)
with open('script.sql') as file:
statements = re.split(r';\s*$', file.read(), flags=re.MULTILINE)
for statement in statements:
if statement:
engine.execute(text(statement))
In the current answers, I did not found a solution which works when a combination of these features in the .SQL file is present:
Comments with "--"
Multi-line statements with additional comments after "--"
Function definitions which have multiple SQL-queries ending with ";" butmust be executed as a whole statement
A found a rather simple solution:
# check for /* */
with open(file, 'r') as f:
assert '/*' not in f.read(), 'comments with /* */ not supported in SQL file python interface'
# we check out the SQL file line-by-line into a list of strings (without \n, ...)
with open(file, 'r') as f:
queries = [line.strip() for line in f.readlines()]
# from each line, remove all text which is behind a '--'
def cut_comment(query: str) -> str:
idx = query.find('--')
if idx >= 0:
query = query[:idx]
return query
# join all in a single line code with blank spaces
queries = [cut_comment(q) for q in queries]
sql_command = ' '.join(queries)
# execute in connection (e.g. sqlalchemy)
conn.execute(sql_command)
Code bellow works for me in alembic migrations
from alembic import op
import sqlalchemy as sa
from ekrec.common import get_project_root
def upgrade():
path = f'{get_project_root()}/migrations/versions/fdb8492f75b2_.sql'
op.execute(open(path).read())
I had success with David's answer here, with two slight modifications:
Use get_bind() as I was working with a Session rather than an Engine
Call cursor() on the raw connection
raw_connection = myDbSession.get_bind().raw_connection()
raw_cursor = raw_connection.cursor()
raw_cursor.execute(open("my_script.sql").read())