how to get the finally sql string in the beg orm - sql

I want to get the finally sql string in the beego's orm.
but I can not find the interface that can get sql string.
I want to make a Logging for database Operating.
I want to find other ways that do not need to turn on the orm.Debug.
orm.Debug = false

I think you want to use orm.Debug mode:
Setting orm.Debug to true will print out SQL queries
It may cause performance issues. It's not recommend to be used in production env.
....
Prints to os.Stderr by default.
You can change it to your own io.Writer
More info

Related

Fetching big SQL table in the web app session

I'm quite new in web app so apologize if my question is abit basic. I'm developing a Web app with R shiny where the inputs are very large tables from Azure SQL server. They are 20 tables each in the order of hundred-thousand rows and hundreds of columns containing numbers, Characters and etc. I have no problem calling them, my main issue is that it takes so much time to fetch everything from Azure SQL server. It takes approximately 20 minutes. So the user of the web app needs to wait quite a long.
I'm using DBI package as follows:
db_connect <- function(database_config_name){
dbConfig <- config::get(database_config_name)
connection <- DBI::dbConnect(odbc::odbc(),
Driver = dbConfig$driver,
Server = dbConfig$server,
UID = dbConfig$uid,
PWD = dbConfig$pwd,
Database = dbConfig$database,
encoding = "latin1"
)
return(connection)
}
and then fetching tables by :
connection <- db_connect(db_config_name)
table <- dplyr::tbl(con, dbplyr::in_schema(fetch_schema_name(db_config_name,table_name,data_source_type), fetch_table_name(db_config_name,table_name,data_source_type)))
I searched a lot but didn't come across a good solution, I appreciate any solutions can tackle this problem.
I work with R accessing SQL Server (not Azure) daily. For larger data (as in your example), I always revert to using the command-line tool sqlcmd, it is significantly faster. The only pain point for me was learning the arguments and working around the fact that it does not return proper CSV, there is post-query munging required. You may have an additional pain-point of having to adjust my example to connect to your Azure instance (I do not have an account).
In order to use this in a shiny environment and preserve its interactivity, I use the processx package to start the process in the background and then poll its exit status periodically to determine when it has completed.
Up front: this is mostly a "loose guide", I do not pretend that this is a fully-functional solution for you. There might be some rough-edges that you need to work through yourself. For instance, while I say you can do it asynchronously, it is up to you to work the polling process and delayed-data-availability into your shiny application. My answer here provides starting the process and reading the file once complete. And finally, if encoding= is an issue for you, I don't know if sqlcmd does non-latin correctly, and I don't know if or how to fix this with its very limited, even antiquated arguments.
Steps:
Save the query into a text file. Short queries can be provided on the command-line, but past some point (128 chars? I don't know that it's clearly defined, and have not looked enough recently) it just fails. Using a query-file is simple enough and always works, so I always use it.
I always use temporary files for each query instead of hard-coding the filename; this just makes sense. For convenience (for me), I use the same tempfile base name and append .sql for the query and .csv for the returned data, that way it's much easier to match query-to-data in the temp files. It's a convention I use, nothing more.
tf <- tempfile()
# using the same tempfile base name for both the query and csv-output temp files
querytf <- paste0(tf, ".sql")
writeLines(query, querytf)
csvtf <- paste0(tf, ".csv")
# these may be useful in troubleshoot, but not always [^2]
stdouttf <- paste0(tf, ".stdout")
stderrtf <- paste0(tf, ".stderr")
Make the call. I suggest you see how fast this is in a synchronous way first to see if you need to add an async query and polling in your shiny interface.
exe <- "/path/to/sqlcmd" # or "sqlcmd.exe"
args <- c("-W", "b", "-s", "\037", "-i", querytf, "-o", csvtf,
"-S", dbConfig$server, "-d", dbConfig$database,
"-U", dbConfig$uid, "-P", dbConfig$pwd)
## as to why I use "\037", see [^1]
## note that the user id and password will be visible on the shiny server
## via a `ps -fax` command-line call
proc <- processx::process$new(command = exe, args = args,
stdout = stdouttf, stderr = stderrtf) # other args exist
# this should return immediately, and should be TRUE until
# data retrieval is done (or error)
proc$is_alive()
# this will hang (pause R) until retrieval is complete; if/when you
# shift to asynchronous queries, do not do this
proc$wait()
One can use processx::run instead of process$new and proc$wait(), but I thought I'd start you down this path in case you want/need to go asynchronous.
If you go with an asynchronous operation, then periodically check (perhaps every 3 or 10 seconds) proc$is_alive(). Once that returns FALSE, you can start processing the file. During this time, shiny will continue to operate normally. (If you do not go async and therefore choose to proc$wait(), then shiny will hang until the query is complete.)
If you make a mistake and do not proc$wait() and try to continue with reading the file, that's a mistake. The file may not exist, in which case it will err with No such file or directory. The file may exist, perhaps empty. It may exist and have incomplete data. So really, make a firm decision to stay synchronous and therefore call proc$wait(), or go asynchronous and poll periodically until proc$is_alive() returns FALSE.
Reading in the file. There are three "joys" of using sqlcmd that require special handling of the file.
It does not do embedded quotes consistently, which is why I chose to use "\037" as a separator. (See [^1].)
It adds a line of dashes under the column names, which will corrupt the auto-classing of data when R reads in the data. For this, we do a two-step read of the file.
Nulls in the database are the literal NULL string in the data. For this, we update the na.strings= argument when reading the file.
exitstat <- proc$get_exit_status()
if (exitstat == 0) {
## read #1: get the column headers
tmp1 <- read.csv(csvtf, nrows = 2, sep = "\037", header = FALSE)
colnms <- unlist(tmp1[1,], use.names = FALSE)
## read #2: read the rest of the data
out <- read.csv(csvtf, skip = 2, header = FALSE, sep = "\037",
na.strings = c("NA", "NULL"), quote = "")
colnames(out) <- colnms
} else {
# you should check both stdout and stderr files, see [^2]
stop("'sqlcmd' exit status: ", exitstat)
}
Note:
After a lot of pain with several issues (some in sqlcmd.exe, some in data.table::fread and other readers, all dealing with CSV-format non-compliance), at one point I chose to stop working with comma-delimited returns, instead opting for the "\037" field Delimiter. It works fine with all CSV-reading tools and has fixed so many problems (some not mentioned here). If you're not concerned, feel free to change the args to "-s", "," (adjusting the read as well).
sqlcmd seems to use stdout or stderr in different ways when there are problems. I'm sure there's rationale somewhere, but the point is that if there is a problem, check both files.
I added the use of both stdout= and stderr= because of a lot of troubleshooting I did, and continue to do if I munge a query. Using them is not strictly required, but you might be throwing caution to the wind if you omit those options.
By the way, if you choose to only use sqlcmd for all of your queries, there is no need to create a connection object in R. That is, db_connect may not be necessary. In my use, I tend to use "real" R DBI connections for known-small queries and the bulk sqlcmd for anything above around 10K rows. There is a tradeoff; I have not measured it sufficiently in my environment to know where the tipping point is, and it is likely different in your case.

Flask-migrate change db before upgrade

I have a multi-tenancy structure set up where each client has a schema set up for them. The structure mirrors the "parent" schema, so any migration that happens needs to happen for each schema identically.
I am using Flask-Script with Flask-Migrate to handle migrations.
What I tried so far is iterating over my schema names, building a URI for them, scoping a new db.session with the engine generated from the URI, and finally running the upgrade function from flask_migrate.
#manager.command
def upgrade_all_clients():
clients = clients_model.query.all()
for c in clients:
application.extensions["migrate"].migrate.db.session.close_all()
application.extensions["migrate"].migrate.db.session = db.create_scoped_session(
options={
"bind": create_engine(generateURIForSchema(c.subdomain)),
"binds": {},
}
)
upgrade()
return
I am not entirely sure why this doesn't work, but the result is that it only runs the migration for the db that was set up when the application starts.
My theory is that I am not changing the session that was originally set up when the manager script runs.
Is there a better way to migrate each of these schemas without setting multiple binds and using the --multidb parameter? I don't think I can use SQLALCHEMY_BINDS in the config since these schemas need to be able to be dynamically created/destroyed.
For those who are encountering the same issue, the answer to my specific situation was incredibly simple.
#manager.command
def upgrade_all_clients():
clients = clients_model.query.all()
for c in clients:
print("Upgrading client '{}'...".format(c.subdomain))
db.engine.url.database = c.subdomain
_upgrade()
return
The database attribute of the db.engine.url is what targets the schema. I don't know if this is the best way to solve this, but it does work and I can migrate each schema individually.

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Someone injecting to my website and running update scripts on my site

I have found that all data of my 5 YEARS old site tables was suddenly mixed up
some data that cannot be updated via any existing sp is updated.
After long search of sp i came to conclusion that somebody messing with my site.
I assume that its done via sql injections.
I have huge amount of trafic on my site 24/7 ,site has more than 100 pages, and the logs are now just showing what user entered what page...more logs will slow down the site even more. so now i need to act efficiently.
1.What is the best way to find where someone injecting
2.how to log his ip and time of injection
never done this before, read lots of mixed opinions on google. please advice your best practise.
Instead of tracking down the "bad guys" you should focus on restoring your database and making your code resistant or invulnerable to injections, not sure whats the best way for asp.net but in java it is well known that prepared statements make it impossible to have somebody peform a sql injection on your data.
Check out this link for how to improve your code in asp.net:
Classic ASP SQL Injection Protection
This is some code I use on my page as a "catch all" attempt for injection attempts via query strings (ie, data sent through a URL):
trap = 0
ref = lcase(request.querystring)
if ref <> "" then
badChars = array("OR 1=1", "select", "drop", "shutdown", "--", "insert", "delete", "waitfor", "union", "0x31", "convert", "truncate", "sysobjects")
cn = 0
for i = 0 to uBound(badChars)
if instr(ref,badchars(i)) > 1 then cn=cn+1
next
if cn >=2 then trap = 1
end if
if trap = 1 then .... ban user ip code here
You could simply put "if trap = 1 then response.end" which would stop any further action on the page. I prefer to ban the IP for an hour.
This should also work with request.form for form input.
You may also want to sanitize your variables that take form input.
data=request.form("emailaddress")
data = replace(data,"'","")
data = replace(data,"union","")
etc.

Problem during SQL Bulk Load

we've got a real confusing problem. We're trying to test an SQL Bulk Load using a little app we've written that passes in the datafile XML, the schema, and the SQL database connection string.
It's a very straight-forward app, here's the main part of the code:
SQLXMLBULKLOADLib.SQLXMLBulkLoad4Class objBL = new SQLXMLBULKLOADLib.SQLXMLBulkLoad4Class();
objBL.ConnectionString = "provider=sqloledb;Data Source=SERVER\\SERVER; Database=Main;User Id=Username;Password=password;";
objBL.BulkLoad = true;
objBL.CheckConstraints = true;
objBL.ErrorLogFile = "error.xml";
objBL.KeepIdentity = false;
objBL.Execute("schema.xml", "data.xml");
As you can see, it's very simple but we're getting the following error from the library we're passing this stuff to: Interop.SQLXMLBULKLOADLib.dll.
The message reads:
Failure: Attempted to read or write protected memory. This is often an indication that other memory has been corrupted
We have no idea what's causing it or what it even means.
Before this we first had an error because SQLXML4.0 wasn't installed, so that was easy to fix. Then there was an error because it couldn't connect to the database (wrong connection string) - fixed. Now there's this and we are just baffled.
Thanks for any help. We're really scratching our heads!
I am not familiar with this particular utility (Interop.SQLXMLBULKLOADLib.dll), but have you checked that your XML validates to its schema .xsd file? Perhaps the dll could have issues with loading the xml data file into memory structures if it is invalid?
I try to understand your problem ,but i have more doubt in that,
If u have time try access the below link ,i think it will definitely useful for you
link text
I know I did something that raised this error message once, but (as often happens) the problem ended up having nothing to do with the error message. Not much help, alas.
Some troubleshooting ideas: try to determine the actual SQL command being generated and submitted by the application to SQL Server (SQL Profiler should help here), and run it as "close" to the database as possible--from within SSMS, using SQLCMD, direct BCP call, whatever is appropriate. Detailing all tests you make and the results you get may help.