mysql optimized query execution - scripting

As part of an ongoing research work, I am checking if an URL exists or not using the cURL command. I have been executing a shell script for couple of days and it is doing some updates for each URL in my database. However, the script seems to update around only 100,000 rows in a day.
I was thinking if I could write the values in a file first and then do the updates, the execution might be faster.
I am connecting to the database using the command line.
mysql -h servername -u username -ppassword databasename "Update Query"
For example, instead of connecting to the database 2 million times like above from the command line and updating 2 million rows, I am planning to connect to the database only once from the command line and update 2 million rows from the file.
So is the second approach better than the first one or the time difference would be negligible?

Three approaches.
You could using load data infile
You could build up a .sql file with all of the updates you need.
You could use something other than a CLI to connect to the URLs and DB. In other words, not using "curl" and "mysql" commands, but using a real programming language and provided libraries for checking URLs and updating databases.
Any of those would probably be faster. Though you'll likely get more speed improvement by making the http calls in parallel. You can do that more easily with a real programming language.

Related

Execute 50k insert queries in postgresql dbeaver

Is there any possibility to insert 50k datasets into a postgresql database using dbeaver?
Locally, it worked fine for me, it took me 1 minute, because I also changed the memory settings of postgresql and dbeaver. But for our development environment, 50k queries did not work.
Is there a way to do this anyway or do I need to split the queries and do for example 10k queries 5 times? Any trick?
EDIT: with "did not work" I mean I got an error after 2500 seconds saying something like "too much data ranges"
If you intend to execute a giant script sql via interface: don't even try.
If you have a csv file, DBeaver gives you a tool:
Even better, as described in comments, copy command is the tool.
If you have a giant SQL file you need to use command line, like:
psql -h host -U username -d myDataBase -a -f myInsertFile
Like in this post: Run a PostgreSQL .sql file using command line arguments

Import huge SQL file into SQL Server

I use the sqlcmd utility to import a 7 GB large SQL dump file into a remote SQL Server. The command I use is this:
sqlcmd -S IP address -U user -P password -t 0 -d database -i file.sql
After about 20-30 min the server regularly responds with:
Sqlcmd: Error: Scripting error.
Any pointers or advice?
I assume file.sql is just a bunch of INSERT statements. For a large amount of rows, I suggest using the BCP command-line utility. This will perform orders of magnitude faster than individual INSERT statements.
You could also bulk insert data using the T-SQL BULK INSERT command. In that case, the file path needs to be accessible by the database server (i.e. UNC path or copied to a drive on the server) and along with needed permissions. See http://msdn.microsoft.com/en-us/library/ms188365.aspx.
Why not use SSIS. While I have a certificate as a DBA, I always try to use the right tool for the job.
Here are some reasons to use SSIS.
1 - Use can still use fast-load, bulk copy. Make sure you set the batch size.
2 - Error handling is much better.
However, if you are using fast-load, either the batch commits or it gets tossed.
If you are using single record, you can direct each error row to a separate destination.
3 - You can perform transformations on the source data before loading it into the destination.
In short, Extract Translate Load.
4 - SSIS loves memory and buffers. If you want to get really in depth, read some articles from Matt Mason or Brian Knight.
Last but not least, the LAN/WAN always plays a factor if the job is not running on the target server with the input file on a local disk.
If you are on the same backbone with a good pipe, things go fast.
In summary, yeah you can use BCP. It is great for little quick jobs. Anything complicated with robust error handling should be done with SSIS.
Good luck,

Are there any programs that will shrink the size of a sql script file?

I have a SQL script which is extremely large (about 700 megabytes). I am wondering if there is a good way to reduce the size of the script?
I know there are code minimizers for JavaScript and am looking for one to use with SQL scripts.
I am not looking to get performance on the SQL script. I am trying to make the file size smaller. Removing excess whitespace. Keeping name-qualification down so that the script file sizes can be smaller.
If I attempt to load the file in SQL Server Management Studio I get this error.
Not enough storage is available to
process this command. (Exception from
HRESULT: 0x80070008) (mscorlib)
What's in this script of 700MB?! I would hope that there are some similarities/repetitions that would allow it to shorten the file.
Just some guesses:
Instead of inserting a million records using Insert statements, use a bulk loading tool
Instead of updating a number of individual records, try to batch updates to the same value into one (e.g. Update tab set col=1 where id in (..) instead of individual updates)
long manipulations can be defined as a stored procedure (before running the script) and the script would only have to call the stored proc
Of course, splitting the script up into smaller portions and calling each one from a simple batch file would work too. But I'd be a little worried about performance (how long does the execution take?!) and would look for some faster ways.
What about breaking your script into several small files, and calling those files from a single master script?
This link describes how to do it from a stored procedure.
Or you can do it from a batch file like this:
REM =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
REM Define widely-used variables up here so they can be changed in one place
REM Search for "sqlcmd.exe" and make sure this path is valid for you
REM =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
set sqlcmd="C:\Program Files\Microsoft SQL Server\100\Tools\Binn\sqlcmd.exe"
set uname="your_uname_here"
set pwd="your_pwd_here"
set database="your_db_name_here"
set server="your_server_name_here"
%sqlcmd% -S %server% -d %database% -U %uname% -P %pwd% -i "c:\script1.sql"
%sqlcmd% -S %server% -d %database% -U %uname% -P %pwd% -i "c:\script2.sql"
%sqlcmd% -S %server% -d %database% -U %uname% -P %pwd% -i "c:\script3.sql"
pause
I like the batch file approach myself, because it is easier to tinker with it, and you can schedule it as a windows job.
Make sure the .BAT file is in a folder with the appropriate security restrictions, since it has your credentials in a plain text .BAT file.
gzip should do.
SQL is much harder to shrink, the field, table names and commands need to be what they are. Plus, you wouldn't just want to rewrite the commands as something shorter because it could have implications on performance.
Depending on the DBMS that you use, it may allow short names for commands, and then there might be a converter.
(Answering this because it is the top item returned when I searched for "SQL script size")
I got the same error when trying to load a large script into Management Studio. In my case I was trying to downgrade a database from SQL2008 R2 to SQL 2008 by using the SQL Server script generator, which created a 700mb structure and data .sql file.
To get around it I used the command line to run the script instead:
C:>sqlcmd -S [SQLSERVER\INSTANCE] -i [FILELOCATION\FILENAME].sql
Hopefully this helps someone else.
Compress the sql file will have the most compression ratio.
Minimizing the txt sql file will reduce some bytes/kilobytes per mega.. is not worth...
The better approach is to create a "function" to unzip and read the file. The best benefit I guess.
Today, filesize shouldn't be a problem. Dial-up connection? Floppy disks?
pg-minify can do it, and not just for PostgreSQL, but for most notations, including MS, MySql, etc.

Unable to update the table of SQL Server with BCP utility

We have a database table that we pre-populate with data as part of our deployment procedure. Since one of the columns is binary (it's a binary serialized object) we use BCP to copy the data into the table.
So far this has worked very well, however, today we tried this technique on a Windows Server 2008 machine for the first time and noticed that not all of the columns were being updated. Out of the 31 rows that are normally inserted as part of this operation, only 2 rows actually had their binary columns populated correctly. The other 29 rows simply had null values for their binary column. This is the first situation where we've seen an issue like this and this is the same .dat file that we use for all of our deployments.
Has anyone else ever encountered this issue before or have any insight as to what the issue could be?
Thanks in advance,
Jeremy
My guess is that you're using -c or -w to dump as text, and it's choking on a particular combination of characters it doesn't like and subbing in a NULL. This can also happen in Native mode if there's no format file. Try the following and see if it helps. (Obviously, you'll need to add the server and login switches yourself.)
bcp MyDatabase.dbo.MyTable format nul -f MyTable.fmt -n
bcp MyDatabase.dbo.MyTable out MyTable.dat -f MyTable.fmt -k -E -b 1000 -h "TABLOCK"
This'll dump the table data as straight binary with a format file, NULLs, and identity values to make absolutely sure everything lines up. In addition, it'll use batches of 1000 to optimize the data dump. Then, to insert it back:
bcp MySecondData.dbo.MyTable in MyTable.dat -f MyTable.fmt -n -b 1000
...which will use the format file, data file, and set batching to increase the speed a little. If you need more speed than that, you'll want to look at BULK INSERT, FirstRow/LastRow, and loading in parallel, but that's a bit beyond the scope of this question. :)

How do I handle large SQL SERVER batch inserts?

I'm looking to execute a series of queries as part of a migration project. The scripts to be generated are produced from a tool which analyses the legacy database then produces a script to map each of the old entities to an appropriate new record. THe scripts run well for small entities but some have records in the hundreds of thousands which produce script files of around 80 MB.
What is the best way to run these scripts?
Is there some SQLCMD from the prompt which deals with larger scripts?
I could also break the scripts down into further smaller scripts but I don't want to have to execute hundreds of scripts to perform the migration.
If possible have the export tool modified to export a BULK INSERT compatible file.
Barring that, you can write a program that will parse the insert statements into something that BULK INSERT will accept.
BULK INSERT uses BCP format files which come in traditional (non-XML) or XML. Does it have to get a new identity and use it in a child and you can't get away with using SET IDENTITY INSERT ON because the database design has changed so much? If so, I think you might be better off using SSIS or similar and doing a Merge Join once the identities are assigned. You could also load the data into staging tables in SQL using SSIS or BCP and then use regular SQL (potentially within SSIS in a SQL task) with the OUTPUT INTO feature to capture the identities and use them in the children.
Just execute the script. We regularly run backup / restore scripts that are 100's Mb in size. It only takes 30 seconds or so.
If it is critical not to block your server for this amount to time, you'll have to really split it up a bit.
Also look into the -tab option of mysqldump with outputs the data using TO OUTFILE, which is more efficient and faster to load.
It sounds like this is generating a single INSERT for each row, which is really going to be pretty slow. If they are all wrapped in a transaction, too, that can be kind of slow (although the number of rows doesn't sound that big that it would cause a transaction to be nearly impossible - like if you were holding a multi-million row insert in a transaction).
You might be better off looking at ETL (DTS, SSIS, BCP or BULK INSERT FROM, or some other tool) to migrate the data instead of scripting each insert.
You could break up the script and execute it in parts (especially if currently it makes it all one big transaction), just automate the execution of the individual scripts using PowerShell or similar.
I've been looking into the "BULK INSERT" from file option but cannot see any examples of the file format. Can the file mix the row formats or does it have to always be consistent in a CSV fashion? The reason I ask is that I've got identities involved across various parent / child tables which is why inserts per row are currently being used.