DB upload in mysql - loses UTF characters

DB upload in mysql - loses UTF characters - sql

I'm frequently updating my db on the server, and I run the following line from the command line:
mysqldump -u root --password=mypass mydb|mysql -h mysite.cc -u remotusr --password=remotpsw remotdb
The problem is that it loses the UTF characters along the way.
How can I keep the utf chars in cmd, or what is a better practice doing this?

( Upgrading to an answer )
As documented under mysqldump — A Database Backup Program:
--default-character-set=charset_name
Use charset_name as the default character set. See Section 10.5, “Character Set Configuration”. If no character set is specified, mysqldump uses utf8, and earlier versions use latin1.
[ deletia ]
--set-charset
Add SET NAMES default_character_set to the output. This option is enabled by default. To suppress the SET NAMES statement, use --skip-set-charset.
Therefore, unless you have settings in an option file which are overriding these defaults (you can specify --no-defaults to ensure they are not), the output from mysqldump should be more than capable of being redirected to another mysql session without loss of Unicode characters.
Instead, the conversion to a smaller character set appears to be occurring on retrieving & displaying your data from the new database. Since you are using PHP's Original MySQL API for this purpose, despite the warning in the introduction to its manual chapter (below), you should use mysql_set_charset() to set the connection character set.
This extension is not recommended for writing new code. Instead, either the mysqli or PDO_MySQL extension should be used. See also the MySQL API Overview for further help while choosing a MySQL API.

Related

Relacing a word in an db2 sql file causes DSNC105I : End of file reached while reading the command error

I have a dynamic sql file in which name of TBCREATOR changes as given in a parameter.
I use a simple python script to change the TBCREATOR=<variable here> and write the result to an output sql file.
calling this file using db2 -td# -vf <generated sql file>gives
DSNC105I : End of file reached while reading the command
Here is the file i need the TBCREATOR variable replaced:
CONNECT to 204.90.115.200:5040/DALLASC user *** using ****#
select REMARKS from sysibm.SYSCOLUMNS WHERE TBCREATOR='table' AND NAME='LCODE'
#
Here is the python script:
#!/usr/bin/python3
# #------replace table value with schema name
# print(list_of_lines)
fin = open("decrypt.sql", "rt")
#output file to write the result to
fout = open("decryptout.sql", "wt")
for line in fin:
fout.write(line.replace('table', 'ZXP214'))
fin.close()
fout.close()
After decryptout.sql is generated I call it using db2 -td# -vf decryptout.sql
and get the error given above.
Whats irritating is I have another sql file that contains exactly same data as decryptout.sql which runs smoothly with the db2 -td# -vf ... command. I tried to use the unix command cmp to compare the generated file and the one which I wrote, with the variable ZXP214 already replaced but there are no differences. What is causing this error?.
here is the file (that executes without error) I compare generated output with:
CONNECT to 204.90.115.200:5040/DALLASC user *** using ****#
select REMARKS from sysibm.SYSCOLUMNS WHERE TBCREATOR='ZXP214' AND NAME='LCODE'
#

I found that specifically on the https://ibmzxplore.influitive.com/ challenge, if you are using the java db2 command and working in the Zowe USS system (Unix System Services of zOS), there is a conflict of character sets. I believe the system will generally create files in EBCDIC format, whereas if you do
echo "CONNECT ..." > syscat.clp
the resulting file will be tagged as ISO8859-1 and will not be processed properly by db2. Instead, go to the USS interface and choose "create file", give it a folder and a name, and it will create the file untagged. You can use
ls -T
to see the tags. Then edit the file to give it the commands you need, and db2 will interoperate with it properly. Because you are creating the file with python, you may be running into similar issues. When you open the new file, use something like
open(input_file_name, mode=”w”, encoding=”cp1047”)
This makes sure the file is open as an EBCDIC file.

If you are using the Db2-LUW CLP (command line processor) that is written in c/c++ and runs on windows/linux/unix, then your syntax for CONNECT is not valid.
Unfortunately your question is ambigiously tagged so we cannot tell which Db2-server platform you actually use.
For Db2-LUW with the c/c++ written classic db2 command, the syntax for a type-1 CONNECT statement does not allow a connection-string (or partial connection string) as shown in your question. For Db2-LUW db2 clp, the target database must be externally defined (i.e not inside the script) , either via the legacy actions of both catalog tcpip node... combined with catalog database..., or must be defined in the db2dsdriver.cfg configuration file as plain XML.
If you want to use connection-strings then you can use the clpplus tool which is available for some Db2-LUW client packages, and is present on currently supported Db2-LUW servers. This lets you use Oracle style scripting with Db2. Refer to the online documentation for details.
If you not using the c/c++ classic db2 command, and you are instead using the emulated clp written in java only available with Z/OS-USS, then you must open a ticket with IBM support for that component, as that is not a matter for stackoverflow.

Paramiko, channel.recv(9999) causing confusion [duplicate]

I am using Python's Paramiko library to SSH a remote machine and fetch some output from command-line. I see a lot of junk printing along with the actual output. How to get rid of this?
chan1.send("ls\n")
output = chan1.recv(1024).decode("utf-8")
print(output)
[u'Last login: Wed Oct 21 18:08:53 2015 from 172.16.200.77\r', u'\x1b[2J\x1b[1;1H[local]cli#BENU>enable', u'[local]cli#BENU#Configure',
I want to eliminate, [2J\x1b[1;1H and u from the output. They are junk.

It's not a junk. These are ANSI escape codes that are normally interpreted by a terminal client to pretty print the output.
If the server is correctly configured, you get these only, when you use an interactive terminal, in other words, if you requested a pseudo terminal for the session (what you should not, if you are automating the session).
The Paramiko automatically requests the pseudo terminal, if you used the SSHClient.invoke_shell, as that is supposed to be used for implementing an interactive terminal. See also How do I start a shell without terminal emulation in Python Paramiko?
If you automate an execution of remote commands, you better use the SSHClient.exec_command, which does not allocate the pseudo terminal by default (unless you override by the get_pty=True argument).
stdin, stdout, stderr = client.exec_command('ls')
See also What is the difference between exec_command and send with invoke_shell() on Paramiko?
Or as a workaround, see How can I remove the ANSI escape sequences from a string in python.
Though that's rather a hack and might not be sufficient. You might have other problems with the interactive terminal, not only the escape sequences.
You particularly are probably not interested in the "Last login" message and command-prompt (cli#BENU>) either. You do not get these with the exec_command.
If you need to use the "shell" channel due to some specific requirements or limitations of the server, note that it is technically possible to use the "shell" channel without the pseudo terminal. But Paramiko SSHClient.invoke_shell does not allow that. Instead, you can create the "shell" channel manually. See Can I call Channel.invoke_shell() without calling Channel.get_pty() beforehand, when NOT using Channel.exec_command().
And finally the u is not a part of the actual string value (note that it's outside the quotes). It's an indication that the string value is in the Unicode encoding. You want that!

This is actually not junk. The u before the string indicates that this is a unicode string. The \x1b[2J\x1b[1;1H is an escape sequence. I don't know exactly what it is supposed to do, but it appears to clear the screen when I print it out.
To see what I mean, try this code:
for string in output:
print string

Arabic and English text in PostgreSQL database

I need to insert both English and Arabic text into a PostgreSQL database.
I'm running the following command via a .bat script:
psql.exe --echo-all --username=postgres --dbname=dbname -f populate.sql
populate.sql has statements like this:
insert into table1 (column1, column2) VALUES (2, 'المستخدم ');
If I do this via pgadmin, it works. The thing is I need to do this via some .sql population scripts that are ran once the application is started.
In that case I get gibberish, like this:
Ø§Ù„Ø¹Ø±Ø¶
I created the scripts in Notepad++ using the Encode in UTF-8 without BOM option, since the normal encoding in UTF-8 adds an extra character to the start of the file and some of the inserts are not made.
I'm assuming this is an encode problem, but I have yet to figure out exactly what is wrong.
The databse is in UTF-8.
Thanks in advance!

The Windows console doesn't speak Unicode of any form by default, it speaks a "native" codepage. Which codepage depends on your Windows install's language settings.
If you chcp 65001 then it'll switch into utf-8.
Overall, though, text encoding handling in batch/cmd files and the Windows command line is absolutely awful. I generally recommend that you instead put anything that isn't basically 7-bit ASCII into a separate .sql file and execute it via psql -f with a suitable client_encoding; the PGCLIENTENCODING environment variable is useful for this.
So try:
SET PGCLIENTENCODING=utf-8
psql.exe --username=postgres --dbname=dbname -f populate.sql

How to split sql in MAC OSX?

Is there any app for mac to split sql files or even script?
I have a large files which i have to upload it to hosting that doesn't support files over 8 MB.
*I don't have SSH access

You can use this : http://www.ozerov.de/bigdump/
Or
Use this command to split the sql file
split -l 5000 ./path/to/mysqldump.sql ./mysqldump/dbpart-
The split command takes a file and breaks it into multiple files. The -l 5000 part tells it to split the file every five thousand lines. The next bit is the path to your file, and the next part is the path you want to save the output to. Files will be saved as whatever filename you specify (e.g. “dbpart-”) with an alphabetical letter combination appended.
Now you should be able to import your files one at a time through phpMyAdmin without issue.
More info http://www.webmaster-source.com/2011/09/26/how-to-import-a-very-large-sql-dump-with-phpmyadmin/

This tool should do the trick: MySQLDumpSplitter
It's free and open source.
Unlike the accepted answer to this question, this app will always keep extended inserts intact so the precise form of your query doesn't matter; the resulting files will always have valid SQL syntax.
Full disclosure: I am a share holder of the company that hosts this program.

The UploadDir feature in phpMyAdmin could help you, if you have FTP access and can modify your phpMyAdmin's configuration (or are allowed to install your own instance of phpMyAdmin).
http://docs.phpmyadmin.net/en/latest/config.html?highlight=uploaddir#cfg_UploadDir

You can split into working SQL statements with:
csplit -s -f db-part db.sql "/^# Dump of table/" "{99}"
Which makes up to 99 files named 'db-part[n]' from db.sql
You can use "CREATE TABLE" or "INSERT INTO" instead of "# Dump of ..."
Also: Avoid installing any programs or uploading your data into any online service. You don't know what will be done with your information!

How to print sqlite to file in utf-8?

I've opened sqlite3.exe in windows console and made a database with special characters.
.dump showed me the sql query with special characters.
Then I changed output to file: .output file.sql
And executed the .dump command.
The special characters were missing when I imported the database using .read file.sql.
I used pragma encoding="UTF-8"; but it didn't change anything (I don't know if it should).

The Windows console makes it hard to use UTF-8 correctly, and the Microsoft compiler has lots of bugs that make it impossible to use UTF-8 with portable I/O functions.
If you have entered data in the Windows console, those strings are not valid UTF-8.
If a non-ASCII string is output with correct characters in the Windows console, it is not valid UTF-8.
To ensure that your data is valid UTF-8, you have to go through files.
Alternatively, use any SQLite shell that does not use the console (such as the SQLite Manager Firefox extension).

This work fine for CP852, but could be used for any codepage known by iconv.
chcp 852 >NUL
echo INSERT into NAMES (name,timestamp) VALUES ('ěščřžýáíé','1429001515'); | iconv.exe -f cp852 -t utf-8 | ..\utilities\sqlite3.exe test.db
Windows can handle unicode internaly, but if you print it on console (by 'echo' command for example) than character mismatch. Using on-the-fly reencoding solve this problem.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas