Binary file to blob column population using Jython - jython

I want to populate a database table having blob column with the content of a binary file that will be placed on the server.The name of the file will not be constant and will keep on varying .Is this possible in Jython?

It is easy:
def insert_file_to_db(db_url, usr, passwd, file_name):
db = DriverManager.getConnection(db_url, usr, passwd)
blob = FileInputStream(file_name)
pstm = db.prepareStatement("insert into my_blobs (content) values (?)")
pstm.setBinaryStream(1, blob)
rs = pstm.execute()
blob.close()
(tested with Informix JDBC driver)

Related

Unable to store bytes data to sqlserver

I am planning to store hashed value of password in SQL Server database when a user signs up and when the same user logs in, will compare user entered password with the stored hashed value.
I am using following piece of code to generate hashed value of password and want to insert same value in the database with column datatype varbinary(1000).
I have used following code snippets to insert into database and both options have failed.
insert into users.dbo.allusers values (123456789,
b'\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"\'\xfdbt\x03\xc81F\xc59\xd03\xcfXs\x88\xff\x95bg\x7f\xd1\xf6\xfc\x98\xe5x~c\x9eb\x91\x89\x80{\x14i0\x99f&\xa5\\e?\xf2\xbd\x06\xf7\xd0',
'a#a.com',
'a',
'b'
)
insert into users.dbo.allusers values (123456789,
convert(varbinary(1000), b'\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"\'\xfdbt\x03\xc81F\xc59\xd03\xcfXs\x88\xff\x95bg\x7f\xd1\xf6\xfc\x98\xe5x~c\x9eb\x91\x89\x80{\x14i0\x99f&\xa5\\e?\xf2\xbd\x06\xf7\xd0', 1),
'a#a.com',
'a',
'b'
)
The error I am getting is
SQL Error [102] [S0001]: Incorrect syntax near '\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"'.
I am using cloudsql (gcp product) with SQL Server 2017 standard and dbeaver client to insert data. Any help is really appreciated.
Based on comments I am editing my question. Also used python to insert data to SQL Server using following flask code
def generate_password(password_value):
salt = os.urandom(32)
key = hashlib.pbkdf2_hmac('sha256', password_value.encode('utf-8'), salt, 100000)
# Store them as:
storage = salt + key
return storage
#app.route('/add_new_user', methods = ['POST'])
def add_new_user():
data = request.get_json(silent=True, force=True)
cpf = data.get('cpf')
password = data.get('password')
email = data.get('email')
fname = data.get('fname')
lname = data.get('lname')
password = generate_password(password)
mssqlhost = '127.0.0.1'
mssqluser = 'sqlserver'
mssqlpass = 'sqlserver'
mssqldb = 'users'
try:
# - [x] Establish Connection to db
mssqlconn = pymssql.connect(
mssqlhost, mssqluser, mssqlpass, mssqldb)
print("Connection Established to MS SQL server.")
cursor = mssqlconn.cursor()
stmt = "insert into users.dbo.allusers (cpf, password, email, fname, lname) values (%s,%s,%s,%s,%s)"
data = f'({cpf}, {password}, {email}, {fname}, {lname})'
print(data)
cursor.execute(stmt)
mssqlconn.commit()
mssqlconn.close()
return {"success":"true"}
except Exception as e:
print(e)
return {"success":"false"}
I get different error in command prompt
more placeholders in sql than params available
because data already has quotes because of hash value (printed data)
(123456789, b'6\x17DnOP\xbb\xd0\xdbL\xb6"}\xda6M\x1dX\t\xdd\x12\xec\x059\xbb\xe1/\x1c|\xea\x038\xfd\r\xd1\xcbt\xd6Pe\xcd<W\n\x9f\x89\xd7J\xc1\xbb\xe1\xd0\xd2n\xa7j}\xf7\xf5:\xba0\xab\xbe', a#a.com, a, b)
A binary literal in TSQL looks like 0x0A23...
insert into dbo.allusers(cpf, password, email, fname, lname)
values
(
123456789,
0xC8C2069F8E96. . .,
'a#a.com',
'a',
'b'
)

how to refer values based on column names in python

i am trying to extract and read the data from a SQL query.
Below is the sample data from SQL developer:
target_name expected_instances environment system_name hostname
--------------------------------------------------------------------------------------
ORAUAT_host1 1 UAT ORAUAT_host1_sys host1.sample.net
ORAUAT_host2 1 UAT ORAUAT_host1_sys host2.sample.net
Normally i pass the system_name to the query (which has a bind variable for system_name) and get the data as a list,but not the column names.
Is there a way in Python to retrieve the data along with the column names and reference values with column name like target_name[0] giving the value ORAUAT_host1?Please suggest.Thanks.
If what you want is to get the column names from the table you are querying, you can do something like this:
My example is printing a csv file
import os
import sys
import cx_Oracle
db = cx_Oracle.connect('user/pass#host:1521/service_name')
SQL = "select * from dual"
print(SQL)
cursor = db.cursor()
f = open("C:\dual.csv", "w")
writer = csv.writer(f, lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
r = cursor.execute(SQL)
#this takes the column names
col_names = [row[0] for row in cursor.description]
writer.writerow(col_names)
for row in cursor:
writer.writerow(row)
f.close()
The way to print the columns is using the method description of the cursor object
Cursor.description
This read-only attribute is a sequence of 7-item sequences. Each of
these sequences contains information describing one result column:
(name, type, display_size, internal_size, precision, scale, null_ok).
This attribute will be None for operations that do not return rows or
if the cursor has not had an operation invoked via the execute()
method yet.
The type will be one of the database type constants defined at the
module level.
https://cx-oracle.readthedocs.io/en/latest/api_manual/cursor.html#

Create SQL table from parquet files

I am using R to handle large datasets (largest dataframe 30.000.000 x 120). These are stored in Azure Datalake Storage as parquet files, and we would need to query these daily and restore these in a local SQL database. Parquet files can be read without loading the data into memory, which is handy. However, creating SQL tables from parquuet files is more challenging as I'd prefer not to load the data into memory.
Here is the code I used. Unfortunately, this is not a perfect reprex as the SQL database need to exist for this to work.
# load packages
library(tidyverse)
library(arrow)
library(sparklyr)
library(DBI)
# Create test data
test <- data.frame(matrix(rnorm(20), nrow=10))
# Save as parquet file
write_parquet(test2, tempfile(fileext = ".parquet"))
# Load main table
sc <- spark_connect(master = "local", spark_home = spark_home_dir())
test <- spark_read_parquet(sc, name = "test_main", path = "/tmp/RtmpeJBgyB/file2b5f4764e153.parquet", memory = FALSE, overwrite = TRUE)
# Save into SQL table
DBI::dbWriteTable(conn = connection,
name = DBI::Id(schema = "schema", table = "table"),
value = test)
Is it possible to write a SQL table without loading parquet files into memory?
I lack the experience with T-sql bulk import and export but this is likely where you'll find your answer.
library(arrow)
library(DBI)
test <- data.frame(matrix(rnorm(20), nrow=10))
f <- tempfile(fileext = '.parquet')
write_parquet(test2, f)
#Upload table using bulk insert
dbExecute(connection,
paste("
BULK INSERT [database].[schema].[table]
FROM '", gsub('\\\\', '/', f), "' FORMAT = 'PARQUET';
")
)
here I use T-sql's own bulk insert command.
Disclaimer I have not yet used this command in T-sql, so it may riddled with error. For example I can't see a place to specify snappy compression within the documentation, although it can be specified if one instead defined a custom file format with CREATE EXTERNAL FILE FORMAT.
Now the above only inserts into an existing table. For your specific case, where you'd like to create a new table from the file, you would likely be looking more for OPENROWSET using CREATE TABLE AS [select statement].
column_definition <- paste(names(column_defs), column_defs, collapse = ',')
dbExecute(connection,
paste0("CREATE TABLE MySqlTable
AS
SELECT *
FROM
OPENROWSET(
BULK '", f, "' FORMAT = 'PARQUET'
) WITH (
", paste0([Column definitions], ..., collapse = ', '), "
);
")
where column_defs would be a named list or vector describing giving the SQL data-type definition for each column. A (more or less) complete translation from R data types to is available on the T-sql documentation page (Note two very necessary translations: Date and POSIXlt are not present). Once again disclaimer: My time in T-sql did not get to BULK INSERT or similar.

Is it possible to change the delimiter of AWS athena output file

Here is my sample code where I create a file in S3 bucket using AWS Athena. The file by default is in csv format. Is there a way to change it to pipe delimiter ?
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
client = boto3.client('athena')
# Start Query Execution
response = client.start_query_execution(
QueryString="""
select * from srvgrp
where category_code = 'ACOMNCDU'
""",
QueryExecutionContext={
'Database': 'tmp_db'
},
ResultConfiguration={
'OutputLocation': 's3://tmp-results/athena/'
}
)
queryId = response['QueryExecutionId']
print('Query id is :' + str(queryId))
There is a way to do that with CTAS query.
BUT:
This is a hacky way and not what CTAS queries are supposed to be used for, since it will also create a new table definition in AWS Glue Data Catalog.
I'm not sure about performance
CREATE TABLE "UNIQU_PREFIX__new_table"
WITH (
format = 'TEXTFILE',
external_location = 's3://tmp-results/athena/__SOMETHING_UNIQUE__',
field_delimiter = '|',
bucketed_by = ARRAY['__SOME_COLUMN__'],
bucket_count = 1
) AS
SELECT *
FROM srvgrp
WHERE category_code = 'ACOMNCDU'
Note:
It is important to set bucket_count = 1, otherwise Athena will create multiple files.
Name of the table in CREATE_TABLE ... also should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime.
External location should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime. I would advise to embed table name into S3 path.
You need to include in bucketed_by only one of the columns from SELECT.
At some point you would need to clean up AWS Glue Data Catalog from all table defintions that were created in such way

How to insert and display Chinese character in oracle database?

i have inserted Chinese character into oracle database. but the value is not suitable with the character, the value such as ¿¿ :¿¿¿!. the data type is varchar2. same problem when i want to display the data. How to store and display Chinese character into oracle 10g database ?
To support unicode character use driver Properties as shown :-
Class.forName("oracle.jdbc.driver.OracleDriver");
String url = "jdbc:oracle:thin:#10.52.45.36:1521:ORCL";
Properties connectionProps = new Properties();
connectionProps.put("useUnicode","true"); //Property 1
connectionProps.put("characterEncoding","UTF-8" ); //Property 2
connectionProps.put("user", "SYSTEM");
connectionProps.put("password", "Root-123");
Connection conn = DriverManager.getConnection(url,connectionProps);
Statement stmt = conn.createStatement();
And ensure that the datatype in the oracle table is NVARCHAR/NCHAR. Also, while inserting use function such as TO_NCHAR('-some-unicode-value-') or N'-some-unicode-value-'
This should be it!