Ruby - Implement an SQL server?

Ruby - Implement an SQL server? - sql

I have an application which has a Ruby API. I would like to link to this application from a SQL server system.
Is there a way for me to implement a Ruby SQL server which receives SQL statements and returns the requested data from the applications. Is it then possible to hook into this from an SQL server applications?
E.G.
# request as string like "SELECT * FROM MAIN_TABLE WHERE SOME_COLUMN = <SOME DATA>"
SQLEngine.OnRequest do |request|
Application.RunSQL(request)
end
P.S. I don't have any experience with SQL server, so have no idea how one would go about this...
Note: I'm not asking how I can query an SQL server database, I'm asking how I can implement an SQL server connection.

After some searching I found a few other stack overflow questions about how to make Database Drivers in other languages:
creating a custom odbc driver for application
Implementing a ODBC driver
Creating a custom ODBC driver
Alternatives to writing an ODBC driver
Potentially these will be useful for others going down this road, the most hopeful suggestion is implementing the wire protocol, of which one has been made in python which should be relatively easy to port
import SocketServer
import struct
def char_to_hex(char):
retval = hex(ord(char))
if len(retval) == 4:
return retval[-2:]
else:
assert len(retval) == 3
return "0" + retval[-1]
def str_to_hex(inputstr):
return " ".join(char_to_hex(char) for char in inputstr)
class Handler(SocketServer.BaseRequestHandler):
def handle(self):
print "handle()"
self.read_SSLRequest()
self.send_to_socket("N")
self.read_StartupMessage()
self.send_AuthenticationClearText()
self.read_PasswordMessage()
self.send_AuthenticationOK()
self.send_ReadyForQuery()
self.read_Query()
self.send_queryresult()
def send_queryresult(self):
fieldnames = ['abc', 'def']
HEADERFORMAT = "!cih"
fields = ''.join(self.fieldname_msg(name) for name in fieldnames)
rdheader = struct.pack(HEADERFORMAT, 'T', struct.calcsize(HEADERFORMAT) - 1 + len(fields), len(fieldnames))
self.send_to_socket(rdheader + fields)
rows = [[1, 2], [3, 4]]
DRHEADER = "!cih"
for row in rows:
dr_data = struct.pack("!ii", -1, -1)
dr_header = struct.pack(DRHEADER, 'D', struct.calcsize(DRHEADER) - 1 + len(dr_data), 2)
self.send_to_socket(dr_header + dr_data)
self.send_CommandComplete()
self.send_ReadyForQuery()
def send_CommandComplete(self):
HFMT = "!ci"
msg = "SELECT 2\x00"
self.send_to_socket(struct.pack(HFMT, "C", struct.calcsize(HFMT) - 1 + len(msg)) + msg)
def fieldname_msg(self, name):
tableid = 0
columnid = 0
datatypeid = 23
datatypesize = 4
typemodifier = -1
format_code = 0 # 0=text 1=binary
return name + "\x00" + struct.pack("!ihihih", tableid, columnid, datatypeid, datatypesize, typemodifier, format_code)
def read_socket(self):
print "Trying recv..."
data = self.request.recv(1024)
print "Received {} bytes: {}".format(len(data), repr(data))
print "Hex: {}".format(str_to_hex(data))
return data
def send_to_socket(self, data):
print "Sending {} bytes: {}".format(len(data), repr(data))
print "Hex: {}".format(str_to_hex(data))
return self.request.sendall(data)
def read_Query(self):
data = self.read_socket()
msgident, msglen = struct.unpack("!ci", data[0:5])
assert msgident == "Q"
print data[5:]
def send_ReadyForQuery(self):
self.send_to_socket(struct.pack("!cic", 'Z', 5, 'I'))
def read_PasswordMessage(self):
data = self.read_socket()
b, msglen = struct.unpack("!ci", data[0:5])
assert b == "p"
print "Password: {}".format(data[5:])
def read_SSLRequest(self):
data = self.read_socket()
msglen, sslcode = struct.unpack("!ii", data)
assert msglen == 8
assert sslcode == 80877103
def read_StartupMessage(self):
data = self.read_socket()
msglen, protoversion = struct.unpack("!ii", data[0:8])
print "msglen: {}, protoversion: {}".format(msglen, protoversion)
assert msglen == len(data)
parameters_string = data[8:]
print parameters_string.split('\x00')
def send_AuthenticationOK(self):
self.send_to_socket(struct.pack("!cii", 'R', 8, 0))
def send_AuthenticationClearText(self):
self.send_to_socket(struct.pack("!cii", 'R', 8, 3))
if __name__ == "__main__":
server = SocketServer.TCPServer(("localhost", 9876), Handler)
try:
server.serve_forever()
except:
server.shutdown()

Every programming language – including Ruby – supplies packages which implement interfaces to various SQL servers.
Start here: Ruby database access.

Related

Is ES native SQL parsed as ES DSL (json query)? How to get the parsing time?

I want to know whether Elasticsearch native SQL will be parsed as Elasticsearch DSL (json query)? And how to get the parsing time??
I found a link about that.
An Introduction to Elasticsearch SQL with Practical Examples - Part 1
You can find a picture about Elasticsearch SQL implementation consists of 4 execution phases at Implementation Internals.
I suppose the answer is "YES". But I can not find a way to get the "parsed time" which ES native SQL will be parsed as ES DSL (json query). I think "the way" for a long time, then I write following codes.
My local environment:
- macOS Mojave Version 10.14.2
- MacBook Pro(Retina, 13-inch, Early 2015)
- Processor 2.7 GHz Intel Core i5
- Memory 8 GB 1867 MHz DDR3
- Elasticsearch 6.5.4
- python 2
Requirements:
- pip install elasticsearch
- pip install requests
"""
SQL Access (X-Pack)
https://www.elastic.co/guide/en/elasticsearch/reference/6.5/xpack-sql.html
"""
import random
import requests
import datetime
import elasticsearch
URL_PREFIX = 'http://localhost:9200'
def get_url(url):
return URL_PREFIX + url
def translate_sql(sql):
return requests.post(url=get_url('/_xpack/sql/translate/?pretty'),
json={'query': sql}).content
def search_with_query(query):
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_search?pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def search_with_sql(sql):
# https://www.elastic.co/guide/en/elasticsearch/reference/6.6/sql-rest.html
query = '{"query":"' + sql.replace('\n', '') + '"}'
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_xpack/sql?format=txt&pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def gen_data(number):
es = elasticsearch.Elasticsearch()
for i in range(number):
es.index(index='question_index', doc_type='question_type', body={
'a': random.randint(0, 10),
'b': random.randint(0, 10),
'c': random.randint(0, 10),
'd': random.randint(0, 10),
'e': random.randint(0, 10),
})
if i % 10000 == 0:
print i
if __name__ == '__main__':
# gen_data(100000)
sql = '''
select max(a) as max_a
from question_index
group by a
having max_a > 1
order by a limit 1
'''
json_query = translate_sql(sql)
sql_cost_time = search_with_sql(sql)
query_cost_time = search_with_query(json_query)
print 'sql :', sql_cost_time
print 'query:', query_cost_time
if query_cost_time < sql_cost_time:
print 'parsing time is:', sql_cost_time - query_cost_time, ' ???'
else:
print 'parsing time is :', query_cost_time - sql_cost_time, ' ???'
Actual results:
I don't know whether the code is right?
Expected：
I expect the code is right.
Your answer may save my hair!
Thanks

Python twisted, reactor.callLater() not defering

The following code snippet is from a python poker server. The program works except when trying to delay the start of a tourney when a reactor.callLater is used.
The variable "wait" gets its integer from an xml file which has a setting of "60". However the delay is never implemented and the tourney always starts immediately. I am not very familiar with python or twisted just trying to hack this into working for me. One thing however from my perspective is it seems that it shouldn't work given that I can't see how or where the variable "old_state" gets its value in order for the code to properly determine the states of the server. But perhaps I am mistaken.
I hope that someone familiar with python and twisted can see what the problem might be and be willing to enlighten me on this issue.
elif old_state == TOURNAMENT_STATE_REGISTERING and new_state == TOURNAMENT_STATE_RUNNING:
self.databaseEvent(event = PacketPokerMonitorEvent.TOURNEY_START, param1 = tourney.serial)
reactor.callLater(0.01, self.tourneyBroadcastStart, tourney.serial)
# Only obey extra_wait_tourney_start if we had been registering and are now running,
# since we only want this behavior before the first deal.
wait = int(self.delays.get('extra_wait_tourney_start', 0))
if wait > 0:
reactor.callLater(wait, self.tourneyDeal, tourney)
else:
self.tourneyDeal(tourney)
For reference I have placed the larger portion of the code that is relative to the problem.
def spawnTourneyInCore(self, tourney_map, tourney_serial, schedule_serial, currency_serial, prize_currency):
tourney_map['start_time'] = int(tourney_map['start_time'])
if tourney_map['sit_n_go'] == 'y':
tourney_map['register_time'] = int(seconds()) - 1
else:
tourney_map['register_time'] = int(tourney_map.get('register_time', 0))
tourney = PokerTournament(dirs = self.dirs, **tourney_map)
tourney.serial = tourney_serial
tourney.verbose = self.verbose
tourney.schedule_serial = schedule_serial
tourney.currency_serial = currency_serial
tourney.prize_currency = prize_currency
tourney.bailor_serial = tourney_map['bailor_serial']
tourney.player_timeout = int(tourney_map['player_timeout'])
tourney.via_satellite = int(tourney_map['via_satellite'])
tourney.satellite_of = int(tourney_map['satellite_of'])
tourney.satellite_of, reason = self.tourneySatelliteLookup(tourney)
tourney.satellite_player_count = int(tourney_map['satellite_player_count'])
tourney.satellite_registrations = []
tourney.callback_new_state = self.tourneyNewState
tourney.callback_create_game = self.tourneyCreateTable
tourney.callback_game_filled = self.tourneyGameFilled
tourney.callback_destroy_game = self.tourneyDestroyGame
tourney.callback_move_player = self.tourneyMovePlayer
tourney.callback_remove_player = self.tourneyRemovePlayer
tourney.callback_cancel = self.tourneyCancel
if not self.schedule2tourneys.has_key(schedule_serial):
self.schedule2tourneys[schedule_serial] = []
self.schedule2tourneys[schedule_serial].append(tourney)
self.tourneys[tourney.serial] = tourney
return tourney
def deleteTourney(self, tourney):
if self.verbose > 2:
self.message("deleteTourney: %d" % tourney.serial)
self.schedule2tourneys[tourney.schedule_serial].remove(tourney)
if len(self.schedule2tourneys[tourney.schedule_serial]) <= 0:
del self.schedule2tourneys[tourney.schedule_serial]
del self.tourneys[tourney.serial]
def tourneyResumeAndDeal(self, tourney):
self.tourneyBreakResume(tourney)
self.tourneyDeal(tourney)
def tourneyNewState(self, tourney, old_state, new_state):
cursor = self.db.cursor()
updates = [ "state = '" + new_state + "'" ]
if old_state != TOURNAMENT_STATE_BREAK and new_state == TOURNAMENT_STATE_RUNNING:
updates.append("start_time = %d" % tourney.start_time)
sql = "update tourneys set " + ", ".join(updates) + " where serial = " + str(tourney.serial)
if self.verbose > 2:
self.message("tourneyNewState: " + sql)
cursor.execute(sql)
if cursor.rowcount != 1:
self.error("modified %d rows (expected 1): %s " % ( cursor.rowcount, sql ))
cursor.close()
if new_state == TOURNAMENT_STATE_BREAK:
# When we are entering BREAK state for the first time, which
# should only occur here in the state change operation, we
# send the PacketPokerTableTourneyBreakBegin. Note that this
# code is here and not in tourneyBreakCheck() because that
# function is called over and over again, until the break
# finishes. Note that tourneyBreakCheck() also sends a
# PacketPokerGameMessage() with the time remaining, too.
secsLeft = tourney.remainingBreakSeconds()
if secsLeft == None:
# eek, should I really be digging down into tourney's
# member variables in this next assignment?
secsLeft = tourney.breaks_duration
resumeTime = seconds() + secsLeft
for gameId in map(lambda game: game.id, tourney.games):
table = self.getTable(gameId)
table.broadcast(PacketPokerTableTourneyBreakBegin(game_id = gameId, resume_time = resumeTime))
self.tourneyBreakCheck(tourney)
elif old_state == TOURNAMENT_STATE_BREAK and new_state == TOURNAMENT_STATE_RUNNING:
wait = int(self.delays.get('extra_wait_tourney_break', 0))
if wait > 0:
reactor.callLater(wait, self.tourneyResumeAndDeal, tourney)
else:
self.tourneyResumeAndDeal(tourney)
elif old_state == TOURNAMENT_STATE_REGISTERING and new_state == TOURNAMENT_STATE_RUNNING:
self.databaseEvent(event = PacketPokerMonitorEvent.TOURNEY_START, param1 = tourney.serial)
reactor.callLater(0.01, self.tourneyBroadcastStart, tourney.serial)
# Only obey extra_wait_tourney_start if we had been registering and are now running,
# since we only want this behavior before the first deal.
wait = int(self.delays.get('extra_wait_tourney_start', 0))
if wait > 0:
reactor.callLater(wait, self.tourneyDeal, tourney)
else:
self.tourneyDeal(tourney)
elif new_state == TOURNAMENT_STATE_RUNNING:
self.tourneyDeal(tourney)
elif new_state == TOURNAMENT_STATE_BREAK_WAIT:
self.tourneyBreakWait(tourney)

I have discovered that this code has several imported files that were in another directory that I did not examine. I also made a false assumption of the purpose of this code block. I expected the function to be arbitrary and delay each tourney by n seconds but in practice it implements the delay only when a player forgets about the game and does not show up for it. These facts were made clear once I examined the proper files. Lesson learned. Look at all the imports!

I just want to load 5GB from MySql into BigQuery

Long time no see. I'd want to get 5GB of data from MySql into BigQuery. My best bet seems to be some sort of CSV export / import. Which doesn't work for various reasons, see:
agile-coral-830:splitpapers1501200518aa150120052659
agile-coral-830:splitpapers1501200545aa150120055302
agile-coral-830:splitpapers1501200556aa150120060231
This is likely because I don't have the right MySql incantation able to generate perfect CSV in accordance with RFC 4180. However, instead of arguing RFC 4180 minutia, this whole load business could be solved in five minutes by supporting customizable multi-character field separators and multi-character line separators. I'm pretty sure my data doesn't contain either ### nor ###, so the following would work like a charm:
mysql> select * from $TABLE_NAME
into outfile '$DATA.csv'
fields terminated by '###'
enclosed by ''
lines terminated by '###'
$ bq load --nosync -F '###' -E '###' $TABLE_NAME $DATA.csv $SCHEMA.json
Edit: Fields contain '\n', '\r', ',' and '"'. They also contain NULLs, which MySql represents as [escape]N, in the example "N. Sample row:
"10.1.1.1.1483","5","9074080","Candidate high myopia loci on chromosomes 18p and 12q do not play a major role in susceptibility to common myopia","Results
There was no strong evidence of linkage of common myopia to these candidate regions: all two-point and multipoint heterogeneity LOD scores were < 1.0 and non-parametric linkage p-values were > 0.01. However, one Amish family showed slight evidence of linkage (LOD>1.0) on 12q; another 3 Amish families each gave LOD >1.0 on 18p; and 3 Jewish families each gave LOD >1.0 on 12q.
Conclusions
Significant evidence of linkage (LOD> 3) of myopia was not found on chromosome 18p or 12q loci in these families. These results suggest that these loci do not play a major role in the causation of common myopia in our families studied.","2004","BMC MEDICAL GENETICS","JOURNAL","N,"5","20","","","","0","1","USER","2007-11-19 05:00:00","rep1","PDFLib TET","0","2009-05-24 20:33:12"

I found loading through a CSV very difficult. More restrictions and complications. I have been messing around this morning with moving data from MySQL to BigQuery.
Bellow is a Python script that will build the table decorator and stream the data directly into the BigQuery table.
My db is in the Cloud so you may need to change the connection string. Fill in the missing values for your particular situation then call it by:
SQLToBQBatch(tableName, limit)
I put the limit in to test with. For my final test I sent 999999999 for the limit and everything worked fine.
I would recommend using a backend module to run this over 5g.
Use "RowToJSON" to clean up and invalid characters (ie anything non utf8).
I haven't tested on 5gb but it was able to do 50k rows in about 20 seconds. The same load in CSV was over 2 minutes.
I wrote this to test things, so please excuse the bad codding practices and mini hacks. It works so feel free to clean it up for any production level work.
import MySQLdb
import logging
from apiclient.discovery import build
from oauth2client.appengine import AppAssertionCredentials
import httplib2
OAUTH_SCOPE = 'https://www.googleapis.com/auth/bigquery'
PROJECT_ID =
DATASET_ID =
TABLE_ID =
SQL_DATABASE_NAME =
SQL_DATABASE_DB =
SQL_USER =
SQL_PASS =
def Connect():
return MySQLdb.connect(unix_socket='/cloudsql/' + SQL_DATABASE_NAME, db=SQL_DATABASE_DB, user=SQL_USER, passwd=SQL_PASS)
def RowToJSON(cursor, row, fields):
newData = {}
for i, value in enumerate(row):
try:
if fields[i]["type"] == bqTypeDict["int"]:
value = int(value)
else:
value = float(value)
except:
if value is not None:
value = value.replace("\x92", "'") \
.replace("\x96", "'") \
.replace("\x93", '"') \
.replace("\x94", '"') \
.replace("\x97", '-') \
.replace("\xe9", 'e') \
.replace("\x91", "'") \
.replace("\x85", "...") \
.replace("\xb4", "'") \
.replace('"', '""')
newData[cursor.description[i][0]] = value
return newData
def GetBuilder():
return build('bigquery', 'v2',http = AppAssertionCredentials(scope=OAUTH_SCOPE).authorize(httplib2.Http()))
bqTypeDict = { 'int' : 'INTEGER',
'varchar' : 'STRING',
'double' : 'FLOAT',
'tinyint' : 'INTEGER',
'decimal' : 'FLOAT',
'text' : 'STRING',
'smallint' : 'INTEGER',
'char' : 'STRING',
'bigint' : 'INTEGER',
'float' : 'FLOAT',
'longtext' : 'STRING'
}
def BuildFeilds(table):
conn = Connect()
cursor = conn.cursor()
cursor.execute("DESCRIBE %s;" % table)
tableDecorator = cursor.fetchall()
fields = []
for col in tableDecorator:
field = {}
field["name"] = col[0]
colType = col[1].split("(")[0]
if colType not in bqTypeDict:
logging.warning("Unknown type detected, using string: %s", str(col[1]))
field["type"] = bqTypeDict.get(colType, "STRING")
if col[2] == "YES":
field["mode"] = "NULLABLE"
fields.append(field)
return fields
def SQLToBQBatch(table, limit=3000):
logging.info("****************************************************")
logging.info("Starting SQLToBQBatch. Got: Table: %s, Limit: %i" % (table, limit))
bqDest = GetBuilder()
fields = BuildFeilds(table)
try:
responce = bqDest.datasets().insert(projectId=PROJECT_ID, body={'datasetReference' :
{'datasetId' : DATASET_ID} }).execute()
logging.info("Added Dataset")
logging.info(responce)
except Exception, e:
logging.info(e)
if ("Already Exists: " in str(e)):
logging.info("Dataset already exists")
else:
logging.error("Error creating dataset: " + str(e), "Error")
try:
responce = bqDest.tables().insert(projectId=PROJECT_ID, datasetId=DATASET_ID, body={'tableReference' : {'projectId' : PROJECT_ID,
'datasetId' : DATASET_ID,
'tableId' : TABLE_ID},
'schema' : {'fields' : fields}}
).execute()
logging.info("Added Table")
logging.info(responce)
except Exception, e:
logging.info(e)
if ("Already Exists: " in str(e)):
logging.info("Table already exists")
else:
logging.error("Error creating table: " + str(e), "Error")
conn = Connect()
cursor = conn.cursor()
logging.info("Starting load loop")
count = -1
cur_pos = 0
total = 0
batch_size = 1000
while count != 0 and cur_pos < limit:
count = 0
if batch_size + cur_pos > limit:
batch_size = limit - cur_pos
sqlCommand = "SELECT * FROM %s LIMIT %i, %i" % (table, cur_pos, batch_size)
logging.info("Running: %s", sqlCommand)
cursor.execute(sqlCommand)
data = []
for _, row in enumerate(cursor.fetchall()):
data.append({"json": RowToJSON(cursor, row, fields)})
count += 1
logging.info("Read complete")
if count != 0:
logging.info("Sending request")
insertResponse = bqDest.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body={"rows":data}).execute()
cur_pos += batch_size
total += count
logging.info("Done %i, Total: %i, Response: %s", count, total, insertResponse)
if "insertErrors" in insertResponse:
logging.error("Error inserting data index: %i", insertResponse["insertErrors"]["index"])
for error in insertResponse["insertErrors"]["errors"]:
logging.error(error)
else:
logging.info("No more rows")

• Generate google service account key
o IAM & Admin > Service account > create_Service_account
o Once created then create key , download and save It to the project folder on local machine – google_key.json
• Run the code in pycharm environment after installing the packages.
NOTE : The table data in mysql remains intact. Also , if one uses preview in BQ to see that you won’t see. Go to console and fire the query.
o CODE
o import MySQLdb
from google.cloud import bigquery
import mysql.connector
import logging
import os
from MySQLdb.converters import conversions
import click
import MySQLdb.cursors
from google.cloud.exceptions import ServiceUnavailable
import sys
bqTypeDict = {'int': 'INTEGER',
'varchar': 'STRING',
'double': 'FLOAT',
'tinyint': 'INTEGER',
'decimal': 'FLOAT',
'text': 'STRING',
'smallint': 'INTEGER',
'char': 'STRING',
'bigint': 'INTEGER',
'float': 'FLOAT',
'longtext': 'STRING',
'datetime': 'TIMESTAMP'
}
def conv_date_to_timestamp(str_date):
import time
import datetime
date_time = MySQLdb.times.DateTime_or_None(str_date)
unix_timestamp = (date_time - datetime.datetime(1970, 1, 1)).total_seconds()
return unix_timestamp
def Connect(host, database, user, password):
return mysql.connector.connect(host='',
port='',
database='recommendation_spark',
user='',
password='')
def BuildSchema(host, database, user, password, table):
logging.debug('build schema for table %s in database %s' % (table, database))
conn = Connect(host, database, user, password)
cursor = conn.cursor()
cursor.execute("DESCRIBE %s;" % table)
tableDecorator = cursor.fetchall()
schema = []
for col in tableDecorator:
colType = col[1].split("(")[0]
if colType not in bqTypeDict:
logging.warning("Unknown type detected, using string: %s", str(col[1]))
field_mode = "NULLABLE" if col[2] == "YES" else "REQUIRED"
field = bigquery.SchemaField(col[0], bqTypeDict.get(colType, "STRING"), mode=field_mode)
schema.append(field)
return tuple(schema)
def bq_load(table, data, max_retries=5):
logging.info("Sending request")
uploaded_successfully = False
num_tries = 0
while not uploaded_successfully and num_tries < max_retries:
try:
insertResponse = table.insert_data(data)
for row in insertResponse:
if 'errors' in row:
logging.error('not able to upload data: %s', row['errors'])
uploaded_successfully = True
except ServiceUnavailable as e:
num_tries += 1
logging.error('insert failed with exception trying again retry %d', num_tries)
except Exception as e:
num_tries += 1
logging.error('not able to upload data: %s', str(e))
#click.command()
#click.option('-h', '--host', default='tempus-qa.hashmapinc.com', help='MySQL hostname')
#click.option('-d', '--database', required=True, help='MySQL database')
#click.option('-u', '--user', default='root', help='MySQL user')
#click.option('-p', '--password', default='docker', help='MySQL password')
#click.option('-t', '--table', required=True, help='MySQL table')
#click.option('-i', '--projectid', required=True, help='Google BigQuery Project ID')
#click.option('-n', '--dataset', required=True, help='Google BigQuery Dataset name')
#click.option('-l', '--limit', default=0, help='max num of rows to load')
#click.option('-s', '--batch_size', default=1000, help='max num of rows to load')
#click.option('-k', '--key', default='key.json',help='Location of google service account key (relative to current working dir)')
#click.option('-v', '--verbose', default=0, count=True, help='verbose')
def SQLToBQBatch(host, database, user, password, table, projectid, dataset, limit, batch_size, key, verbose):
# set to max verbose level
verbose = verbose if verbose < 3 else 3
loglevel = logging.ERROR - (10 * verbose)
logging.basicConfig(level=loglevel)
logging.info("Starting SQLToBQBatch. Got: Table: %s, Limit: %i", table, limit)
## set env key to authenticate application
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = os.path.join(os.getcwd(), key)
print('file found')
# Instantiates a client
bigquery_client = bigquery.Client()
print('Project id created')
try:
bq_dataset = bigquery_client.dataset(dataset)
bq_dataset.create()
logging.info("Added Dataset")
except Exception as e:
if ("Already Exists: " in str(e)):
logging.info("Dataset already exists")
else:
logging.error("Error creating dataset: %s Error", str(e))
bq_table = bq_dataset.table(table)
bq_table.schema = BuildSchema(host, database, user, password, table)
print('Creating schema using build schema')
bq_table.create()
logging.info("Added Table %s", table)
conn = Connect(host, database, user, password)
cursor = conn.cursor()
logging.info("Starting load loop")
cursor.execute("SELECT * FROM %s" % (table))
cur_batch = []
count = 0
for row in cursor:
count += 1
if limit != 0 and count >= limit:
logging.info("limit of %d rows reached", limit)
break
cur_batch.append(row)
if count % batch_size == 0 and count != 0:
bq_load(bq_table, cur_batch)
cur_batch = []
logging.info("processed %i rows", count)
# send last elements
bq_load(bq_table, cur_batch)
logging.info("Finished (%i total)", count)
print("table created")
if __name__ == '__main__':
# run the command
SQLToBQBatch()
o Command to run the file : python mysql_to_bq.py -d 'recommendation_spark' -t temp_market_store -i inductive-cocoa-250507 -n practice123 -k key.json

PyQt exclusive OR in sql query

How can I make that if my first search shows results it doesn't do the second part of the query, but stops and displays results? I tried something like this, but it just gives me blank window and it's pretty chaotic:
def test_update(self):
projectModel = QSqlQueryModel()
projectModel.setQuery("""SELECT * FROM pacijent WHERE prezime = '%s' OR (prezime, 3) = metaphone('%s', 3) OR LEVENSHTEIN(LOWER(prezime), '%s') < 3 AND NOT (prezime = '%s' AND (prezime, 3) = metaphone('%s', 3) AND LEVENSHTEIN(LOWER(prezime), '%s') < 3)""" % (str(self.lineEdit.text()), str(self.lineEdit.text()), str(self.lineEdit.text()), str(self.lineEdit.text()), str(self.lineEdit.text()), str(self.lineEdit.text())))
global projectView
projectView = QtGui.QTableView()
projectView.setModel(projectModel)
projectView.show()
So, if it finds the exact value of attribute "prezime" it should display it, but if it doesn't it should call for more advance saerch tactics, such as metaphone and levenshtein.
EDIT:
I got it working like this:
ef search_data(self):
myQSqlQueryModel = QSqlQueryModel()
query = QSqlQueryModel()
global myQTableView
myQTableView = QtGui.QTableView()
querySuccess = False
for queryCommand in [""" SELECT * FROM "%s" WHERE "%s" = '%s' """ % (str(self.search_from_table_lineEdit.text()), str(self.search_where_lineEdit.text()), str(self.search_has_value_lineEdit.text()))]:
myQSqlQueryModel.setQuery(queryCommand)
if myQSqlQueryModel.rowCount() > 0:
myQTableView.setModel(myQSqlQueryModel)
myQTableView.show()
querySuccess = True
break
if not querySuccess:
query.setQuery(""" SELECT * FROM "%s" WHERE METAPHONE("%s", 3) = METAPHONE('%s', 3) OR LEVENSHTEIN("%s", '%s') < 4 """ % (str(self.search_from_table_lineEdit.text()), str(self.search_where_lineEdit.text()), str(self.search_has_value_lineEdit.text()), str(self.search_where_lineEdit.text()), str(self.search_has_value_lineEdit.text())))
global var
var = QtGui.QTableView()
var.setModel(query)
var.show()

After your query success, your can check your data in model if have any row count in this method. And your for loop to get many query;
def testUpdate (self):
myQSqlQueryModel = QtSql.QSqlQueryModel()
myQTableView = QtGui.QTableView()
querySuccess = False
for queryCommand in ["YOUR QUERY 1", "YOUR QUERY 2"]:
myQSqlQueryModel.setQuery(queryCommand)
if myQSqlQueryModel.rowCount() > 0:
myQTableView.setModel(myQSqlQueryModel)
myQTableView.show()
querySuccess = True
break
if not querySuccess:
QtGui.QMessageBox.critical(self, 'Query error', 'Not found')

How to add custom statistics in Grinder

In Grinder we'd like to add customized statistics
grinder.statistics.registerSummaryExpression("connTimeout", "userLong0")
grinder.statistics.forCurrentTest.addLong("userLong0", 1)
And it seems to be successful as we can get the customized field in Grinder out file
The problem is that the value of that statistics is always 0
Here is the complete script implemented by Jython
# -*- coding: utf-8 -*-
from net.grinder.script.Grinder import grinder
from net.grinder.script import Test
from com.netease.cloud.ndir.performance import Query
from com.netease.cloud.ndir.performance import QueryReturnCode
def writeToFile(text):
filename = "response.log"
file = open(filename, "a")
file.write(str(text) + "\n")
file.close()
ndir_client = grinder.getProperties().getProperty("ndirClient")
query_file = grinder.getProperties().getProperty("queryFile")
request = Query("grinder.properties", query_file)
grinder.statistics.registerSummaryExpression("connTimeout", "userLong0")
grinder.statistics.registerSummaryExpression("readTimeout", "userLong1")
grinder.statistics.registerSummaryExpression("code!=200", "userLong2")
grinder.statistics.registerSummaryExpression("docs=[]", "userLong3")
grinder.statistics.registerSummaryExpression("unknown", "userLong4")
class TestRunner:
def __init__(self):
grinder.statistics.delayReports=True
def initialSleep(self):
sleepTime = grinder.threadNumber * 20 # per thread
grinder.sleep(sleepTime, 0)
def query(self):
if ndir_client == "true":
query = request.randomQueryByNdirClient
else:
query = request.randomQueryByHttpGet
try:
result = query()
except:
writeToFile("exception")
grinder.statistics.forCurrentTest.addLong("userLong4", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(False)
return
if result == 0:
grinder.getStatistics().getForCurrentTest().setSuccess(True)
return
elif result == 1:
grinder.statistics.forCurrentTest.addLong("userLong0", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(False)
return
elif result == 2:
grinder.statistics.forCurrentTest.addLong("userLong1", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(False)
return
elif result == 3:
grinder.statistics.forCurrentTest.addLong("userLong2", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(False)
return
elif result == 4:
grinder.statistics.forCurrentTest.addLong("userLong3", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(True)
return
else:
grinder.statistics.forCurrentTest.addLong("userLong4", 1)
grinder.getStatistics().getForCurrentTest().setSuccess(False)
return
request = Test(120, 'query').wrap(query)
def __call__(self):
if grinder.runNumber == 0:
self.initialSleep()
self.request(self)

I suspect the problem is that you are marking tests as failed, but expecting the statistics to appear in the summary. Only successful tests are accumulated into the summary statistics.
Try registering data log expressions as well
grinder.statistics.registerDataLogExpression("connTimeout", "userLong0")
grinder.statistics.registerDataLogExpression("readTimeout", "userLong1")
grinder.statistics.registerDataLogExpression("code!=200", "userLong2")
grinder.statistics.registerDataLogExpression("docs=[]", "userLong3")
grinder.statistics.registerDataLogExpression("unknown", "userLong4")
Then you'll at least see the values in the data log file of the worker process.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ruby - Implement an SQL server? - sql

Every programming language – including Ruby – supplies packages which implement interfaces to various SQL servers. Start here: Ruby database access.

Related

Is ES native SQL parsed as ES DSL (json query)? How to get the parsing time?

Python twisted, reactor.callLater() not defering

I just want to load 5GB from MySql into BigQuery

PyQt exclusive OR in sql query

How to add custom statistics in Grinder

Categories

Resources