How to Use AI to analyse test execution output (SQL output) to design a regression suite? - sql

We currently run SQL reports to extract test execution output so that we can review how successful a test has been and then make an educated guess of which tests to add to our regression suites.
However this is time consuming as it requires someone to go through all the data and make certain assumptions.
I've been tasked with looking into the possibility of using artificial intelligence to sift through the data instead and would like to know if anyone has tried this and how they implemented.

I'm not sure if this will do, but you can use out-of-the-box Python's scikit-learn
It is as simple as:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
import pandas as pd
####DATA PREP##
data = pd.read_csv('filepath')
#Forgot the target xD
# target = pd.read_csv('target_data_filepath')
target = data.target #If target is in data
other_data = pd.read_csv('filepath_other')
###MAKE MODEL##
tfidf_vect = TfidfVectorizer()
mpl_class = MLPClassifier()
pipe = Pipeline([('Tfidf Vectorizer', tfidf_vect),('MLP Classifier', mlp_class)]
pipe.fit(data, target) #remove target from data beforehand if applies
####PREDICT###
pipe.predict(other_data)
data is your text in separate entries, whole output per a single record
target is what you found beforehand, wether it should be included somewhere or not
other_data is what you want to test
But beware that above is just a mockup and I don't guarantee that I had all method names correct. For reading just follow scikit-learn's doku, quite expensive but extensive books like Building Machine Learning Systems with Python on Packt and lots of lots of other free blogs like this machinelearningmastery.com

Related

How to convert multiple LCI ecospold files to a custom excel format/ how to use parse_file from pyecospold/ how to read ecospold into brightway

I have multiple ecospold (version 1) files with LCI data that I want to convert to a custom excel format. I need all data given in the ecospold file. For my own convinience I want to use python to complete this task.
My research until now has lead me to the following conclusions:
There exist at least two converters (by GLAD and openLCA) to convert ecospold formats (1 and 2) to e.g. the ILCD. But those formats are not helping me to go anywhere, since I need to have all the data accessible in python and in order to then write it into my custom excel format.
To get the data in python, the package pyecospold (https://github.com/sami-m-g/pyecospold) seems to be a suitable choice.
According to the README that can be found at the pyecospold github repository,
ecoSpold = parse_file("data/v1/v1_1.xml") # Replace with your own XML file
should do the job. So I implemented the following lines:
import os
from pyecospold import parse_file, save_file, Defaults
from lxml import etree
cd = os.getcwd()
path_input = cd + r'\inputs\ecospold_test.xml'
# Parse the required XML file to EcoSpold class.
es = parse_file('inputs/ecospold_test.xml')
Now I run into the error:
TypeError: parse_file() missing 2 required positional arguments: 'schema_path' and 'ecospold_lookup'
I understood that a schema in xsd format is needed, therefore I got the schema files from the github and amended my last line of code:
es = parse_file('inputs/ecospold_test.xml', 'inputs/schemas/v1/EcoSpold01Dataset.xsd')
Now there is still one argument missing:
TypeError: parse_file() missing 1 required positional argument: 'ecospold_lookup'
Since I have no experience in parsing xml files in python, I have no idea what to do with this. Additionally, I am confused why the README does not say anything about those additionally needed arguments.
My second idea was to use brightway to get the data into python. But since brightway itself is quite an extensive package, I could not find a simple (or any) way to do this. (Sadly, the notebooks linked in the answer of this question Import Ecoinvent 2.2 Ecospold files into Brightway do not exist anymore)
Another option would of course be to write my own parser. But because I am lacking experience and pyecospold does exactly this (at least in my understanding), I would like to avoid this option.
Additionally, there in openLCA it is possible to read in ecospold files and then export them to an excel format. From this excel format I could of course make my custom excel format. The problem here is that I have no idea how to automize this, because I do not want to read in and export each file individually and manually in openLCA.
If anyone has an idea on how to solve one of my subproblems or a good alternative on how to solve my general problem, I would be very thankful. :)

How do I launch a SmartSim orchestrator without models?

I'm trying to prototype using the SmartRedis Python client to interact with the SmartSim Orchestrator. Is it possible to launch the orchestrator without any other models in the experiment? If so, what would be the best way to do so?
It is entirely possible to do that. A SmartSim Experiment can contain different types of 'entities' including Models, Ensembles (i.e. groups of Models), and Orchestrator (i.e. the Redis-backed database). None of these entities, however, are 'required' to be in the Experiment.
Here's a short script that creates an experiment which includes only a database.
from SmartSim import Experiment
NUM_DB_NODES = 3
exp = Experiment("Database Only")
db = exp.create_database(db_nodes=NUM_DB_NODES)
exp.generate(db)
exp.start(db)
After this, the Orchestrator (with the number of shards specified by NUM_DB_NODES) will have been spunup. You can then connect the Python client using the following line:
client = smartredis.Client(db.get_address()[0],NUM_DB_NODES>1)

How to use tensorflow grappler?

I'm trying to optimize my tensorflow model serving performance by applying grappler, I'm working on a C++ tensorflow-serving service.
AFAIK, I should do the grappler stuff after LoadSavedModel. But I'm not sure what exactly should I do, should I write the op optimization myself or I just call the API?
I've Google searched for quite a while and didn't see problem-solving post or code snippets.
Could you give me any advice or code example for this?
I've found an answer by searching the tensorflow code base.
tensorflow::grappler::GrapplerItem item;
item.fetch = std::vector<std::string>{output_node_};
item.graph = bundle_.meta_graph_def.graph_def();
tensorflow::RewriterConfig rw_cfg;
rw_cfg.add_optimizers("constfold");
rw_cfg.add_optimizers("layout");
auto new_graph_def = bundle_.meta_graph_def.mutable_graph_def();
tensorflow::grappler::MetaOptimizer meta_opt(nullptr, rw_cfg);
meta_opt.Optimize(nullptr, item, new_graph_def);
By adding the code lines above, I got my GraphDef-Serialized-Filesize reduce from 20MB to 6MB, so surely it did the pruning. But I found the session.Run() cost more time than before.
============
update:
The usage above is incorrect. The default setting optimizes graph with grappler, and runs when load saved models. You could learn the right usage by review the LoadSavedModel related codes.

rstan() should not run in #'#example?

In package development, each example requires <5s. However, the pair of stan_model() and rstan::sampling() take long times more than 5s as follows:
Examples with CPU or elapsed time > 5s
user system elapsed
fit 1.25 0.11 32.47
So I put \donttest{} for each rstan::sampling() in roxygen comments #'#examples
In examples#'#examples, we should not run sampling() or is there any treatment ?
I had tried to create my package based on the code rstan_package_skeleton(path = 'BayesianAAA') when I was taught from you (Thank you !!) but, I do not understand many things about it.
Previously, rstan_package_skeleton(path = 'BayesianAAA') launched the errors in my computer ( but now the error does not occur).
So, I made my package without the rstan_package_skeleton(), say BayesianAAA, and in my original making, I put the Model_A.stan,Model_B.stan,Model_C.stan,.... in the inst/extdata and I refer my stan files as follows;
scr <- system.file("extdata", "Model_A.stan", package="BayesianAAA")
scr <- rstan::stan_model(scr)
I have many questions about the code rstan_package_skeleton(path = 'BayesianAAA').
1) The first question is How to include my existing stan files and how to refer my .stan files for the rstan::stan_model() ?
According to the page following page, it said that
If we had existing .stan files to include with the package we could use the optional stan_files argument to rstan_package_skeleton to include them.
So, I think I should execute, I am not sure but the following like manner is required;
`rstan_package_skeleton(path = 'BayesianAAA', stan_files = "Model_A.stan" )`.
But I do not know how to write the code for several stan files, say Model_A.stan,Model_B.stan,Model_C.stan in my existing package made without the rstan_package_skeleton(). I do not understand , but the following code is correct ? Since I do not where the files described in the variable stan_files is reflected in the new project created by the rstan_package_skeleton().
`rstan_package_skeleton(path = 'BayesianAAA', stan_files = c("Model_A.stan",`Model_B.stan`,`Model_C.stan` )`.
Here, the another question arise, that is,
2) The second question: Where I execute the code rstan_package_skeleton(path = 'BayesianAAA', stan_files = "Model_A.stan" ) ? I execute it form the R studio console in my existing package project. Is it correct ? And then, the new project arise and it is contained the old existing project. What should I do ?
https://cran.r-project.org/web/packages/rstantools/vignettes/minimal-rstan-package.html
3) I do not quite know about the packages "rstanarm" , but I try to imitate it for my package, but I can not fined any .stan file in it, I am wrong ?
I am sorry for my poor English, and Lack of study about these things.
I would be grateful if you could tell me.
You generally should not be writing a package that calls stan_model at runtime, unless like brms or tmbstan you are generating a Stan program at runtime as opposed to writing it statically. There are dozens of packages on CRAN that provide compiled Stan programs basically by following the build process developed for rstanarm, which is facilitated by the rstantools::rstan_package.skeleton function, the step-by-step guide, and the developer guidelines which directly address your question
CRAN policy permits long installation times but imposes restrictions on the time consumed by examples and unit tests that are much shorter than the time that it takes to compile even a simple Stan program. Thus, it is only possible to adequately test your package if it has pre-compiled Stan programs.
Even then, it can be difficult to sample from a posterior distribution (adequately) in five seconds, so you often have to use small datasets, one chain, a small number of iterations, etc.
It is best to pass the names of your Stan programs (which should end in a .stan extension, not use a period otherwise, and only have ASCII letters, numbers, and the underscore in their names) to rstantools::rstan_package_skeleton(). If doing so from RStudio, I would call it while not in an existing project. Then
During installation, all Stan programs will be compiled and saved in the list stanmodels that can then be used by R function in the package. The rule is that the Stan program compiled from the model code in src/stan_files/foo.stan is stored as list element stanmodels$foo.
There are dozens of R packages that have Stan programs in their src/stan_files directory (although the locations of the Stan programs are going to move to inst/stan for the next rstantools release) that for the most part just followed the vignettes and did not have to do any additional steps except write more R functions.

Psychopy and pylink example

I'm working on integrating an experiment in psychopy with the eyelink eyetracking system. The way to do this seems to be through pylink. Unfortunately I'm really unfamiliar with pylink and I was hoping there was a sample of an experiment that combines the two. I haven't been able to find one. If anyone would be able to share an example or point me towards a more accessible manual than the pylink api that sr-research provides I'd be really grateful.
Thanks!
I am glad you found your solution. I have not used iohub, but we do use psychopy and an eyelink and therefore some of the following code may be of use to others who wish to invoke more direct communication. Note that our computers use Archlinux. If none of the following makes any sense to you, don't worry about it, but maybe it will help others who are stumbling along the same path we are.
Communication between experimental machine and eye tracker machine
First, you have to establish communication with the eyelink. If your experimental machine is turned on and plugged into a live Eyelink computer then on linux you have to first set your ethernet card up, and then set the default address that Eyelink uses (this also works for the Eyelink 1000 - they kept the same address). Note your ethernet will probably have a different name than enp4s0. Try simply with ip link and look for something similar. NB: these commands are being typed into a terminal.
#To set up connection with Eyelink II computer:
#ip link set enp4s0 up
#ip addr add 100.1.1.2/24 dev enp4s0
Eyetracker functions
We have found it convenient to write some functions for talking to the Eyelink computer. For example:
Initialize Eyetracker
sp refers to the tuple of screenx, screeny sizes.
def eyeTrkInit (sp):
el = pl.EyeLink()
el.sendCommand("screen_pixel_coords = 0 0 %d %d" %sp)
el.sendMessage("DISPLAY_COORDS 0 0 %d %d" %sp)
el.sendCommand("select_parser_configuration 0")
el.sendCommand("scene_camera_gazemap = NO")
el.sendCommand("pupil_size_diameter = %s"%("YES"))
return(el)
NB: the pl function comes from import pylink as pl. Also, note that there is another python library called pylink that you can find on line. It is probably not the one you want. Go through the Eyelink forum and get pylink from there. It is old, but it still works.
Calibrate Eyetracker
el is the name of the eyetracker object initialized above. sp screen size, and cd is color depth, e.g. 32.
def eyeTrkCalib (el,sp,cd):
pl.openGraphics(sp,cd)
pl.setCalibrationColors((255,255,255),(0,0,0))
pl.setTargetSize(int(sp[0]/70), int(sp[1]/300))
pl.setCalibrationSounds("","","")
pl.setDriftCorrectSounds("","off","off")
el.doTrackerSetup()
pl.closeGraphics()
#el.setOfflineMode()
Open datafile
You can talk to the eye tracker and do things like opening a file
def eyeTrkOpenEDF (dfn,el):
el.openDataFile(dfn + '.EDF')
Drift correction
Or drift correct
def driftCor(el,sp,cd):
blockLabel=psychopy.visual.TextStim(expWin,text="Press the space bar to begin drift correction",pos=[0,0], color="white", bold=True,alignHoriz="center",height=0.5)
notdone=True
while notdone:
blockLabel.draw()
expWin.flip()
if keyState[key.SPACE] == True:
eyeTrkCalib(el,sp,cd)
expWin.winHandle.activate()
keyState[key.SPACE] = False
notdone=False
Sending and getting messages.
There are a number of built-in variables you can set, or you can add your own. Here is an example of sending a message from your python program to the eyelink
eyelink.sendMessage("TRIALID "+str(trialnum))
eyelink.startRecording(1,1,1,1)
eyelink.sendMessage("FIX1")
tFix1On=expClock.getTime()
Gaze contingent programming
Here is a portion of some code that uses the eyelink's most recent sample in the logic of the experimental program.
while notdone:
if recalib==True:
dict['recalib']=True
eyelink.sendMessage("RECALIB END")
eyelink.startRecording(1,1,1,1)
recalib=False
eventType=eyelink.getNextData()
if eventType==pl.STARTFIX or eventType==pl.FIXUPDATE or eventType==pl.ENDFIX:
sample=eyelink.getNewestSample()
if sample != None:
if sample.isRightSample():
gazePos = sample.getRightEye().getGaze()
if sample.isLeftSample():
gazePos = sample.getLeftEye().getGaze()
gazePosCorFix = [gazePos[0]-scrx/2,-(gazePos[1]-scry/2)]
posPix = posToPix(fixation)
eucDistFix = sqrt((gazePosCorFix[0]-posPix[0])**2+(gazePosCorFix[1]-posPix[1])**2)
if eucDistFix < tolFix:
core.wait(timeFix1)
notdone=False
eyelink.resetData()
break
Happy Hacking.
rather than PyLink, you might want to look into using the ioHub system within PsychoPy. This is a more general-purpose eye tracking system that also allows for saving data in a common format (integrated with PsychoPy events), and provides tools for data analysis and visualisation.
ioHUb is built to be agnostic to the particular eye tracker you are using. You just need to create a configuration file specific to your EyeLink system, and thereafter use the generic functions ioHiv provides for calibration, accessing gaze data in real-time, and so on.
There are some teaching resources accessible here: http://www.psychopy.org/resources/ECEM_Python_materials.zip
For future readers, I wanted to share my library for combining pylink and psychopy. I've recently updated it to work with python 3. It provides simple to use, high level functions.
https://github.com/colinquirk/templateexperiments/tree/master/eyelinker
You could also work at a lower level with the PsychoPyCustomDisplay class (see the pylink docs for more info about EyeLinkCustomDisplay).
For an example of it in use, see:
https://github.com/colinquirk/ChangeDetectionEyeTracking
(At the time of writing, this experiment code is not yet python 3 ready, but it should still be a useful example.)
The repo also includes other modules for creating experiments and recording EEG data, but they are not necessary if you are just interested in the eyelinker code.