Chromsky normal form unit production - grammar

I have the following that I need to convert to CNF:
S -> Aux NP VP
S -> VP
VP -> Verb NP
VP -> VP PP
Verb -> book
Aux -> does
What I have so far is:
S -> X1 VP
X1 -> Aux NP
S -> Verb NP
S -> VP PP
VP -> Verb NP
VP -> VP PP
Verb -> book
Aux -> does
Is that it? What happens to Verb and Aux? My book has the following:
1. Copy all conforming rules to the new grammar unchanged.
2. Convert terminals within rules to dummy non-terminals.
3. Convert unit-productions.
4. Make all rules binary and add them to new grammar
I assume this means all rules with two non-terminals on the right stay
Aux NP is terminal so I turn it to dummy non-terminal X1 -> Aux NP
Not sure what this step is but the book has:
We
can eliminate unit productions by rewriting the right-hand side of the original rules
with the right-hand side of all the non-unit production rules that they ultimately lead
to
What I have so far seems to be in binary except for Verb and Aux.

Related

set seed for entire model training/testing

I have a code using tensorflow v1 and I'd like to migrate it toward native tensorflow 2.
The code defines random objects (using numpy.randomor random, a neural network (keras weight initialization etc) and other tensorflow's random functions. At the end, it makes predictions on a random test set and outputs loss/accuracy of the model.
For this task, I'm having the original code and a copy of it and I'm changing the code of the copy part by part. I want to make sure that the behaviour is the same so I want to set the randomness so that I can monitor if the loss/accuracy change
However, even after setting the seeds of the various random modules in my original file, launching it multiple times still give different loss/accuracy
here are my libraries :
import time
import random
import my_file as mf // file in directory scope
import numpy as np
import copy
import os
from matplotlib import pyplot as plt
import tensorflow.compat.v1 as tf
and I'm setting the seeds at the beginning like that :
tf.set_random_seed(42)
random.seed(42)
np.random.seed(42)
My module my_file uses the random library and I'm also setting the seed there
I do understand from the docs that tf.set_random_seed only sets the global seed and that each random operation in tensorflow is also using its own seed, resulting in different behaviors for consecutive calls. For example if I call the training/testing cell 3 times I get the consecutive value of losses L1 -> L2 -> L3
However, this should still result in the same behavior if I restart the environment so why isn't it the case ? If I restart the kernel and execute 3 times I will get L1' =/= L1 -> L2' =/= L2 -> L3' =/= L3
What else should I verify to make sure the behaviour is the same everytime I restart the notebook kernel ?

How can I code for an anticontrol U1 gate in qiskit? There are references for anticontrol X but no U gate

Some of them use https://github.com/Qiskit/qiskit-terra/blob/3b3536bcdb83124d49723dd205573f169c82ea9c/qiskit/circuit/add_control.py#L24 this code to implement the anticontrol X gate
U1Gate is being replaced by PhaseGate (aka p). If you still want to use u1, replace the import PhaseGate in this example by import U1Gate as PhaseGate:
from qiskit import QuantumCircuit
from qiskit.circuit.library import PhaseGate
circuit = QuantumCircuit(2)
circuit.append(PhaseGate(3.14159).control(1, ctrl_state="0"), [0, 1])
print(circuit)
q_0: ─o─────
│P(π)
q_1: ─■─────
The method control takes the amount of qubits to control on (in this example, 1) and the ctrl_state. In this case, an anticontrol (aka open control) applies phase rotation on q_1 if the q_0 == 0.

dask - CountVectorizer returns "ValueError('Cannot infer dataframe metadata with a `dask.delayed` argument')"

I have a Dask Dataframe with the following content:
X_trn y_trn
0 java repeat task every random seconds p m alre... LQ_CLOSE
1 are java optionals immutable p d like to under... HQ
2 text overlay image with darkened opacity react... HQ
3 ternary operator in swift is so picky p questi... HQ
4 hide show fab with scale animation p m using c... HQ
I am trying to use CountVectorizer from dask.ml's library. When I do pass my X_trn to fit_transform, I get the Value Error "Cannot infer dataframe metadata with a dask.delayed argument'".
vectorizer = CountVectorizer()
countMatrix = vectorizer.fit_transform(training['X_trn'])
This answer will probably come too late for the original author but may still help others. The answer is actually in the documentation I also overlooked it at first:
The Dask-ML implementation currently requires that raw_documents is a
dask.bag.Bag of documents (lists of strings).
This apparently innocense sentence is your problem. You are passing a dask.dataframe and not a dask.bag.Bag of documents
import dask.bag as db
corpus = db.from_sequence(training['X_trn'], npartitions=2)
And then, you can pass it to the vectorizer as you were doing:
X = vectorizer.fit_transform(corpus)

how to write more robust pipes in julia

It happens quite often that I'm adding a new line to a pipe that throws an error for one reason or another. Normaly this isn't a problem. I fix my code and it works. However, when using pipes, I often have to re-run all the code that created the DataFrame up to the point the pipe starts because the pipe itself changes the DataFrame in such a way it's no valid input anymore.
For example:
df = CSV.read("myfile.csv", DataFrame)
# ----
# all kinds of code working on df
# ----
#pipe df |>
rename!(_, "A ha" => :Aha) |> # This works fine the first time
select!(_, :typo, :) # throws an error for any reason
#pipe df |>
rename!(_, "A ha" => :Aha) |> # Now this throws an error
select!(_, :fixed_the_typo, :)
ArgumentError: Tried renaming :A ha to :Aha, when :A ha does not exist in the Index.
Is there a way to either make a pipeline atomic (it either all runs or nothing runs), or write my code in a way that prevents this problem?
I guess what I'm looking for is something like this:
#pipe df |>
rename(_, "A ha" => :Aha) |>
select(_, :typo, :) |>
commit!(_)
The issue is that running each of the piped commands does an in-place modification of the DataFrame. If you instead do
#pipe df |>
rename(_, "A ha" => :Aha) |>
select(_, :typo, :)
(notice I omitted the exclamation marks), then instead of modifying the DataFrame df directly in each operation it will create a new version to operate on.
For the exact behavior you asked for you could do
df = #pipe df |>
rename(_, "A ha" => :Aha) |>
select(_, :typo, :)
which assigns the result to df when it finishes.
Or, to only create a new DataFrame for the first operation, leave out the exclamation mark for the first operation, and leave it in for all the rest in the pipe:
df = #pipe df |>
rename(_, "A ha" => :Aha) |>
select!(_, :typo, :)
Now, in the first operation, a new DataFrame is created, and the same DataFrame is operated on from then on. This will give you the best possible performance while doing what you asked.

How can I make scipy.odeint to be faster?

I am currently solving an integrated system of 559 non linear differential equations.I have to fit the solutions obtained to some experimental data by varying the constants c1,c2 b and g.
I am using scipy.odeint and I would like to know if there is a way to make my program faster as it takes ages to run.
The code is this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import random as rd
from numba import jit
L=np.loadtxt('C:/Users/Pablo/Desktop/TFG/Probas/matriz_L_Pablo.txt')
I=np.loadtxt('C:/Users/Pablo/Desktop/TFG/Probas/vector_I_Pablo.txt')
k=np.diag(L)
n=len(k) #Contamos o numero de nodos
u=np.zeros(n)
for i in range (n):
u[i]=rd.random()
M=np.zeros((n,n))
derivs=np.zeros(n)
c1=100 ; c2=10000 ; b=0.01 ; g=1
#jit
def f(y,t,params):
suma=0
c1,c2,b,g=params
for i in range(n):
for j in range(n):
if i==j:
M[i,i]=(1-y[i]/b)+g*(1-y[i])+c2*I[i]*(1/n-1)
if i!=j:
M[i,j]=(1/n)*(c1*L[i,j]+c2*I[i])
out=(M[i,j]*y[j])
suma=suma+out
derivs[i]=suma
suma=0
return derivs
#Condicions iniciais
y0=u
#lista cos parametros
params=[c1,c2,b,g]
#tempos de int
tf=1
deltat=0.001
t=np.arange(0,tf,deltat)
#solucion
sol= odeint(f, y0,t, args=(params,))
(Sorry if it is not very clear it's my first time here)
You can try vectorizing your code. The function f does 2 things - first it creates the matrix M, and then does the multiplication $$My$$. The multiplication $$My$$ is easy to vectorize because all we have to do is use numpy's matmul function.
def f(y,t,params):
suma=0
c1,c2,b,g=params
for i in range(n):
for j in range(n):
if i==j:
M[i,i]=(1-y[i]/b)+g*(1-y[i])+c2*I[i]*(1/n-1)
if i!=j:
M[i,j]=(1/n)*(c1*L[i,j]+c2*I[i])
return np.matmul(M,y)
That should help with runtime a bit. But the most time consuming part is the fact that the entire matrix M is formed every time f is called, and that it is formed one element at a time.
The only parts of M that need to be modified when calling f, are the parts which depend on y. So all of the off diagonal entries in M can be filled before the ode solver is called. So if M is (569x569), instead of having to calculate all ~250000+ elements of M every time f is called, you would only have to calculate the 569 elements of M on the diagonal. The remaining 250000 entries of M don't depend on y, and are specified before calling the ode solver. Making this modification should result in a huge speedup as this seems to be the main bottleneck in your code.
Lastly, you could also vectorize how the diagonal of M is filled by using something like numpy.diag_indices.