SPSS: Adding two variables using DO REPEAT - syntax-error

I am new to SPSS, the world of statistics and new to this forum. I am doing research in conjunction with my Masters Degree and running into a bit of a problem and looking for some help. Yes, I could hire a consultant, but view this part of the learning process, and would like to see if I cannot master this - with your help of course.....
I am looking to add: q1 and q51 q2 and q52 q3 and q53 etc.... through to q50 and q100
The new variable names currently are TOTAL1 to TOTAL50, but could be anything. Q1 to q100 exist and are consecutive TOTAL1 to TOTAL 50 exists and are consecutive
I have tried:
do repeat x = q1 to q50
/y = q51 to q100
/z = TOTAL1 to TOTAL50.
COMPUTE z = x + y.
end repeat .
EXECUTE.
But getting the following in the output:
Error # 4502 in column 11. Text: = An equals sign appears in a
variable or value list where it is not expected. It will be ignored.
Execution of this command stops.
Error # 4508 in column 15. Text: + Unrecognized text appears on the DO
REPEAT command. It will be ignored. EXECUTE. do repeat x = q1 to q50 /
y = q51 to q100 / z = TOTAL1 to TOTAL50 COMPUTE z = x + y end repeat .
Error # 4502 in column 11. Text: = An equals sign appears in a
variable or value list where it is not expected. It will be ignored.
Execution of this command stops.
Error # 4508 in column 15. Text: + Unrecognized text appears on the DO
REPEAT command. It will be ignored. EXECUTE.
Is this the best way of doing this? Can anyone spot a syntax error?
I am using SPSS v. 20.

Try this
do repeat x = q1 to q50
/y = q51 to q100
/z = TOTAL1 to TOTAL50.
- COMPUTE z = x + y. /*i just added a minus sign before the compute./
end repeat .
EXECUTE.

Related

I am trying to recreate this R logic within a SQL query. Any ideas on how I should go about doing so? Appreciate any assistance at all

This is the R script that I am attempting to recreate using a CASE WHEN statement in SQL:
dat[ ,X_1_7_Spline := pmax(1,pmin(ifelse(is.na(X),1,X),7))]
It seems that this command is telling the parser to return the parallel maxima of a vector containing a conditional statement as long as the value of variable X lies between 1 and the parallel minima of some value and 7 (as long as the value is not null). It then seems to join the new column containing these values back to the original dataset (dat). I am having some troubles representing the "pmax(1,pmin(ifelse(is.na(X),1,X),7))" portion of the code in my SQL query and would appreciate any ideas on how I might be able to do this effectively.
I have something very remedial right now, which I know does not express this above statement properly:
CASE WHEN MAX(IF(ISNOTNULL(X) AND MIN(X)=1 AND MAX(X)=7) then 1 else X end as X_1_7_Spline
Any thoughts/feedback would be greatly appreciated as I am still trying to understand the R script. Thanks in advance for any insight on this issue.
ifelse(is.na(X),1,X) can be translated into SQL's COALESCE(X, 1); and
pmin and pmax logic can be placed in a CASE WHEN (as you've started)
Perhaps this?
CASE WHEN X < 1 THEN 1
WHEN X > 7 THEN 7
ELSE coalesce(X, 1) END as NewX
We don't need to worry about coalesceing the X < 1 or X > 7 because null < 1 does not resolve as true, so it does not accept that case.
Demo in R using sqldf:
library(data.table)
dat <- data.table(X = c(-1,5,9,NA))
dat[, X_1_7_Spline := pmax(1,pmin(ifelse(is.na(X),1,X),7)) ]
sqldf::sqldf("select *, (CASE WHEN X < 1 THEN 1 WHEN X > 7 THEN 7 ELSE coalesce(X,1) END) as NewX from dat")
# X X_1_7_Spline NewX
# 1 -1 1 1
# 2 5 5 5
# 3 9 7 7
# 4 NA 1 1

How to update record in R - problem with sqldf

I would like to change some records in my table. I think the easest way is to use sqldf and Update. But when i using it i get warning (the table b isn't empty):
c<-sqldf("UPDATE b
SET l_all = ''
where id='12293' ")
# In result_fetch(res#ptr, n = n) :
# SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
Can you help me how to change chosen records in the easest way?
The query worked but there are several possible problems:
The message is a spurious warning, not an error, caused by backwardly incompatible changes to RSQLite. You can ignore the warning or use the sqldf2 workaround here: https://github.com/ggrothendieck/sqldf/issues/40
The SQL update command does not return anything so one would not expect the command shown in the question to return anything. To return the updated value ask for it.
1) Using the built in BOD data frame, defining sqldf2 from (1) and taking into account (2) we have:
sqldf2(c("update BOD set demand = 0 where Time = 1", "select * from BOD"))
giving:
Time demand
1 1 0.0
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
2) Another approach to do it is to use select giving the same result.
sqldf("select Time, iif(Time == 1, 0, demand) demand from BOD")

The King's March

You’re given a chess board with dimension n x n. There’s a king at the bottom right square of the board marked with s. The king needs to reach the top left square marked with e. The rest of the squares are labeled either with an integer p (marking a point) or with x marking an obstacle. Note that the king can move up, left and up-left (diagonal) only. Find the maximum points the king can collect and the number of such paths the king can take in order to do so.
Input Format
The first line of input consists of an integer t. This is the number of test cases. Each test case contains a number n which denotes the size of board. This is followed by n lines each containing n space separated tokens.
Output Format
For each case, print in a separate line the maximum points that can be collected and the number of paths available in order to ensure maximum, both values separated by a space. If e is unreachable from s, print 0 0.
Sample Input
3
3
e 2 3
2 x 2
1 2 s
3
e 1 2
1 x 1
2 1 s
3
e 1 1
x x x
1 1 s
Sample Output
7 1
4 2
0 0
Constraints
1 <= t <= 100
2 <= n <= 200
1 <= p <= 9
I think this problem could be solved using dynamic-programing. We could use dp[i,j] to calculate the best number of points you can obtain by going from the right bottom corner to the i,j position. We can calculate dp[i,j], for a valid i,j, based on dp[i+1,j], dp[i,j+1] and dp[i+1,j+1] if this are valid positions(not out of the matrix or marked as x) and adding them the points obtained in the i,j cell. You should start computing from the bottom right corner to the left top, row by row and beginning from the last column.
For the number of ways you can add a new matrix ways and use it to store the number of ways.
This is an example code to show the idea:
dp[i,j] = dp[i+1,j+1] + board[i,j]
ways[i,j] = ways[i+1,j+1]
if dp[i,j] < dp[i+1,j] + board[i,j]:
dp[i,j] = dp[i+1,j] + board[i,j]
ways[i,j] = ways[i+1,j]
elif dp[i,j] == dp[i+1,j] + board[i,j]:
ways[i,j] += ways[i+1,j]
# check for i,j+1
This assuming all positions are valid.
The final result is stored in dp[0,0] and ways[0,0].
Brief Overview:
This problem can be solved through recursive method call, starting from nn till it reaches 00 which is the king's destination.
For the detailed explanation and the solution for this problem,check it out here -> https://www.callstacker.com/detail/algorithm-1

PuLP - COIN-CBC error: How to add constraint with double inequality and relaxation?

I want to add this set of constraints:
-M(1-X_(i,j,k,n) )≤S_(i,j,k,n)-ToD_(i,j,k,n)≤M(1-X_(i,j,k,n) ) ∀i,j,k,n
Where M is a big number, S is a integer variable that takes values between 0 and 1440. ToD is a 4-dimensional matrix that takes values from an Excel sheet. X i dual variable, it takes as values 0-1.
I try to implement in code as following:
for n in range(L):
for k in range(M):
for i in range(N):
for j in range(N):
if (i != START_POINT_S & i != END_POINT_T & j != START_POINT_S & j != END_POINT_T):
prob += (-BIG_NUMBER*(1-X[i][j][k][n])) <= (S[i][j][k][n] - ToD[i][j][k][n]), ""
and another constraint as follows:
for i in range(N):
for j in range(N):
for k in range(M):
for n in range(L):
if (i != START_POINT_S & i != END_POINT_T & j != START_POINT_S & j != END_POINT_T):
prob += S[i][j][k][n] - ToD[i][j][k][n] <= BIG_NUMBER*(1-X[i][j][k][n]), ""
According to my experience, in code, those two constraints are totally equivalent to what we want. The problem is that PuLP and CBC won't accept them. The produce the following errors:
PuLP:
Traceback (most recent call last):
File "basic_JP.py", line 163, in <module>
prob.solve()
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\pulp.py", line 1643, in solve
status = solver.actualSolve(self, **kwargs)
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\solvers.py", line 1303, in actualSolve
return self.solve_CBC(lp, **kwargs)
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\solvers.py", line 1366, in solve_CBC
raise PulpSolverError("Pulp: Error while executing "+self.path)
pulp.solvers.PulpSolverError: Pulp: Error while executing C:\Users\dimri\Desktop
\Filesystem\Projects\deliverable_B4\lib\site-packages\pulp\solverdir\cbc\win\64\
cbc.exe
and CBC:
Welcome to the CBC MILP Solver
Version: 2.9.0
Build Date: Feb 12 2015
command line - C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\sit
e-packages\pulp\solverdir\cbc\win\64\cbc.exe 5284-pulp.mps branch printingOption
s all solution 5284-pulp.sol (default strategy 1)
At line 2 NAME MODEL
At line 3 ROWS
At line 2055 COLUMNS
Duplicate row C0000019 at line 10707 < X0001454 C0000019 -1.000000000000e+
00 >
Duplicate row C0002049 at line 10708 < X0001454 C0002049 -1.000000000000e+
00 >
Duplicate row C0000009 at line 10709 < X0001454 C0000009 1.000000000000e+
00 >
Duplicate row C0001005 at line 10710 < X0001454 C0001005 1.000000000000e+
00 >
At line 14153 RHS
At line 16204 BOUNDS
Bad image at line 17659 < UP BND X0001454 1.440000000000e+03 >
At line 18231 ENDATA
Problem MODEL has 2050 rows, 2025 columns and 5968 elements
Coin0008I MODEL read with 5 errors
There were 5 errors on input
** Current model not valid
Option for printingOptions changed from normal to all
** Current model not valid
No match for 5284-pulp.sol - ? for list of commands
Total time (CPU seconds): 0.02 (Wallclock seconds): 0.02
I don't know what's the problem, any help? I am new to this, if information are not enough let me know what I should add.
Alright, I have searched for hours, but right after I posted this question I found the answer. These kinds of problems are mainly because of the names of the variables or the constraints. That is what caused something to duplicate. I am really not used to that kind of software that is why it took me so long to find and answer. Anyway, the problem for me was when I was defining the variables:
# define X[i,j,k,n]
lower_bound_X = 0 # lower bound for variable X
upper_bound_X = 1 # upper bound for variable X
X = LpVariable.dicts(name="X",
indexs=(range(N), range(N), range(M), range(L)),
lowBound=lower_bound_X,
upBound=upper_bound_X,
cat=LpInteger)
and
# define S[i,j,k,n]
lower_bound_S = 0 # lower bound for variable S
upper_bound_S = 1440 # upper bound for variable S
S = LpVariable.dicts(name="X",
indexs=(range(N),
range(N), range(M), range(L)),
lowBound=lower_bound_S,
upBound=upper_bound_S,
cat=LpInteger)
As you see in the definition of S I obviously forgot to change the name of the variable to S because I copy-pasted it. Anyway, the right way to define S is like this:
# define S[i,j,k,n]
lower_bound_S = 0 # lower bound for variable S
upper_bound_S = 1440 # upper bound for variable S
S = LpVariable.dicts(name="S",
indexs=(range(N), range(N), range(M), range(L)),
lowBound=lower_bound_S,
upBound=upper_bound_S,
cat=LpInteger)
This is how I got my code running.

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks
Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example
In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa