How do i count members of a list in wxmaxima? - arraylist

In wxmaxima, for my assignment I was given a list exp.
L:[1,-2,3,4,-5,11,-12]
I need to count members of the list that are less than 0, print that data and print such members. I used:
L:[1,-2,3,4,-5,11,-12]$
n: length(L)$
for k:1 thru n step 1 do
if L[k]<0 then print(L[k]);
and I got:
-2
-5
-12
done
My question is, how do I get the number of members,which in this example would be 3, printed?

Related

is there efficient way for pandas to get tail rows with a condition

I want to get tail rows with a condition
For example:
I want to get all negative tail rows from a column 'A' like:
test = pd.DataFrame({'A':[-8, -9, -10, 1, 2, 3, 0, -1,-2,-3]})
I expect a 'method' to get new frame like:
A
0 -1
1 -2
2 -3
note that, it is not certain of how many 'negative' numbers are in the tail. So I can not run test.tail(3)
It looks like the pandas provided 'tail()' function can only run with a given number.
But my input data frame might be too large that I dont want run a simple loop to check one by one
Is there a smart way to do that?
Is this what you wanted?
test = pd.DataFrame({'A':[-8, -9, -10, 1, 2, 3, 0, -1,-2,-3]})
test = test.iloc[::-1]
test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1]
Output:
A
9 -3
8 -2
7 -1
edit, if you want to get it back into the original order:
test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1].iloc[::-1]
A
7 -1
8 -2
9 -3
Optional also .reset_index(drop=True) if you need a index starting at 0.
What's the tail for? It seems like you just need the negative numbers
test.query("A < 0")
Update: Find where sign changes, split the array and choose last one
split_points = (test.A.shift(1)<0) == (test.A<0)
np.split(test, split_points.loc[lambda x: x==False].index.tolist())[-1]
Output:
A
7 -1
8 -2
9 -3
Just share a picture of performance comparing above two given answers
Thansk Patry and Macro
I improved my above test, and did another round test, as I feel the old 'testing sample' size was too small,and afaid the %%time measurement might not accurate.
My new test uses a very big head numbers with size of 10000000 and tail with 3 negative numbers
so the new test can prove how the whole data frame size impact the over all performance.
code is like bellow:
%%time
arr = np.arange(1,10000000,1)
arr = np.concatenate((arr, [-2,-3,-4]))
test = pd.DataFrame({'A':arr})
test = test.iloc[::-1]
test.loc[test.index.max():test[test['A'].ge(0)].index[0]+1].iloc[::-1]
%%time
arr = np.arange(1,10000000,1)
arr = np.concatenate((arr, [-2,-3,-4]))
test = pd.DataFrame({'A':arr})
split_points = (test.A.shift(1)<0) == (test.A<0)
np.split(test, split_points.loc[lambda x: x==False].index.tolist())[-1]
due to system impacts, I tested 10 times, the above 2 methods are very much performs the similar. In about 50% cases Patryk's code even performs faster
Check out this image bellow

Redis get member where score is between min and max

I have a table in sql with 3 columns: BIGINT StartNumber, BIGINT EndNumber, BIGINT LocationId, and I need to be able to do something like this
Select LocationId where StartNumber < #number and EndNumber > #number.
for example:
StartNumber EndNumber LocationId
1 5 1
6 9 1
10 16 2
and when I have #number = 7 I should get LocationId = 1
How can I do this in redis?
I was thinking to move this table to redis, use sorted set and ZRANGEBYSCORE but it did't work for me:
1) When I am using ZADD key score member [score] [member], I am unable to add 2 elements with the same member and different score even with nx parameter:
zadd myset nx 1 "17" 2 "17" - it will add one element and then update its score instead of adding two elements.
2) when I am adding this: zadd set1 2 "a" 4 "b" 6 "c" 10 "d" and then trying to do zrangebyscore set1 3 3 (want to get member whose score include 3) I em getting empty result
P.s. All commands are executed on the example pages of redis website.
So as I understood the task, you don't have overlaps and each interval maps to only one location (?) and intervals don't have gaps. Based on this you can use only one sorted list with lower (or upper) bound values:
ZADD StartNumber 1 "1:5:1" 6 "6:9:1" 10 "10:16:2"
Then you can use:
ZREVRANGEBYSCORE StartNumber 7 -inf LIMIT 0 1
And it will be O(log(N)).
Put differently, your question is "how can I map N ranges of numbers to a location". One way of doing this is using two Sorted Sets, one for the StartNumber and the other one for EndNumber. Since members have to be unique, we'll also need to ensure that by using the Start/End values as part of the member. For example, with your example data, this could be done like so:
ZADD StartNumber 1 "1:5:1" 6 "6:9:1" 10 "10:16:2"
ZADD EndNumber 5 "1:5:1" 9 "6:9:1" 16 "10:16:2"
To find the location for #number=7, do ZRANGEBYSCORE StartNumber -inf 7 and ZRANGEBYSCORE EndNumber 7 +inf and intersect the results. All that remains is to split the intesect's result(s) on the colon (:) and use the 3rd element as the location.
Note: if your app ensures that there are no overlapping ranges and that there can be only one location per "number", you can get the same results with only one set.
(this is the first time that I'm giving two answers to the same question - maybe I'll get a badge or sumthin' ;))
The double Sorted Set approach is a generalization and, as such, aims to solve a bigger set of problems than what the OP needs (as put in the comments to the first answer). That approach is also not effective as the query is O(logn)+O(N) so when N is large (e.g. 5M) that's probably not a good idea.
However, to satisfy the requirements and given that the ranges do not overlap, one could actually use only a single Sorted Set and a simpler query. The set's members should be added by concatenating the EndNumber and LocationId and the their scores should be set to their respective StartNumber, so for the sake of the example:
ZADD ranges 1 "5:1" 6 "9:1" 10 "16:2"
Given #number, obtain the relevant LocationId with the following Redis Lua code (O(logn)):
-- rangelookup.lua
-- http://stackoverflow.com/questions/32185898/redis-get-member-where-score-is-between-min-and-max/32186675
-- A **non inclusive** range search on a Sorted Set with the following data:
-- score = <StartNumber>
-- member = <EndNumber>:<LocationId>
--
-- KEYS[1] - Sorted Set key name
-- ARGV[1] - the number to search
--
-- reply - the relevant id, nil if range doesn't exist
--
-- usage example: redis-cli --eval rangelookup.lua ranges , 7
local number = tonumber(ARGV[1])
local data = redis.call('ZREVRANGEBYSCORE', KEYS[1], number, '-inf', 'WITHSCORES', 'LIMIT', 0, 1)
local reply = nil
if data ~= nil and number > tonumber(data[2]) then
local to, id = data[1]:match( '(.*):(.*)' )
if tonumber(to) > number then
reply = id
end
end
return reply
Sample output:
$ redis-cli --eval rangelookup.lua ranges , 7
"1"
$ redis-cli --eval rangelookup.lua ranges , 9
(nil)
$ redis-cli --eval rangelookup.lua ranges , 99
(nil)

Sequence conversion

Could you please help me to understand this problem:
Convert the input sequence of N (1 ≤ N ≤ 20) input numbers so that
the subsequences of the same numbers are replaced with the first
numbers of the subsequences. Each input number is in the range [1, 2
000 000 000].
For example, the input sequence 1 2 2 3 1 1 1 4 4 is converted into
1 2 3 1 4.
Input: First, the number T of test cases is given. Each test case is
specified using two lines. The first one contains the number N and the
second one contains the numbers of the sequence.
Output: The converted sequence. The result for each test case should
be printed in a separate line.
For example, the input sequence 1 2 2 3 1 1 1 4 4 is converted into 1 2 3 1 4.
It looks like the idea is to remove duplicate numbers that occur adjacent to each other when creating the output.
You can do that by just keeping a state variable recording what the previous value was. When you get a new value, compare it to the state value. If it's the same, skip. If different, output it and update the state variable. Remember to initialize the state variable to a value not found in the input stream (e.g. -1 should work in this case).

How do I remove contiguous sequences of almost identical records from database

I have a SQL Server database containing real-time stock quotes.
There is a Quotes table containing what you would expect-- a sequence number, ticker symbol, time, price, bid, bid size, ask, ask size, etc.
The sequence number corresponds to a message that was received containing data for a set of ticker symbols being tracked. A new message (with a new, incrementing sequence number) is received whenever anything changes for any of the symbols being tracked. The message contains data for all symbols (even for those where nothing changed).
When the data was put into the database, a record was inserted for every symbol in each message, even for symbols where nothing changed since the prior message. So a lot of records contain redundant information (only the sequence number changed) and I want to remove these redundant records.
This is not the same as removing all but one record from the entire database for a combination of identical columns (already answered). Rather, I want to compress each contiguous block of identical records (identical except for sequence number) into a single record. When finished, there may be duplicate records but with differing records between them.
My approach was to find contiguous ranges of records (for a ticker symbol) where everything is the same except the sequence number.
In the following sample data I simplify things by showing only Sequence, Symbol, and Price. The compound primary key would be Sequence+Symbol (each symbol appears only once in a message). I want to remove records where Price is the same as the prior record (for a given ticker symbol). For ticker X it means I want to remove the range [1, 6], and for ticker Y I want to remove the ranges [1, 2], [4, 5] and [7, 7]:
Before:
Sequence Symbol Price
0 X $10
0 Y $ 5
1 X $10
1 Y $ 5
2 X $10
2 Y $ 5
3 X $10
3 Y $ 6
4 X $10
4 Y $ 6
5 X $10
5 Y $ 6
6 X $10
6 Y $ 5
7 X $11
7 Y $ 5
After:
Sequence Symbol Price
0 X $10
0 Y $ 5
3 Y $ 6
6 Y $ 5
7 X $11
Note that (Y, $5) appears twice but with (Y, $6) between.
The following generates the ranges I need. The left outer join ensures I select the first group of records (where there is no earlier record that is different), and the BETWEEN is intended to reduce the number of records that need to be searched to find the next-earlier different record (the results are the same without the BETWEEN, but slower). I would need only to add something like "DELETE FROM Quotes WHERE Sequence BETWEEN StartOfRange AND EndOfRange".
SELECT
GroupsOfIdenticalRecords.Symbol,
MIN(GroupsOfIdenticalRecords.Sequence)+1 AS StartOfRange,
MAX(GroupsOfIdenticalRecords.Sequence) AS EndOfRange
FROM
(
SELECT
Q1.Symbol,
Q1.Sequence,
MAX(Q2.Sequence) AS ClosestEarlierDifferentRecord
FROM
Quotes AS Q1
LEFT OUTER JOIN
Quotes AS Q2
ON
Q2.Sequence BETWEEN Q1.Sequence-100 AND Q1.Sequence-1
AND Q2.Symbol=Q1.Symbol
AND Q2.Price<>Q1.Price
GROUP BY
Q1.Sequence,
Q1.Symbol
) AS GroupsOfIdenticalRecords
GROUP BY
GroupsOfIdenticalRecords.Symbol,
GroupsOfIdenticalRecords.ClosestEarlierDifferentRecord
The problem is that this is way too slow and runs out of memory (crashing SSMS- remarkably) for the 2+ million records in the database. Even if I change "-100" to "-2" it is still slow and runs out of memory. I expected the "ON" clause of the LEFT OUTER JOIN to limit the processing and memory usage (2 million iterations, processing about 100 records each, which should be tractable), but it seems like SQL Server may first be generating all combinations of the 2 instances of the table, Q1 and Q2 (about 4e12 combinations) before selecting based on the criteria specified in the ON clause.
If I run the query on a smaller subset of the data (for example, by using "(SELECT TOP 100000 FROM Quotes) AS Q1", and similar for Q2), it completes in a reasonable amount time. I was trying to figure out how to automatically run this 20 or so times using "WHERE Sequence BETWEEN 0 AND 99999", then "...BETWEEN 100000 AND 199999", etc. (actually I would use overlapping ranges such as [0,99999], [99900, 199999], etc. to remove ranges that span boundaries).
The following generates sets of ranges to split the data into 100000 record blocks ([0,99999], [100000, 199999], etc). But how do I apply the above query repeatedly (once for each range)? I keep getting stuck because you can't group these using "BETWEEN" without applying an aggregate function. So instead of selecting blocks of records, I only know how to get MIN(), MAX(), etc. (single values) which does not work with the above query (as Q1 and Q2). Is there a way to do this? Is there totally different (and better) approach to the problem?
SELECT
CONVERT(INTEGER, Sequence / 100000)*100000 AS BlockStart,
MIN(((1+CONVERT(INTEGER, Sequence / 100000))*100000)-1) AS BlockEnd
FROM
Quotes
GROUP BY
CONVERT(INTEGER, Sequence / 100000)*100000
You can do this with a nice little trick. The groups that you want can be defined as the difference between two sequences of numbers. One is assigned for each symbol in order by sequence. The other is assigned for each symbol and price. This is what is looks like for your data:
Sequence Symbol Price seq1 seq2 diff
0 X $10 1 1 0
0 Y $ 5 1 1 0
1 X $10 2 2 0
1 Y $ 5 2 2 0
2 X $10 3 3 0
2 Y $ 5 3 3 0
3 X $10 4 4 0
3 Y $ 6 4 1 3
4 X $10 5 5 0
4 Y $ 6 5 2 3
5 X $10 6 6 0
5 Y $ 6 6 3 3
6 X $10 7 7 0
6 Y $ 5 7 4 3
7 X $11 8 1 7
7 Y $ 5 8 5 3
You can stare at this and figure out that the combination of symbol, diff, and price define each group.
The following puts this into a SQL query to return the data you want:
select min(q.sequence) as sequence, symbol, price
from (select q.*,
(row_number() over (partition by symbol order by sequence) -
row_number() over (partition by symbol, price order by sequence)
) as grp
from quotes q
) q
group by symbol, grp, price;
If you want to replace the data in the original table, I would suggest that you store the results of the query in a temporary table, truncate the original table, and then re-insert the values from the temporary table.
Answering my own question. I want to add some additional comments to complement the excellent answer by Gordon Linoff.
You're right. It is a nice little trick. I had to stare at it for a while to understand how it works. Here's my thoughts for the benefit of others.
The numbering by Sequence/Symbol (seq1) always increases, whereas the numbering by Symbol/Price (seq2) only increases sometimes (within each group, only when a record for Symbol contains the group's Price). Therefore seq1 either remains in lock step with seq2 (i.e., diff remains constant, until either Symbol or Price changes), or seq1 "runs away" from seq2 (while it is busy "counting" other Prices and other Symbols-- which increases the difference between seq1 and seq2 for a given Symbol and Price). Once seq2 falls behind, it can never "catch up" to seq1, so a given value of diff is never seen again once diff moves to the next larger value (for a given Price). By taking the minimum value within each Symbol/Price group, you get the first record in each contiguous block, which is exactly what I needed.
I don't use SQL a lot, so I wasn't familiar with the OVER clause. I just took it on faith that the first clause generates seq1 and the second generates seq2. I can kind of see how it works, but that's not the interesting part.
My data contained more than just Price. It was a simple thing to add the other fields (Bid, Ask, etc.) to the second OVER clause and the final GROUP BY:
row_number() over (partition by Symbol, Price, Bid, BidSize, Ask, AskSize, Change, Volume, DayLow, DayHigh, Time order by Sequence)
group by Symbol, grp, price, Bid, BidSize, Ask, AskSize, Change, Volume, DayLow, DayHigh, Time
Also, I was able to use use >MIN(...) and <=MAX(...) to define ranges of records to delete.

Looping through variables in spss

Im looking for a way to loop through variables (eg week01 to week52) and count the number of times the value changes across the them. For example
week01 to week18 may be coded as 1
week19 to week40 may be coded as 4
and week 41 to 52 may be coded as 3
That would be 2 transistions within the data.
How could i go about writing a code that can find me this information? I'm rather new to this and some help to get me in the right direction would be very appreciated.
You can use the DO REPEAT command to loop through variable lists. Below is an example of using this command to create a before date and after date to compare, and increment a count variable whenever these two variables are different.
data list fixed / observation (A1).
begin data
1
2
3
4
5
end data.
*making random data.
vector week(52).
do repeat week = week1 to week52.
compute week = RND(RV.UNIFORM(0.5,4.4)).
end repeat.
execute.
*initialize count to zero.
compute count = 0.
do repeat week_after = week2 to week52 / week_before = week1 to week51.
if week_after <> week_before count = count + 1.
end repeat.
execute.