Is there a suggested format or approach in TraMineR for sequences with 300+ events in length? - sequence

I am dealing with time stamped event sequences that are 300+ events long. This data is similar to web logs, where users hit different pages of a website at different times. One sequence may be one web session and each event is a user action (visit page, click button, etc).
I first used the TSE format. When trying to find subsequences using seqefsub() TraMineR hung. I set maxk = 5 and it worked (this limits the length of subsequences to be searched for to 5 events). However, maxK 6 or higher also hangs. Not sure why this sudden drop off. Also, when I pruned the event sequences to only be 15 events in length everything completed fine. So clearly event sequence length is an issue here.
Is there a different format that is more robust to sequence length, e.g. STS? Are there any other recommendations for dealing with sequences of this length in TraMineR?

The problem has nothing to do with the format used to enter the sequences.
TraMineR has only a rudimentary algorithm for searching subsequences.
I would suggest you look at more appropriate tools for your problem. Consider for instance the R package arulesSequences.

Related

VueJS + Firestore- Autosaving votes

I have a site similar to stack overflow where users can up and down vote posts. My question is how can I efficiently save this information to firestore under the Vuejs / Firestore/ VUEX architecture? How do other sites like stackoverflow/ reddit handle this?
I don't want to save directly on click because Firestore charges by the write and it would be very easy for a user to toggle up/ down votes many times consecutively. I also have a number of cloud functions this would trigger, each of which take time to complete.
I see a few alternatives. I could set a 2 second timer interval once a vote is cast, and if it doesn't change in that time, commit it to firestore. This has the downside of not working when the user leaves the page too quickly and after testing a few bigger sites, this approach seems to be of lower quality.
Another alternative is to save on page refresh/ exit with VUEX. Although the solutions I have found for this seem to rely on using the window object, which I hear may not work well if/ when I implement SSR into my app.
Perhaps there is a built in solution for this?
A common way to protect your back end against malicious users is to add a rate limit on certain events. In your case the up/down vote events.
Your first approach
You mention comes very close to the way rate limiting works. Just instead of firing the event to firestore 2 seconds after the user clicked and nothing has changed you could also fire the event immediately and then block any events for 2 seconds. This way the user can leave the page in the 2 seconds and the up/down vote will still be persisted in your database.
Your second approach
This is actually some I've never tried but just because you already mentioned the potential problems that come along in combination with SSR and the complexity of saving it to a temporary local state made me think that this approach might not be worth the work.
exaple solution
I copied this from a github issue comment, see the jsfiddle as an examp:
There are solutions you can do for per-user rate limiting using Rules
link to fiddle http://jsfiddle.net/firebase/VBmA
We've discussed exposing this information, but at present we don't expose quota, rate limiting, or billing metrics. There are likely better ways of configuring these things than Rules (which are primarily for authN/Z, resource definition, type validation). The worry about exposing them is having a proliferation of different config files (firestore.quota, firestore.billing, firestore.rules, etc.) that confuse users unnecessarily.
issue:
https://github.com/firebase/firebase-js-sdk/issues/647#issuecomment-380303400

Call a Web Service to get search result in titanium

I have implemented a tableView with a searchBar added to it. I want to call a service when user start typing the search keyword in the search bar. I know that I can call the service in the change event listener that will call the service.
I know that for every change in the search bar it is not good to call a service. So what is the efficient approach of using search bar when the search result is coming from a service call or what we can do to make the search efficient.
For example: the search functionality on Apple's App store
I did something like this for one of my test projects. I would check in my change event that at least 3 characters were entered before I would attempt a look-up. I have no idea why I went with 3, but it seemed like a decent number of characters for filtering my data. I would also set a flag that a network request was in progress. So if they entered 3 characters, you could kick off the search if no look-up was already in progress. If there was a network request in progress, you could setup a wait interval to keep checking if the request came back and kick off an additional request when it is. I would send back short lists of items, which for me was 25 so that my table appeared fast.
Though I didn't do this, you could track the interval of time between characters typed to make sure the user is finished typing. For the best interval you will need to experiment with what is reasonable for an average user. Get some feedback from typically non-power users on this.
I can see a potential issue where you are in the middle of a look-up, but the user is still typing. You might need to track those character updates and perhaps kick off an additional search for the updated character string. You might even check the character search string you sent at the time with the current characters in the input box and choose to abandon the list of look-up items you already received and just do another search.
You might want to show the list of items you did receive just so the user knows the app is working, but immediately send another request for look-up items automatically. A user might eventually start hammering keys and think the app is unresponsive if you don't show something in the table once in a while.

How to avoid key-loggers when authenticating access

As per the title really, just what can be done to defeat key/keystroke logging when authenticating access?
I have just posted a related question (how-to-store-and-verify-digits-chosen-at-random-from-a-pin-password) asking for advice for choosing random digits from a PIN/password. What other reasonably unobtrusive methods might there be?
Any and all solutions appreciated.
One solution to defeat keyloggers is to not care if they capture what you type.
One time passwords (search: "OTP") are one solution. Smartcard authentication is another.
A hardware-based keylogger will not be fooled by any solution that requires the use of a keyboard. So, to bypass those you will need to have input through the mouse only. But software-based keyloggers can be stopped by adding a keyboard hook in your own code which captures the keys and which does not call the next hook procedure in the hook list. But keyboard hooks tend to trigger antivirus software if used incorrectly and will cause bugs if you use them in any dynamic library with the wrong parameter.And basically, a keylogger will use a keyhook to capture keystrokes. By adding your own keyhook on top of the malware keyhook, you'll disable the keylogger.However, there are keyloggers that hide deeper in the kernel so you'd soon end up with a keylogger that will bypass your security again.Don't focus too much on the danger of keyloggers, though. It's just one of the many methods that hackers use to get all kinds of account information. Worse, there's no way that you can protect your users from social engineering tricks. Basically, the easiest way for hackers to get account information is by just asking their victims for this information. Through fake sites, false applications and all kinds of other tricks they could just collect any information that you're trying to protect by blocking keyloggers. But keyloggers just aren't the biggest dangers.
One suggestion was to use pictures of cute kittens (or puppies) for the user to click on. What you could do is use a set of 10 pictures and let the user pick four of them as their "pincode". Then, whenever the user needs to enter their code, display the pictures in any random order so hackers have no use for it's location. If it's a web application, also give the pictures a random name, and just let the server know which is which. To make it even more complex, you could create 10 sets of 10 pictures, where every picture displays a single object but from a slightly different perspective, different angle or in a different color. Set 1 would be a chair, set 2 a table, set 3 a kitten, set four a puppy, etc. The user then just needs to remember: Table, kitten, chair, puppy. (Or puppy, chair, chair, table. Or kitten, puppy, puppy, puppy...)
You could have a clickable image with the letters on it. Your users will be pretty mad though...
You can allow to use only on-screen keyboard to enter password.
Or you can write module (on flash for example) for handwriting (via mouse or stillus) passwords recognition.
The only real way is a proper second factor authentication: Either something the person is: fingerprint, iris scan. Or something they have: one-time password list/generator; crypto-generator.
Assuming that only keyboard, and not mouse input is captured, you could type the password out of order moving the cursor with the mouse.
I really like the one time approach better, though.
How about a variation of standard password. For example you could have a list of words and have program leave out random letters from each word. In addition to that it would leave out one word from the list which user would have to remember and type it out.
If the words form a sentence, it would be easier or users to remember it but on the other hand creation of the sentence would be more difficult because you'd need to use words which can't be guessed from sentence's context.
Another variation of this could be to have program at random ask user to replace all letters i with 1 or a with 4 or to place say letter R after every third letter A or something similar.
Basically have a password which would be modified at random and have it instructions displayed to user how to modify the password.
Now that I think of it, I'm not sure how unobtrusive my ideas are...
The online banking portal of my bank has a nice way that I find very unobtrusive. When creating the account, you define a 6 digit PIN (additional to a normal password). After entering your password, you're asked for 2 digits of the 6 digit PIN at 2 random positions. For example, if your PIN is 654321, it'll ask your for digits 2 and 5 and you'll click on 5 and 2 (it has a numpad with digits to click on). Even if you'd enter the digits with your keyboard, it would still be kind of safe because the attacker won't know which digits you've been asked for (unless he captures the screen as well, maybe using tempest).
So, short answer: Ask only for some parts of the password/PIN, in random order. Having the user use the mouse increases security.
One more idea: If you have a PIN (numerical password), ask the user for modifications of certain digits, e.g. "2nd digit plus 3, 4th digit minus 1".

prevent bruteforcing of captcha

My site uses captcha of 6 digits, however if the attacker try all combinations, chances are he will successfully submit the form fraction of the times.(1/million in theory, much more in practice since the random number generator I use is not truely random).
Is there anyway I can further prevent him from succeeding? One way is to prevent anyone from form submission for 5 minutes after a certain number of tries(eg.20), the problem is that if I store the number of tries in session, and the attacker creates a session for every try(naturally since he uses a program, not a browser), then it would not work. And I don't want to modify existing db schema to accommodate this logic.
Another way is to increase the number of captcha character used, which causes user inconvenience.
All advises are welcome.
regenerate a new number after each attempt, or after x attempts =D
I would recommend adding letters. That will make brute force much harder, than adding more digits.
EDIT: You can also, slow done the answers after getting some incorrect attempts. Add for example, 5 min delay.
Check the IP addresses of incoming connections. If the same IP address tries too many times, rate limit them harshly and if it continues for a long time, block them completely.
Of course it's not a perfect solution, but it will make it more difficult.

How to skip known entries when syncing with Google Reader?

for writing an offline client to the Google Reader service I would like to know how to best sync with the service.
There doesn't seem to be official documentation yet and the best source I found so far is this: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
Now consider this: With the information from above I can download all unread items, I can specify how many items to download and using the atom-id I can detect duplicate entries that I already downloaded.
What's missing for me is a way to specify that I just want the updates since my last sync.
I can say give me the 10 (parameter n=10) latest (parameter r=d) entries. If I specify the parameter r=o (date ascending) then I can also specify parameter ot=[last time of sync], but only then and the ascending order doesn't make any sense when I just want to read some items versus all items.
Any idea how to solve that without downloading all items again and just rejecting duplicates? Not a very economic way of polling.
Someone proposed that I can specify that I only want the unread entries. But to make that solution work in the way that Google Reader will not offer this entries again, I would need to mark them as read. In turn that would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.
Cheers,
Mariano
To get the latest entries, use the standard from-newest-date-descending download, which will start from the latest entries. You will receive a "continuation" token in the XML result, looking something like this:
<gr:continuation>CArhxxjRmNsC</gr:continuation>`
Scan through the results, pulling out anything new to you. You should find that either all results are new, or everything up to a point is new, and all after that are already known to you.
In the latter case, you're done, but in the former you need to find the new stuff older than what you've already retrieved. Do this by using the continuation to get the results starting from just after the last result in the set you just retrieved by passing it in the GET request as the c parameter, e.g.:
http://www.google.com/reader/atom/user/-/state/com.google/reading-list?c=CArhxxjRmNsC
Continue this way until you have everything.
The n parameter, which is a count of the number of items to retrieve, works well with this, and you can change it as you go. If the frequency of checking is user-set, and thus could be very frequent or very rare, you can use an adaptive algorithm to reduce network traffic and your processing load. Initially request a small number of the latest entries, say five (add n=5 to the URL of your GET request). If all are new, in the next request,
where you use the continuation, ask for a larger number, say, 20. If those are still all new, either the feed has a lot of updates or it's been a while, so continue on in groups of 100 or whatever.
However, and correct me if I'm wrong here, you also want to know, after you've downloaded an item, whether its state changes from "unread" to "read" due to the person reading it using the Google Reader interface.
One approach to this would be:
Update the status on google of any items that have been read locally.
Check and save the unread count for the feed. (You want to do this before the next step, so that you guarantee that new items have not arrived between your download of the newest items and the time you check the read count.)
Download the latest items.
Calculate your read count, and compare that to google's. If the feed has a higher read count than you calculated, you know that something's been read on google.
If something has been read on google, start downloading read items and comparing them with your database of unread items. You'll find some items that google says are read that your database claims are unread; update these. Continue doing so until you've found a number of these items equal to the difference between your read count and google's, or until the downloads get unreasonable.
If you didn't find all of the read items, c'est la vie; record the number remaining as an "unfound unread" total which you also need to include in your next calculation of the local number you think are unread.
If the user subscribes to a lot of different blogs, it's also likely he labels them extensively, so you can do this whole thing on a per-label basis rather than for the entire feed, which should help keep the amount of data down, since you won't need to do any transfers for labels where the user didn't read anything new on google reader.
This whole scheme can be applied to other statuses, such as starred or unstarred, as well.
Now, as you say, this
...would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.
True enough. Neither keeping a local read/unread state (since you're keeping a database of all of the items anyway) nor marking items read in google (which the API supports) seems very difficult, so why doesn't this work for you?
There is one further hitch, however: the user may mark something read as unread on google. This throws a bit of a wrench into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user in general will be touching only more recent stuff, and download the latest couple hundred or so items every time, checking the status on all of them. (This isn't all that bad; downloading 100 items took me anywhere from 0.3s for 300KB, to 2.5s for 2.5MB, albeit on a very fast broadband connection.)
Again, if the user has a large number of subscriptions, he's also probably got a reasonably large number of labels, so doing this on a per-label basis will speed things up. I'd suggest, actually, that not only do you check on a per-label basis, but you also spread out the checks, checking a single label each minute rather than everything once every twenty minutes. You can also do this "big check" for status changes on older items less often than you do a "new stuff" check, perhaps once every few hours, if you want to keep bandwidth down.
This is a bit of bandwidth hog, mainly because you need to download the full article from Google merely to check the status. Unfortunately, I can't see any way around that in the API docs that we have available to us. My only real advice is to minimize the checking of status on non-new items.
The Google API hasn't yet been released, at which point this answer may change.
Currently, you would have to call the API and dis-regard items already downloaded, which as you said isn't terribly efficient as you will be re-downloading items every time, even if you already have them.