What's the limit of google transliteration? - api

I've used google transliteration API experimentally. It's working fine and I've noticed that it allows only five words at a time. Is there any method to send more words? and is there any daily limit? If I have 100 words, I will have to send a set of five and then join them?

100k characters per day for ver 2.
The developer console allows you to apply for higher limits (may cost money depending on your needs?) https://code.google.com/apis/console/
Looks like ther is a method for making more than jut individual words transliteratable: https://developers.google.com/transliterate/v1/getting_started#makeTransliteratable

Related

What is the most performant way to structure my multi-language SQLite database?

I am buiding a mobile app to search and manage a collection of playing cards (approximately 10,000). Each card has characteristics which are shared among all languages and four language-specific strings: title, effect1, effect2 and character. Only a limited amount of cards have non-NULL effect2 and character (less than 10%). Some cards may be available in some languages but not in others.
I am using SQLite to store the card database locally and use sqflite to access it (my app is made with Flutter, but I don't think it makes any difference for my question).
My initial idea was to store all the language-agnostic information in a table cards and all the language-specific info in a table card_texts with one row per pair (card_id,language).
However, dealing with the response that this request produces is annoying because it results in multiple rows per card (one per language), and I would have to reduce this manually in client code.
The reason why I don't only make one table per language and call it a day is that because as some cards exist in some languages and not in some others, I want to be able to fall back to another language if needed. For instance, I might list the cards in French, but if a card was not translated to French get it in English, and if it was not translated to English then get it in Japanese.
My second idea was to have all languages in a single row per card id in card_texts so that the result of the JOIN produces only one row per card. To do this I put 33 columns in it (card_id + 4 TEXT columns per language). This was more convenient because I could get all the languages for each card and move the "falling back" logic to client code, but I'm not sure that it is very efficient to have a table that large when most of the time I need only one language.
My third idea is to create one table per language, each one having 5 columns (card_id + 4 TEXT columns). However I either have to JOIN all these or perform multiple requests if I want to fall back to another language.
Here are the actions I want to be able to perform as efficiently as possible (especially because this is going to run on a mobile phone):
Get one card's info + text in my default language + fall back to other languages if the card was not translated, in a given order (e.g. try French, then English, then Japanese)
Do the same thing for multiple cards (one request to get all cards matching a certain criteria)
Be able to search cards in all languages given some string
Ideally, be able to join this with other tables which might have multiple rows for one card (and so I would like to avoid having my number of rows per card multiplied as much as possible)
Given my performance and size constraints (I want this to run on low-end mobile phones, with the database possibly stored on a slow SD card), is it better to have a higher number of columns, to perform more JOINs or to have to perform multiple requests? Is there another solution I haven't thought of?

Octaplanner example for Capicated Vehicle Routing with Time Window?

I am new to OctaPlanner.
I want to build a solution where I will nave number of locations to deliver items from one single location and also I want to use openmap distance data for calculating the distance.
Initially I used jsprit, but for more than 300 deliveries, it takes more than 8 minutes with 20 threads. Thats why I am trying to use Octa planner.
I want to map 1000 deliveries within 1 minute.
Does any one know any reference code or reference material which I can start using?
Thanks in advance :)
CVRPTW is a standard example, just open the examples app, vehicle routing and then import one of the belgium datasets with timewindows. The code is in the zip too.
To scale to 1k deliveries and especially beyond, you'll want to use "Nearby selection" (see reference manual), which isn't on by default but which makes a huge difference.

Listing BigQuery Tables in `huge/big` Datasets - 30K-40K+ tables

The task is to programmatically list all the tables within the given dataset with more than 30-40K tables
Initial option we explored was using tables.list API (as we do all the times for normal datasets with reasonable number of tables in them)
Looks like this API returns max 1000 entries (even if we try to set maxResults to bigger value)
To take next 1000 we need to “wait” for response of previous request then extract pageToken and repeat call and so on
For the datasets with 30K – 40K+ this can take up to 10-15 and more sec (under good weather)
So the timing is a problem for us that we want to address!
In above mentioned calls we are getting back only nextPageToken and tables/tableReference/tableId so size of response is extremely small!
Question:
Is there way to somehow increase maxResults, so to get all tables in one (or very few) call(s) (assuming it will be much faster than doing 30-40 calls)?
The workaround we tried so far is to use __TABLES_SUMMARY__ with jobs.insert or jobs.query API.
This way – the whole result is returned within the seconds – but in our particular case – using BigQuery jobs API is not an option for multiple reasons. We want to be able to use list API

Getting all Twitter Follows (ids) with Groovy?

I was reading an article here and it looks like he is grabbing the IDs by the 100s. I thought it was possible to grab by 5000 each time?
The reason I'm asking is because sometimes there are profiles with much larger amounts of followers and you wouldn't have enough actions to do it all in one hour if one was to grab it by 100 each time.
So is it possible to grab 5000 ids each time, if so, how would I do this?
GET statuses/followers as shown in that article has been deprecated, but did used to return batches of 100
If you're trying to get follower ids, you would use GET followers/ids. This does return batches of up to 5000, and should just require you to change the URL slightly (see example URL at the bottom of the documentation page)

Suggestions/Opinions for implementing a fast and efficient way to search a list of items in a very large dataset

Please comment and critique the approach.
Scenario: I have a large dataset(200 million entries) in a flat file. Data is of the form - a 10 digit phone number followed by 5-6 binary fields.
Every week I will be getting a Delta files which will only contain changes to the data.
Problem : Given a list of items i need to figure out whether each item(which will be the 10 digit number) is present in the dataset.
The approach I have planned :
Will parse the dataset and put it a DB(To be done at the start of the
week) like MySQL or Postgres. The reason i want to have RDBMS in the
first step is I want to have full time series data.
Then generate some kind of Key Value store out of this database with
the latest valid data which supports operation to find out whether
each item is present in the dataset or not(Thinking some kind of a
NOSQL db, like Redis here optimised for search. Should have
persistence and be distributed). This datastructure will be read-only.
Query this key value store to find out whether each item is present
(if possible match a list of values all at once instead of matching
one item at a time). Want this to be blazing fast. Will be using this functionality as the back-end to a REST API
Sidenote: Language of my preference is Python.
A few considerations for the fast lookup:
If you want to check a set of numbers at a time, you could use the Redis SINTER which performs set intersection.
You might benefit from using a grid structure by distributing number ranges over some hash function such as the first digit of the phone number (there are probably better ones, you have to experiment), this would e.g. reduce the size per node, when using an optimal hash, to near 20 million entries when using 10 nodes.
If you expect duplicate requests, which is quite likely, you could cache the last n requested phone numbers in a smaller set and query that one first.