Creating a Price tracker system [closed] - large-data

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I was recently asked the following question in an interview.
"How would you design a system to keep track of a million items at xyz.com ?
The xyz.com could update the prices maybe 2-3 times a day or once per month, so no guarentee on frequency.
Your system should show accurate prices for >95% of items at any given point of time and aim for 99%.
Also scale for 1billion items etc..
"
I asnwered along the lines of creating a distributed system app that would categorize items by priority (based on historical price fluctuations and 80/20 % rule etc) and do API calls more frequently for these.
But I was not allowed to use API calls.
I suggested scraping html content. (But the website can block my ip for such high load)
I basically want to know the resources that would help me anwering these type of questions. Prefer full length courses (Distributed systems ?) or books rather than quick-fix blogs.

Related

How does gmail query from 900 million records? with rdms or no-sql? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
According to this Techcrunch news
Gmail has 900 million users. When I try to login with my username and password to gmail, It queries with the speed of light. Do they use rdms (relational) or no-sql? Is it possible with rdms?
I'm sure this isn't exactly how it's done, but one billion records at say 50 bytes per user name is only 50 gigabytes. They could keep it all in RAM in a sorted tree and just search the sorted tree.
A binary tree of that size is only thirty nodes deep, which would take microseconds to traverse, and I suspect they'd use something that branches more than a binary tree so it would be even flatter.
All in all, there's probably much more amazing things google does, this part is relatively trivial.

sql database convention [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Apologies in advance if this is a stupid question. I've more or less just started learning how to use SQL.
I'm making a website, the website stores main accounts, each having many sub-accounts associated with them. Each sub-account has a few thousand records in various tables associated with it.
My question is to do with the conventional usage of databases. Is it better to use a database per main account with everything associated with it stored in the same place, store everything in one database, or an amalgamation of both?
Some insight would be much appreciated.
Will you need to access more than one of these databases at the same time? If so put them all in one. You will not like the amount of effort and cost 'joining' them back together to do a query. On top of that, every database you have needs to be managed, and should you need to transfer data between them that can get painful as well.
Segregating data by database is a last resort.

Method to create master product database to validate entries [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 days ago.
Improve this question
We have a ruby-on-rails platform (w/ postgreSQL db) for people to upload various products to trade. Of course, many of these products listed are the same, while they are described differently by the consumer (either through spelling, case etc.) "lots of duplicates"
For the purposes of analytics and a better UX, we're aiming to create an evolving "master product list", or "whitelist", if you will, that will have users select from an existing list of products they are uploading, OR request to add a new one. We also plan to enrich each product entry with additional information from the web, that would be tied to the "master product".
Here are some methods we're proposing to solve this problem:
A) Take all the "items" listed in the website (~90,000), de-dupe as much as possible by running select "distinct" queries (while maintaining a key-map back to original data by generating an array of item keys from each distinct listing in a group-by.)
THEN
A1) Running this data through mechanical turk, and asking each turk user to list data in a uniform format.
OR
A2) Running each product entry through the Amazon products API and asking the user to identify a match.
or
A3) A better method?

How can my website appear in search engines [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have developed a website for a firm that deals in pumps, valves and diesel engines. They require that when an interested user searches with some keywords like "Pump Dealers" or "Valve Dealers", their site should appear in the results. Currently I am not aware of how I can go about this, so my question is what should I do in order for better page ranking. I am using meaningful page titles and have enough text in every page.
Any suggestion is welcome.
Firstly Pagerank is irrelevant these days, so don't worry about that.
You should ensure that you use Google's Webmaster Tools to check that Google knows about your site etc. This will tell you what things it is coming up for on Google.
Make sure that the page has the text on it you want to rank for - as you mention, titles, headers etc will help but don't over do it.
The main thing to do is to get links to your site – write interesting blog posts, contact customers etc so they link to you.
It really depends on who your competition is for those terms - if there are already 10 huge companies ranking for those terms then you are stuck.
The other way to do this is to buy Adwords – this will likely cost upwards of $5-10 a day to get any meaningful traffic though.

Downloading complete historical stock data including delisted companies? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
There are several posts on SO that point to sources for downloading historical stock quotes, but these are all for currently listed symbols. The resulting dataset suffers from survivorship bias. Is there any source of all historical data, including delisted companies, preferably for free/cheap? I've found a few sources, but they're usually hundreds of dollars or more, require installing using some Windows client software, or just are on sketchy-looking websites. (End-of-day data is fine - I'm sure asking for intraday bid/ask is too much.)
Where do these data resellers in turn get their data from? What is the original source archive of data? (Some of these datasets date back to the '50s, so I don't think the answer is "they just record it themselves.") Do they cut deals with exchanges / do the exchanges have/sell this? Does the data exist in any public records? Thanks!
Norgate Investor Services is the cheapest I've found, but it will run you hundreds of dollars ( but less than 1000 ). Their source is Standard & Poors.
QuantQuote has survivorship bias free historical stock data, but they only offer it in minute/second/tick resolution, and it costs $$. They also have free daily resolution data of the S&P500. It's too bad Yahoo doesn't keep stock data around after a stock gets delisted.