Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 days ago.
Improve this question
We have a ruby-on-rails platform (w/ postgreSQL db) for people to upload various products to trade. Of course, many of these products listed are the same, while they are described differently by the consumer (either through spelling, case etc.) "lots of duplicates"
For the purposes of analytics and a better UX, we're aiming to create an evolving "master product list", or "whitelist", if you will, that will have users select from an existing list of products they are uploading, OR request to add a new one. We also plan to enrich each product entry with additional information from the web, that would be tied to the "master product".
Here are some methods we're proposing to solve this problem:
A) Take all the "items" listed in the website (~90,000), de-dupe as much as possible by running select "distinct" queries (while maintaining a key-map back to original data by generating an array of item keys from each distinct listing in a group-by.)
THEN
A1) Running this data through mechanical turk, and asking each turk user to list data in a uniform format.
OR
A2) Running each product entry through the Amazon products API and asking the user to identify a match.
or
A3) A better method?
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How to Create optimal database design by looking at below invoice of XYZ Ltd. stationary store, also show all possible tables including relationships.
Here's the approach you should take.
Look through the form to figure out what fields should exist.
Determine the relationships between those fields; i.e. do they relate to the same thing (e.g. the invoice, an invoice line, an item, etc) as other fields.
Figure out the relationships between those things (i.e. can an item appear on more than 1 invoice / can an item appear more than once on the same invoice / is there a 1:1 relationship between them?
For each "thing", create a table. That table should have those fields directly associated with that thing defined on it, along with any useful additional fields (e.g. primary key).
For each "thing" create required relationships between its fields and the related tables' fields.
Good luck.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am about to develop a small cms\forum. Multiple customers are going to have there own access where the customers can communicate white them.
What is best practices- to make separate SQL db to each customer's cms data or one big to contain all the customers data?
As I cannot comment, so I can only type here.
It is strange that you would like to have separate database for each customer and it seems impossible to manage multiple db for just one purpose or function. For example, how could you identify which db belong to which customer? Also, do you expect to have many resource to allocate to each customer? a db simply waste if the customer is not active.
So, I suggest you to use one db to manage all the customers data which is normal solution.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I was recently asked the following question in an interview.
"How would you design a system to keep track of a million items at xyz.com ?
The xyz.com could update the prices maybe 2-3 times a day or once per month, so no guarentee on frequency.
Your system should show accurate prices for >95% of items at any given point of time and aim for 99%.
Also scale for 1billion items etc..
"
I asnwered along the lines of creating a distributed system app that would categorize items by priority (based on historical price fluctuations and 80/20 % rule etc) and do API calls more frequently for these.
But I was not allowed to use API calls.
I suggested scraping html content. (But the website can block my ip for such high load)
I basically want to know the resources that would help me anwering these type of questions. Prefer full length courses (Distributed systems ?) or books rather than quick-fix blogs.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are designing a dimensional model for an IT support business. There are cases (some call them tickets or incidents) with different statuses (feels like an SCD type II dimension)
We also need to consider the count of cases and SLA time duration as measures.
Before going into detailed design, I reviewed Kimbal's data warehousing toolkit but couldn't find a matching business for our project. Are there any references for a dimensional model for this type of business
From your limited information it sounds like you want to model as an accumulating snapshot fact (as well as a transaction table). See Insurance claim processing pipeline in Kimbal's The Data Warehouse Toolkit.
It would only be a Type II SCD if the dimension entries were being updated, which in your described case they are not (you are updating the Fact table)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on a basic admin panel for a site of mine but have run into a problem. I haven't done much SQL so sorry if this is fairly basic knowledge.
The Site has a JS Slider which shows our sites best selling products currently (As seen here : http://www.theyard-store.co.uk/)
I created a form so I could easily update the slider, but I don't know the best way of linking it to the database.
I thought about just having a separate table for each product and a separate form then just use SQL UPDATE to do this.
Is this the best way of doing it? Is there more efficient methods?
"I thought about just having a separate table for each product "
No, seperate tables for each product is a nightmare...imagine a new product being added and you having to go through each line of code to refer to the new product table added.
My recommendation is to have a single product table...something simple like "Product_ID, Product_name, coupl other data points, etc.." as you product table. A second table (product_form?) can be created that refers to the product_ID in your product table and then store relevant data on your form information.
My preference is to never over-write data in your database either. If you include a 'status' or 'active_flag' column in this new product_form table, you can simply insert a new line and set the old line to inactive. This way you save all previous entries for the product_form record and can build a history/workflow/validation process.