How to separate writes from reads to minimize effect of heavy read queries? [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have write-heavy tables in my database. There is a need to run read-only queries by someone else. I have no idea about complexity and volume of their queries but I do know when they start doing it, writes become superslow. So separating writes from reads seems the way to go.
Is the replication an answer? What else may I try?

As anything related to performance "It depends".
In general you are overlooking because general speaking the isolation level ill take care of that kind of problem for you. You can hit the books to see how it works. In general it's not wise to meddling with it IF you don't know exactly what you are doing.
IF You ends to handle issues about it you can:
1) Replicate (but you need to delve in details about it).
Advantge is simplicity, disvantages: waste of servers disk and cpu.
2) Create stag tables.
This is s simple solution and suitable when you get lot of heavy writes on heavy read tables. Example. You got a webservice where users sometimes uploads large csv files and those data are persited on stag tables. That simple no indexed tables acts a buffer (or queue) to the raw data. Later in a "window of opportunity" that data is inserted in the real tables. It takes a disvantage of the uploaded data is not readly to be queried. Advantages are it easy to handle bad formated data and let only sanitized data go on your DB. Also very easy to implement You can create a SQL Service to to it after or before dayly full backup for example.
3) Fine tune isolation level query by query: Advantages are if you really know what to do the system ill shine disvantages are: hard to do the right tweeks, prone to let your system down in a hell of deadlocks, ghost & dirty reads and lost data. Also demands a lot of time to implement and maintain in the right way (you must keep an eye on that tunned queries to be sure).
EDIT about the WITH(NOLOCK) comment: Serious guys? it's deprecated since SQL 2000! It's the silver bullet for the Lazy and don't work well. Consider the scenario where you make a dirty read, processed some data and persisted more data related to that dirty one. Now a rollback undo the dirty one you now got a orphan row or worse data integrity hell. Don't use it anymore unless you still working with SQL Server 7. Study isloation level to know how bad and useless NOLOCK become (in the last 15 years!)

For me the correct answer is replication, you can have a snapshot replication, have a different set of index in your insert database and another in your read. One focus on fast inserts and other in fast search.

Related

NoSql, Sql or Flatfile [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've just started playing around with Node.js and Socket.io and I'm planning on building a little multi-player game. Probably something simple like each player has a character that they can run around in an arena and try and kill each other.
However I'm unsure how best to store the data. I can imagine there would be some loose relationships between objects such as a character and its weapon but that I would likely load these into the system by id as and when they are required and save them back out when I no longer need them.
In these terms would it be simpler to write 'objects' out to file instead of getting a database involved. Use a NoSql document database or just stick to good old Sql server?
My advice would be to start with NoSQL.
Flatfile is difficult because you'll want to read and write this data very, very often. One file per player is not a terrible place to start - and might be OK for the very first prototype - but you're going to be writing a huge amount. File systems are not good at this. The one benefit at prototype stage is you can debug quick - just cat out the current state of a user. Using a .json file, or similar .yaml format, will start you on your way very rapidly (and you can convert to the NoSQL approach as the prototype starts coming together).
SQL isn't a terrible approach. If you're familiar with this, you'll end up building a real schema, and creating a variety of tables and joining the user data against them quite a bit. This can be a benefit for helping you think through your game, but I think you'll end up spending a lot of time trying to figure out how to normalize your data and writing joins. Since it seems you're unfamiliar with the problem (thus are asking the question), you're likely to do this wrong (and get in the way of gaming awesomeness) and/or just spend too much time at it.
NoSQL - using a document store model - is much like just reading an writing a user object. You'll end up re-writing your user object every time - but this kind of access (key-value, accessed by the user id) is hyper efficient. You'll probably get into a prototype really, really quickly, and to the important aspect of building out your play mechanism. Key-value access is highly scalable in the long run.
If you want to store Player information, use sql. However if you're having a connection based system. As in something where you only need to store information while the player is connected and after the connection is lost you don't need to "save"; then just store it in Memory.
Otherwise, I would say that you should stick with Sql. Databases are optimized, quick, tried, tested and true. You can't go wrong with a Sql database.

How to deal with stupid designed databases? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
So we just started doing web application for company X. Application have to calculate a lot of information like workers done job, how long he worked, how long device worked, device speed, device quality, parts quality, up-time, downtime, running time, waste and etc... etc... The problem is database is stupidly designed, no IDs(I joining it on multiple columns, but it's so slow), a lot of calculations inside view tables, (i am going to dream nightmares about this) database have a lot of and I mean a lot of tables with millions of records. So my question is how to approach this situation? Try to get the grip of database and try to do my job, even if it takes half a year to make everything work? Or maybe they should hire some database designer and change whole system...(but i guess they will not going to even if i ask to). Is there a software to fast get grip of database I could use? They using Microsoft Server SQL 2012.
P.S. Don't judge my English writing skills, i don't compile it very often.
EDIT:
1. There is no integrity between some tables, so i have to work my way around. And server always busy and crashes from time to time. Sometimes it takes 20min to get 1000 row from view table. 2. Some expensive query executed every time i query something.
EDIT:
There is a lot of data repeated in different tables.
EDIT:
Is there way to make database more efficient?
Let's walk through each point here:
no IDs(I joining it on multiple columns, but it's so slow)
Do you actually mean you have no referential integrity between tables and there are no columns that would form a primary key? If that is what you mean than yes I agree a non-normalized table is quite bad. However, if there is referential integrity (which I would presume there is, this is not an issue). You proceed to say it is slow, define slow. If it takes 10 seconds to query over 2 trillion records, I would hardly call that slow. If however, if takes 10 seconds to query over 5 rows, than yes that is slow.
a lot of calculations inside view tables
Now is this a materialized view? Meaning that the calculation is only executed once and the table is built off of that expensive query? Or do you mean some expensive query is executed every time that it is targeted? In the latter case that is bad, in the former that is correct.
database have a lot of and I mean a lot of tables with millions of
records
And your point is? Millions of records in 2013 are not that many. Further, if you are melting down over millions of records, it may be time to hang it up. There will only be more data, barring some insane magnetic storm that destroys all technology as we know it.
So my question is how to approach this situation?
Learn set theory and relational design.
You need to understand that changing the database is not trivial. What you need to do is understand this database structure well. Chances are you are not happy with it because you don't know it well. If you get to understand it, you can design views and canned queries for common every day tasks. Once you are comfortable with the database, you could then begin to make a list of what is wrong with the current design and what the business needs are. May be then you could draft a version 1.0 ERD and estimate the cost of building the new system based on business needs and your expertise in the current system.
Actually, contrary to popular belief, missing artificial keys do not automatically make a database "stupidly designed".
So yes, you should try to get the grip of database and try to do your job. Even if it takes you half a year to make everything work, it will probably still be cheaper than adapting the application that generates the data.
Whether your system can be improved by modifying the database can only be determined with an analysis by an expert. It is out of scope for this site.
Make sure that the BD structure is really as bad as you think. Perhaps there is some logic to the design you have missed? Better to check, it will save you time in the long run.
Also, is the database normalised? If there is a lot of data repeated in various tables, then it's not. If there is some attempt to normalise the database (minimising data duplication), then there is some intelligence in the design. Otherwise, you might be right.

SQL & Postgres Interview Concepts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Introduction:
So, I have an interview tomorrow and I'm trying to review SQL and databases. The job posting says that they want someone with:
Experience with database design and development
Strong knowledge of SQL
Experience with SQL Server and/or Postgres
I've read through Questions every good database SQL developer should be able to answer, and a bunch of questions tagged with SQL and interview-questions. So I realize that I need to know about SELECT, JOIN and WHERE.
Questions:
What are essential SQL, Postgres and database concepts that I need to know in order to do well in the interview?
What do I need to know about transaction and normalization?
What are some general ways to optimize slow queries?
Should I learn about the functions, keywords or both?
It depends on how much of the role is based around database development and design. For your SQL syntax, you should also understand the difference between the types of joins, and be able to use GROUP BY, ORDER BY, HAVING as well as the aggregate functions that can be used in conjunction with them.
In terms of performance monitoring, I would be looking at execeution plans (not sure about the Postgres equivalent) and how they can provide tips on increasing performance, as well as using SQL Profiler to see what instructions the server is executing in real time.
Transactions can be useful for rolling back, well, transactions (stored procs, ad-hoc queries etc.) that require queries to complete in a certain way to maintain data consistency. Some people (myself included) have a practice of placing any statements that make any changes to data into a transaction that automatically rolls back (BEGIN TRAN ... ROLLBACK TRAN) to check that the correct amount of data is manipulated before pushing changes to a live server. Have a look at the ACID model - Atomicity, Consistency, Isolation, Durability.
Normalization is something that can take a little time to go through, but just know and partially understand up to 3rd form normalization and that will get you started.
Optimisation can be a huge topic. Just remember to try and do things like UPDATE using set based queries, rather than row based (updating in a WHILE loop is an example of row based updating, but it CAN have its uses).
I hope this helps a little.
Besides the basics of sql syntax, which you listed, you should know some things about query performance. What are some common causes of slow queries and what are the remedies for those, and how can you evaluate the performance of a query.

Distributed Database Solution? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Hey. I am going to be setting up a a database which could get really really huge.
I've been using standard mysql for most of my stuff but this particular problem will get up to the TBs and I will want to be able to do hundreds of queries a second.
So aside from designing my database schema such that its not going to chug, and fast harddrive speeds what is my biggest bottleneck and what sort of solution is recommended for this.
Does it make sense to spread the database over multiple computers on my intranet so it can scale with CPU/Ram etc and if so is there software for this or database solutions for this?
Thanks for any help!
I did a search for questions to related to this and couldn't find anything so sorry if it has already been asked.
Database scalability is a VERY complicated issue; there are a LOT of issues that come into the whole process.
First, consider the lowest-hanging fruit; do you have individual tables (or columns) that are going to be containing the bulk of your data? Columns which will contain BLOBs which are > 4MB each? Those can be extracted from the database and stored on a flat-file storage system, and merely referred to from the database; right there, that can take many unwieldy solutions down to a manageable level.
If not, do you have deeply different usage patterns for different subgroupings of tables? If so, there's an opportunity right there for segmenting your database into different functional databases which can be partitioned onto different servers. A good example of this is read-mostly data, such as on webservers, which gets generated rarely (think user-specific home page data), but read frequently; that type of data can get segregated into a database (or, again, flatfile with references) that's separate from the rest of the user data).
Consider the transactional requirements of your database; can you isolate your transaction boundaries cleanly, or will there be deeply mingled transactions going on all through your database? If you can isolate your transaction boundaries, there's another potential useful boundary.
This is just touching on some of the issues involved with this sort of thing. One thing worth considering is whether or not you really need to have a database that is actually going to be huge, or if you're just trying to use the database as a persistence layer. If you're using the database just as a persistence layer, you might reconsider whether you actually need the relational nature of a database at all, or if you can get away with a smaller relational overlay on top of a simpler persistence layer. (I say this because a large quantity of solutions seem like they could get away with a thin relational layer over a large persistence layer; it's worth considering.)
Ok, first I need to point you to here. I don't think MySQL is going to perform like you want. I have a bad feeling that when I say you need to look into an Oracle instalation, you're going to say, "We don't have the cash for that." But, when I say get the latest/greatest SQL-Server, you're going to say, "We don't have the hardware it'll take to implement that." I'm afraid that terabytes is just flat out going to crush your MySQL instalation.
A new breed of NewSQL databases are being built to solve exactly the problem of distributing resources over multiple servers. The Clustrix database (which was built from the ground up to be a MySQL replacement) is one example that provides near-linear scale -- as you run out of CPU/Memory, you can simple add nodes.
Database scalability is a tough problem and you should consider solutions that can address it for you. I believe that MySQL can be used as the foundation for a solution to your problem.
Horizontal scalability; the ability to scale a database horizontally (aka scale-out) is a good technique to address the problem of very large tables and databases.

How to write a simple database engine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:
How the data is stored internally (i.e. how tables are represented, etc.)
How the engine finds data that it needs (e.g. run a SELECT query)
How data is inserted in a way that is fast and efficient
And any other topics that may be relevant to this. It doesn't have to be an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.
Many thanks for your help.
If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.
SQLite 2.5.0 for Code Reading
http://sqlite.org/
The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;)
but we can think of the problems one by one:
How to store the data internally:
you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM
assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column
like "Adam", 1, 11.1, "123 ABC Street POBox 456"
you can have it like
<&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>
How to find items quickly
try using hashing and indexing to point at data stored and cached based on different criteria
taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on
How to speed insert data
I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that.
so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done
good luck, great project.
There are books on the topic a good place to start would be Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom
SQLite was mentioned before, but I want to add some thing.
I personally learned a lot by studying SQlite. The interesting thing is, that I did not go to the source code (though I just had a short look). I learned much by reading the technical material and specially looking at the internal commands it generates. It has an own stack based interpreter inside and you can read the P-Code it generates internally just by using explain. Thus you can see how various constructs are translated to the low-level engine (that is surprisingly simple -- but that is also the secret of its stability and efficiency).
I would suggest focusing on www.sqlite.org
It's recent, small (source code 1MB), open source (so you can figure it out for yourself)...
Books have been written about how it is implemented:
http://www.sqlite.org/books.html
It runs on a variety of operating systems for both desktop computers and mobile phones so experimenting is easy and learning about it will be useful right now and in the future.
It even has a decent community here: https://stackoverflow.com/questions/tagged/sqlite
Okay, I have found a site which has some information on SQL and implementation - it is a bit hard to link to the page which lists all the tutorials, so I will link them one by one:
http://c2.com/cgi/wiki?CategoryPattern
http://c2.com/cgi/wiki?SliceResultVertically
http://c2.com/cgi/wiki?SqlMyopia
http://c2.com/cgi/wiki?SqlPattern
http://c2.com/cgi/wiki?StructuredQueryLanguage
http://c2.com/cgi/wiki?TemplateTables
http://c2.com/cgi/wiki?ThinkSqlAsConstraintSatisfaction
may be you can learn from HSQLDB. I think they offers small and simple database for learning. you can look at the codes since it is open source.
If MySQL interests you, I would also suggest this wiki page, which has got some information about how MySQL works. Also, you might want to take a look at Understanding MySQL Internals.
You might also consider looking at a non-SQL interface for your Database engine. Please take a look at Apache CouchDB. Its what you would call, a document oriented database system.
Good Luck!
I am not sure whether it would fit to your requirements but I had implemented a simple file oriented database with support for simple (SELECT, INSERT , UPDATE ) using perl.
What I did was I stored each table as a file on disk and entries with a well defined pattern and manipulated the data using in built linux tools like awk and sed. for improving efficiency, frequently accessed data were cached.