Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am currently doing some kind of reporting system.the figures, tables, graphs are all based on the result of queries. somehow i find that complex queries are not easy to maintain, especially when there are a lot of filtering. this makes the query very long and not easy to understand. And also, sometimes, queries with similar filters are executed, making a lot of redundant code, e.g. when i am going to select something between '2010-03-10' and '2010-03-15' and the location is 'US', customer group is "ZZ", i need to rewrite these conditions each time i make a query in this scope. does the dbms (in my case, mysql) support any "scope/context" to make the coding more maintainable as well as the speed faster?
also, is there a industrial standard or best practice for designing such applications?
i guess what I am doing is called data mining, right?
Learn how to create views to eliminate redundant code from queries. http://dev.mysql.com/doc/refman/5.0/en/create-view.html
No, this isn't data mining, it's plain old reporting. Sometimes called "decision support". The bread and butter of information technology. Ultimately, play old reporting is the reason we write software. Someone needs information to make a decision and take action.
Data mining is a little more specialized in that the relationships aren't easily defined yet. Someone is trying to discover the relationships so they can then write a proper query to make use of the relationship they found.
You won't make a very flexible reporting tool if you are hand coding the queries. Every time a requirement changes you are up to your neck in fiddly code trying to satisfy it - that way lies madness.
Instead you should start thinking about a meta-layer above your query infrastructure and generating the sql in response to criteria expressed by the user. You could present them with a set of choices from which you could generate your queries. If you give a bit of thought to making those choices extensible you'll be well on your way down the path of the many, many BI and reporting products that already exist.
You might also want to start looking for infrastructure that does this already, such as Crystal Reports (swallowed by Business Objects, swallowed by SAP) or Eclipse's BIRT. Depending on whether you are after a programming exercise or a solution to your users' reporting problems you might just want to grab an off the shelf product which has already had tens of thousands of man years of development, such as one of those above or even Cognos (swallowed by IBM) or Hyperion (swallowed by Oracle).
Best of luck.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Suppose I have a table in a RDBMS having 26 columns, say A - Z.
With relational databases I can writes queries which invlove conditions on multiple columns. For example,
Select A, B
from table
where C > 12
and D = 'john'
and E between 3 and 6
order by F;
However, if I have the same table in a NoSQL database, all they provide is lookups based on primary keys, or some predefined GSI(taking dynamodb as example).
Although, I can issue a scan against the table in NoSQL db, but that is a lot slower as compared to a table in RDBMS even if the columns involved are not indexed.
I wanted to understand what are the reasons why NoSQL databases scale very well, but fail to provide a query language like SQL. Can someone throw some lightt on it?
You should be more specific about which database(s) you're asking about. You mention DynamoDB, but it's not clear in your question whether this is one example, or are you asking only about DynamoDB?
There are over 220 products that call themselves NoSQL, and they have different characteristics.
Some have an SQL-like language, some don't.
Some support queries to search by secondary attributes, some don't.
It's more a question of why a specific product didn't implement a SQL-like language, not a limitation of "NoSQL" as a broad category of products.
Your question is like asking "why don't non-motorcycles have a clutch?" The answer is that non-motorcycles is a broad category of vehicles, some of which actually do have a clutch, whereas some others were designed not to need a clutch.
No-SQL databases are designed on the premise that the data contained within them is schemaless. Thus, there is no pre-defined structure for the data which a database engine can easily use to determine how to execute an ad-hoc query. However, some no-sql database engines (e.g. Couchbase) do indeed offer such a capability.
The issue with database management systems in general has rarely been about storage and retrieval efficiency, but rather query plan optimization. In general, computers are not very good about dealing with issues created by poor designs. Also in general, most developers are not good at properly structuring data such that it can be queried quickly and easily by an automatically-generated query plan. Thus, most systems which rely upon automatically generated query plans tend to suffer performance issues.
In my opinion, the reason why a no-sql technology might not want to provide automatic query plan generation is that it forces the developer to give actual thought to the process of retrieving the data out, such that an efficient and effective plan might be devised in the code. Indeed, I have found that I am usually better at writing queries than the computer is. Could I restructure the data in such a way that the computer can write a good query plan the first time? Yes, but that takes more time than doing it myself to begin with.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Can someone tell me why we need to use T-SQL over SQL when creating reports from Data Warehouse?
SQL also has functions and Joins but I see all of the online tutorials use T-SQL when creating reports from DW.
Can it be done with SQL? If T-SQL is must, could you please explain why? In terms of what can T-SQL do that SQL cannot.
Some useful tutorial links for T-SQL and creating reports would be great too!
Thanks in advance~
Regardless the fact T-SQL has more functionality than plain SQL, in general data warehousing you have two main approaches:
Put business logic closer to the data. This way you develop lots of T-SQL functions and apply many optimizations available there to improve performance of your ETL. Pros is greater performance of your ETL and reports. But cons are the cost of making changes to the code and migration cost. The usual case for growing DWH is migration to some of the MPP platforms. If you have lots of T-SQL code in MSSQL, you'll have to completely rewrite it, which will cost you pretty much money (sometimes even more than the cost of MPP solution + hardware for it)
Put business logic to the external ETL solution like Informatica, DataStage, Pentaho, etc. This way in database you operate with pure SQL and all the complex logic (if needed) is the responsibility of your ETL solution. Pros are simplicity of making changes (just move the transformation boxes and change their properties using GUI) and simplicity of changing the platform. The only con is the performance, which is usually up to 2-3x slower than in case of in-database implementation.
This is why you can either find a tutorial on T-SQL, or tutorial on ETL/BI solution. SQL is very general tool (many ANSI standards for it) and it is the basic skill for any DWH specialist, also ANSI SQL is much simpler as it does not have any database-specific stuff
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've just started playing around with Node.js and Socket.io and I'm planning on building a little multi-player game. Probably something simple like each player has a character that they can run around in an arena and try and kill each other.
However I'm unsure how best to store the data. I can imagine there would be some loose relationships between objects such as a character and its weapon but that I would likely load these into the system by id as and when they are required and save them back out when I no longer need them.
In these terms would it be simpler to write 'objects' out to file instead of getting a database involved. Use a NoSql document database or just stick to good old Sql server?
My advice would be to start with NoSQL.
Flatfile is difficult because you'll want to read and write this data very, very often. One file per player is not a terrible place to start - and might be OK for the very first prototype - but you're going to be writing a huge amount. File systems are not good at this. The one benefit at prototype stage is you can debug quick - just cat out the current state of a user. Using a .json file, or similar .yaml format, will start you on your way very rapidly (and you can convert to the NoSQL approach as the prototype starts coming together).
SQL isn't a terrible approach. If you're familiar with this, you'll end up building a real schema, and creating a variety of tables and joining the user data against them quite a bit. This can be a benefit for helping you think through your game, but I think you'll end up spending a lot of time trying to figure out how to normalize your data and writing joins. Since it seems you're unfamiliar with the problem (thus are asking the question), you're likely to do this wrong (and get in the way of gaming awesomeness) and/or just spend too much time at it.
NoSQL - using a document store model - is much like just reading an writing a user object. You'll end up re-writing your user object every time - but this kind of access (key-value, accessed by the user id) is hyper efficient. You'll probably get into a prototype really, really quickly, and to the important aspect of building out your play mechanism. Key-value access is highly scalable in the long run.
If you want to store Player information, use sql. However if you're having a connection based system. As in something where you only need to store information while the player is connected and after the connection is lost you don't need to "save"; then just store it in Memory.
Otherwise, I would say that you should stick with Sql. Databases are optimized, quick, tried, tested and true. You can't go wrong with a Sql database.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
As background, I'm one of two developers in my department. I got into computers my freshman year in high school (1986) and have no formal education. I got into MS Access a little bit in 1994 and more seriously beginning in 2003. I'm self-educated, have always tried to learn as much as I can about database design, and while I believe I know a lot I also know I don't know everything.
The other developer in my department, according to his resume, has a degree in computer science and has been doing IT work, including web design and database design, for about 8 years. He was hired into my department last December. I've been very surprised by what I see as a very fundamental lack of knowledge about the basics of database design and SQL and have been trying to figure out if at least part of the problem is I'm expecting too much or maybe don't know as much as I think I do.
Hence my question. Please note we are 100% MS Access, but I believe this question applies to about any SQL database. This developer was tasked to take a spreadsheet and convert it into a database. Part of the spreadsheet involved tracking inventory for batteries. In the spreadsheet, the column titles were Date and Count. But the data in the date column was a mix of dates and batch numbers. So this developer created a table with a numeric field to contain both the batch number and the date and a second boolean field called IsDate to indicate what value was in the field.
I disagree with this approach and would have created two separate fields, a date field for the date and a numeric field for the batch number. When I suggested this approach, he seemed to not only not understand why but also to get a bit angry about having to change his design.
Which approach would you recommend? Also, assuming everyone agrees with my approach - of course you will! ;) - if you had a developer with this supposed level of experience, would you consider him worth keeping and worth investing the time and effort to educate him?
My own rule of thumb here is:
Always keep data in a native datatype.
This helps comparing, sorting, finding and grouping - especially in a database - and makes your storage less prone to query errors. Moreover, you're not required to use another predicate (AND isdate) when accessing the data. Hence, I think your approach is correct.
Your colleague's approach seems not to be a matter of high education, but one of a personal approach. I've seen workers with PhD who could well listen to a well-reasoned argument, and freshmen who made grave mistakes and would not listen to a polite advice.
I'd most definitely store the date and the batch number in different fields of the appropriate type - setting each with the relevant content or as NULL if no value was available. By doing this you'd be able to see what data you actually have available and perform meaningful operations on that data.
In terms of you second question, I guess it would really depend on what the developer in question said when you asked them why they'd chosen the approach they did.
You are right.
Only under severe memory restrictions might (note might) this kind of architecture be acceptable.
As to dealing with him, I would first talk to him and fiugre out why he chose the given approach, this is something that might have been common in Access Databases 10 years ago (but even then there was enough disk and memory space to not have to do these kind of tricks).
His reluctance to talk about his design is a worse indicator of his abilities than the design itself. Even the most misguided design should have been based on a structured approach or idea. In my mind it is not a bad thing to be wrong, it is a bad thing to create random structures. But not knowing your requirements it is hard to suggest whether it is worth keeping him or not.
Is one of you the 'senior' hierarchy wise or are you sharing responsibilities ?
Point out that he is breaking first normal form by doing so. Be able to describe 1NF 2NF and 3NF before trying to impress him with you fancy pants knowledge.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Introduction:
So, I have an interview tomorrow and I'm trying to review SQL and databases. The job posting says that they want someone with:
Experience with database design and development
Strong knowledge of SQL
Experience with SQL Server and/or Postgres
I've read through Questions every good database SQL developer should be able to answer, and a bunch of questions tagged with SQL and interview-questions. So I realize that I need to know about SELECT, JOIN and WHERE.
Questions:
What are essential SQL, Postgres and database concepts that I need to know in order to do well in the interview?
What do I need to know about transaction and normalization?
What are some general ways to optimize slow queries?
Should I learn about the functions, keywords or both?
It depends on how much of the role is based around database development and design. For your SQL syntax, you should also understand the difference between the types of joins, and be able to use GROUP BY, ORDER BY, HAVING as well as the aggregate functions that can be used in conjunction with them.
In terms of performance monitoring, I would be looking at execeution plans (not sure about the Postgres equivalent) and how they can provide tips on increasing performance, as well as using SQL Profiler to see what instructions the server is executing in real time.
Transactions can be useful for rolling back, well, transactions (stored procs, ad-hoc queries etc.) that require queries to complete in a certain way to maintain data consistency. Some people (myself included) have a practice of placing any statements that make any changes to data into a transaction that automatically rolls back (BEGIN TRAN ... ROLLBACK TRAN) to check that the correct amount of data is manipulated before pushing changes to a live server. Have a look at the ACID model - Atomicity, Consistency, Isolation, Durability.
Normalization is something that can take a little time to go through, but just know and partially understand up to 3rd form normalization and that will get you started.
Optimisation can be a huge topic. Just remember to try and do things like UPDATE using set based queries, rather than row based (updating in a WHILE loop is an example of row based updating, but it CAN have its uses).
I hope this helps a little.
Besides the basics of sql syntax, which you listed, you should know some things about query performance. What are some common causes of slow queries and what are the remedies for those, and how can you evaluate the performance of a query.