I'm new to hive and came across Operations of Hive.
When I have asked this to my peers. I'm getting different answers.
Someone saying that Built-in operators.
Some are saying ACID and CRUD
Rest of all saying DML and DDL operations.
Can anyone suggest me the proper one?
Related
I've been looking over various questions on loading data into Firebird, and apparently it's possible to run multiple INSERT statements in a batch... if you're running a script in ISQL and all the values are written inline.
If you're trying to do this from code, of course, you have two problems here: 1) ISQL syntax doesn't work, and 2) any developer with 20 minutes of SQL experience knows that writing data values inline into your query is an unholy abomination that will summon nasal demons, cause your hair to fall out, and oh by the way leave your system open to SQL Injection vulnerabilities.
But I haven't found any solution whatsoever about running bulk inserts from application code. I haven't even found anyone discussing it. Apparently there's a mechanism for quick-loading data from "external tables" if you write it out to a file in the right format, but there's precious little information available on how that works, and what is available claims that it has problems with concepts as simple as blobs and even nulls!
So I'm just about at my wits' end here. Does any mechanism at all exist to allow 3rd-party application code to bulk-load any and all data supported by Firebird into a FB database?
Prepared parameterized statement in loop.
IBatch class in Firebird 4 OO API.
IReplicator class in Firebird 4 OO API which is tricky but the fastest possible option.
In any case parsing of source data format and transforming values into types supported by Firebird is up to the application programmer. There is no silver bullet that "load anything".
I am a bit of surprised to know that Hive now has UPDATE statement (although it looks like its way back from v0.14), although I am quite aware for some time that it is in Hive's roadmap to have a full or near RDBMS-SQL functionality.
Can you summarize how Hive's INSERT, UPDATE, DELETE different from Relational Databases and what are its limitations (Hive is v2.1.0 as of this writing)?
Should Hive continue to improve its RDBMS-like SQL capabilities, say 2-3 years time, will it then be useful for Relational DB workloads?
(I'm not aware of the full roadmap though. Pardon if this is a stupid question, or a question due to laziness in browsing through documentations.)
Hive supported insert. However for update and delete operation following are the requirements
only for ORC format
only for bucketed tables
have to specify TBLPROPERTIES ("transactional"="true")
The latency is still an issue with this operations following has use cases of why ACID compatibility is introduced. However in road map hive is not planning to replace transaction relational database.
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Limitations
We have currently one running project which uses RDBMS database( with lots of tables and stored procedures for manipulating data). The current flow is like : the data access layer will call stored procedures, which will insert/delete/update or fetch data from RDBMS(please note that these stored procedures are not doing any bulk proccesses.). The current data structure contains lots of primary key, foreign key relation ship and have lots of updates to existing database tables.a I just want to know whether we can use HBase for our purpose? then how can we use Hadoop with HBase replacing RDBMS?
You need to ask yourself, what is the RDBMS not doing for you, and what is it that you hope to achieve by moving to Hadoop/HBase?
This article may help. There are a lot more.
http://it.toolbox.com/blogs/madgreek/nosql-vs-rdbms-apples-and-oranges-37713
If the purpose is trying new technology, I suggest trying their tutorial/getting started.
If it's a clear problem you're trying to solve, then you may want to articulate the problem.
Good Luck!
I hesitate to suggest replacing your current rdbms simply because of the large developer effort that you've already spent. Consider that your organization probably has no employees with the needed experience for hbase. Moving to hbase with the attendant data conversion and application rewriting will be very expensive and risky.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Introduction:
So, I have an interview tomorrow and I'm trying to review SQL and databases. The job posting says that they want someone with:
Experience with database design and development
Strong knowledge of SQL
Experience with SQL Server and/or Postgres
I've read through Questions every good database SQL developer should be able to answer, and a bunch of questions tagged with SQL and interview-questions. So I realize that I need to know about SELECT, JOIN and WHERE.
Questions:
What are essential SQL, Postgres and database concepts that I need to know in order to do well in the interview?
What do I need to know about transaction and normalization?
What are some general ways to optimize slow queries?
Should I learn about the functions, keywords or both?
It depends on how much of the role is based around database development and design. For your SQL syntax, you should also understand the difference between the types of joins, and be able to use GROUP BY, ORDER BY, HAVING as well as the aggregate functions that can be used in conjunction with them.
In terms of performance monitoring, I would be looking at execeution plans (not sure about the Postgres equivalent) and how they can provide tips on increasing performance, as well as using SQL Profiler to see what instructions the server is executing in real time.
Transactions can be useful for rolling back, well, transactions (stored procs, ad-hoc queries etc.) that require queries to complete in a certain way to maintain data consistency. Some people (myself included) have a practice of placing any statements that make any changes to data into a transaction that automatically rolls back (BEGIN TRAN ... ROLLBACK TRAN) to check that the correct amount of data is manipulated before pushing changes to a live server. Have a look at the ACID model - Atomicity, Consistency, Isolation, Durability.
Normalization is something that can take a little time to go through, but just know and partially understand up to 3rd form normalization and that will get you started.
Optimisation can be a huge topic. Just remember to try and do things like UPDATE using set based queries, rather than row based (updating in a WHILE loop is an example of row based updating, but it CAN have its uses).
I hope this helps a little.
Besides the basics of sql syntax, which you listed, you should know some things about query performance. What are some common causes of slow queries and what are the remedies for those, and how can you evaluate the performance of a query.
I need to get data out of SQL Server2005 tables, and into another system.
My vendors says:
"We donĀ“t recommend that you go directly in the SQL and collect data, because it can result in
corruption of data or you can lock tables while exporting."
Is that true?
Yes. You could lock tables while exporting. You can use the WITH(NOLOCK) hint if you want to avoid locks (but be aware you could read 'stale' or otherwise inconsistent data).
What do they mean by corruption of data? You can't corrupt data if you only read it without locks (but you could read inconsistent data).
According to Microsoft themselves the answer to the question is the following:
"ODBC access to Microsoft Navision is fully supported for Read operations although write operations need careful attention as the business logic is bypassed (for example triggers are not executed)."
Source: page 15 in http://www.navisionguider.dk/downloads/Nav_IntegrationGuide1.2.pdf
Anyone with experiences in using ODBC for read operations only? Does it disturb the write operations in any critical way? (is write operations made impossible, or is data destroyed)?
Or is it just a performance issue? (slower writes, while you're exporting/reading tons of data)?
I guess I could experience dirty reads (read outdated data), but write operations should still be possible for others?