How to make Hive Terminal show rows (not just headers) after code is run? - hive

As of now, Hive Terminal is showing only column headers after a create table code is run. What settings should I change to make Hive Terminal show few rows also, say first 100 rows?
Code I am using to create table t2 from table t1 which resides in the database (I don't know how t1 is created):
create table t2 as
select *
from t1
limit 100;
Now while development, I am writing select * from t2 limit 100; after each create table section to get the rows with headers.

You cannot
The Hive Create Table documentation does not mention anything about showing records. This, combined with my experience in Hive makes me quite confident that you cannot achieve this by mere regular config changes.
Of course you could tap into the code of hive itself, but that is not something to be attempted lightly.
And you should not want to
Changing the create command could lead to all kinds of problems. Especially because unlike the select command, it is in fact an operation on metadata, followed by an insert. Both of these normally would not show you anything.
If you would create a huge table, it would be problematic to show everything. If you choose always to just show the first 100 rows, that would be inconsistent.
There are ways
Now, there are some things you could do:
Change hive itself (not easy, probably not desirable)
Do it in 2 steps (what you currently do)
Write a wrapper:
If you want to automate things and don't like code duplication, you can look into writing a small wrapper function to call the create and select based on just the input of source (and limit) and destination.
This kind of wrapper could be written in bash, python, or whatever you choose.
However, note that if you like executing the commands ad-hoc/manually this may not be suitable, as you will need to start a hive JVM each time you run such a program and thus response time is expected to be slow.
All in all you are probably best off just doing the create first and select second.

The below command mentioned seems to be correct to show the first 100 rows:
select * from <created_table> limit 100;
Paste the code you have written to create the table will help to diagnose the issue in hand!!
Nevertheless , check if you have correctly mentioned the delimiters for the elements, key-value pairs, collection items etc while creating the table.
If you have not defined them correctly you might end up with having only the first row(header) being shown.

Related

Oracle - allowing the LIKE clause being configured by another select

In a script I'm running the following query on an Oracle database:
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
where DATA.FILTER like (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
In the actual script it's a bit more complicated, but you get the idea. ;-)
DATA2.FILTER contains the filter which needs to applied on DATA as a LIKE -clause. The idea is to have it as generic as possible, meaning it should be possible to filter on:
FILTER%
%FILTER
FI%LTER
%FILTER%
FILTER (as if the clause was DATA.FILTER = (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
The system the script runs on does not allow stored procedures to run for this kind of task, also I can't make the scrip build the query directly before running it.
Whatever data needs to be fetched, it has to be done with one single query.
I've already tried numerous solutions I found online, but no matter what I do, I seem to be misisng the mark.
What am I doing wrong here?
This one?
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
JOIN DATA2 on DATA.FILTER LIKE DATA2.FILTER
where DATA2.FILTER2 = 'WYZ'
Note: The performance of this query is not optimal, because for table DATA Oracle will perform always a "Full Table Scan" - but it works.

Get last few query results in SQL

I frequently do a static analysis of SQL databases, during which I have the luxury of nobody being able to change the data except me.
However, I have not found a way to 'tell' this to SQL in order to prevent running the same query multiple times.
Here is what I would like to do, first I start with a complicated query that has a very small output.
SELECT * FROM MYTABLE WHERE MYPROPERTY = 1234
Then I run a simple query from the same window (Mostly using SQL server studio if that is relevant)
SELECT 1
Now I suddenly realize that I forgot to save the results from my first complicated (slow) query.
As I know the underlying data did not change (or even if it did) I would like to look one step back and simply get the result. However at the moment I don't know any trick to do this and I have to run the entire query again.
So the question summary is: How can I (automatically store/)get the results from recently executed queries.
I am particulary interested in simple select queries, and would be happy to allocate say 100MB memory for automated result storage. Would prefer a solution that works in SQL server studio with T-SQL, but other SQL solutions are also welcome.
EDIT: I am not looking for a way to manually prevent this from happening. In the cases where I can anticipate the problem it will not happen.
This can't be done in Microsoft SQL Server. SQL Server does not cache results, instead it caches data pages that were accessed by your query. This should make your query go a lot faster the second time around so it won't be as painful to re-run it.
In other databases, such as Oracle and MySQL, they do have a query caching mechanism that will allow you to retrieve the results directly the second time around.
I run into this frequently, I often just throw the results of longer-running queries into a temp table:
SELECT *
INTO #results1
FROM MYTABLE WHERE MYPROPERTY = 1234
SELECT *
FROM #results1
If the query is very long-running I might use a 'real' table. It's a good way to save on re-run time.
Downside is that it adds to your query.
You can also send query results to a file in SSMS, info on formatting the output is here: SSMS Results to File
The easiest way to do this is to run each query in its own SSMS window, the results will stay there until you close it, or run out of memory - besides that, I am not sure there is a way to accomplish what you want.
Once you close the SSMS window, I don't believe there is a way to get back 'cached' results.
This isn't a technical answer to your question. Having written queries and looking at results for many years, I am in the habit of saving the results in Excel, regardless of the database/query tool I'm using.
The format in Excel is rather methodical:
Each worksheet has the date. (Called something like "1 Jul".)
Each spreadsheet contains one month. (Typically with the month name like "work-201307".)
In the "B" column I copy and paste the query.
Underneath, in the "C" column, I copy and paste the results.
The next query starts a few lines after, one after the other.
I put the queries in the "B" column, so I can go to the "A" column and use to get to the first row. I put the results in the "C" column, so I can go to the "B" column and use to move between queries.
I mostly do this so I can go back and see the work I did many months ago. For instance, someone sends an email from February and says "do this again". I can go back to the February spreadsheet, go to the day it was created, and see what I was doing at that time.
In your question, though, I realize that I now instinctively solve this problem, because the "right click on the grid, copy with column headers, alt-tab to excel, alt-V" is a behavior that I comes quite naturally.
I was going to suggest you to run each query into a script with a counter (stored in a table) increased each time the query is executed (i.e. i++) and storing each query in a Temp Table called "tmpTable" + i, but it sounds very complicated to manage. Am I right?
Then I googled and I've found this Tool Pack: I didn't try it but you could take a look:
http://www.ssmstoolspack.com/Features
Hope it helps.
EDIT: added the folliwing link. There's the option to output as XML file and they mention SQL Server Integration Services as a possible solution too.
http://michaeljswart.com/2012/03/sending-query-results-to-others/#method5
SECOND EDIT: There's this DBMS-Independent tool too, it sounds interesting:
http://www.sql-workbench.net/
i am not sure this is what you want. Anyway check my answer
In sql server management studio you can open multiple tabs for executing queries. Open new tab for each query, then the result of executed queries will be available under that tab.
After executing one query in a tab dont use that tab for new query, open new tab for that job.
Have you considered using some kind of offline SQL client such as Excel? Specifically, Excel will retrieve the results into the spread sheet (using the Data ribbon/menus) where they are stored pretty much permanently as results. It will prompt you to refresh when necessary or you can do it on demand.
Your question as to whether it can be done in T/SQL or other databases depends on the database and results cache and even then they are options that the query processor can use not guarantees to the individual query.

Fill your tables with junk data?

I am lazy, sometimes excruciatingly lazy but hey (ironically) this is how we get stuff done right?
Had a simple idea that may or not be out there. If it is I would like to know and if not perhaps I will make it.
When working with my MSSQL database sometimes I want to test the performance of various transactions over tables and view and procedures etc... Does anyone know if there is a way to fill a table up with x rows of junk data mearly to experiment with.
One could simple enough..
INSERT INTO `[TABLE]`
SELECT `COLUMNS` FROM [`SOURCE_TABLE`]
Or do some kind of...
DECLARE count int
SET count = 0
WHILE count <= `x`
BEGIN
INSERT INTO `[TABLE]`
(...column list...)
VALUES
(...VALUES (could include the count here as a primary key))
SET count = count + 1
END
But it seems like there is or should already be something out there. Any ideas??
I use redgate
SQL Data generator
Use a Data Generation Plan (a feature of Visual Studio database projects).
WinSQL seems to have a data generator (which I did not test) and has a free version. But the Test data generation wizard seems to be reserved to the Pro version.
My personal favorite would be to generate a CSV file (using a 4.5 lines script) and load it into your SQL DB using BULK INSERT. This will also allow better customization of the data as sometimes is needed (e.g. when writing tests).

Move SELECT to SQL Server side

I have an SQLCLR trigger. It contains a large and messy SELECT inside, with parts like:
(CASE WHEN EXISTS(SELECT * FROM INSERTED I WHERE I.ID = R.ID)
THEN '1' ELSE '0' END) AS IsUpdated -- Is selected row just added?
as well as JOINs etc. I like to have the result as a single table with all included.
Question 1. Can I move this SELECT to SQL Server side? If yes, how to do this?
Saying "move", I mean to create a stored procedure or something else that can be executed before reading dataset in while cycle.
The 2 following questions make sense only if answer is "yes".
Why do I want to move SELECT? First off, I don't like mixing SQL with C# code. At second, I suppose that server-side queries run faster, since the server have more chances to cache them.
Question 2. Am I right? Is it some sort of optimizing?
Also, the SELECT contains constant strings, but they are localizable. For instance,
WHERE R.Status = "Enabled"
"Enabled" should be changed for French, German etc. So, I want to write 2 static methods -- OnCreate and OnDestroy -- then mark them as stored procedures. When registering/unregistering my assembly on server side, just call them respectively. In OnCreate format the SELECT string, replacing {0}, {1}... with required values from the assembly resources. Then I can localize resources only, not every script.
Question 3. Is it good idea? Is there an existing attribute to mark methods to be executed by SQL Server automatically after (un)registartion an assembly?
Regards,
Well, the SQL-CLR trigger will also execute on the server, inside the server process - so that's server-side as well, no benefit there.
But I agree - triggers ought to be written in T-SQL whenever possible - no real big benefit in having triggers in C#.... can you show the the whole trigger code?? Unless it contains really odd balls stuff, it should be pretty easy to convert to T-SQL.
I don't see how you could "move" the SELECT to the SQL side and keep the rest of the code in C# - either your trigger is in T-SQL (my preference), or then it is in C#/SQL-CLR - I don't think there's any way to "mix and match".
To start with, you probably do not need to do that type of subquery inside of whatever query you are doing. The INSERTED table only has rows that have been updated (or inserted but we can assume this is an UPDATE Trigger based on the comment in your code). So you can either INNER JOIN and you will only match rows in the Table with the alias of "R" or you can LEFT JOIN and you can tell which rows in R have been updated as the ones showing NULL for all columns were not updated.
Question 1) As marc_s said below, the Trigger executes in the context of the database. But it goes beyond that. ALL database related code, including SQLCLR executes in the database. There is no client-side here. This is the issue that most people have with SQLCLR: it runs inside of the SQL Server context. And regarding wanting to call a Stored Proc from the Trigger: it can be done BUT the INSERTED and DELETED tables only exist within the context of the Trigger itself.
Question 2) It appears that this question should have started with the words "Also, the SELECT". There are two things to consider here. First, when testing for "Status" values (or any Lookup values) since this is not displayed to the user you should be using numeric values. A "status" of "Enabled" should be something like "1" so that the language is not relevant. A side benefit is that not only will storing Status values as numbers take up a lot less space, but they also compare much faster. Second is that any text that is to be displayed to the user that needs to be sensitive to language differences should be in a table so that you can pass in a LanguageId or LocaleId to get the appropriate French, German, etc. strings to display. You can set the LocaleId of the user or system in general in another table.
Question 3) If by "registration" you mean that the Assembly is either CREATED or DROPPED, then you can trap those events via DDL Triggers. You can look here for some basics:
http://msdn.microsoft.com/en-us/library/ms175941(v=SQL.90).aspx
But CREATE ASSEMBLY and DROP ASSEMBLY are events that are trappable.
If you are speaking of when Assemblies are loaded and unloaded from memory, then I do not know of a way to trap that.
Question 1.
http://www.sqlteam.com/article/stored-procedures-returning-data
Question 3.
It looks like there are no appropriate attributes, at least in Microsoft.SqlServer.Server Namespace.

Is there a way to parser a SQL query to pull out the column names and table names?

I have 150+ SQL queries in separate text files that I need to analyze (just the actual SQL code, not the data results) in order to identify all column names and table names used. Preferably with the number of times each column and table makes an appearance. Writing a brand new SQL parsing program is trickier than is seems, with nested SELECT statements and the like.
There has to be a program, or code out there that does this (or something close to this), but I have not found it.
I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.
It work great using around 150 different (and complex) SQL queries.
How about using the Execution Plan report in MS SQLServer? You can save this to an xml file which can then be parsed.
You may want to looking to something like this:
JSqlParser
which uses JavaCC to parse and return the query string as an object graph. I've never used it, so I can't vouch for its quality.
If you're application needs to do it, and has access to a database that has the tables etc, you could run something like:
SELECT TOP 0 * FROM MY_TABLE
Using ADO.NET. This would give you a DataTable instance for which you could query the columns and their attributes.
Please go with antlr... Write a grammar n follow the steps..which is given in antlr site..eventually you will get AST(abstract syntax tree). For the given query... we can traverse through this and bring all table ,column which is present in the query..
In DB2 you can append your query with something such as the following, but 1 is the minimum you can specify; it will throw an error if you try to specify 0:
FETCH FIRST 1 ROW ONLY