I have a script with 8 impala queries in order to run them through airflow. I want to create a log file when the script finish, to display fetched rows, execution time, and some logs. Is this possible and how can I achieve that?
Related
I want to create a table for logging in Hive. This table should contain basic details of a sqoop job that runs every day. It should list the name of the job/the table name to which the data was loaded using the sqoop job. The number of records ingested , whether the ingestion was successful or failed and also the time of ingestion.
A .log file is created after every sqoop job run but this log file is not structured to be loaded directly into the hive table using the LOAD DATA INPATH command.I would really appreciate it if someone could point me in the right direction about whether a shell script should be written to achieve this or something else?
Thank you in advance
I have created a few Athena queries to generate reports.
Business wants these reports run nightly and have the output of the query emailed to them?
My first step is to schedule the execution of the saved/named Athena queries so that I can collect the output of the query execution from the S3 buckets.
Is there a way to automate the execution of the queries on a periodic basis?
You can schedule events in AWS using Lambda (see this tutorial). From Lambda you can run just about anything, including trigger some Athena query.
I usually connect to gateway node through putty and run hive queries over there.
On several occasions the queries run for hours together. And at least a few times, putty gets disconnected, and the execution of the queries also abort.
Is there a way to store hive query results somehow, so that I can inspect them at later points of time?
I don't want to create another table just to store the results.
You can store your result
INSERT OVERWRITE DIRECTORY 'outputpath' SELECT * FROM table
I am new to BODS, At present I have configured a job to execute every 2 mins to perform transaction from MySQL server and Load into HANA tables.
But sometimes when the data volume in MySQL is too large to transform and load into HANA within 2 Mins, the job is still executing my next iteration for the same job starts which results in BODS failure.
My question is: is there is any option BODS to check for the execution status of the scheduled JOB between runs?
Please help me out with this.
You can create a control/audit table to keep history of each run of bods job. The table should contain fields like eExtractionStart, ExtractionEnd, EndTime etc. And you need to make a change in the job, so that it reads status of previous run from this table before starting the load to Hana data flow. If previous run has not finished, the job can raise an exception.
Let me know if this has been helpful or if you need more information.
I have this simple query which is fine in hive 0.8 in IBM BigInsights2.0:
SELECT * FROM patient WHERE hr > 50 LIMIT 5
However when I run this query using hive 0.12 in BigInsights3.0 it runs forever and returns no results.
Actually the scenario is the same for following query and many others:
INSERT OVERWRITE DIRECTORY '/Hospitals/dir' SELECT p.patient_id FROM
patient1 p WHERE p.readingdate='2014-07-17'
If I exclude the WHERE part then it would be all fine in both versions.
Any idea what might be wrong with hive 0.12 or BigInsights3.0 when including WHERE clause in the query?
When you use a WHERE clause in the Hive query, Hive will run a map-reduce job to return the results. That's why it usually takes longer to run the query because without the WHERE clause, Hive can simply return the content of the file that represents the table in HDFS.
You should check the status of the map-reduce job that is triggered by your query to find out if an error happened. You can do that by going to the Application Status tab in the BigInsights web console and clicking on Jobs, or by going to the job tracker web interface. If you see any failed tasks for that job, check the logs of the particular task to find out what error occurred. After fixing the problem, run the query again.