How to compare hive and sql tables - sql

We have created a scheduler that pulls the data from on premise sql server and puts it on the HDFS. Now the problem is we need to verify if the data pushed is correct and is consistent with the on premise data.
Could you please help me how to compare these tables and their data for correctness . Any thing will help. Thanks.

You can use SQOOP which also supports validation between a Hive table and a database with the --validate option.
Refer to Sqoop User Guide - Validation for more details.

Related

Converting hana tables to hdb tables

Can anyone help on how to convert hana sql tables to .hdb tables and use them? For converting into .hdb files at first I have imported table .csv format and after this I am not sure how to convert to .hdb table. can someone provide any process
I'm not really sure what you going for but using hdb tables is as easy as creating table_name.hdb in exactly the same format (I.E. COLUMN TABLE ... ) as it was created in "classic" schema. Help Sap hdbtables
You can use the SAP HANA developer CLI's massConvert functionality to convert one or more tables to hdbtable.
Note that this will only take care of the table structure. If you have data that you want to keep you will have to copy it manually, for example, via a CSV export/import.

aws Glue: Is it possible to pull only specific data from a database?

I need to transform a fairly big database table with aws Glue to csv. However I only the newest table rows from the past 24 hours. There ist a column which specifies the creation date of the row. Is it possible, to just transform these rows, without copying the whole table into the csv file? I am using a python script with Spark.
Thank you very much in advance!
There are some Built-in Transforms in AWS Glue which are used to process your data. This transfers can be called from ETL scripts.
Please refer the below link for the same :
https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html
You haven't mentioned the type of database that you are trying connect. Anyway for JDBC connections spark has the option of query, in which you can issue the usual SQL query to get the rows you need.

Visualization Using Tableau

I am new to Tableau, and having performance issues and need some help. I have a hive query result in Azure Blob Storage named as part-00000.
The issue having this performance is I want to execute the custom query in Tableau and generates the graphical reports at Tableau.
So can I do this? How ?
I have 7.0 M Data in Hive table.
you can find custom query in data source connection check linked image
You might want to consider creating an extract instead of a live connection. Additional considerations would include hiding unused fields and using filters at the data source level to limit data as per requirement.

Is it possible that Hive table as source and Oracle table as target in Informatica developer tool with hive environment?

I want to use HIVE table as a source and Oracle table as target in my Informatica developer tool mapping with hive environment. Mean I want to run my Informatcia developer tool mapping in HIVE mode. Is it possible if yes then please let me know the steps.
Thanks
Yes, it is possible, You just need to change the execution engine.
Refer to Informatica BDM help document for the detailed steps
You can create Oracle connection via sqoop and use it with any of the Hadoop engine in informatica bdm
Yes this is possible using sqoop for oracle. But there is limitation that with sqoop you can only insert in oracle.
You can use PowerExchange for Hadoop to load data from PowerCenter to new Hive tables.
Source or targets will appear in PowerCenter as ODBC data sources or ODBC targets .

Get SQL code to generate the schema for an existing database in PostgreSQL

Given an existing database in PostgreSQL, I want to get the required SQL code to generate an identical database with no records.
Is there any easy way to do so?
You can use pg_dump command to do that. The option -s dumps only the schema and no data from the database.