Does anyone know where to look for an example of a hive schema?
Say you want to define a table with 500 columns and you've put these 500 columns in an excel spreadsheet, how can you get Hive to read in this schema from the excel spreadsheet and create the table that we want?
WE don't necessary need to tie ourselves for a spreadsheet - I am just using that as an example.
thanks.
Well its really a nice requirement, I myself faced couple of issues like this. But as far as I know here is no direct way for that. You need to create some code (Java) which will read from the source(in your case an excel spread sheet) and generate the create table statement and execute in Hive.
You might check in GitHub for some open-source projects that might have this facility. But Hive does not do that.
Hope it helps...!!!
Related
Can anyone help on how to convert hana sql tables to .hdb tables and use them? For converting into .hdb files at first I have imported table .csv format and after this I am not sure how to convert to .hdb table. can someone provide any process
I'm not really sure what you going for but using hdb tables is as easy as creating table_name.hdb in exactly the same format (I.E. COLUMN TABLE ... ) as it was created in "classic" schema. Help Sap hdbtables
You can use the SAP HANA developer CLI's massConvert functionality to convert one or more tables to hdbtable.
Note that this will only take care of the table structure. If you have data that you want to keep you will have to copy it manually, for example, via a CSV export/import.
I need to transform a fairly big database table with aws Glue to csv. However I only the newest table rows from the past 24 hours. There ist a column which specifies the creation date of the row. Is it possible, to just transform these rows, without copying the whole table into the csv file? I am using a python script with Spark.
Thank you very much in advance!
There are some Built-in Transforms in AWS Glue which are used to process your data. This transfers can be called from ETL scripts.
Please refer the below link for the same :
https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html
You haven't mentioned the type of database that you are trying connect. Anyway for JDBC connections spark has the option of query, in which you can issue the usual SQL query to get the rows you need.
I was told that there is a way to do this but I have no idea. I am hoping that someone more educated than I can help me understand the answer...
There is a table that has been imported from an external source via SSIS. The destination in SSIS needs to be updated frequently from the external source. The users are complaining about performance problems in their queries.
How would you update this destination in SSIS to achieve these goals?
Anyone have a clue? I'm "dry"...
If your users are complaining about performance then it is not an SSIS issue. You need to look at what queries are running against the table. Make sure your table has a primary key and appropriate filters based on the columns used to sort and filter the data.
Can you give us a listing of the table definition?
i'll give u some advice then maybe can improve your ssis performance
Use SQL statement rather than instant dropdown interface when you import data from db to your SSIS (using SQL Statement u can import, filter, Convert, and Sort at once)
Import & Filter only Column you needed in SSIS
Some reference say, you can customize your default buffer settings (DefaultBufferSize & DefaultBufferMaxRows). this way will improve your bandwidth data process.
I have a fairly simple CSV file that I would like to use within a SQL query. I'm using Oracle SQL Developer but none of the solutions I have found on the web so far seem to have worked. I don't need to store the data (unless I can use temp tables?) just to query it and show results.
Thank You!
You need to create an EXTERNAL TABLE. This essentially maps a CSV (or indeed any flat file) to a table. You can then use that table in queries. You will not be able to perform DML on the external table.
I have a very large data set in GPDB from which I need to extract close to 3.5 million records. I use this for a flatfile which is then used to load to different tables. I use Talend, and do a select * from table using the tgreenpluminput component and feed that to a tfileoutputdelimited. However due to the very large volume of the file, I run out of memory while executing it on the Talend server.
I lack the permissions of a super user and unable to do a \copy to output it to a csv file. I think something like a do while or a tloop with more limited number of rows might work for me. But my table doesnt have any row_id or uid to distinguish the rows.
Please help me with suggestions how to solve this. Appreciate any ideas. Thanks!
If your requirement is to load data into different tables from one table, then you do not need to go for load into file and then from file to table.
There is a component named tGreenplumRow which allows you to write direct sql queries (DDL and DML queries) in it.
Below is a sample job,
If you notice, there are three insert statements inside this component. It will be executed one by one separated by semicolon.