Slow query using Pyhs2 to fetch data in Hive

Slow query using Pyhs2 to fetch data in Hive - hive

I tried to use Pyhs2 to communicate with Hive，fetch data and put them in a list(temporary stored in RAM).
But it took a long time to query a table using very simple HQL like 'select fields1,fields2... from table_name', in which data scale is about 7 million rows and less then 20 fields. The whole process costs nearly 90 mins.
My server: CentOS 6.5, 8 cpu units, 32 processors and 32GB RAM
Hadoop cloud: more than 200 machines
Can someone help to solve this problem? Thanks very much

Related

postgres rds slow response time

We have an aws rds postgres db od type t3.micro.
I am running simple queries on a pretty small table and I get pretty high response time - around 2 seconds per query run.
Query example:
select * from package where id='late-night';
The cpu usage is not high (around 5%)
We tried creating a bigger rds db (t3.meduiom) with the snapshot of the original one and the performance did not improve at all.
Table size 2600 rows
We examined connection with bot external ip and internal ip.
Disk size 20gib
Memory type: ssd
Is there a way to improve performance??
Thanks for the help!

running into an issue when running query on Impala

I've been running into the following issue when running basic queries on Impala (For example: select * from table limit 100) recently. I did some online research but have not found a fix for this. any insights on how I could fix this? I use HUE for querying.
ExecQueryFInstances rpc query_id=5d4f8d25428xxxx:813cfbd30000xxxx
failed: Failed to get minimum memory reservation of 8.00 MB on daemon
ser1830.xxxx.com:22000 for query xxxxxxxxx:813cfbd30xxxxxxxxx
due to following error: Failed to increase reservation by 8.00 MB
because it would exceed the applicable reservation limit for the
"Process" ReservationTracker: reservation_limit=68.49 GB
reservation=68.49 GB used_reservation=0 child_reservations=68.49 GB
The top 5 queries that allocated memory under this tracker are:
Query(5240724f8exxxxxx:ab377425000xxxxxx): Reservation=41.81 GB
ReservationLimit=64.46 GB OtherMemory=133.44 MB Total=41.94 GB
Peak=42.62 GB Query(394dcbbaf6bxxxxx2f4760000xxxxxx0):
Reservation=26.68 GB ReservationLimit=64.46 GB OtherMemory=92.94 KB
Total=26.68 GB Peak=26.68 GB Query(5d4f8d25428xxxxx:813cfbd30000xxxxx):
Limit=100.00 GB Reservation=0 ReservationLimit=64.46 GB OtherMemory=0
Total=0 Peak=0 Memory is likely oversubscribed. Reducing query
concurrency or configuring admission control may help avoid this
error.

mysqldump performance on machine with big amount of memory

I'm doing backup of innodb database with mysqldump. It takes 2 min to perform a backup.
Is there any way how to speed it up?
I have machine with 120GB of RAM and I expect that my DB should fit in memory.
Database size on hard drive is around 8 GB:
[user#host:E018 mysql]$ du -hs database
8.3G database
Biggest table has 12054861 records and data size 2991587328.
I have tried to play with innodb_buffer_pool_size but I don't see big performance increase.
If I run mysqldump for first time it takes 2 min 7 sec. If I try it second time it takes around 2 min that is to slow.
I have also tried to archive data to avoid a lot of disk writes:
mysqldump database |pigz > database-dmp.sql.gz that has no influence on performance.
Running mysqldump on different machine from mysql engine does not change anything.
Probably mysql does not cache data to the memory or it sends data to the mysqldump to slow.
Here is configuration that I use:
max_heap_table_size=10G;
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_strict_mode=1
innodb_write_io_threads=4
innodb_read_io_threads=4
innodb_adaptive_hash_index=0
innodb_support_xa=0
innodb_buffer_pool_size=40G
innodb_log_file_size=256M
innodb_log_files_in_group=3
innodb_log_buffer_size=64M
What else can I try to improve mysqldump performance?

Moving data from one table to another in Sql Server 2005

I am moving around 10 million data from one table to another in SQL Server 2005. The Purpose of Data transfer is to Offline the old data.
After some time it throws an error Description: "The LOG FILE FOR DATABASE 'tempdb' IS FULL.".
My tempdb and templog is placed in a drive (other than C drive) which has around 200 GB free. Also my tempdb size in database is set to 25 GB.
As per my understanding I will have to increase the size of tempdb from 25 GB to 50 GB and set the log file Auto growth portion to "unrestricted file growth (MB)".
Please let me know other factors and I cannot experiment much as I am working on Production database so can you please let me know if they changes will have some other impact.
Thanks in Advance.

You know the solution. Seems you are just moving part of data to make your queries faster.
I am agree with your solution
As per my understanding I will have to increase the size of tempdb from 25 GB to 50 GB and set the log file Auto growth portion to "unrestricted file growth (MB)".
Go ahead

My guess is that you're trying to move all of the data in a single batch; can you break it up into smaller batches, and commit fewer rows as you insert? Also, as noted in the comments, you may be able to set your destination database to SIMPLE or BULK-INSERT mode.

Why are you using Log file at all? Copy your data (Data and Logfile) then set the mode on SIMPLE and run the transfer again.

Concurrent Reads from Highly Transactional Table

Currently, I have a highly transactional database with appx 100,000 inserts daily. Do I need to be concerned if I start allowing a large number of concurrent reads from my main transaction table? I am not concerned about concurrency, so much as performance.
At present there are 110+ million transactions in this table, and I am using SQL 2005

In 2002, a dell server with 2 GB of RAM, and 1.3 GHz CPU served 25 concurrent users as a File Server, a Database Server, and ICR server (very CPU intensive). Users and ICR server continuously insert, read and update one data table with 80+ million records where each operation requires 25 to 50 insert or update statements. It worked like a charm for 24/7 for almost a year. If you use decent indexes, and your selects use these indexes, it will work.
As #huadianz proposed, a read-only copy will do even better.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Slow query using Pyhs2 to fetch data in Hive - hive

Related

postgres rds slow response time

running into an issue when running query on Impala

mysqldump performance on machine with big amount of memory

Moving data from one table to another in Sql Server 2005

Concurrent Reads from Highly Transactional Table

Categories

Resources