Slow pywinauto Import

Slow pywinauto Import - automation

When I import all most any module, it loads almost/seemingly instantly or at least fast enough to be unnoticeable.
However, there is an issue with PyWinAuto. When i try to import it, it takes a huge amount of time (~1min) which is highly annoying for the users.
I am wondering if there is anyway to be able to speed up the time it takes to load up the module.

If you use backend='uia' it looks impossible, because import comtypes and loading UIAutomationCore.dll should take the most time and it's functionally required.
But if you need only default backend='win32' (i.e. if you create Application() object without backend parameter), you can run pip uninstall -y comtypes. Only Win32 backend will be available after that. But the import should work much faster.
More details about these 2 backends could be found in the Getting Started Guide.

Related

keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time..."

I keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" warning message after I finished DASK code. I am using DASK doing a large seismic data computing. After the computing, I will write the computed data into disk. The writing to disk part takes much longer than computing. Before I wrote the data to the disk, I call client.close(), which I assume that I am done with DASK. But "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" keep coming. When I doing the computing, I got this warning message 3-4 times. But when I write the data to the disk, I got the warning every 1 sec. How can I get ride of this annoying warning? Thanks.

same was happening with me in the Colab where we start the session
client = Client(n_workers = 40, threads_per_worker = 2 )
I terminate all my Colab sessions and installed and imported all the Dask libs
!pip install dask
!pip install cloudpickle
!pip install 'dask[dataframe]'
!pip install 'dask[complete]'
from dask.distributed import Client
import dask.dataframe as dd
import dask.multiprocessing
Now everything is working fine and not facing any issues.
Don't know how this solved my issue :D

I had been struggling with this warning too. I would get many of these warnings and then the workers would die. I was getting them because I had some custom python functions for aggregating my data together that was handling large python objects (dict). It makes sense so much time was being spent of garbage collection if I was creating these large objects.
I refactored my code so more computation was being done in parallel before they were aggregated together and the warnings went.
I looked at the progress chart on the status page of dask dashboard to see which tasks were taking along time to process (Dask tries to name the tasks after the function in your code which called them so that can help, but they're not always that descriptive). From there I could figure out which part of my code I needed to optimise.

You can disable garbage collection in Python.
gc.disable()
I found that it was easier to manage Dask worker memory through periodic usages of the Dask client restart: Client.restart()

Just create a process to run Dask cluster and return the ip address. Create the client using that ip address.

Is it possible to use Datalab with multiprocessing as a way to scale Pandas transformations?

I try to use Google Cloud Datalab to scale up data transformations in Pandas.
On my machine, everything works fine with small files (keeping the first 100000 rows of my file), but working with the full 8G input csv file led to a Memoryerror.
I though that a Datalab VM would help me. I first tried to use a VM with Highmem, going to up to 120 G or memory.
There, I keep getting an error : The kernel appears to have died. It will restart automatically.
I found something here :
https://serverfault.com/questions/900052/datalab-crashing-despite-high-memory-and-cpu
But I am not using TensorFlow, so it didn't help much.
So I tried a different approach, chunk processing and parallelize on more cores. It works well on my machine (4-cores, 12 G ram), but still requires hours of computation.
So I wanted to use a Datalab VM with 32 cores to speed things up, but here after 5 hours, the first threads still didn't finish, when on my local machine already 10 are completed.
So very simply:
Is it possible to use Datalab as a way to scale Pandas transformations ?
Why do I get worst results with a theoretically much better VM than my local machine ?
Some code:
import pandas as pd
import numpy as np
from OOS_Case.create_features_v2 import process
from multiprocessing.dummy import Pool as ThreadPool
df_pb = pd.read_csv('---.csv')
list_df = []
for i in range(-) :
df = df_pb.loc[---]
list_df.append(df)
pool = ThreadPool(4)
pool.map(process, list_df)
All the operations in my process function are pure Pandas and Numpy operations
Thanks for any tip, alternative or best practice advice you could give me !

One year later, I learned some useful best practices:
Use Google AI Platform, select to build VM with required number of CPUs
Use Dask for multithreading
With Pandas, there is a possibility to parallelise the .apply(), with pandarallel

It seems that a GCP Datalab has not supported multithreading:
Each kernel is single threaded. Unless you are running multiple notebooks at the same time, multiple cores may not provide significant benefit.
More information you can find here

Run pip in IDLE

Sry for this stupid question. I am new for python and I am currently using IDLE for python programming. Is there anyway to hide the output generated by the command?
pip.main(['install', 'modulename'])
I was trying to install matplotlib by pip in idle, but then both the speed and idle itself got slower and slower and finally the process became endless. So I was thinking about can I enhance the speed a little bit by hiding the output.
I tried code like this:
import sys
import pip
import io
stdout_real = sys.stdout
sys.stdout = io.StringIO()
try:
pip.main(["install","matplotlib"])
finally:
stdout_real.write(sys.stdout.getvalue())
sys.stdout = stdout_real
The code is from [How to import/open numpy module to IDLE, but unfortunately it did not work.
I also tried to use the -q --quiet flag, but to be honest, I am struggling on how to use the flag for pip in IDLE. I tried code like:
pip.main(['-q'])
and
pip.main(['--quiet'])
None of them work.
Can anyone give some suggestion about this? Or some suggestion about enhancing the download speed?
Thanks so much!

How to resume loading in PhpMyAdmin?

I am using XAMPP and PHPMyAdmin and I'm trying to load English Wikipedia. Since the file is so big (1.7GB), it take a lot of time. I'm wondering if there is any way to resume the loading process. I have no problem with TimeOut or something like that. The problem is that if my firefox crashes for any reason, the process must start from the scratch.
The part which says allow interrupt is already checked with a check mark. But the problem is that for such a big file that I am loading, it's really difficult to expect to be done without any interrupt. If the laptop is shut down or restarted or so, the process is repeated from the beginning. Is there any way to solve this problem?
In the meantime, I am using
$cfg['UploadDir'] = 'upload';
and load the file from the upload directory on my computer.
Thanks in advance

First, I would recommend against using phpMyAdmin for such a large file. You're going to be constrained by PHP/Apache resource limits for things such as execution time and memory used (or, apparently, some Firefox resource on the client side), to a degree that even if it works properly will have to be done in so many small chunks that it's just not ideal. Even using the UploadDir functionality, you're going to be limited in ways that make it non-ideal to import your file this way. I suggest using the command-line tool for importing a file of this size.
Secondly, if you're going to use phpMyAdmin anyway, it's better to uncompress the file and deal with the raw .sql. This is not intuitive, because of course you think the smaller filesize is better, but phpMyAdmin has to first uncompress the compressed file before it can begin working with it, which can cause problems such as the resource limits (or even running out of disk space). phpMyAdmin can pick up an aborted import, but if you're spending 95% of the execution time uncompressing the file each time, you're going to make very, very slow progress. Actually, I wonder if you're even getting the full file uncompressed on execution before PHP kills the process due to timeout.
phpMyAdmin can pick up execution part way through; you can select which line to begin the import from. If you restart your computer part way through the export, you can use this means to resume your partial import.

Speeding up the Dojo Build

We are running a build of our application using Dojo 1.9 and the build itself is taking an inordinate amount of time to complete. Somewhere along the lines of 10-15 minutes.
Our application is not huge by any means. Maybe 150K LOC. Nothing fancy. Furthermore, when running this build locally using Node, it takes less than a minute.
However, we run the build on a RHEL server with plenty of space and memory, using Rhino. In addition, the tasks are invoked through Ant.
We also use Shrinksafe as the compression mechanism, which could also be the problem. It seems like Shrinksafe is compressing the entire Dojo library (which is enormous) each time the build runs, which seems silly.
Is there anything we can do to speed this up? Or anything we're doing wrong?

Yes, that is inordinate. I have never seen a build take so long, even on an Atom CPU.
In addition to the prior suggestion to use Node.js and not Rhino (by far the biggest killer of build performance), if all of your code has been correctly bundled into layers, you can set optimize to empty string (don’t optimize) and layerOptimize to "closure" (Closure Compiler) in your build profile so only the layers will be run through the optimizer.
Other than that, you should make sure that there isn’t something wrong with the system you are running the build on. (Build files are on NAS with a slow link? Busted CPU fan forcing CPUs to underclock? Ancient CPU with only a single core? Insufficient/bad RAM? Someone else decided to install a TF2 server on it and didn’t tell you?)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Slow pywinauto Import - automation

Related

keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time..."

Is it possible to use Datalab with multiprocessing as a way to scale Pandas transformations?

Run pip in IDLE

How to resume loading in PhpMyAdmin?

Speeding up the Dojo Build

Categories

Resources