Databricks considering files as directory - pandas

We are facing an issue on the Databrick filesyste that considers files as directory and we are unable to read files with Pandas. The files exist in the Azure Storage Explorer, and are considered as files as seen here :
We have mounted the storage with oAuth 2.0.
On Databricks,
%sh ls -al '<path_to_files>'
returns the following :
total 1127
drwxrwxrwx 2 root root 4096 Jan 29 09:26 .
drwxrwxrwx 2 root root 4096 Jan 9 13:47 ..
drwxrwxrwx 1 root root 136705 Jan 28 16:35 AAAA_2019-10-01_2019-12-27.csv
drwxrwxrwx 1 root root 183098 Jan 28 16:35 BBBB_2019-10-01_2019-12-27.csv
-rwxrwxrwx 1 root root 313120 Jan 28 16:35 CCCC_2019-10-01_2019-12-27.csv
-rwxrwxrwx 1 root root 212935 Jan 29 09:26 df_cube.csv
-rwxrwxrwx 1 root root 298228 Jan 29 09:26 df_other_cube.csv
​The thing is, the two first csv files are not directories at all. We can download them and read them as csv, but we cannot load them into a Pandas dataframe.
df = pd.read_csv(rootname_source_test + r'AAAA_2019-10-01_2019-12-27.csv',header=0,sep="|",engine='python')
>>> IsADirectoryError: [Errno 21] Is a directory: '/dbfs/mnt/<path>/AAA_2019-10-01_2019-12-27.csv'
They are generated the same way the 3rd csv is generated, and the 3rd on is loadable in pandas. Sometimes they appear as files, sometimes as directories and we are having trouble recreating and solving this consistently.
Cluster config : Runtime 6.2 ML (includes Apache Spark 2.4.4, Scala 2.11)
Any help will be very appreciated.

Related

meson doesn't find binary dependency

I compiled wayland from source code with this command
meson --buildtype=release -D prefix=$HOME/mylib -D documentation=false
then installed it with ninja. Now in $HOME/mylib I have this structure:
total 24K
drwxr-xr-x 6 myuser myuser 4.0K Dec 3 19:52 .
drwxr-xr-x 16 myuser myuser 4.0K Dec 4 17:41 ..
drwxr-xr-x 2 root root 4.0K Dec 3 19:52 bin
drwxr-xr-x 2 root root 4.0K Dec 3 19:52 include
drwxr-xr-x 3 root root 4.0K Dec 3 19:52 lib
drwxr-xr-x 4 root root 4.0K Dec 3 19:52 share
Inside bin folder I have wayland-scanner and when I run this command
wayland-scanner -v
I got this output:
wayland-scanner 1.21.90
Now when I build other source code with meson that has wayland-scanner as dependency I got this error:
../tests/meson.build:2:0: ERROR: Invalid version of dependency, need 'wayland-scanner' ['>=1.20.0'] found '1.18.0'.
This is related to another wayland-scanner that is placed here:
/usr/bin/wayland-scanner
with version 1.18.0. The command
echo $PATH
reply with this output:
/home/myuser/mylib/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games
Why meson doesn't find the updated version of wayland-scanner? Using PKG_CONFIG_PATH doesn't work, same error as above
Hi don't know the wayland package, but from description
I could think that /usr/bin/wayland-scanner is a link to the old installation,
Try to look in your environment for the wayland-scanner scanner binary to check if there is some link do not updated to the new installation.

Why does ag-grid code appears in my stenciljs component bundle?

I am trying to optimize a StencilJS component that uses ag-grid as third party dependency. Currently, rollup is adding ag-grid definitions to my bundle of code resulting in ~1Mb bundle size (archived).
Is there a way to achieve code separation in my bundle? or am I looking at things in the wrong way?
This component is used in a vue app alongside other 3rd party web component that also uses ag-grid. The ag-grid code will be duplicated in this scenario.
The component library is fairly simple with only 2 ts components:
$ ls -l src/components/product-picker/
total 30
-rw-r--r-- 1 197121 1692 Aug 8 10:35 readme.md
-rw-r--r-- 1 197121 41 Jul 9 13:38 product-filter.scss
-rw-r--r-- 1 197121 6521 Jul 19 09:34 product-filter.tsx
-rw-r--r-- 1 197121 371 Jun 25 11:51 product-picker.scss
-rw-r--r-- 1 197121 10089 Aug 8 10:21 product-picker.tsx
-rw-r--r-- 1 197121 1630 Jul 15 17:00 product-picker-grid.scss
drwxr-xr-x 1 197121 0 Jun 25 11:51 test/
This is how I import the ag-grid in my tsx component
import { ModuleRegistry } from '#ag-grid-community/core';
import { ClientSideRowModelModule } from '#ag-grid-community/client-side-row-model';
import { Grid, GridOptions } from '#ag-grid-community/core';
[...]
ModuleRegistry.registerModules([
ClientSideRowModelModule,
]);
#Component({
tag: 'product-picker',
styleUrl: 'product-picker.scss',
})
This is how the build files looks like:
$ ls -lh dist/esm/
total 2.2M
-rw-r--r-- 1 197121 1 Aug 8 10:35 index.js
-rw-r--r-- 1 197121 41K Aug 8 10:35 index-0f6f2d39.js
-rw-r--r-- 1 197121 931 Aug 8 10:35 loader.js
drwxr-xr-x 1 197121 0 Aug 8 10:35 polyfills/
-rw-r--r-- 1 197121 2.2M Aug 8 10:35 product-filter_2.entry.js
-rw-r--r-- 1 197121 947 Aug 8 10:35 product-picker.js
The product-filter_2.entry.js file has all the lines of code related to ag-grid.
Option 1
You could try using Typescript 3.8's type only imports :
That way, you get the type safety of TS, without including all of ag-grid in your Stencil bundle. This assumes that some other component on the page does include ag-grid in a way that makes it accessible to your Stencil component
Option 2
If you're using webpack you could consider marking ag-grid as an external library, not to be included in the generated bundle.
There's a similar question around not bundling the React runtime with every component, and instead adding it just once on the page with a script tag to reduce bundle sizes.
https://github.com/webpack/webpack/issues/1275#issuecomment-123846260
{
...
externals: {
// Use external version of React
"react": "React"
},
...
}
Option 3
You could not add ag-grid as a dependency to your Stencil project and instead rely on the existence of window.agGrid or something similar. Personally I dislike that option because at this point, you get no type safety in your Stencil project.

IntelliJ claims to build but jar files are not touched

I have a java project that builds correctly using mvn
># mvn package
[ok]
># ls -il target/app.java target/app/ap.jar target/docker-app/app.jar
4239421 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:55 target/docker-app/app.jar
4239422 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:55 target/app/app.jar
4239416 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:55 target/app.jar
change sources, build it again and the mtimes change
># mvn package
[ok]
># ls -il target/app.java target/app/ap.jar target/docker-app/app.jar
4239421 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/docker-app/app.jar
4239422 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/app/app.jar
4239416 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/app.jar
as expected. Also if I diff one of these jar files with a copy of an older one, it is different.
I import this project into IntelliJ IDEA and build
Build completed successfully with 3 warnings
however
># ls -il target/app.java target/app/ap.jar target/docker-app/app.jar
4239421 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/docker-app/app.jar
4239422 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/app/app.jar
4239416 -rw-r--r-- 1 me domain users 25305467 Apr 27 08:56 target/app.jar
the mtime has NOT changed, and diff reports that the files are identical to copies of the earlier versions.
Why is IDEA not producing new jar files?
Your question is very similar to this one, which I have already answered: intellj IDEA doesnt build jar properly
It helps understanding.
Well looked at the catches. (click zoom)
This uses nvmw local to the project, nothing prevents you from using your nvm version.
Namely intellij provided well, its own build construction system, to create jars without maven. (Even though I personally have not been able to set it up correctly for it to work for starting the application.)
But if you are looking to create a war, I can give you more information to create a war file ...
I use spring boot but the principle remains the same with all simple java projects

How to define the lua module path in nifi Processor ExecuteScript?

I am new to Lua.
Several days ago, I complished using python in ExecuteScript of NIFI. I set Python Module path to /usr/local/lib/python2.7/dist-packages. Everything goes well.
But doing the same thing with Lua is so difficult for me. Always Error of cannot find the Module!!!
I use Luarock to install modules.
Could you please tell me how to set the lua module path or some useful infomation about it?
Here comes some info of my Luarocks setting:
lbh#es-2:~/install/nifi-1.1.1/lua_modules$ luarocks list
Installed rocks:
----------------
lua-cjson
2.1.0-1 (installed) - /usr/local/lib/luarocks/rocks
luasocket
3.0rc1-2 (installed) - /usr/local/lib/luarocks/rocks
redis-lua
2.0.4-1 (installed) - /usr/local/lib/luarocks/rocks
An example of lua-cjson.
lbh#es-2:~/install/nifi-1.1.1/lua_modules$ luarocks show lua-cjson
lua-cjson 2.1.0-1 - A fast JSON encoding/parsing module
The Lua CJSON module provides JSON support for Lua. It features: - Fast,
standards compliant encoding/parsing routines - Full support for JSON with
UTF-8, including decoding surrogate pairs - Optional run-time support for
common exceptions to the JSON specification (infinity, NaN,..) - No
dependencies on other libraries
License: MIT
Homepage: http://www.kyne.com.au/~mark/software/lua-cjson.php
Installed in: /usr/local
Modules:
cjson
lua2json
json2lua
cjson.util
Directory info of /usr/local/lib/luarocks/rocks
lbh#es-2:~/install/nifi-1.1.1/lua_modules$ ls -l /usr/local/lib/luarocks/rocks
total 16
drwxr-xr-x 3 root root 4096 Mar 17 11:13 lua-cjson
drwxr-xr-x 3 root root 4096 Mar 17 11:18 luasocket
-rw-r--r-- 1 root root 3653 Mar 17 11:18 manifest
drwxr-xr-x 3 root root 4096 Mar 17 11:18 redis-lua

How to freeze graphs from checkpoint directory for inception-v3 model?

I am fine tuning inception-v3 model flowers using this: https://github.com/tensorflow/models/tree/master/inception
I checkpointed the result in a directory. But in the directory I see files like these:
-rw-r--r-- 1 root root 389908432 Mar 15 21:46 model.ckpt-0.data-00000-of-00001
-rw-r--r-- 1 root root 72680 Mar 15 21:46 model.ckpt-0.index
-rw-r--r-- 1 root root 15189794 Mar 15 21:47 model.ckpt-0.meta
-rw-r--r-- 1 root root 135185788 Mar 15 22:36 events.out.tfevents.1489594533.f7d5defbed64
-rw-r--r-- 1 root root 72680 Mar 15 22:37 model.ckpt-4999.index
-rw-r--r-- 1 root root 389908432 Mar 15 22:37 model.ckpt-4999.data-00000-of-00001
-rw-r--r-- 1 root root 15189794 Mar 15 22:38 model.ckpt-4999.meta
-rw-r--r-- 1 root root 130 Mar 15 22:49 checkpoint
whereas I need outputs in directory similar to this:
-rw-r----- 1 107456 5000 223 Mar 2 2016 README.txt
-rw-r----- 1 107456 5000 43 Mar 2 2016 checkpoint
-rw-r----- 1 107456 5000 434903494 Mar 15 2016 model.ckpt-157585
For that I need to do something like freezing, but freezing needs to provide output_node_names. Can anyone guide me, what will be the output_node_names for inception-v3?
Also, I need a reliable way to freeze. Is tensorflow freezer tool okay for this?
I found the answer eventually.
One-line answer should be to use freezer.py available in upstream Tensorflow codebase. See the example on how to use that program from tests.
You may check the following link for sample:
https://gist.githubusercontent.com/morgangiraud/249505f540a5e53a48b0c1a869d370bf/raw/6cb0b4d497925517316a92f935ce5dccb6aafd17/medium-tffreeze-1.py