Best way to generate data to fill USB memory? - usb

I need to fill a USB memory and I want others to be able to repeat this in an easy way. SO I dont want to write "find a file that filles the memory" so they have to look around for such a file.
Rather I want to generate X MB of data and write that to a file that can then be transferrred to the USB stick.
How would you do that (on Windows)?

If you are on Windows, you can use fsutil:
fsutil file createnew D:\fatboy.tmp SIZE
If you are on Linux or OSX or somesuch, you can use mkfile to make a big file:
mkfile SIZE PathToFileOnUSB
e.g.
mkfile 10m /some/where/on/USB/10MegFile
Or, if your system lacks mkfile, use dd to fill storage quickly and easily.
So, if you want 10MB, use:
dd if=/dev/zero of=PathToFileOnUSB bs=1m count=10
That says... dump data, reading from /dev/zero (which supplies an endless sream of zeroes) and writing to the file called PathToFileOnUSB using a blocksize (bs) of 1 megabyte, and do this 10 times (cnt).
If you want X MB, use:
dd if=/dev/zero of=PathToFileOnUSB bs=1m count=X
If you want to fill the device, write until error without specifying a count:
dd if=/dev/zero of=PathToFileOnUSB bs=1m

Related

Reading and handling many small CSV-s to concatenate one large Dataframe

I have two folders each contains about 8,000 small csv files. One with an aggregated size of around 2GB and another with aggregated size of around 200GB.
These files are stored like this to better update them in a daily basis. However, when I conduct EDA, I would like them to be assigned to a single variable. For example.
path = "some random path"
df = pd.concat([pd.read_csv(f"{path}//{files}") for files in os.listdir(path)])
It would take much less time for me to read the dataset with 2GB in total size than reading it on the super computer cluster. And it is impossible to read the 200GB dataset on the local machine unless using some sort of scaling Pandas solutions. The situation does not seem to improve on the cluster even using the popular open-source tools like Dask and Modin.
Is there an effective way that enables to read those csv files effectively with given situation?
Q :"Is there an effective way that enables to read those csv files effectively ... ?"
A :Oh, sure, there is :
CSV format ( standard attempts in RFC4180 ) is not unambiguous and is not obeyed under all circumstances ( commas inside fields, header present or not ), so some caution & care is needed here. Given you are your own data curator, you shall be able to decide plausible steps for handling your own data properly.
So, the as-is state is :
# in <_folder_1_>
:::::::: # 8000 CSV-files ~ 2GB in total
||||||||||||||||||||||||||||||||||||||||||| # 8000 CSV-files ~ 200GB in total
# in <_folder_2_>
Speaking efficiency, O/S coreutils provide the best, stable, proven and most efficient (as system tool used to be since ever ) tools for the phase of merging thousands and thousands of plain CSV-files' content :
###################### if need be,
###################### use an in-place remove of all CSV-file headers first :
for F in $( ls *.csv ); do sed -i '1d' $F; done
this helps for case we cannot avoid headers on the CSV-exporter side. Works like this :
(base):~$ cat ?.csv
HEADER
1
2
3
HEADER
4
5
6
HEADER
7
8
9
(base):~$ for i in $( ls ?.csv ); do sed -i '1d' $i; done
(base):~$ cat ?.csv
1
2
3
4
5
6
7
8
9
Now, the merging phase :
###################### join
cat *.csv > __all_CSVs_JOINED.csv
Given the nature of the said file storage policy, performance can be boosted by using more processes for independent taking small files and large files separately, as defined above, having put the logic inside a pair of conversion_script_?.sh shell-scripts :
parallel --jobs 2 conversion_script_{1}.sh ::: $( seq -f "%1g" 1 2 )
As the transformation is a "just"-[CONCURRENT] flow of processing for a sake of removing the CSV-headers, but a pure-[SERIAL] ( for larger number of files, there might become interesting to use a multi-staged tree of trees - using several stages of [SERIAL]-collections of [CONCURRENT]-ly pre-processed leaves, yet for just 8000 files, not knowing the actual file-system details, the latency-masking from a just-[CONCURRENT] processing both of the directories just independently will be fine to start with )
Last but not least, the final pair of ___all_CSVs_JOINED.csv are safe to get opened using in a way, that prevents moving all disk-stored date into RAM at once ( using chunk-size-fused file-reading-iterator, avoiding RAM-spillovers by using mmaped-mode as a context manager ) :
with pandas.read_csv( "<_folder_1_>//___all_CSVs_JOINED.csv",
sep = NoDefault.no_default,
delimiter = None,
...
chunksize = SAFE_CHUNK_SIZE,
...
memory_map = True,
...
) \
as df_reader_MMAPer_CtxMGR:
...
When tweaking for ultimate performance, details matter and depend on physical hardware bottlenecks ( disk-I/O-wise, filesystem-wise, RAM-I/O-wise ), so due care may take further improvement for minimising the repetitive performed end-to-end processing times ( sometimes even turning data into a compressed/zipped form, in cases, where CPU/RAM resources permit sufficient performance advantages over limited performance of disk-I/O throughput - moving less bytes is so faster, that CPU/RAM-decompression costs are still lower, than moving 200+ [GB]s of uncompressed plain text data.
Details matter,tweak options,benchmark,tweak options,benchmark,tweak options,benchmark
would be nice to post your progress on testing the performanceend-2-end duration of strategy ... [s] AS-IS nowend-2-end duration of strategy ... [s] with parallel --jobs 2 ...end-2-end duration of strategy ... [s] with parallel --jobs 4 ...end-2-end duration of strategy ... [s] with parallel --jobs N ... + compression ... keep us posted

Why is df command showing incorrect size?

I am using a 64 MB QSPI formatted in some UBI partitions.
df is an applet of busybox 1.27.2
Actually,
~ # df -h
Filesystem Size Used Available Use% Mounted on
/dev/ubi0_0 3.1T 1.9T 1.2T 63% /
/dev/ubi1_0 1.6T 21.8G 1.5T 1% /conf
But, obviously, the size cannot be that! Anyway, the use % seems to be correct, for the files contained in the partitions weight few MB.
How do you explain that?
I have been able to fix the issue.
Busybox 1.28.0 commit d1535216 substitutes use of statfs with statvfs (https://github.com/mirror/busybox/commit/d1535216ca27047e3962d61b975bd2a638aa45a2).
I applied the commit to my project using Busybox 1.27.2 and, now, sizes are correct!
Thanks anyway.

Why received ZFS dataset uses less space than original?

I have a dataset on the server1 that I want to back up to the second server2.
Server1 (original):
zfs list -o name,used,avail,refer,creation,usedds,usedsnap,origin,compression,compressratio,refcompressratio,mounted,atime,lused storage/iscsi/webhost-old produces:
NAME USED AVAIL REFER CREATION USEDDS USEDSNAP ORIGIN COMPRESS RATIO REFRATIO MOUNTED ATIME LUSED
storage/iscsi/webhost-old 67,8G 1,87T 67,8G Út kvě 31 6:54 2016 67,8G 16K - lz4 1.00x 1.00x - - 67,4G
Sending volume to the 2nd server:
zfs send storage/iscsi/webhost-old | pv | ssh -c arcfour,aes128-gcm#openssh.com root#10.0.0.2 zfs receive -Fduv pool/bkp-storage
received 69,6GB stream in 378 seconds (189MB/sec)
Server2 zfs list produces:
NAME USED AVAIL REFER CREATION USEDDS USEDSNAP ORIGIN COMPRESS RATIO REFRATIO MOUNTED ATIME LUSED
pool/bkp-storage/iscsi/webhost-old 36,1G 3,01T 36,1G Pá pro 29 10:25 2017 36,1G 0 - lz4 1.15x 1.15x - - 28,4G
Why is there such a difference in sizes? Thanks.
From what you posted, I noticed 3 things that seemed odd:
the compressratio is 1.15x on system 2, but 1.00x on system 1
on system 2, used is 1.27x higher than logicalused
the logicalused and the number zfs receive report are ~2.3x higher on system 1 than system 2
These terms are all defined in the man page, but are still confusing to reverse-engineer explanations for in practice.
(1) could happen if you enabled compression on the source dataset after you wrote all the data to it, since ZFS doesn't rewrite the data to compress it when you enable that setting. The data sent by zfs send is uncompressed unless you use -c, but system 2 will try to compress it as it runs zfs receive if the setting is enabled on the destination dataset. If both system 1 and system 2 had the same compression settings before the data was written, they would have the same compressratio as well.
(2) can happen due to metadata written along with your data, but in this case it's too high for "normal" metadata, which accounts for 1-2% of most pools. It's probably caused by a pool-wide setting, like configuring RAID-Z, or a weird combination of striping and mirroring (like 4 stripes, but with one of them being a mirror).
For (3), I re-read the man page to try to figure it out:
logicalused
The amount of space that is "logically" consumed by this dataset and
all its descendents. See the used property. The logical space
ignores the effect of the compression and copies properties, giving a
quantity closer to the amount of data that applications see.
If you were sending a dataset (instead of a single iSCSI volume) and the send size matched system 2's logicalused value (instead of system 1's), I would guess you forgot to send some child datasets (i.e. by using zfs send -R). However, neither of those are true in this case.
I had to do some additional digging -- this blog post from 2005 might contain the explanation. If system 1 didn't have compression enabled when the data was written (like I guessed above for (1)), the function responsible for not writing zeroed-out blocks (zio_compress_data) would not be run, so you probably have a bunch of empty blocks written to disk, and accounted for in the logicalused size. However, since lz4 is configured on system 2, it would run there, and those blocks would not be counted.

perl gunzip to buffer and gunzip to file have different byte orders

I'm using Perl v5.22.1, Storable 2.53_01, and IO::Uncompress::Gunzip 2.068.
I want to use Perl to gunzip a Storable file in memory, without using an intermediate file.
I have a variable $zip_file = '/some/storable.gz' that points to this zipped file.
If I gunzip directly to a file, this works fine, and %root is correctly set to the Storable hash.
gunzip($zip_file, '/home/myusername/Programming/unzipped');
my %root = %{retrieve('/home/myusername/Programming/unzipped')};
However if I gunzip into memory like this:
my $file;
gunzip($zip_file, \$file);
my %root = %{thaw($file)};
I get the error
Storable binary image v56.115 more recent than I am (v2.10)`
so the Storable's magic number has been butchered: it should never be that high.
However, the strings in the unzipped buffer are still correct; the buffer starts with pst which is the correct Storable header. It only seems to be multi-byte variables like integers which are being broken.
Does this have something to do with byte ordering, such that writing to a file works one way while writing to a file buffer works in another? How can I gunzip to a buffer without it ruining my integers?
That's not related to unzip but to using retrieve vs. thaw. They both expect different input, i.e. thaw expect the output from freeze while retrieve expects the output from store.
This can be verified with a simple test:
$ perl -MStorable -e 'my $x = {}; store($x,q[file.store])'
$ perl -MStorable=freeze -e 'my $x = {}; print freeze($x)' > file.freeze
On my machine this gives 24 bytes for the file created by store and 20 bytes for freeze. If I remove the leading 4 bytes from file.store the file is equivalent to file.freeze, i.e. store just added a 4 byte header. Thus you might try to uncompress the file in memory, remove the leading 4 bytes and run thaw on the rest.

copying kernel and uboot into sdcard

I have a Freescale I.MX ARM board for which I am preparing the bootloader, kernel and Root filesystem on the sdcard.
I am a little confused about the order in which I partition and copy my files into sdcard. Let us say I have an empty sdcard 4GB size. I used gparted to first parition it into:
Firts partition 400 MB size as FAT32 system. this is my boot partition
Second partition is the rest of the card as ext3. This is my root file system partition.
Let us say my sdcard is under /dev/sdb.
Now I have seen many documents differing slightly in the way of copying the boot files.
Which is the right way?
Method 1:
(without mounting sdb partitions:
sudo dd if=u-boot.bin of=/dev/sdb bs=512 seek=2
sudo dd if=uImage of=/dev/sdb bs=512 seek=2
Mount sdb2 for copying rootfs:
mount /dev/sdb2 /mnt/rootfs
copy rootfs:
tar -xf tarfile /mnt/rootfs
Method 2:
Mount sdb1 boot partition:
mount /dev/sdb1 /mnt/boot
copy uboot and kernel:
cp u-boot.bin /mnt/boot/
cp uImage /mnt/boot/
Then copy rootfs as above!
Which is the correct one. I tried two but the sddcard is not even booting.
When I tried method 1, the card boot up until it says the rootfs is not found in the partition. I removed the card and inserted and found that the first fat 32 partition is somehow 'destroyed' as it says 'unallocated' on gparted.
Please help.
You need to mark first partition as bootable.
Check your first partition details in gparted or disk utility.
From disk utility you cab mark a partition bootable. by selecting specific partition and going into 'more action' option --> 'edit partition type'.
below is script to flash binaries onto SD card for my
Arndale OCTA board. You can see the placement of bootloader binaries:
BL1
dd iflag=dsync oflag=dsync if=arndale_octa.bl1.bin of=/dev/sde bs=512 seek=1
BL2
dd iflag=dsync oflag=dsync if=../arndale_octa.bl2.bin of=/dev/sde bs=512 seek=31
uboot
dd iflag=dsync oflag=dsync if=u-boot.bin of=/dev/sde bs=512 seek=63
kernel and trust software , ....
Please notes:
1) The partition table is at SDcard offset 0 (seek 0), then you have to run:fdisk /dev/sde
and create paratiions that does not overlapped with blocks ocppuied by kernel or trust software.
2) add the "dsync" option in dd command to gaurantee every written data is immediately flushed into SD card
In the most of the cases, imx processor requires bootloader at 0x400 offset. So whatever you are doing for u-boot is correct, you need to use dd command for that.
sudo dd if=u-boot.bin of=/dev/sdb bs=512 seek=2
While partitioning the sd card, Make sure that you are keeping enough room for u-boot image. So start your 1st bootable partition at let's say 1 MB offset.
You can simply copy your uImage and environment variables (uEnv.txt or boot.scr) through cp command.
For rootfs also, You can follow the same steps as kernel.