Output progress over time in hashcat - passwords

I am analysing the amount of hashes cracked over a set period of time.
I am looking to save the current status of the crack every 10 seconds.
'''
Recovered........: 132659/296112 (44.80%) Digests, 0/1 (0.00%) Salts
Recovered/Time...: CUR:3636,N/A,N/A AVG:141703,8502198,204052756 (Min,Hour,Day)
Progress.........: 15287255040/768199139595 (1.99%)
'''
I want these 3 lines of the status saved every 10 seconds or so.
Is it possible to do this within hashcat or will I need to make a separate script in python?

Getting the status every 10 seconds
You can enable printing the status with --status and you can set the status to prints every X seconds with --status-timer X. You can see these command line arguments on the hashcat options wiki page, or hashcat --help.
Example: hashcat -a 0 -m 0 example.hash example.dict --status --status-timer 10
Saving all the statuses
I'm assuming that you just want to save everything that gets printed by hashcat while it's running. An easy way to do this is just copy everything from stdout into a file. This is a popular s/o question, so we'll just use this answer.
To be safe, let's use -a which appends to the file, so we don't accidentally overwrite previous runs. All we need to do is put | tee -a file.txt after our hashcat call.
Solution
Give this a shot, it should save all the statuses (and everything else from stdout) to output.txt:
hashcat -a A -m M hashes.txt dictionary.txt --status --status-timer 10 | tee -a output.txt
Just swap out A, M, hashes.txt, and dictionary.txt with the arguments you're using.
If you need help getting just the "Recovered" lines from this output file, or if this doesn't work on your computer (I'm on OSX), let me know in a comment.

In addition to Andrew Zick's answer, note that for machine-readable status, hashcat has native support for machine-readable output - see the --machine-readable option. This produces tab-separated output like so:
STATUS 5 SPEED 111792 1000 EXEC_RUNTIME 0.007486 CURKU 1 PROGRESS 62 62 RECHASH 0 1 RECSALT 0 1 REJECTED 0 UTIL -1
STATUS 5 SPEED 14247323 1000 EXEC_RUNTIME 0.038953 CURKU 36 PROGRESS 2232 2232 RECHASH 0 1 RECSALT 0 1 REJECTED 0 UTIL -1
STATUS 5 SPEED 36929864 1000 EXEC_RUNTIME 1.661804 CURKU 1296 PROGRESS 80352 80352 RECHASH 0 1 RECSALT 0 1 REJECTED 0 UTIL -1
STATUS 5 SPEED 66538858 1000 EXEC_RUNTIME 3.237319 CURKU 46656 PROGRESS 28926722892672 RECHASH 0 1 RECSALT 0 1 REJECTED 0 UTIL -1
STATUS 5 SPEED 63562975 1000 EXEC_RUNTIME 3.480536 CURKU 1679616 PROGRESS 104136192 104136192 RECHASH 0 1 RECSALT 0 1 REJECTED 0 UTIL -1
... which is exactly what tools like Hashtopolis use to provide a front-end to hashcat output.
For machine-readable output, the options --outfile, and --outfile-format are available. See the Format section of the output of hashcat --help for the options to --outfile-format:
- [ Outfile Formats ] -
# | Format
===+========
1 | hash[:salt]
2 | plain
3 | hex_plain
4 | crack_pos
5 | timestamp absolute
6 | timestamp relative

Related

Media and Data Integrity Errors

I was wondering if anyone can tell me what these mean. From most people posting about them, there is no more than double digits. However, I have 1051556645921812989870080 Media and Data Integrity Errors on my SK hynix PC711 on my new HP dev one. Thanks!
Here's my entire smartctl output
`smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SK hynix PC711 HFS001TDE9X073N
Serial Number: KDB3N511010503A37
Firmware Version: HPS0
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 00254f98f1
Local Time is: Wed Nov 9 13:58:37 2022 EST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.3000W - - 0 0 0 0 5 5
1 + 2.4000W - - 1 1 1 1 30 30
2 + 1.9000W - - 2 2 2 2 100 100
3 - 0.0500W - - 3 3 3 3 1000 1000
4 - 0.0040W - - 3 3 3 3 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 34 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 13,162,025 [6.73 TB]
Data Units Written: 3,846,954 [1.96 TB]
Host Read Commands: 156,458,059
Host Write Commands: 128,658,566
Controller Busy Time: 116
Power Cycles: 273
Power On Hours: 126
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 1051556645921812989870080
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 34 Celsius
Temperature Sensor 2: 36 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged`
Encountered a similar SMART reading from the same model.
I'm seeing a reported Media and Data Integrity Errors rate of a value that's over 2 ^ 84.
It could just be an error with its SMART implementation or the utility reading from it.
Converting your reported value of 1051556645921812989870080 to hex, we get 0xdead0000000000000000 big endian and 0x0000000000000000adde little endian.
Similarly, when I convert my value to hex, I get 0xffff0000000000000000 big endian and 0x0000000000000000ffff little endian, where f is just denotes a value other than 0.
I'm going to assume that the Media and Data Integrity Errors value has no actual meaning with regard to real errors. I doubt that both of us would have values that are padded with 16 0's when converted to hex. Something is sending/receiving/parsing bad data.
If you poke around the other reported SMART values in your post, and on my end, some of them don't seem to make much sense, either.

Target ID duplicate - Beegfs

Check if you can help me.
We have an old BeeGFS install running version 7.1.5 on EL7 and one of the TargetIDs gone offline (without replacing). After it came back buddy mirror entered in a failed state that we can’t recover.
If we try to change the Target back to online it fails:
[root#headnode beegfs]# beegfs-ctl --nodetype=storage --setstate --state=good --force --targetid=13
Node did not accept state change. Error: Unknown storage target
The state shows as this:
root#headnode ~]# beegfs-ctl --listtargets --nodetype=storage --state
TargetID Reachability Consistency NodeID
======== ============ =========== ======
1 Online Good 1
2 Online Good 2
3 Online Good 3
4 Online Good 4
5 Online Good 5
6 Online Good 6
7 Online Good 7
8 Online Good 8
9 Online Good 9
10 Online Good 10
11 Online Good 11
12 Online Good 12
13 Offline Good 13
14 Online Good 14
16 Online Good 13
Please note that a new TargetID numbered as 16 appeared where it should be 13.I tried to swap it back to 13 but I was unable to.
[root#headnode.mintrop.usp.br ~]# beegfs-ctl --removetarget 13
Given target is part of a buddy mirror group. Aborting.
[root#n13 ~]# beegfs-ctl --removemirrorgroup --mirrorgroupid=7 --nodetype=storage --dry-run
Could not remove buddy group: Communication error
I think we are doing something wrong, because of the buddy mirror setup that sometimes is difficult.
Any help is greatly appreciated.
Thank you.
PS: For completude, the checks seems to be fine:
[root#headnode.mintrop.usp.br ~]# beegfs-df
METADATA SERVERS:
TargetID Cap. Pool Total Free % ITotal IFree %
======== ========= ===== ==== = ====== ===== =
1 normal 218.2GiB 66.9GiB 31% 109.2M 107.8M 99%
STORAGE TARGETS:
TargetID Cap. Pool Total Free % ITotal IFree %
======== ========= ===== ==== = ====== ===== =
[ERROR from beegfs-storage n13.mintrop.usp.br [ID: 13]: Unknown storage target]
13 emergency 0.0GiB 0.0GiB 0% 0.0M 0.0M 0%
Solution found: Problem was in the node that was using different inputs than the headnode was seeing. The headnode sees the file below, which corresponds to each node in ascending order (n01, n02...n14):
[root#headnode ~]# cat /data1/beegfs/mgmtd/targetNumIDs
0-5E3B6573-1=1
0-5E3B6592-2=2
0-5E3B65B2-3=3
0-5E3B65D1-4=4
0-5E3B65F1-5=5
0-5E3B6610-6=6
0-5E3B6630-7=7
0-5E3B664F-8=8
0-5E3B666E-9=9
0-5E3B6690-A=A
0-5E3B66B1-B=B
0-5E3B66D2-C=C
0-5E3B66F3-D=D
0-5E3B6714-E=E
0-626C29BD-D=F
0-62853797-D=10
In the n13 file /data1/beegfs/storage/targetID was the corresponding number in tenth 0-62853797-D=10. If you do the calculation this corresponds to 16 in decimal:
[root#headnode~]# echo "obase=16; 16" | bc
10
So the solution was to change the targetID to the hexadecimal corresponding to the number 13:
[root#headnode~]# echo "obase=16; 13" | bc
D
This inside hn's /data1/beegfs/mgmtd/targetNumIDs file corresponds to 0-5E3B66F3-D=D. So two changes were made to n13. Inside the targetNumID and targetID files that had 16 and 0-62853797-D=10 respectively were replaced by:
[root#n13 ~]# cat /data1/beegfs/storage/targetNumID
13
[root#n13 ~]# cat /data1/beegfs/storage/targetID
0-5E3B66F3-D
Once this is done, restart the beegfs-storage services beegfs-meta.
root#headnode~]# beegfs-ctl --listtargets --nodetype=storage --state
TargetID Reachability Consistency NodeID
======== ============ =========== ======
1 Online Good 1
2 Online Good 2
3 Online Good 3
4 Online Good 4
5 Online Good 5
6 Online Good 6
7 Online Good 7
8 Online Good 8
9 Online Good 9
10 Online Good 10
11 Online Good 11
12 Online Good 12
13 Online Good 13
14 Online Good 14
Best regards
Jaqueline

TEZ mapper resource request

We recently migrated from MapReduce to TEZ for executing Hive queries on EMR. We are seeing cases where for the exact hive query launches very different number of mappers. See Map 3 phase below. On the first run it requested for 305 resources and on another run it requested for 4534 mappers. ( Please ignore the KILLED status because I manually killed the query.) Why does this happen ? How can we change it to be based on underlying data size instead ?
Run 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container KILLED 5 0 0 5 0 0
Map 3 container KILLED 305 0 0 305 0 0
Map 5 container KILLED 16 0 0 16 0 0
Map 6 container KILLED 1 0 0 1 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/06 [>>--------------------------] 0% ELAPSED TIME: 14.16 s
----------------------------------------------------------------------------------------------
Run 2
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 5 5 0 0 0 0
Map 3 container KILLED 4534 0 0 4534 0 0
Map 5 .......... container SUCCEEDED 325 325 0 0 0 0
Map 6 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/06 [=>>-------------------------] 5% ELAPSED TIME: 527.16 s
----------------------------------------------------------------------------------------------
This article explains the process in which Tez allocates resources. https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
If Tez grouping is enabled for the splits, then a generic grouping
logic is run on these splits to group them into larger splits. The
idea is to strike a balance between how parallel the processing is and
how much work is being done in each parallel process.
First, Tez tries to find out the resource availability in the cluster for these tasks. For that, YARN provides a headroom value (and
in future other attributes may be used). Lets say this value is T.
Next, Tez divides T with the resource per task (say M) to find out how many tasks can run in parallel at one (ie in a single wave). W =
T/M.
Next W is multiplied by a wave factor (from configuration - tez.grouping.split-waves) to determine the number of tasks to be used.
Lets say this value is N.
If there are a total of X splits (input shards) and N tasks then this would group X/N splits per task. Tez then estimates the size of
data per task based on the number of splits per task.
If this value is between tez.grouping.max-size & tez.grouping.min-size then N is accepted as the number of tasks. If
not, then N is adjusted to bring the data per task in line with the
max/min depending on which threshold was crossed.
For experimental purposes tez.grouping.split-count can be set in configuration to specify the desired number of groups. If this config
is specified then the above logic is ignored and Tez tries to group
splits into the specified number of groups. This is best effort.
After this the grouping algorithm is executed. It groups splits by node locality, then rack locality, while respecting the group size
limits.

Equal loading for parallel task distribution

I have a large number of independent tasks I would like to run, and I would like to distribute them on a parallel system such that each processor does the same amount of work, and maximizes my efficiency.
I would like to know if there is a general approach to finding a solution to this problem, or possibly just a good solution to my exact problem.
I have T=150 tasks I would like to run, and the time each task will take is t=T. That is, task1 takes 1 one unit of time, task2 takes 2 units of time... task150 takes 150 units of time. Assuming I have n=12 processors, what is the best way to divide the work load between workers, assuming the time it takes to begin and clean up tasks is negligible?
Despite my initial enthusiasm for #HighPerformanceMark's ingenious approach, I decided to actually benchmark this using GNU Parallel with -j 12 to use 12 cores and simulated 1 unit of work with 1 second of sleep.
First I generated a list of the jobs as suggested with:
paste <(seq 1 72) <(seq 150 -1 79)
That looks like this:
1 150
2 149
3 148
...
...
71 80
72 79
Then I pass the list into GNU Parallel and pick up the remaining 6 jobs at the end in parallel:
paste <(seq 1 72) <(seq 150 -1 79) | parallel -k -j 12 --colsep '\t' 'sleep {1} ; sleep {2}'
sleep 73 &
sleep 74 &
sleep 75 &
sleep 76 &
sleep 77 &
sleep 78 &
wait
That runs in 16 mins 24 seconds.
Then I used my somewhat simpler approach, which is just to run big jobs first so you are unlikely to be left with any big ones at the end and thereby get imbalance in CPU load because just one big job needs to run and the rest of your CPUs have nothing to do:
time parallel -j 12 sleep {} ::: $(seq 150 -1 1)
And that runs in 15 minutes 48 seconds, so it is actually faster.
I think the problem with the other approach is that after the first 6 rounds of 12 pairs of jobs, there are 6 jobs left the longest of which takes 78 seconds, so effectively 6 CPUs sit there doing nothing for 78 seconds. If the number of tasks was divisible by the number of CPUs, that would not occur but 150 doesn't divide by 12.
The solution I came to was similar to those mentioned above. Here is the pseudo-code if anyone is interested:
N_proc = 12.0
Jobs = range(1,151)
SerialTime = sum(Jobs)
AverageTime = SerialTime / N_proc
while Jobs remaining:
for proc in range(0,N_proc):
if sum(proc) < AverageTime:
diff = AverageTime - sum(proc)
proc.append( max( Jobs <= diff ) )
Jobs.pop( max( Jobs <= diff ) )
else:
proc.append( min(Jobs) )
Jobs.pop( min(Jobs) )
This seemed to be the optimal method for me. I tried it on many different distributions of job run-times, and it seems to do a decent job of evenly distributing the work, so long as N_proc << N_jobs.
This is a slight modification from largest first, in that each processor first tries to avoid doing more than it's "fair share". If it must go over it's fair share, then it will attempt to stay near the fair answer by grabbing the smallest remaining task from the queue.

optaplanner vrp file with road time and time window

I am trying to create VRP file which defines a problem with time window and distance in seconds. I currently do not need capacity (can I turn it off?)
this is my file :
NAME: almirs-test
COMMENT: Generated for OptaPlanner Examples
TYPE: CVRPTW
DIMENSION: 2
EDGE_WEIGHT_TYPE: EXPLICIT
EDGE_WEIGHT_FORMAT: FULL_MATRIX
EDGE_WEIGHT_UNIT_OF_MEASUREMENT: sec
CAPACITY: 125
NODE_COORD_SECTION
0 0 0 BRUSSEL
55 1 1 ANTHISNES
EDGE_WEIGHT_SECTION
0.0 1
1 0.0
DEMAND_SECTION
0 0 0 100 0
55 1 0 10 1
DEPOT_SECTION
0
-1
EOF
it is corcectly parsed, and I see locations on screen, but when I try to solve it I get message : "Not feasible"
org.optaplanner.examples.vehiclerouting.solver/arrivalAfterDueTime/level0/[ANTHISNES]=-990
any idea what am I doing wrong? any samples where I can see how it is done?
thanks
almir