Replicate the right justification that ls does when using ls twice - formatting

If I have two files, tmp.txt and tmp.pdf, one large and one small, when I type ls -al tmp.* everything is nicely right justified and I get this output
-rw-r--r-- 1 simon simon 615316 Oct 17 20:55 tmp.pdf
-rw-r--r-- 1 simon simon 0 Oct 17 20:55 tmp.txt
For a reason that doesn't matter, I want to be able to write the output of two separate ls -al commands to a file, then cat the file and obtain the same output. But of course, if I do this:
ls -al tmp.txt > foo
ls -al tmp.pdf >> foo
and then cat foo, I get this
-rw-r--r-- 1 simon simon 0 Oct 17 20:55 tmp.txt
-rw-r--r-- 1 simon simon 615316 Oct 17 20:55 tmp.pdf
Is there a way of mimicking the justified output that ls -al produces? Obviously, I can use wc -c tmp.pdf etc to figure out which output is largest, but how would I translate that information into code that would put the requisite number of spaces before the 0 in the first line? Thanks very much for any suggestions.

Yes. Just use column
$ (ll tmp.small; ll tmp.big) | column -t
-rw-r--r-- 1 myuser myuser 0 Oct 18 11:24 tmp.small
-rw-r--r-- 1 myuser myuser 113616 Oct 18 11:42 tmp.big
However if your file names contain spaces then those names won't be printed correctly. If your column supports the -l option then the simplest fix is to change to column -t -l 9 to limit the total number of columns to 9. Another workaround is to use stat to simulate ls output
$ (
> stat --printf="%A\t%h\t%U\t%G\t%s\t%y\t%n\n" "tmp with space.txt"
> stat --printf="%A\t%h\t%U\t%G\t%s\t%y\t%n\n" tmp.small
> stat --printf="%A\t%h\t%U\t%G\t%s\t%y\t%n\n" tmp.big
> ) | column -t -s $'\t'
-rw-r--r-- 1 myuser myuser 1307 2020-10-18 12:08:45.360000000 +0700 tmp with space.txt
-rw-r--r-- 1 myuser myuser 0 2020-10-18 11:24:21.650000000 +0700 tmp.small
-rw-r--r-- 1 myuser myuser 113616 2020-10-18 11:42:04.150000000 +0700 tmp.big
Files with tabs or newlines in their names still don't work though. You may try to change the delimiter to null with stat --printf="%A\0%h\0%U\0%G\0%s\0%Y\0%n\n" tmp* | column -t -s $'\0' -l 9 but somehow my column doesn't recognize \0 as the delimiter. Not sure about other columns versions, just try it on your PC and see

Related

AWK Assignment and execute operation with variables

I would like to find out how to assign and execute an operation with the value variable.
Suppose that I get these files as a result of ls -A *.pdf | grep -v '^d':
firstOne.pdf
ordenSiq.pdf
Now I'm trying to execute any operation later of assignment, an example:
% ls -lAh ordenSiq.pdf
-rw-r--r--# 1 joseluisbz staff 47K Jun 29 15:35 ordenSiq.pdf
Here my attempt (but is not working!)
awk -v thelast="$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}')" 'BEGIN {ls -lAh thelast;}'
EDIT
Obtaining The Last File With Extension!
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); awk -v result=$thelast 'BEGIN{print result}' Otutput: ordenSiq.pdf
Extracting Only The Name (split)
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); awk -v result="$thename" 'BEGIN{print result}' Output:ordenSiq
ALL Extensions For name (concatenation)
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); allexts=$(echo ${thename}'.*'); awk -v result="$allnames" 'BEGIN{print result}' Output:ordenSiq.*
or
thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); awk -v allexts=${thename}".*" 'BEGIN{print allexts}'
Execution of command with variable
I would like to obtain something like:
% ls -lAh ordenSiq.*
-rw-r--r-- 1 joseluisbz staff 0B Jul 15 12:34 ordenSiq.abc
-rw-r--r-- 1 joseluisbz staff 0B Jul 15 12:34 ordenSiq.def
-rw-r--r--# 1 joseluisbz staff 47K Jun 29 15:35 ordenSiq.pdf
%
ERROR:
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); awk -v allexts=${thename}".*" 'BEGIN{system(ls -lAh allexts)}' Output:
sh: 0ordenSiq.*: command not found
And with
thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); awk -v allexts=${thename}".*" 'BEGIN{system(ls -lAh $allexts)}' Output:
awk: illegal field $(ordenSiq.*), name "allexts"
source line number 1
Some Working example:
% root1="/webroot"; echo | awk -v r=$root1 '{ print "shell variable $root1 value is " r}' Output:
shell variable $root1 value is /webroot
Statically Works!
% ls -lAh ordenSiq.*
-rw-r--r-- 1 joseluisbz staff 0B Jul 15 12:34 ordenSiq.abc
-rw-r--r-- 1 joseluisbz staff 0B Jul 15 12:34 ordenSiq.def
-rw-r--r--# 1 joseluisbz staff 47K Jun 29 15:35 ordenSiq.pdf
%
And the variable's value is correct!
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); allexts=$(echo ${thename}'.*'); echo ${allexts} Output:
ordenSiq.*
But doing this, do not work;
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); allexts=$(echo ${thename}'.*'); ls -lAh $allexts Output:
ls: ordenSiq.*: No such file or directory
QUESTION:
What is wrong in my steps in order to perform the final operation with variables (with and without AWK)?
{ls -lAh thelast;}
This is not correct way of using shell command in GNU AWK, you should prepare string with command and then use system function, so for example if you want to sleep for 5 seconds at beginning you might do
awk 'BEGIN{system("sleep 5")}'
Keep in mind that system function returns exit status code (see linked documentation for further discussion), not output of command.
I think this is what you're trying to do:
thelast=$(find . -type f -name '*.pdf' -printf '%T#\t%p\n' | sort -n | cut -f2- | tail -n 1)
thename="${thelast%.pdf}"
allexts=( "$thename".* )
ls -lh "${allexts[#]}"
As for what's wrong with your original code:
% thelast=$(ls -A *.pdf | grep -v '^d' | tail -n 1 | awk '{print}'); thename=$(echo ${thelast} | awk '{split($0,a,"."); print a[1]}'); allexts=$(echo ${thename}'.*'); ls -lAh $allexts
It's trying to parse the output of ls, see https://mywiki.wooledge.org/ParsingLs
It's removing the names of files that start with d for unknown reasons.
It's not listing the files in time order
It's using awk '{print}' which does nothing but copy the input to the output.
It's got unquoted variables (copy/paste your code into http://shellcheck.net and it'll tell you about the basic issues)
echo ${thename}'.*' is leaving the part that needs to be double-quoted unquoted and then single-quoting the part that needs to be unquoted.
Using split() in awk as you are would corrupt file names that contain multiple .s.
There may be other issues that aren't as obvious, idk.

awk + how to get latest numbers in file but exclude number until 4 digit

we have file like the following
more file
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 07:10 /noij/gtdf/license_alerts/alert_1_1623222654527
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 07:10 /noij/gtdf/license_alerts/alert_1_1623222654679
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 07:10 /noij/gtdf/license_alerts/alert_1_1623222654744
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 08:02 /noij/gtdf/license_alerts/alert_1_1623225746040_69
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 08:02 /noij/gtdf/license_alerts/alert_1_1623225746059_66
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 08:02 /noij/gtdf/license_alerts/alert_1_1623225746061_65
-rwxrwxrwt 3 hdfs hdfs 30 2021-06-09 08:02 /noij/gtdf/license_alerts/alert_1_1623225746162_65
while I need to take the last numbers but exclude if the number contain 1-3 digit
expected results should be
1623222654527
1623222654679
1623222654744
so my approach until now is
sed s'/_/ /g' file | awk '{print $NF}'
but its print
1623222654527
1623222654679
1623222654744
69
66
65
65
how to improve my syntax i n order to exclude the latest number that have 1 or 2 or 3 digit?
so we get the output like this
1623222654527
1623222654679
1623222654744
Like this?:
$ awk -F_ 'length($NF)>3{print $NF}' file
Output:
1623222654527
1623222654679
1623222654744
With _ as field separator, if the length of the last field $NF is greater than 3, output the last field.
With grep:
$ grep -oE '[0-9]{4,}$' ip.txt
1623222654527
1623222654679
1623222654744
If there should be an underscore present, using gnu grep with -P for Perl-compatible regular expressions:
grep -oP ".*_\K\d{4,}$" file
The pattern matches:
.*_ Match until the last underscore
\K Clear the match buffer
\d{4,}$ Match 4 or more digits until the end of the string
Output
1623222654527
1623222654679
1623222654744
Or using sed
sed -nE 's/.*_([0-9]{4,})$/\1/p' file
-E extended regex
-n suppress automatic printing
/p print the line
Output
1623222654527
1623222654679
1623222654744

Hive How to extract data and write to local files based on column value

I am trying to extract data from Hive table and write to local files:
One output file per a column "Date" value. My Hive table will have about 2+ years history of data, that means I will need about 700+ different output files.
My current knowledge will only allow me to write one file per a run, this is my code can be run in Hive command line:
INSERT OVERWRITE LOCAL DIRECTORY '/local/hive/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select date, col1, col2, col3, col4, col5
from WH_TEMP_EXTRACT.table_temp
where date='2015-09-17';
I am not a developer, but currently in the process of researching all options to perform this task. I appreciate any help you can provide here.
Extract all the 2 year data in a single query into the local file. After that you can use awk command to get them into individual files as below.
/tmp/hive> ls -l
total 4
-rw-r--r-- 1 xxxxxxx yyyyyy 228 Sep 20 10:11 hive_extract.dat
/tmp/hive> cat hive_extract.dat
2018-09-17,abc,134
2018-09-17,abc,135
2018-09-17,abc,136
2018-09-17,abc,137
2018-09-17,abc,138
2018-09-18,abc,141
2018-09-18,abc,142
2018-09-18,abc,143
2018-09-18,abc,144
2018-09-19,abc,150
2018-09-19,abc,151
2018-09-19,abc,152
/tmp/hive> awk -F"," '{ print $0 > "file_"$1 }' hive_extract.dat
/tmp/hive> ll
total 28
-rw-r--r-- 1 xxxxxxx yyyyyy 228 Sep 20 10:11 hive_extract.dat
-rw-r--r-- 1 xxxxxxx yyyyyy 57 Sep 20 10:13 file_2018-09-19
-rw-r--r-- 1 xxxxxxx yyyyyy 76 Sep 20 10:13 file_2018-09-18
-rw-r--r-- 1 xxxxxxx yyyyyy 95 Sep 20 10:13 file_2018-09-17
/tmp/hive> cat file_2018-09-17
2018-09-17,abc,134
2018-09-17,abc,135
2018-09-17,abc,136
2018-09-17,abc,137
2018-09-17,abc,138
/tmp/hive> cat file_2018-09-18
2018-09-18,abc,141
2018-09-18,abc,142
2018-09-18,abc,143
2018-09-18,abc,144
/tmp/hive> cat file_2018-09-19
2018-09-19,abc,150
2018-09-19,abc,151
2018-09-19,abc,152
/tmp/hive>
let me know if this solution will work for you.
EDIT 1:
use gsub
awk -F"," '{ gsub("-","_",$1); print $0 > "file_"$1 }' hive_extract.dat
EDIT 2:
awk -F"," 'BEGIN { OFS=","} { gsub("-","_",$1); print $0 > "file_"$1 }' hive_extract.dat
EDIT 3:
awk -F"," '{ fx=$1;gsub("-","_",fx);print $0 > "file_"fx }' hive_extract.dat

awk or cut field with filename that has space

From below
-rw-r--r-- 1 user user 0 Aug 26 15:20 /home/user/public_html/this\ space.ext
I want to extract last column. Expected output:
/home/user/public_html/this\ space.ext
What I tried with cut:
ls -lh /home/user/public_html/this\ space.ext | cut -d ' ' -f9
output:
/home/user/public_html/this\
What I tried with awk:
ls -lh /home/user/public_html/this\ space.ext | awk '{print $9}'
output:
/home/user/public_html/this\
with awk
$ echo "-rw-r--r-- 1 user user 0 Aug 26 15:20 /home/user/public_html/this\ space.ext" |
awk -F'[^\\\\] ' '{print $NF}'
/home/user/public_html/this\ space.ext
define delimiter as space after a non-backslash char.

Converting output of ls -ltr to date format %m%d %H:%M

i am writing a awk script for getting modification date and then converting them but getting a problem in converting output of ls -lrt to date format "month/date hour:date"
My awk script:-
awk 'BEGIN{
"ls -lrt "ARGV[1] "| awk '{\"print $6$7$8\" +\"%Y%m/%d %H:%M\"}'" | getline cdatefile1
}
{
print cdatefile1
}' file1
Assuming GNU ls, you want the --time-style option:
$ touch afile
$ ls -l afile
-rw-r--r-- 1 jackman jackman 0 Apr 2 08:54 afile
$ ls -l --time-style='+%m%d %H:%M' afile
-rw-r--r-- 1 jackman jackman 0 0402 08:54 afile