Get full list of Groups and Projects in Gitlab Cloud - api

I'm trying to get a full list of Projects and groups in out Gitlab cloud account.
I'm currently using their documentation as reference (bear in mind though I'm no developer) and using Linux command line to do so. Here's the documentation I'm trying to use:
https://docs.gitlab.com/ee/api/projects.html
https://docs.gitlab.com/ee/api/groups.html#list-a-groups-projects
I'm using the following command to get the data and parse in a readable format that I will export to csv or spreadsheet afterwards:
curl --header "PRIVATE-TOKEN: $TOKEN" "https://gitlab.com/api/v4/projects/?owned=yes&per_page=1000&page=1" | python -m json.tool | grep -E "http_url_to_repo|visibility" | awk '!(NR%2){print$0p}{p=$0}' | awk '{print $4,$2}' | sed -E 's/\"|\,//g' > gitlab.txt
My problem is that the code only return about 100 of the 280 repositories we have. It doesn't seem to get it recursively from all the groups and subgroups.
Any ideas on how I can improve this search to get everything ?
Thank you

It seems it can get only 100 per page so you will have to run it two times - first with page=1 and next with page=2. And for second page you will need >> to append to existing file gitlab.txt
curl --header "..." "https://...&per_page=100&page=1" | ... > gitlab.txt
curl --header "..." "https://...&per_page=100&page=2" | ... >> gitlab.txt
Or you will have to write script which first get all pages and later send it to pipe. You may also try to use for-loop in bash

Related

How to read the number of lines in a multiline variable in Ansible

I pass a multiline variable dest_host from Jenkins to Ansible as below
ansible-playbook -i allmwhosts.hosts action.yml -e '{ dest_host: myhost1
myhost2 }' --tags validate
In ansible i wish to count the number of lines present in dest_host which in this case is 2.
I can think of command: "cat {{ dest_host }} | wc -l" register the output and then print as a solution. However, is these a better way to get this in Ansible rather than going for a unix command ?
That is what the | length filter is for
- debug:
msg: '{{ dest_host | length }}'
vars:
dest_host: "alpha\nbeta\n"
although be forewarned that your -e does not do what you think it does (about the lines) because of yaml's scalar folding
ansible -e '{ bob:
alpha
beta
}' -m debug -a var=bob -c local -i localhost, localhost
emits
"bob": "alpha beta"
but the | length can still help you by using | split | length
Do note that not all results may play nicely by just passing | split | length — for example, take a stdout like below:
stdout:
- "this is the first line\nthis is the second line"
If you wanted to count the number of lines, {{ stdout[0] | split | length }} would give you something like 9 or 10, not 2 — it splits by spaces!
So, in a case like this, you would instead need to use {{ stdout[0].split('\n') | length }} (thanks Python), which would give you 2 as intended/desired.

LDAPSEARCH into table format

Is there any way to perform a LDAP search and save the results into a table format (e.g. csv)?
Cheers
Jorge
You can use the excellent miller tool (mlr)
The last bit:
echo output | sed 's/://g' | mlr --x2c cat then unsparsify
How it works:
the sed converts the output to XTAB format
--x2c converts XTAB to CSV
cat then unsparsify makes sure the missing values are just filled instead of breaking to different csv output
Total command:
ldapsearch -H ldap://<hostname>:389 -D "<bindDN>" -W -b "<base>" '<query>' -oldif-wrap=no -LLL cn mail telephoneNumber | sed 's/://g' | mlr --x2c cat then unsparsify
Just in case someone else has to do this:
Based on the answer provided in
Filter ldapsearch with awk/bash
this will output the LDAP info into a csv format:
$ ldapsearch -x -D "cn=something" | awk -v OFS=',' '{split($0,a,": ")} /^mail:/{mail=a[2]} /^uidNumber:/{uidNumber=a[2]} /^uid:/{uid=a[2]} /^cn/{cn=a[2]; print uid, uidNumber,cn , mail}' > ldap_dump.csv
NOTE
You need to be careful about the order in which you parse the LDAP data with awk! It needs to be parsed in the same order as it appears on the LDAP data!

How to extract table data from PDF as CSV from the command line?

I want to extract all rows from here while ignoring the column headers as well as all page headers, i.e. Supported Devices.
pdftotext -layout DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| sed '$d' \
| sed -r 's/ +/,/g; s/ //g' \
> output.csv
The resulting file should be in CSV spreadsheet format (comma separated value fields).
In other words, I want to improve the above command so that the output doesn't brake at all. Any ideas?
I'll offer you another solution as well.
While in this case the pdftotext method works with reasonable effort, there may be cases where not each page has the same column widths (as your rather benign PDF shows).
Here the not-so-well-known, but pretty cool Free and OpenSource Software Tabula-Extractor is the best choice.
I myself am using the direct GitHub checkout:
$ cd $HOME ; mkdir svn-stuff ; cd svn-stuff
$ git clone https://github.com/tabulapdf/tabula-extractor.git git.tabula-extractor
I wrote myself a pretty simple wrapper script like this:
$ cat ~/bin/tabulaextr
#!/bin/bash
cd ${HOME}/svn-stuff/git.tabula-extractor/bin
./tabula $#
Since ~/bin/ is in my $PATH, I just run
$ tabulaextr --pages all \
$(pwd)/DAC06E7D1302B790429AF6E84696FCFAB20B.pdf \
| tee my.csv
to extract all the tables from all pages and convert them to a single CSV file.
The first ten (out of a total of 8727) lines of the CVS look like this:
$ head DAC06E7D1302B790429AF6E84696FCFAB20B.csv
Retail Branding,Marketing Name,Device,Model
"","",AD681H,Smartfren Andromax AD681H
"","",FJL21,FJL21
"","",Luno,Luno
"","",T31,Panasonic T31
"","",hws7721g,MediaPad 7 Youth 2
3Q,OC1020A,OC1020A,OC1020A
7Eleven,IN265,IN265,IN265
A.O.I. ELECTRONICS FACTORY,A.O.I.,TR10CS1_11,TR10CS1
AG Mobile,Status,Status,Status
which in the original PDF look like this:
It even got these lines on the last page, 293, right:
nabi,"nabi Big Tab HD\xe2\x84\xa2 20""",DMTAB-NV20A,DMTAB-NV20A
nabi,"nabi Big Tab HD\xe2\x84\xa2 24""",DMTAB-NV24A,DMTAB-NV24A
which look on the PDF page like this:
TabulaPDF and Tabula-Extractor are really, really cool for jobs like this!
Update
Here is an ASCiinema screencast (which you also can download and re-play locally in your Linux/MacOSX/Unix terminal with the help of the asciinema command line tool), starring tabula-extractor:
As Martin R commented, tabula-java is the new version of tabula-extractor and active. 1.0.0 was released on July 21st, 2017.
Download the jar file and with the latest java:
java -jar ./tabula-1.0.0-jar-with-dependencies.jar \
--pages=all \
./DAC06E7D1302B790429AF6E84696FCFAB20B.pdf
> support_devices.csv
What you want is rather easy, but you're having a different problem also (I'm not sure you are aware of it...).
First, you should add -nopgbrk for ("No pagebreaks, please!") to your command. Because these pesky ^L characters which otherwise appear in the output then need not be filtered out later.
Adding a grep -vE '(Supported Devices|^$)' will then filter out all the lines you do not want, including empty lines, or lines with only spaces:
pdftotext -layout -nopgbrk \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| grep -vE '(Supported Devices|^$|Marketing Name)' \
| gsed '$d' \
| gsed -r 's# +#,#g' \
| gsed '# ##g' \
> output2.csv
However, your other problem is this:
Some of the table fields are empty.
Empty fields appear with the -layout option as a series of space characters, sometimes even two in the same row.
However, the text columns are not spaced identically from page to page.
Therefor you will not know from line to line how many spaces you need to regard as a an "empty CSV field" (where you'd need an extra , separator).
As a consequence, your current code will show only one, two or three (instead of four) fields for some lines, and these fields end up in the wrong columns!
There is a workaround for this:
Add the -x ... -y ... -W ... -H ... parameters to pdftotext to crop the PDF column-wise.
Then append the columns with a combination of utilities like paste and column.
The following command extracts the first columns:
pdftotext -layout -x 38 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 1st-columns.txt
These are for second, third and fourth columns:
pdftotext -layout -x 214 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 2nd-columns.txt
pdftotext -layout -x 390 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 3rd-columns.txt
pdftotext -layout -x 567 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 4th-columns.txt
BTW, I cheated a bit: in order to get a clue about what values to use for -x, -y, -W and -H I did first run this command in order to find the exact coordinates of the column header words:
pdftotext -f 1 -l 1 -layout -bbox \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - | head -n 10
It's always good if you know how to read and make use of pdftotext -h. :-)
Anyway, how to append the four text files as columns side by side, with the proper CVS separator in between, you should find out yourself. Or ask a new question :-)
This can be done easily with an IntelliGet (http://akribiatech.com/intelliget) script as below
userVariables = brand, name, device, model;
{ start = Not(Or(Or(IsSubstring("Supported Devices",Line(0)),
IsSubstring("Retail Branding",Line(0))),
IsEqual(Length(Trim(Line(0))),0)));
brand = Trim(Substring(Line(0),10,44));
name = Trim(Substring(Line(0),45,79));
device = Trim(Substring(Line(0),80,114));
model = Trim(Substring(Line(0),115,200));
output = Concat(brand, ",", name, ",", device, ",", model);
}
For the case where you want to extract that tabular data from PDF over which you have control at creation time (for timesheets contracts your employees have to sign), the following solution will be cleaner:
Create a PDF form with field IDs.
Let people fill and save the PDF forms.
Use a Apache PDFBox, an open source tool that allows to extract form data from a PDF. It includes a command-line example tool PrintFields that you would call as follows to print the desired field information:
org.apache.pdfbox.examples.interactive.form.PrintFields file.pdf
For other options, see this question.
As an alternative to the above workflow, maybe you could also use a digital signature web service that allows PDF form filling and export of the data to tables. Such as SignRequest, which allows to create templates and later export the data of signed documents. (Not affiliated, just found this myself.)

Google BigQuery - how to drop table with bq command?

Google BigQuery - bq command enable you to create, load, query and alter table.
I did not find any documentation regarding dropping table, will be happy to know how to do it.
I found the bq tool much easier to implement instead of writing python interface for each command.
Thanks.
found it :
bq rm -f -t data_set.table_name
-t for table, -f for force, -r remove all tables in the named dataset
great tool.
Is there a way to bulk delete multiple tables? – activelearner
In bash, you can do something like:
for i in $(bq ls -n 9999 my_dataset | grep keyword | awk '{print $1}'); do bq rm -ft my_dataset.$i; done;
Explanation:
bq ls -n 9999 my_dataset - list up to 9999 tables in my dataset
| grep keyword - pipe the results of the previous command into grep, search for a keyword that your tables have in common
| awk '{print $1}' - pipe the results of the previous command into awk and print only the first column
Wrap all that into a for loop
do bq rm -ft my_dataset.$i; done; - remove each table from your dataset
I would highly recommend running the commands to list out the tables you want to delete before you add the 'do bq rm'. This way you can ensure you are only deleting the tables you actually want to delete.
UPDATE:
The argument -ft now returns an error and should be simply -f to force the deletion, without a prompt:
for i in $(bq ls -n 9999 my_dataset | grep keyword | awk '{print $1}'); do bq rm -f my_dataset.$i; done;
You can use Python code (on Jupyter Notebook) for the same purpose:
bigquery_client = bigquery.Client() #Create a BigQuery service object
dataset_id='Name of your dataset'
table_id='Table to be deleted'
table_ref = bigquery_client.dataset(dataset_id).table(table_id)
bigquery_client.delete_table(table_ref) # API request
print('Table {}:{} deleted.'.format(dataset_id, table_id))
if you want to delete complete dataset:
If dataset contains tables as well. And we want to delete dataset containing tables in one go the command is:
!bq rm -f -r serene-boulder-203404:Temp1 # It will remove complete data set along with the tables in it
If your dataset is empty then you can use the following command as well:
To use the following command make sure that you have deleted all the tables in that dataset otherwise, it will generate an error (dataset is still in use).
#Now remove an empty dataset using bq command from Python
!bq rm -f dataset_id
print("dataset deleted successfully !!!")
I used the command line for loop to delete a month of table data, but this is reliant on your table naming:
for %d in (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31) DO bq rm -f -t dataset.tablename_201701%d
Expanding on the excellent answer from #james, I simply needed to remove all tables in a dataset but not actually remove the dataset itself. Hence the grep part was unnecessary for me however I still needed to get rid of the
table_id
------------------
header that bq returns when listing tables, for that I used sed to remove those first two lines:
for i in $(bq ls -n 9999 my_dataset | sed "1,2 d" | awk '{print $1}'); do bq rm -f my_dataset.$i; done;
perhaps there's a bq option to not return that header but if there is, I don't know it.

Retrieving process id using sshcmd on unix

I want to retrieve process id when my code successfully start the job. But its returning null.
I am starting job using sshcmd, creating log of sshcmd output, and then trying to retrieve process id in new_process_id using sshcmd. if I get new_process_id I will show new_process_id else I will show output collected in log file. But I am getting null in new_process_id.
remote_command="nohup J2EEServer/config/AMSS/scripts/${batch_job} & "
sshcmd -q -u ${login_user} -s ${QA_HOST} "$remote_command" > /tmp/nohup_${batch_job} 2>&1
remote_command=$(ps -ef | grep ${login_user} | grep $batch_job | grep -v grep | awk '{print $2}');
new_process_id=`sshcmd -q -u ${login_user} -s ${QA_HOST} "$remote_command"`
runstatus=`grep Synchronized. /tmp/nohup_${batch_job}`
if [[ $runstatus != "" ]]
then
new_process_id=`cat /tmp/nohup_${batch_job}`
fi
echo $new_process_id
The second variable remote_command is the output of that command run on your local machine.
Some other hints: If you are making a second, unrelated variable, give it another name. It will avoid unnecessary confusion.
What you are attempting to do next with runstatus and rewriting an already existing but not used variable is totally unclear to me.