When I'm trying to use GitLab API GET request to get all commits from specific date range and specific branch, I receive only commits from NEXT day after I put since date.
I mean, if I define since date for example - from 2022-12-01T12:17:30.000+02:00 until 2022-12-15T15:01:36.000+01:00. But, my commits from curl request starting from 2 Dec 2022.
How does to include initial date to response?
curl -s --header "PRIVATE-TOKEN: <token>" https://gitlab.example.com/api/v4/projects/ID/repository/commits"?ref_name=${branch}&since=${since_date}&until=${until_date}" | jq -r '.[] | .committed_date + "\t" + .title'
Response which I receive:
2022-12-15T15:01:36.000+01:00
2022-12-15T14:39:44.000+02:00
2022-12-14T08:26:43.000+02:00
2022-12-13T20:55:03.000+02:00
2022-12-13T15:51:34.000+01:00
2022-12-13T15:43:26.000+01:00
2022-12-12T16:50:49.000+01:00
2022-12-07T16:38:26.000+01:00
2022-12-05T22:41:04.000+01:00
2022-12-02T09:23:58.000+01:00
By the way, I tried to use, but it didn't help me.
?first_parent=true
It could be a pagination problem. Try adding &per_page=100 to see if there are more commits. I think you can't get more than 100 items in one api call.
See https://docs.gitlab.com/ee/api/#pagination for more info
I have a date-partitioned table in BigQuery that I'd like to export. I would like to export it such that the data from each day ends up in a different file. For example, to a GS bucket with a nested folder structure like gs://my-bucket/YYYY/MM/DD/. Is this possible?
Please don't tell me I need to run a separate export job for each day of data: I know this is possible but it is painful when exporting many years worth of data, as you need to run thousands of export jobs.
On the import side, this is possible with the parquet format.
If this is not possible with BigQuery directly, is there a GCS tool like dataproc or dataflow that would make this easy (bonus points for linking to a script that actually performs this export).
Would a bash script with bq extract work?
#!/bin/bash
# Stop on first error
set -e;
# Used for Bigquery partitioning (to distinguish from bash variable reference)
DOLLAR="\$"
# -I ISO DATE
# -d FROM STRING
start=$(date -I -d 2019-06-01) || exit -1
end=$(date -I -d 2019-06-15) || exit -1
d=${start}
# string(d) <= string(end)
while [[ ! "$d" > "$end" ]]; do
YYYYMMDD=$(date -d ${d} +"%Y%m%d")
YYYY=$(date -d ${d} +"%Y")
MM=$(date -d ${d} +"%m")
DD=$(date -d ${d} +"%d")
# print current date
echo ${d}
cmd="bq extract --destination_format=AVRO \
'project:dataset.table${DOLLAR}${YYYYMMDD}' \
'gs://my-bucket/${YYYY}/${MM}/${DD}/part*.avro'
"
# execute
eval ${cmd}
# d++
d=$(date -I -d "$d + 1 day")
done
Maybe you should request a new feature at https://issuetracker.google.com/savedsearches/559654.
Not a bash ninja, so sure that there is a cooler way to compare dates.
At #Ben P's request here's the solution (a python script) I've used previously to run lots of export jobs in parallel. This is pretty rough code and should be improved by checking the status of each export job after it runs to see whether it succeeded.
I won't accept this an an answer because the question is looking for a bigquery-native way of performing this task.
Note that this script was for exporting a versioned dataset, so there's a bit of extra logic around that which many users may not need. It assumes that the input table and output folder names both use the version. This should be easy to strip out.
import argparse
import datetime as dt
from google.cloud import bigquery
from multiprocessing import Pool
import random
import time
GCS_EXPORT_BUCKET = "YOUR_BUCKET_HERE"
VERSION = "dataset_v1"
def export_date(export_dt, bucket=GCS_EXPORT_BUCKET, version=VERSION):
table_id = '{}${:%Y%m%d}'.format(version, export_dt)
gcs_filename = '{}/{:%Y/%m/%d}/{}-*.jsonlines.gz'.format(version, export_dt, table_id)
gcs_path = 'gs://{}/{}'.format(bucket, gcs_filename)
job_id = export_data_to_gcs(table_id, gcs_path, 'currents')
return (export_dt, job_id)
def export_data_to_gcs(table_id, destination_gcs_path, dataset):
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset(dataset)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = 'NEWLINE_DELIMITED_JSON'
job_config.compression = 'GZIP'
job_id = 'export-{}-{:%Y%m%d%H%M%S}'.format(table_id.replace('$', '--'),
dt.datetime.utcnow())
# Add a bit of jitter
time.sleep(5 * random.random())
job = bigquery_client.extract_table(table_ref,
destination_gcs_path,
job_config=job_config,
job_id=job_id)
print(f'Now running job_id {job_id}')
time.sleep(50)
job.reload()
while job.running():
time.sleep(10)
job.reload()
return job_id
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-s', "--startdate",
help="The Start Date - format YYYY-MM-DD (Inclusive)",
required=True,
type=dt.date.fromisoformat)
parser.add_argument('-e', "--enddate",
help="The End Date format YYYY-MM-DD (Exclusive)",
required=True,
type=dt.date.fromisoformat)
args = parser.parse_args()
start_date = args.startdate
end_date = args.enddate
dates = []
while start_date < end_date:
dates.append(start_date)
start_date += dt.timedelta(days=1)
with Pool(processes=30) as pool:
jobs = pool.map(export_date, dates, chunksize=1)
To run this code, put it into a file called bq_exporter.py and then run python bq_exporter.py -s 2019-01-01 -e 2019-02-01. That'll export January of 2019, and print each of the export job's IDs. You can check on the status of a job using BigQuery CLI via bq show -j JOB_ID.
I'm planning to migrate a couple hundred bugs tracked in another (home-rolled) system into GitHub's issue system. Most of these bugs were closed in the past. I can use github's API to create an issue, e.g.
curl -u $GITHUB_TOKEN:x-oauth-basic https://api.github.com/repos/my_organization/my_repo/issues -d '{
"title": "test",
"body": "the body"
}'
... however, this will leave me with a bunch of open issues. How to close those? I've tried just closing at the time of creation, e.g.:
curl -u $GITHUB_TOKEN:x-oauth-basic https://api.github.com/repos/my_organization/my_repo/issues -d '{
"title": "test",
"body": "the body",
"state": "closed"
}'
... but the result is to create an open issue (i.e. the "state" is ignored).
It looks to me like I should be able to "edit" an issue to close it (https://developer.github.com/v3/issues/#edit-an-issue) ... but I'm unable to figure out what the corresponding curl command is supposed to look like. Any guidance?
Extra credit: I'd really like to be able to assign a "closed" date, to agree with the actual closed date captured in our current system. It's not clear that this is possible.
Thanks!
migrating a bunch of issues to github with the command line? are you crazy?
anyway, using php and hhb_curl from https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php ,
this worked for me, unfortunately couldn't set the "closed_at" date (it was ignored by the api), but i could emulate it using labels, then it looked like
, the code should give you something to work on when porting it to command line:
<?php
declare(strict_types = 1);
require_once ('hhb_.inc.php');
$hc=new hhb_curl();
define('BASE_URL','https://api.github.com');
$hc->_setComfortableOptions();
$data=array(
'state'=>'closed',
'closed_at'=> '2011-04-22T13:33:48Z',// << unfortunately, ignored
'labels'=>array(
'closed at 2011-04-22T13:33:48Z' // << we can fake it using labels...
)
);
$data=json_encode($data);
$hc->setopt_array(array(
CURLOPT_CUSTOMREQUEST=>'PATCH',
// /repos/:owner/:repo/issues/:number
// https://github.com/divinity76/GitHubCrashTest/issues/1
CURLOPT_URL=>BASE_URL.'/repos/divinity76/GitHubCrashTest/issues/1',
CURLOPT_USERAGENT=>'test',
CURLOPT_HTTPHEADER=>array(
'Accept: application/vnd.github.v3+json',
'Content-Type: application/json',
'Authorization: token <removed>'
),
CURLOPT_POSTFIELDS=>$data,
));
$hc->exec();
hhb_var_dump($hc->getStdErr(),$hc->getResponseBody());
(i modified the "Authorization: token" line before posting it on stackoverflow ofc)
As suggested by hanshenrik, the correct altered curl command is:
curl -u $GITHUB_TOKEN:x-oauth-basic https://api.github.com/repos/my_organization/my_repo/issues/5 -d '{
"state": "closed"
}'
I'd failed to understand the documentation referenced in his answer:
/repos/:owner/:repo/issues/:number
translates to
https://api.github.com/repos/my_organization/my_repo/issues/5
(I now understand that fields starting with ":" are variables)
For the record, I'm planning to script the calls to curl. :)
I am trying to replicate the curl method mentioned in splunk rest api doc into R to perform search using R. Sorry, I won't be able provides details on the parameters to replicate. Hence attaching the link for reference.
curl -u admin:changeme -k https://localhost:8089/services/search/jobs -d search="search *"
This returns me a sid from curl. However when I try to replicate the same in R using httr it returns list of all search details. I have tried using both POST & GET in httr just in case. Below is sample code. Ideally below one should return me a sid. however it returns list of existing search details. Not sure what I am missing. I am new to Rcurl,httr. I tried curlperform as well, there also same. Seems, something is missing out. What exactly -d does in curl, is this the thing I am missing to replicate ?
response <- GET(splunk_server,path=search_job_export_endpoint,
config(ssl_verifyhost=FALSE, ssl_verifypeer=0),
authenticate(username, password),
query=list(search=urlencode(search_terms)),
verbose())
result <- read.table(text=content(response, as="text"), sep=",", header=TRUE,
stringsAsFactors=FALSE)
It's the first great virtue of programmers. All of us have, at one time or another automated a task with a bit of throw-away code. Sometimes it takes a couple seconds tapping out a one-liner, sometimes we spend an exorbitant amount of time automating away a two-second task and then never use it again.
What tiny hack have you found useful enough to reuse? To make go so far as to make an alias for?
Note: before answering, please check to make sure it's not already on favourite command-line tricks using BASH or perl/ruby one-liner questions.
i found this on dotfiles.org just today. it's very simple, but clever. i felt stupid for not having thought of it myself.
###
### Handy Extract Program
###
extract () {
if [ -f $1 ] ; then
case $1 in
*.tar.bz2) tar xvjf $1 ;;
*.tar.gz) tar xvzf $1 ;;
*.bz2) bunzip2 $1 ;;
*.rar) unrar x $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tbz2) tar xvjf $1 ;;
*.tgz) tar xvzf $1 ;;
*.zip) unzip $1 ;;
*.Z) uncompress $1 ;;
*.7z) 7z x $1 ;;
*) echo "'$1' cannot be extracted via >extract<" ;;
esac
else
echo "'$1' is not a valid file"
fi
}
Here's a filter that puts commas in the middle of any large numbers in standard input.
$ cat ~/bin/comma
#!/usr/bin/perl -p
s/(\d{4,})/commify($1)/ge;
sub commify {
local $_ = shift;
1 while s/^([ -+]?\d+)(\d{3})/$1,$2/;
return $_;
}
I usually wind up using it for long output lists of big numbers, and I tire of counting decimal places. Now instead of seeing
-rw-r--r-- 1 alester alester 2244487404 Oct 6 15:38 listdetail.sql
I can run that as ls -l | comma and see
-rw-r--r-- 1 alester alester 2,244,487,404 Oct 6 15:38 listdetail.sql
This script saved my career!
Quite a few years ago, i was working remotely on a client database. I updated a shipment to change its status. But I forgot the where clause.
I'll never forget the feeling in the pit of my stomach when I saw (6834 rows affected). I basically spent the entire night going through event logs and figuring out the proper status on all those shipments. Crap!
So I wrote a script (originally in awk) that would start a transaction for any updates, and check the rows affected before committing. This prevented any surprises.
So now I never do updates from command line without going through a script like this. Here it is (now in Python):
import sys
import subprocess as sp
pgm = "isql"
if len(sys.argv) == 1:
print "Usage: \nsql sql-string [rows-affected]"
sys.exit()
sql_str = sys.argv[1].upper()
max_rows_affected = 3
if len(sys.argv) > 2:
max_rows_affected = int(sys.argv[2])
if sql_str.startswith("UPDATE"):
sql_str = "BEGIN TRANSACTION\\n" + sql_str
p1 = sp.Popen([pgm, sql_str],stdout=sp.PIPE,
shell=True)
(stdout, stderr) = p1.communicate()
print stdout
# example -> (33 rows affected)
affected = stdout.splitlines()[-1]
affected = affected.split()[0].lstrip('(')
num_affected = int(affected)
if num_affected > max_rows_affected:
print "WARNING! ", num_affected,"rows were affected, rolling back..."
sql_str = "ROLLBACK TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
sql_str = "COMMIT TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
ret_code = sp.call([pgm, sql_str], shell=True)
I use this script under assorted linuxes to check whether a directory copy between machines (or to CD/DVD) worked or whether copying (e.g. ext3 utf8 filenames -> fusebl
k) has mangled special characters in the filenames.
#!/bin/bash
## dsum Do checksums recursively over a directory.
## Typical usage: dsum <directory> > outfile
export LC_ALL=C # Optional - use sort order across different locales
if [ $# != 1 ]; then echo "Usage: ${0/*\//} <directory>" 1>&2; exit; fi
cd $1 1>&2 || exit
#findargs=-follow # Uncomment to follow symbolic links
find . $findargs -type f | sort | xargs -d'\n' cksum
Sorry, don't have the exact code handy, but I coded a regular expression for searching source code in VS.Net that allowed me to search anything not in comments. It came in very useful in a particular project I was working on, where people insisted that commenting out code was good practice, in case you wanted to go back and see what the code used to do.
I have two ruby scripts that I modify regularly to download all of various webcomics. Extremely handy! Note: They require wget, so probably linux. Note2: read these before you try them, they need a little bit of modification for each site.
Date based downloader:
#!/usr/bin/ruby -w
Day = 60 * 60 * 24
Fromat = "hjlsdahjsd/comics/st%Y%m%d.gif"
t = Time.local(2005, 2, 5)
MWF = [1,3,5]
until t == Time.local(2007, 7, 9)
if MWF.include? t.wday
`wget #{t.strftime(Fromat)}`
sleep 3
end
t += Day
end
Or you can use the number based one:
#!/usr/bin/ruby -w
Fromat = "http://fdsafdsa/comics/%08d.gif"
1.upto(986) do |i|
`wget #{sprintf(Fromat, i)}`
sleep 1
end
Instead of having to repeatedly open files in SQL Query Analyser and run them, I found the syntax needed to make a batch file, and could then run 100 at once. Oh the sweet sweet joy! I've used this ever since.
isqlw -S servername -d dbname -E -i F:\blah\whatever.sql -o F:\results.txt
This goes back to my COBOL days but I had two generic COBOL programs, one batch and one online (mainframe folks will know what these are). They were shells of a program that could take any set of parameters and/or files and be run, batch or executed in an IMS test region. I had them set up so that depending on the parameters I could access files, databases(DB2 or IMS DB) and or just manipulate working storage or whatever.
It was great because I could test that date function without guessing or test why there was truncation or why there was a database ABEND. The programs grew in size as time went on to include all sorts of tests and become a staple of the development group. Everyone knew where the code resided and included them in their unit testing as well. Those programs got so large (most of the code were commented out tests) and it was all contributed by people through the years. They saved so much time and settled so many disagreements!
I coded a Perl script to map dependencies, without going into an endless loop, For a legacy C program I inherited .... that also had a diamond dependency problem.
I wrote small program that e-mailed me when I received e-mails from friends, on an rarely used e-mail account.
I wrote another small program that sent me text messages if my home IP changes.
To name a few.
Years ago I built a suite of applications on a custom web application platform in PERL.
One cool feature was to convert SQL query strings into human readable sentences that described what the results were.
The code was relatively short but the end effect was nice.
I've got a little app that you run and it dumps a GUID into the clipboard. You can run it /noui or not. With UI, its a single button that drops a new GUID every time you click it. Without it drops a new one and then exits.
I mostly use it from within VS. I have it as an external app and mapped to a shortcut. I'm writing an app that relies heavily on xaml and guids, so I always find I need to paste a new guid into xaml...
Any time I write a clever list comprehension or use of map/reduce in python. There was one like this:
if reduce(lambda x, c: locks[x] and c, locknames, True):
print "Sub-threads terminated!"
The reason I remember that is that I came up with it myself, then saw the exact same code on somebody else's website. Now-adays it'd probably be done like:
if all(map(lambda z: locks[z], locknames)):
print "ya trik"
I've got 20 or 30 of these things lying around because once I coded up the framework for my standard console app in windows I can pretty much drop in any logic I want, so I got a lot of these little things that solve specific problems.
I guess the ones I'm using a lot right now is a console app that takes stdin and colorizes the output based on xml profiles that match regular expressions to colors. I use it for watching my log files from builds. The other one is a command line launcher so I don't pollute my PATH env var and it would exceed the limit on some systems anyway, namely win2k.
I'm constantly connecting to various linux servers from my own desktop throughout my workday, so I created a few aliases that will launch an xterm on those machines and set the title, background color, and other tweaks:
alias x="xterm" # local
alias xd="ssh -Xf me#development_host xterm -bg aliceblue -ls -sb -bc -geometry 100x30 -title Development"
alias xp="ssh -Xf me#production_host xterm -bg thistle1 ..."
I have a bunch of servers I frequently connect to, as well, but they're all on my local network. This Ruby script prints out the command to create aliases for any machine with ssh open:
#!/usr/bin/env ruby
require 'rubygems'
require 'dnssd'
handle = DNSSD.browse('_ssh._tcp') do |reply|
print "alias #{reply.name}='ssh #{reply.name}.#{reply.domain}';"
end
sleep 1
handle.stop
Use it like this in your .bash_profile:
eval `ruby ~/.alias_shares`