Compiling a week's worth of tweets automatically? - automation

I'd like to be able to run a script that parsed through the twitter page and compiled a list of tweets for a given time period - one week to be more exact. Ideally it should return the results as a html list that could then be posted in a blog. Like here:
http://www.perezfox.com/2009/07/12/the-week-in-tweet-for-2009-07-12/
I'm sure there's a script out there that could do it, unless the guy does it manually (that would be a big pain!). If there is such a script forgive my ignorance.
Thanks.

Use the Twitter search API. For instance, this query returns my tweets between 2009-07-10 and 2009-07-17:
http://search.twitter.com/search.atom?q=from:tormodfj&since=2009-07-10&until=2009-07-17

For anyone that's interested, I hacked together a quick PHP parser that will take the XML output of the above feed and turn it into a nice list. It's sensible if you post a lot of tweets to use the rpp parameter, so that your feed doesn't get clipped at 15. The maximum limit is 100. So by sticking this url into NetNewsWire (or equivalent feed reader):
http://search.twitter.com/search.atom?q=from:yourTwitterAccountHere&since=2009-07-13&until=2009-07-19&rpp=100
and exporting the xml to a hard file, you can use this script:
<?php
$date = "";
$in = 'links.xml'; //tweets
file_exists($in) ? $xml = simplexml_load_file($in) : die ('Failed to open xml data.');
foreach($xml->entry as $item)
{
$newdate = date("dS F", strtotime($item->published));
if ($date == "")
{
echo "<h2>$newdate</h2>\n<ul>\n";
}
elseif ($newdate != $date)
{
echo "</ul>\n<h2>$newdate</h2>\n<ul>\n";
}
echo "<li>\n<p>" . $item->content ." *</p>\n</li>\n";
$date = $newdate;
}
echo "</ul>\n";
?>

Related

Perl : Scrape website and how to download PDF files from the website using Perl Selenium:Chrome

So I'm studying Scraping website using Selenium:Chrome on Perl, I just wondering how can I download all pdf files from year 2017 to 2021 and store it into a folder from this website https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021 . So far this is what I've done
use strict;
use warnings;
use Time::Piece;
use POSIX qw(strftime);
use Selenium::Chrome;
use File::Slurp;
use File::Copy qw(copy);
use File::Path;
use File::Path qw(make_path remove_tree);
use LWP::Simple;
my $collection_name = "mre_zen_test3";
make_path("$collection_name");
#DECLARE SELENIUM DRIVER
my $driver = Selenium::Chrome->new;
#NAVIGATE TO SITE
print "trying to get toc_url\n";
$driver->navigate('https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021');
sleep(8);
#GET PAGE SOURCE
my $toc_content = $driver->get_page_source();
$toc_content =~ s/[^\x00-\x7f]//g;
write_file("toc.html", $toc_content);
print "writing toc.html\n";
sleep(5);
$toc_content = read_file("toc.html");
This script only download the entire content of the website. Hope someone here can help me and teach me. Thank you very much.
Here is some working code, to help you get going hopefully
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # only convenience
use Selenium::Chrome;
my $base_url = q(https://www.fda.gov/drugs/)
. q(warning-letters-and-notice-violation-letters-pharmaceutical-companies/);
my $show = 1; # to see navigation. set to false for headless operation
# A little demo of how to set some browser options
my %chrome_capab = do {
my #cfg = ($show)
? ('window-position=960,10', 'window-size=950,1180')
: 'headless';
'extra_capabilities' => { 'goog:chromeOptions' => { args => [ #cfg ] } }
};
my $drv = Selenium::Chrome->new( %chrome_capab );
my #years = 2017..2021;
foreach my $year (#years) {
my $url = $base_url . "untitled-letters-$year";
$drv->get($url);
say "\nPage title: ", $drv->get_title;
sleep 1 if $show;
my $elem = $drv->find_element(
q{//li[contains(text(), 'PDF')]/a[contains(text(), 'Untitled Letter')]}
);
sleep 1 if $show;
# Downloading the file is surprisingly not simple with Selenium (see text)
# But as we found the link we can get its url and then use Selenium-provided
# user-agent (it's LWP::UserAgent)
my $href = $elem->get_attribute('href');
say "pdf's url: $href";
my $response = $drv->ua->get($href);
die $response->status_line if not $response->is_success;
say "Downloading 'Content-Type': ", $response->header('Content-Type');
my $filename = "download_$year.pdf";
say "Save as $filename";
path($filename)->spew( $response->decoded_content );
}
This takes shortcuts, switches approaches, and sidesteps some issues (which one need resolve for a fuller utility of this useful tool). It downloads one pdf from each page; to download all we need to change the XPath expression used to locate them
my #hrefs =
map { $_->get_attribute('href') }
$drv->find_elements(
# There's no ends-with(...) in XPath 1.0 (nor matches() with regex)
q{//li[contains(text(), '(PDF)')]}
. q{/a[starts-with(#href, '/media/') and contains(#href, '/download')]}
);
Now loop over the links, forming filenames more carefully, and download each like in the program above. I can fill the gaps further if there's need for that.
The code puts the pdf files on disk, in its working directory. Please review that before running this so to make sure that nothing gets overwritten!
See Selenium::Remove::Driver for starters.
Note: there is no need for Selenium for this particular task; it's all straight-up HTTP requests, no JavaScript. So LWP::UserAgent or Mojo would do it just fine. But I take it that you want to learn how to use Selenium, since it often is needed and is useful.

How to get numFound value from response in Apache Solr using Perl

I used the below code to search documents (which has a particular keyword in content) from Apache Solr
my $solrgetapi = "http://$address:$port/solr/OppsBot/select?q=content:";
my $solrgeturl = $solrgetapi.'"'.$keyword.'"';
my $browser = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => $solrgeturl );
$req->authorization_basic( "$username", "$pass" );
my $page = $browser->request( $req );
print $page->decoded_content;
The result I get is as follows:
{
"responseHeader":{
"status":0,
"QTime":2,
"params":{
"q":"content:\"ABC\""}},
"response":{"numFound":0,"start":0,"docs":[]
}}
I want to extract the numFound value to a variable.
I came across some solutions in SolrJ like these
queryResponse.getResults().getNumFound();
But I couldn't find in Perl.
I tried with these below codes also. But I couldn't get these to work. Please help.
$numFound = $page->decoded_content->{response}->{numFound};
print $page->{numFound}
You neglected to transform the JSON text into a data structure.
use JSON::MaybeXS qw(decode_json);
say decode_json($page->decoded_content)->{response}{numFound}
# 0

How to get/set Trello custom fields using the API?

I'm already in love with the Custom Fields feature in Trello. Is there a way to get and set custom fields via the API?
I tried using the get field API call to get a field (on a board with a custom field defined called "MyCustomField"):
curl "https://api.trello.com/1/cards/57c473503a5ef0b76fddd0e5/MyCustomField?key=${TRELLO_API_KEY}&token=${TRELLO_OAUTH_TOKEN}"
to no avail.
The Custom Fields API from Trello is now officially available (announcement blog post here).
It allows users to manipulate both custom field items of boards and custom field item values on cards.
Custom Fields API documentation
Getting customFieldItems For Cards
Setting & Updating CustomFieldItems
This is just to add to bdwakefield's answer. His solution involves hard coding the names of the field ids.
If you want to also retrieve the name of the fields themselves (for example get that "ZIn76ljn-4yeYvz" is actually "priority" in Trello, without needing to hard code it, call the following end point:
boards/{trello board id}/pluginData
This will return an array with the plugins information and in one of the array items, it will include a line along the lines of:
[value] => {"fields":[{"n":"~custom field name~:","t":0,"b":1,"id":"~custom field id that is the weird stuff at the card level~","friendlyType":"Text"}]}
So you just need to figure out the plugin for custom fields in your case, and you can retrieve the key -> value pair for the custom field name and the id associated with it.
It means that if you add / remove fields, or rename them, you can handle it at run time vs changing your code.
This will also give you the options for the custom field (when it is a dropdown like in bdwakefield's example above).
So I have a "sort of" answer to this. It requires some hackery on your part to make it work and there is more than a little manual upkeep as you add properties or values -- but it works.
I am doing this in powershell (I am NOT well versed in ps just yet and this my first really 'big' script that I have pulled together for it) since my intent is to use this with TFS Builds to automate moving some cards around and creating release notes. We are using custom fields to help us classify the card and note estimate/actual hours etc. I used this guys work as a basis for my own scripting. I am not including everything but you should be able to piece everything together.
I have left out everything with connecting to Trello and all that. I have a bunch of other functions for gettings lists, moving cards, adding comments etc. The ps module I linked above has a lot of that built in as well.
function Get-TrelloCardPluginData
{
[CmdletBinding()]
param
(
[Parameter(Mandatory,ValueFromPipelineByPropertyName)]
[ValidateNotNullOrEmpty()]
[Alias('Id')]
[string]$CardId
)
begin
{
$ErrorActionPreference = 'Stop'
}
process
{
try
{
$uri = "$baseUrl/cards/$CardId/pluginData?$($trelloConfig.String)"
$result = Invoke-RestMethod -Uri $uri -Method GET
return $result
}
catch
{
Write-Error $_.Exception.Message
}
}
}
You'll get data that looks like this:
#{id=582b5ec8df1572e572411513; idPlugin=56d5e249a98895a9797bebb9;
scope=card; idModel=58263201749710ed3c706bef;
value={"fields":{"ZIn76ljn-4yeYvz":2,"ZIn76ljn-c2yhZH":1}};
access=shared}
#{id=5834536fcff0525f26f9e53b; idPlugin=56d5e249a98895a9797bebb9;
scope=card; idModel=567031ea6a01f722978b795d;
value={"fields":{"ZIn76ljn-4yeYvz":4,"ZIn76ljn-c2yhZH":3}};
access=shared}
The fields collection is basically key/pair. The random characters correspond to the property and the value after that is what was set on the custom property. In this case it is an 'index' for the value in the dropdown. These two fields have a 'priority' (low, medium, high) and a 'classification' (Bug, Change Request, etc) for us. (We are using labels for something else).
So you'll have to create another fucntion where you can parse this data out. I am sure there are better ways to do it -- but this is what I have now:
function Get-TrelloCustomPropertyData($propertyData)
{
$data = $propertyData.Replace('{"fields":{', '')
$data = $data.Replace('}}', '')
$data = $data.Replace('"', '')
$sepone = ","
$septwo = ":"
$options = [System.StringSplitOptions]::RemoveEmptyEntries
$obj = $data.Split($sepone, $options)
$cardCustomFields = Get-TrelloCustomFieldObject
foreach($pair in $obj)
{
$field = $pair.Split($septwo,$options)
if (-Not [string]::IsNullOrWhiteSpace($field[0].Trim()))
{
switch($field[0].Trim())
{
'ZIn76ljn-4yeYvz' {
switch($field[1].Trim())
{
'1'{
$cardCustomFields.Priority = "Critical"
}
'2'{
$cardCustomFields.Priority = "High"
}
'3'{
$cardCustomFields.Priority = "Medium"
}
'4'{
$cardCustomFields.Priority = "Low"
}
}
}
'ZIn76ljn-c2yhZH' {
switch($field[1].Trim())
{
'1'{
$cardCustomFields.Classification = "Bug"
}
'2'{
$cardCustomFields.Classification = "Change Request"
}
'3'{
$cardCustomFields.Classification = "New Development"
}
}
}
'ZIn76ljn-uJyxzA'{$cardCustomFields.Estimated = $field[1].Trim()}
'ZIn76ljn-AwYurD'{$cardCustomFields.Actual = $field[1].Trim()}
}
}
}
return $cardCustomFields
}
Get-TrelloCustomFieldObject is another ps function that I set up to build an object based on the properties I know that I have defined.
function Get-TrelloCustomFieldObject
{
[CmdletBinding()]
param()
begin
{
$ErrorActionPreference = 'Stop'
}
process
{
$ccf = New-Object System.Object
$ccf | Add-Member -type NoteProperty -name Priority -value "None"
$ccf | Add-Member -type NoteProperty -name Classification -value "None"
$ccf | Add-Member -type NoteProperty -name Estimated -value ""
$ccf | Add-Member -type NoteProperty -name Actual -value ""
return $ccf
}
}

How to inject a variable inside an SQL associated array?

I have many websites that use some of the same content snippets and instead of manually updating all the different websites, I thought it would be a good idea to have the content stored in a database as to only have one copy instead of multiple. It works great except for one issue which is the images that are in the article are sometimes left aligned and other times right aligned.
My solution was to add the following code to the article's image CSS tag that is in the database and use a variable on each of the individual pages to add the custom classes to the image.
class="<?php echo $ImgClass01; ?>"
EDIT: here is more of the content from what is stored in the database field to make my question a little more understandable.
<p><img src="img/charleston.jpg" class="<?php echo $ImgClass01; ?>">Is it the delightful year-round climate? The almost-European feel of its downtown city streets? The overwhelming...</p>
However, the webpage is only showing the text when viewing the source code and not using the variable. Almost anything is possible, but I'm not sure how to make this work.
Here is the code on the page...
// value for the class within the article to be printed on the page
$ImgClass01 = 'img-responsive img-rounded pull-right';
//Start a while loop to process all the rows
while ($row = mysql_fetch_assoc ($result_set))
{
$article = $row['article'];
echo $article;
} //END WHILE
EDIT: Just in case the entire page will be helpful, here is it.
<?php
$PageTitle = "Charleston, South Carolina | Local Towns";
$PageDescription = "Charleston is rated the first most popular vacation destination in the United States, and it surely must rank in...";
// 1. Create a database connection
$link = mysql_connect ("localhost", "root", "");
if (!$link) die("Could not connect: " . mysql_error());
// 2. Select a database to use
if (!mysql_select_db ("articleBank"))
die("Problem with the database: " . mysql_error());
// 3. Set up query for items to display
$query = "SELECT article FROM `articles` WHERE ID = 1";
// 4. Execute the query
$result_set = mysql_query ($query);
include ("theme/header.php");
// value for the class within the article to be printed on the page
$ImgClass01 = 'img-responsive img-rounded pull-right';
//Start a while loop to process all the rows
while ($row = mysql_fetch_assoc ($result_set))
{
$article = $row['article'];
echo $article;
} //END WHILE
// 5. Close Connection
mysql_close();
include ("theme/footer.php");
?>
while ($row = mysql_fetch_assoc ($other_result_set))
{
$ImgClass01 = $row['ImgClass01'];
}
not sure what you wanted, but you could use left, right in some row like place..

How can I implement incremental (find-as-you-type) search on command line?

I'd like to write small scripts which feature incremental search (find-as-you-type) on the command line.
Use case: I have my mobile phone connected via USB, Using gammu --sendsms TEXT I can write text messages. I have the phonebook as CSV, and want to search-as-i-type on that.
What's the easiest/best way to do it? It might be in bash/zsh/Perl/Python or any other scripting language.
Edit:
Solution: Modifying Term::Complete did what I want. See below for the answer.
I get the impression GNU Readline supports this kind of thing. Though, I have not used it myself. Here is a C++ example of custom auto complete, which could easily be done in C too. There is also a Python API for readline.
This StackOverflow question gives examples in Python, one of which is ...
import readline
def completer(text, state):
options = [x in addrs where x.startswith(text)]
if state < options.length:
return options[state]
else
return None
readline.set_completer(completer)
this article on Bash autocompletion may help. This article also gives examples of programming bash's auto complete feature.
Following Aiden Bell's hint, I tried Readline in Perl.
Solution 1 using Term::Complete (also used by CPAN, I think):
use Term::Complete;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $input;
while (!defined $input) {
$input = Complete("Enter a name or number: ",#terms);
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
Press \ to complete, press Ctrl-D to see all possibilities.
Solution 2: Ctrl-D is one keystroke to much, so using standard Term::Readline allows completion and the display off possible completions using only \.
use Term::ReadLine;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $term = new Term::ReadLine;
$term->Attribs->{completion_function} = sub { return #terms; };
my $prompt = "Enter name or number >> ";
my $OUT = $term->OUT || \*STDOUT;
while ( defined (my $input = $term->readline($prompt)) ) {
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
This solution still needs a for completion.
Edit: Final Solution
Modifying Term::Complete (http://search.cpan.org/~jesse/perl-5.12.0/lib/Term/Complete.pm) does give me on the fly completion.
Source code: http://search.cpan.org/CPAN/authors/id/J/JE/JESSE/perl-5.12.0.tar.gz
Solution number 1 works with this modification. I will put the whole sample online somewhere else if this can be used by somebody
Modifications of Completion.pm (just reusing it's code for Control-D and \ for each character):
170c172,189
my $redo=0;
#match = grep(/^\Q$return/, #cmp_lst);
unless ($#match < 0) {
$l = length($test = shift(#match));
foreach $cmp (#match) {
until (substr($cmp, 0, $l) eq substr($test, 0, $l)) {
$l--;
}
}
print("\a");
print($test = substr($test, $r, $l - $r));
$redo = $l - $r == 0;
if ($redo) { print(join("\r\n", '', grep(/^\Q$return/, #cmp_lst)), "\r\n"); }
$r = length($return .= $test);
}
if ($redo) { redo LOOP; } else { last CASE; }