What's the best way to scrape keywords from Twitter? - automation

I have an idea for a script that I would like to use for a contest on Twitter. The script I want to build will scrape keywords from a tweet (someone I'm following) and then automate an action when they're found.
My question is, what would be the best way to build this script? Should I script the webpage or an API to get faster results? I need the script to be reliable and very fast because I'll be using this for a contest. Ideally I would like the script to obtain the keywords a few seconds before the tweets are posted online (if possible) to have a better chance on winning.
Thoughts and suggestions? Btw I'm not a developer but I am planning to hire someone to build this script for me after I get the necessary information.

There is no need to scrape the webpage. Use the API. They have a REST API or you could use something like twitter4j. I suggest having a look at their website, see the examples and see if it fits in with what you would like to do.
In my experience the tweets will show up in the API before the are updated on the website.

Related

Can I track if someone clicked a link on my Twitter post

I have a private business Twitter account and I would like to know when someone clicks any link inside one of my posts. This solution cannot assume that we know the form of the link being posted.
For example a twitter post like this:
Have you guys heard of this amazing site called google?
I would like to see how many people clicked on this google.com link. I don't need to know any specific information about who they are, just if it was clicked or not.
Ideally I would want this from the API but crawlers and plugins are also possible. I would like to avoid using a paid tool but those would be acceptable.
I think you have multiple choices:
Use google firebase or google analytics
Create your own short link services by python or any other programming languages.
Just search in the google and look for short link generators which gives appropriate service.
Hi using the twitter api you should be able to understand how many clicks a link has.
https://developer.twitter.com/en/docs/twitter-api/metrics
But to have all this info automated you might need to use a third-party tool.
This should be the most straight forward solution.

Fastest way for webscraping a page implementing a shopping bot

My friend and I are trying to develop a shopping bot. It must be as fast as it can get, because the products might run out of stock in a matter of seconds. We have looked for different ways of doing this, and we came up with Selenium and Scrapy and other python libraries, and we have something working already, but it seems so slow doing the task at hand.
We have thought of instead of scraping the web page (selecting product, adding to cart, etc), try making a bot that just sends an HTML post requests to the server of the store with the product and the rest of the information necessary. We have read in other posts that this is done with the requests library, but how can we know what information and how many post requests does an action require (like for example clicking the add to cart button sends some posts request to the server, so how can we know the information that goes in that request to emulate it in our program?)
We would like the library to be able to scrape web pages with JavaScript, for example when clicking a button or selecting an item from a drop down menu. We have run across some libraries that weren't able to do it (such as Scrapy)
Also we would appreciate if you know of a different programming language with may be better libraries or that it executes faster, we both know Python and Java, but we are open to suggestions
the fastest way would be through requests, using bs4 or regex to scrape the web page, this is what most 'shopping bots' use, to make it even faster you could write the bot in go or typescript which are way faster then python

fetching Ads using google custom search API

Is it possible to fetch ads is search results using the google custom search API ?
Need this to do a simple experiment. Search something and to see if any ad appears in the results.
Is it possible to do this ?
You probably want to screen scrap the search result window. Use some tool like phantomjs and inspect the results.
You could open one Google AdWords account and there you can make sample queries and experiments and whatever you want.
Google are not really a big fan of the experiments on their front-end, on Google AdSense, on Google CSE, CSA, etc. You could try, but you might get a wrong picture - there are some protection algorithms on the front-end.

Twitter API - quick summary

For a project, I wan't to build a webapp, which shows (semi-) real time Tweets with specific keywords.
I'm not sure how to start. Can somebody explain to me how I need to start. I have a PHP framework for my website.
With what language do I need to retrieve data form the Twitter API. JSON? PHP?
Where can I find a nice tutorial?
I hope somebody can provide me a few lines of code, just to help me on the right track.
Thanks!
- Sammy
You could use the Streaming API with the Phirehose library (PHP).
The method "filter" of this API catch all the tweets with the specified keywords in real-time. You can take a look at 140dev for an example of application using this library.

Dumping a list of URLs from Twitter to Tumblr

This question has less to do with actual code, and more to do with the underlying methods.
My 'boss' at my pseudointernship has requested that I write him a script that will scrape a list of links from a users' tweet (the list comes 'round once per week, and it's always the same user) and then publish said list to the company's Tumblr account.
Currently, I am thinking about this structure: The base will be a bash script that first calls some script that uses the Twitter API to find the post given a hashtag and parse the list (current candidates for languages being Perl, PHP and Ruby, in no particular order). Then, the script will store the parsed list (with some markup) into a text file, from where another script that uses the Tumblr API will format the list and then post it.
Is this a sensible way to go about doing this? So far in planning I'm only up to getting the Twitter post, but I'm already stuck between using the API to grab the post or just grabbing the feed they provide and attempting to parse it. I know it's not really a big project, but it's certainly the largest one I've ever started, so I'm paralyzed with fear when it comes to making decisions!
From your description, there's no reason you shouldn't be able to do it all in one script, which would simplify things unless there's a good reason to ferry the data between two scripts. And before you go opening connections manually, there are libraries written for many languages for both Tumblr and Twitter that can make your job much easier. You should definitely not try to parse the RSS feed - they provide an API for a reason.*
I'd personally go with Python, as it is quick to get up and running and has great libraries for such things. But if you're not familiar with that, there are libraries available for Ruby or Perl too (PHP less so). Just Google "{platform} library {language}" - a quick search gave me python-tumblr, WWW:Tumblr, and ruby-tumblr, as well as python-twitter, Net::Twitter, and a Ruby gem "twitter".
Any of these libraries should make it easy to connect to Twitter to pull down the tweets for a particular user or hashtag via the API. You can then step through them, parsing it as needed, and then use the Tumblr library to post them to Tumblr in whatever format you want.
You can do it manually - opening and reading connections or, even worse, screen scraping, but there's really no sense in doing that if you have a good library available - which you do - and it's more prone to problems, quirks, and bugs that go unnoticed. And as I said, unless there's a good reason to use the intermediate bash script, it would be much easier to just keep the data within one script, in an array or some other data structure. If you need it in a file too, you can just write it out when you're done, from the same script.
*The only possible complication here is if you need to authenticate to
Twitter - which I don't think you do,
if you're just getting a user timeline
- they will be discontinuing basic authentication very soon, so you'll
have to set up an OAuth account (see
"What is OAuth" over at
dev.twitter.com). This isn't really a
problem, but makes things a bit more
complicated. The API should still be
easier than parsing the RSS feed.
Your approach seems to be appropriate.
Utilize user_timeline twitter api to fetch all tweets posted by a user.
Parse the fetcned list ( may be using regex ) to extract links from tweets and store them in an external file.
Post those links to tumblr account using tumblr write api.
You may also want to track last fetched tweet id from twitter so that you can continue extraction from that tweet id.