I am trying to make a bot to simulate some human behaviors, and I got some instructions about scrapy to login at a page like nike.com.br, but once I need to select some buttons and submit some forms I was not able to find how.
Can anyone help me on it?
for example, after the login, I need to chose the size of the product and click at add to the cart, that is some way to do it using scrapy?
It's hard to answer you question because it's too generic, and this probably will have different solutions for different pages.
Generally speaking you need to check what the page is doing when you click to submit the form. Most likely a POST request, so you will need to mimic that POST request with scrapy (check FormRequest).
Same logic applies to add item to the card.
I think the best way to approach that is to use the browser's network tool. In scrapy docs there are a few tips about using it for similar purpose (here).
Related
I have a private business Twitter account and I would like to know when someone clicks any link inside one of my posts. This solution cannot assume that we know the form of the link being posted.
For example a twitter post like this:
Have you guys heard of this amazing site called google?
I would like to see how many people clicked on this google.com link. I don't need to know any specific information about who they are, just if it was clicked or not.
Ideally I would want this from the API but crawlers and plugins are also possible. I would like to avoid using a paid tool but those would be acceptable.
I think you have multiple choices:
Use google firebase or google analytics
Create your own short link services by python or any other programming languages.
Just search in the google and look for short link generators which gives appropriate service.
Hi using the twitter api you should be able to understand how many clicks a link has.
https://developer.twitter.com/en/docs/twitter-api/metrics
But to have all this info automated you might need to use a third-party tool.
This should be the most straight forward solution.
My friend and I are trying to develop a shopping bot. It must be as fast as it can get, because the products might run out of stock in a matter of seconds. We have looked for different ways of doing this, and we came up with Selenium and Scrapy and other python libraries, and we have something working already, but it seems so slow doing the task at hand.
We have thought of instead of scraping the web page (selecting product, adding to cart, etc), try making a bot that just sends an HTML post requests to the server of the store with the product and the rest of the information necessary. We have read in other posts that this is done with the requests library, but how can we know what information and how many post requests does an action require (like for example clicking the add to cart button sends some posts request to the server, so how can we know the information that goes in that request to emulate it in our program?)
We would like the library to be able to scrape web pages with JavaScript, for example when clicking a button or selecting an item from a drop down menu. We have run across some libraries that weren't able to do it (such as Scrapy)
Also we would appreciate if you know of a different programming language with may be better libraries or that it executes faster, we both know Python and Java, but we are open to suggestions
the fastest way would be through requests, using bs4 or regex to scrape the web page, this is what most 'shopping bots' use, to make it even faster you could write the bot in go or typescript which are way faster then python
I want to create an app for faster payment of parking.
This question is more about logic of my app, and what tools I need to use about creating it.
At this point, I use a parking place every day and I pay for it through the web page.
I do it like this.
Login to page.
click on the menu and it redirects me to www.parkingexample.page/payments
there is a search menu and I enter my car plate number if my car is found it returns me how much I need to pay, and "Pay" Button appears.
I click "Pay" buttons and then it's all done.
So my goal is to create an app that when I start it will automatically connect to the page and will search for my plate and if found and payment is needed there would be just one button "Pay"
So I think I should do it like this, but as I haven't created any web app(I'm 100% back-end developer) I ask you is my thought process is correct.
And also I don't want to use WebView as I think it's not necessary for me.
When I start my app it sends "POST" request to page to login.
Then I send 'GET' request to www.parkingexample.page/payments with params = 'mycarspaltenumber'
Somehow I need to click on PAY button on page when it appears so I think it's probably again 'POST' request, but at this point, I'm not sure.
So a QUESTION is, is my logic valid? or it can be done in some other way?
UPDATE. ADDED SCREENSHOTS
First Screen shoot this is the menu after I logged in with the search bar where I need to enter my card plate.
Second screen is where I found my car(Entered plate number and clicked search)
and now the page is updated with sum I have to pay and there is a button "PAID" in the bottom right corner I need to click.
And that's all i need.
To validate whether your suggested sequence is correct I would start by capturing your typical browser session between yourself and your parking provider with something like Fiddler. Then I would use HTTP client library of choice (for C# it would be something like HttpClient) and emulate the same flow with correct headers, query parameters and such like.
Looknig at your screenshots it seems the application is ASP.NET Web Forms, which can get a bit painful to emulate due to way its state management works: you will likely need to decode View state object (to ensure you're passing it back correctly) and locate all dynamic field ids that it uses for postbacks. This however is very doable.
If you discover that the above is too hard to emulate (or there's javascript involved) it might be easier to explore Remote Selenium WebDriver coupled with a headless browser like PhantomJS. You'd then have your PhantomJS interact with the page on your server, and you'll drive it with your mobile app. Basically you'll reduce the complexity of your parking provider page to a well documented API.
Hopefully that gives you a starting point
In your application, all that you will need is services call and the security part of logging a new user everytime to check for payment.
So It will be a simple spring-boot application, where you can use the security part for logging, and you can exactly use the simple way , for example you don't need to have a database, just to redirect your page, and if you are not familiar to front-end framework, you can use a basic html-css pages for client side.
Another important point, you should start by designing your application, before coding, because it's very important to know all the ideas behind your application.
Enjoy your doing time!
I'm trying to scrape a login-only, bot-sensitive website. After logging in, when I perform a simple selenium function like driver.find_element_by_id('button').click(), the website displays a message along the lines of We think you are a bot. Please complete the CAPTCHA below to continue.
Is there any way for me to make selenium more human-like so I don't trigger CAPTCHAs?
Hopefully not.
You are scraping, i.e. you are developing a bot, and if you try to avoid being identified as a bot, it will just be a question of time until the captcha gets improved to detect your strategy.
DonĀ“t do it. The captcha is there for a reason, which is: to detect and lockout bots!
Better check if the page you want to scrape supports an API that allows computer-to-computer communication. If there is one, use it. If there is none, suggest one, but depending on whether the web page owner wants to support your goals, or not, he might say "no".
I am interested in writing a script that goes to a website and clicks a link at a certain time. How do I go about doing something like this?
You should use selenium http://seleniumhq.org/
You can control it using anyone of the language you specified in the tags.
You can start to browse from
http://seleniumhq.org/projects/remote-control/
"clicking a link" could have two meanings:
Actually clicking the link in a browser, or just doing an HTTP GET that would result from it. This could be as simple as software that runs on your desktop and simulates a click at a certain point, to something as complicated as Selenium for automation of website interactions.
If you just need to do the GET request that clicking the link would do, anything would do. Unix systems typically include wget and curl, which take a url to request. Or if you want to process the data, you can do this in most programming languages. For example, in python you could do urllib2.urlopen('http://stackoverflow.com') and then do whatever you want with the data. Perl has an equivalent.
Are you familiar with cURL?