General internet "scraping" question - objective-c

I just started studying programming about 6 months ago and I have really been diving deep into Objective-C. Unfortunately, I don't know any programmers IRL to bounce general questions off of.
What languages are being used when people write programs that will search a website for information and then send it back? For example, if I wanted to write a program that would search weather.com for the daily temperature of the last 30 days in a given location and then send it back as say...an NSArray or NSDictionary, how would i do that? Can I do that in Objective C or is that super-advanced scripting language stuff? If I CAN do it in Objective-C, can someone link to a tutorial or place that may get me started learning that type of stuff? (I don't really know the term for this type of programming so my google searches have been unfruitful.)

I most commonly use PHP and MySQL with CURL
http://en.wikipedia.org/wiki/CURL
You can do some fun things like Search Engine Results Page queries, etc.
Here is the source from a crawler I use. I've cut out some parts for anonymity's sake, but it's a good almost-working example. I can help you get it running if need be.
<?php
class Crawler {
protected $markup = '';
protected $uri = '';
protected $db_location = "localhost";
protected $db_username = "***";
protected $db_password = "***";
protected $db_name = "***";
public function __construct() {
ini_set('memory_limit', -1);
}
public function getMarkup() {
$markup = "";
$markup = #file_get_contents($this->uri);
return $markup;
}
public function get($type) {
$method = "_get_{$type}";
if (method_exists($this, $method)){
return call_user_method($method, $this);
}
}
protected function db_query($query) {
$connection = mysql_connect($this->db_location,$this->db_username,$this->db_password) or die(mysql_error());
mysql_select_db($this->db_name,$connection) or die(mysql_error()." >> ".$query);
//echo $query."<br/>"; //for debugging
$result = mysql_query($query,$connection) or die (mysql_error()." >> ".$query);
$i = 0;
if($result != 1)
{
while ($data_array = mysql_fetch_array($result))
{
foreach($data_array as $key => $value)
{
$tableArray[$i][$key] = stripslashes($data_array[$key]);
}
$i++;
}
return $tableArray;
}
}
protected function db_insert($table,$array) {
$tableArray = $this->db_query("show columns from ".$table);
$inputString = "";
foreach($tableArray as $key => $value)
{
if (array_key_exists($value[0], $array) && $value[0]) {
$inputString .= "'".addslashes($array[$value[0]])."', ";
} else {
$inputString .= "'', ";
}
}
$inputString = substr($inputString, 0, -2);
$this->db_query("insert into $table values(".$inputString.")");
return mysql_insert_id();
}
protected function _get_data() {
//$scrape['id'] = $this->get('id');
$scrape['name'] = $this->get('name');
$scrape['tags'] = $this->get('tags');
$scrape['stat_keys'] = $this->get('stat_keys');
$scrape['stat_values'] = $this->get('stat_values');
foreach($scrape['stat_values'] as $key => $value) {
$scrape['stat_values'][$key] = trim($scrape['stat_values'][$key]);
if(strpos($value,"<h5>Featured Product</h5>")) {
unset($scrape['stat_values'][$key]);
}
if(strpos($value,"<h5>Featured Company</h5>")) {
unset($scrape['stat_values'][$key]);
}
if(strpos($value,"<h5>Featured Type</h5>")) {
unset($scrape['stat_values'][$key]);
}
if(strpos($value,"sign in")) {
unset($scrape['stat_values'][$key]);
}
if(strpos($value,"/100")) {
unset($scrape['stat_values'][$key]);
}
}
if(sizeof($scrape['tags']) > 0 && is_array($scrape['tags'])) {
foreach($scrape['tags'] as $tag) {
$tag_array[$tag] = $tag_array[$tag] + 1;
}
$scrape['tags'] = $tag_array;
foreach($scrape['tags'] as $key => $tag_count) {
$scrape['tags'][$key] = $tag_count - 1;
}
}
$scrape['stat_values'] = array_merge(array(),$scrape['stat_values']);
return $scrape;
}
protected function _get_images() {
if (!empty($this->markup)){
preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images);
return !empty($images[1]) ? $images[1] : FALSE;
}
}
protected function _get_links() {
if (!empty($this->markup)){
preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links);
return !empty($links[1]) ? $links[1] : FALSE;
}
}
protected function _get_id() {
if (!empty($this->markup)){
preg_match_all('/\/wine\/view\/([^`]*?)-/', $this->markup, $links);
return !empty($links[1]) ? $links[1] : FALSE;
}
}
protected function _get_grape() {
if (!empty($this->markup)){
preg_match_all('/ class="linked" style="font-size: 14px;">([^`]*?)<\/a>/', $this->markup, $links);
return !empty($links[1]) ? $links[1] : FALSE;
}
}
}
if($_GET['pass'] == "go") {
$crawl = new Crawler();
$crawl->go();
}
?>

So, you want to know how to write server-side code? Well, in theory you can write that in whatever you want. I also assure you it isn't "super-advanced".
You might find it easiest to get started with PHP. W3schools.com has a fine tutorial.

What you are describing is a crawler (e.g. Google).
Any language that has the ability to send HTTP requests and receive responses can do this (which is most languages).
If you don't care to code this thing from scratch, try downloading an open source crawler framework that will allow for custom plugins to parse the resulting HTML.
For your example, you would tell the crawler what site you want it to crawl (i.e. your weather site), add URI constraints if necessary, and create a custom plugin to parse the weather data out of the HTML it responds with. You can then save that data however you see fit.

Related

Parent-driven determination that can end in class change

I'm trying to make a use from Steam API data as I like to learn on live examples, and looking at the way various statistics are returned I began to think that OOP approach would suit me best in this case.
What I'm trying to achieve is to loop through all the results, and programatically populate an array with objects of type that corresponds to the actual type of the statistic. I've tried to build myself a basic class, called Statistic, and after instantiating an object determine wheter or not it's class should change (i.e. whether or not to cast an object of type that Statistic is parent to and if so, of what type). How to do that in PHP? My solution gives me no luck, all of the objects are of type Statistic with it's 'type' property being the object I want to store alone in the array. Code:
$data = file_get_contents($url);
$data = json_decode($data);
$data = $data->playerstats;
$data = $data->stats;
$array;
for($i=0;$i<165;$i++)
{
$array[$i] = new Statistic($data[$i]);
echo "<br/>";
}
var_dump($array[10]);
And the classes' code:
<?php
class Statistic
{
public function getProperties()
{
$array["name"] = $this->name;
$array["value"] = $this->value;
$array["type"] = $this->type;
$array["className"] = __CLASS__;
return json_encode($array);
}
public function setType($x)
{
$y = explode("_",$x->name);
if($y[0]=="total")
{
if(!isset($y[2]))
{
$this->type = "General";
}
else
{
if($y[1]=="wins")
{
$this->type = new Map($x);
$this->__deconstruct();
}
if($y[1]=="kills")
{
$this->type = new Weapon($x);
$this->__deconstruct();
}
else $this->type="Other";
}
}
else $this->type = "Other";
}
function __construct($obj)
{
$this->name = $obj->name;
$this->value = $obj->value;
$this->setType($obj);
}
function __deconstruct()
{
echo "deconstructing <br/>";
return $this->type;
}
}
class Weapon extends Statistic
{
public function setType($x)
{
$y = explode("_",$x);
if($y[1]=="kills")
{
$this->type = "kills";
}
else if($y[1]=="shots")
{
$this->type = "shots";
}
else if($y[1]=="hits")
{
$this->type = "hits";
}
}
function __construct($x)
{
$name = explode("_",$x->name);
$this->name = $name[2];
$this->value = $x->value;
$this->setType($x->name);
}
function __deconstruct()
{
}
}
class Map extends Statistic
{
public function setType($x)
{
if($x[1]=="wins")
{
$this->type = "wins";
}
if($x[1]=="rounds")
{
$this->type = "rounds";
}
}
public function setName($name)
{
if(isset($name[3]))
{
if(isset($name[4]))
{
return $name[3] + " " + $name[4];
}
else return $name[3];
}
else return $name[2];
}
function __construct($x)
{
$name = explode("_",$x->name);
$this->name = $this->setName($name);
$this->value = $x->value;
$this->setType($name);
}
function __deconstruct()
{
}
}
Gives the result:
object(Statistic)#223 (3) {
["name"]=> string(18) "total_kills_deagle"
["value"]=> int(33)
["type"]=> object(Weapon)#222 (3) {
["name"]=> string(6) "deagle"
["value"]=> int(33)
["type"]=> string(5) "kills" }
}
Should that determination be driven from the loop itself, the whole advantage of having a set of functions that does everything for me and returns a ready-to-serve data is gone, since I would really have to cast different objects that aren't connected to each other, which is not the case here. How can I achieve returning objects of different type than the object itself is?
For answer your question How can I achieve returning objects of different type than the object itself is?
"Casting to change the object's type is not possible in PHP (without using a nasty extension)"
For more info: Cast the current object ($this) to a descendent class
So you can't change the class type of an instance with type of a derived class. In other world can't change instance of Static with instance of Weapon.

Moving from hard-coded to SOLID principles in PHP

I am actually reading theory about clean code and SOLID principles. I know understand well that we should program to an interface and not to an implementation.
So, I actually try to apply those principles to a little part of my code. I would like to have your advice or point of view so I can know if I am going in the good direction. I'll show you my previous code and my actual so you can visualize the evolution.
To start, i had a method in my controller to check some requirements for every step of an order process (4 steps that the user have to follow in the right order => 1 then 2 then 3 and then 4)
This is my old code :
private function isAuthorizedStep($stepNumber)
{
$isStepAccessAuthorized = TRUE;
switch($stepNumber) {
case self::ORDER_STEP_TWO: // ORDER_STEP_TWO = 2
if (!($_SESSION['actualOrderStep'] >= ORDER_STEP_ONE)) {
$isStepAccessAuthorized = FALSE;
}
break;
case self::ORDER_STEP_THREE:
if (!($_SESSION['actualOrderStep'] >= ORDER_STEP_TWO)) {
$isStepAccessAuthorized = FALSE;
}
break;
...
}
return $isStepAccessAuthorized;
}
public function orderStepTwo()
{
if ($this->isAuthorizedStep(self::ORDER_STEP_TWO) {
return;
}
... // do some stuff
// after all the verifications:
$_SESSION['actualOrderStep'] = ORDER_STEP_TWO
}
Trying to fit to SOLID principles, I splited my code following this logic:
Extracting hard-coded logic from controllers to put it in classes (reusability)
Using Dependency Injection and abstraction
interface RuleInterface {
public function matches($int);
}
class StepAccessControl
{
protected $rules;
public function __construct(array $rules)
{
foreach($rules as $key => $rule) {
$this->addRule($key, $rule);
}
}
public isAccessGranted($actualOrderStep)
{
$isAccessGranted = TRUE;
foreach($this->rules as $rule) {
if (!$rule->matches($actualOrderStep) {
$isAccessGranted = FALSE;
}
}
return $isAccessGranted;
}
public function addRule($key, RuleInterface $rule)
{
$this->rules[$key] = $rule;
}
}
class OrderStepTwoRule implements RuleInterface
{
public function matches($actualStep)
{
$matches = TRUE;
if (!($actualStep >= 1)) {
$isStepAccessAuthorized = FALSE;
}
return $matches;
}
}
class StepAccessControlFactory
{
public function build($stepNumber)
{
if ($stepNumber == 1) {
...
} elseif ($stepNumber == 2) {
$orderStepTwoRule = new OrderStepTwoRule();
return new StepAcessControl($orderStepTwoRule);
}...
}
}
and then in the controller :
public function stepTwoAction()
{
$stepAccessControlFactory = new StepAccessControlFactory();
$stepTwoAccessControl = $stepAccessControlFactory(2);
if (!$stepTwoAccessControl->isAccesGranted($_SESSION['actualOrderStep'])) {
return FALSE;
}
}
I would like to know if I get the spirit and if I am on the good way :)

what is the correct way to extract two sets of data from a method class

I am new to OOP and still a bit confused by the concepts
I created a class` method that will extract two sets of data from a Zend_Session_Namespace. my problem now is that I don't know how to extract these data when its pulled into another method.
It might be best if I show you what I mean:
Public function rememberLastProductSearched()
{
$session = new Zend_Session_Namespace(searchedproducts);
if ($this->getRequest()->getParam('product-searched')) {
$session->ProductSearched = $this->getRequest()->getParam('product-searched');
return " $session->ProductSearched";
} else {
if ($session->ProductSearched) {
return " $session->ProductSearched ";
}
}
if ($this->getRequest()->getParam('search-term')) {
$session->SearchTerm = $this->getRequest()->getParam('search-term');
return " $session->SearchTerm";
} else {
if ($session->SearchTerm) {
return " $session->SearchTerm ";
}
}
This method should obtain two sets of data i.e the
$session->SearchTerm
$session->ProductSearched
my confusion is this; how do I now extract both sets of data in another method call (that is within the same class).i.e
Above is my attempt to extract the information- but it did not work.
Alternatively, should I have placed the information into an array- if so, can somebody please tell me how I could have done this.
It looks like what you're trying to do is use the product-searched and search-terms from params and store them in the session if they're set, otherwise access previously saved values. It would help a bit to see how you're calling this method, but I would probably modify your code slightly to return the session namespace object instead, since that then contains the two values, regardless of whether they came from params or were there already:
public function rememberLastProductSearched()
{
$searchedProducts = new Zend_Session_Namespace('searchedproducts');
if ($this->getRequest()->getParam('product-searched')) {
$searchedProducts->ProductSearched = $this->getRequest()->getParam('product-searched');
}
if ($this->getRequest()->getParam('search-term')) {
$searchedProducts->SearchTerm = $this->getRequest()->getParam('search-term');
}
return $searchedProducts;
}
I'm assuming you have this method in a controller class, so you'd call it like this:
public function searchAction()
{
$searchedProducts = $this->rememberLastProductSearched();
// do something with the values here
}
you'll then have the two values in $searchedProducts->ProductSearched and $searchedProducts->SearchTerm.
The line "return $something;" will stop the code execution and return the value. If you want to return more than one value, you will need to either return an array or use two separate functions to return the values. If you want to return an array, you could do it this way:
public function rememberLastProductSearched() {
$returnArray = array();
$session = new Zend_Session_Namespace(searchedproducts);
if ($this->getRequest()->getParam('product-searched')) {
$session->ProductSearched = $this->getRequest()->getParam('product-searched');
$returnArray['productSearched'] = $session->ProductSearched;
} else {
if ($session->ProductSearched) {
$returnArray['productSearched'] = $session->ProductSearched;
}
}
if ($this->getRequest()->getParam('search-term')) {
$session->SearchTerm = $this->getRequest()->getParam('search-term');
$returnArray['searchTerm'] = $session->SearchTerm;
} else {
if ($session->SearchTerm) {
$returnArray['searchTerm'] = $session->SearchTerm;
}
}
return $returnArray;
}
In your controller or wherever you wanted to check for those values:
$lastSearch = $this->rememberLastProductSearched();
echo $lastSearch['productSearched']; // Product Searched
echo $lastSearch['searchTerm']; // Search terms
But it might be cleaner to use two function
public function getLastProductSearched() {
$session = new Zend_Session_Namespace(searchedproducts);
if ($this->getRequest()->getParam('product-searched')) {
$session->ProductSearched = $this->getRequest()->getParam('product-searched');
$returnValue = $session->ProductSearched;
} else {
if ($session->ProductSearched) {
$returnValue = $session->ProductSearched;
}
}
return $returnValue;
}
public function getLastSearchTerms() {
$session = new Zend_Session_Namespace(searchedproducts);
if ($this->getRequest()->getParam('search-term')) {
$session->SearchTerm= $this->getRequest()->getParam('search-term');
$returnValue = $session->SearchTerm;
} else {
if ($session->SearchTerm) {
$returnValue = $session->SearchTerm;
}
}
return $returnValue;
}
And you could use them like this:
echo $this->getLastProductSearched(); // Product Searched
echo $this->getLastSearchTerms(); // Search terms
It will make your code easier to read and debug later on. A few more notes on your code. You could avoid using nested ifs by using ||.
if ($this->getRequest()->getParam('product-searched') || $session->ProductSearched) {
$returnValue = $this->getRequest()->getParam('product-searched') || $session->ProductSearched;
}
will achieve the same thing as :
if ($this->getRequest()->getParam('product-searched')) {
$session->ProductSearched = $this->getRequest()->getParam('product-searched');
$returnArray['productSearched'] = $session->ProductSearched;
} else {
if ($session->ProductSearched) {
$returnArray['productSearched'] = $session->ProductSearched;
}
}
Hope this helps !

PHP Memcached extension OOP instantiation

Background:
I have installed the PHP Memcached extension on my live server.
Despite various efforts, I can't seem to install Memcached within my XAMPP development box, so I am relying on the following code to only instantiate Memcached only on the Live server:
My connect file which is included in every page:
// MySQL connection here
// Memcached
if($_SERVER['HTTP_HOST'] != 'test.mytestserver') {
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
}
At the moment I am instantiating each method, and I can't help thinking that that there is a better way to acheive my objective and wonder if anyone has any ideas?
My class file:
class instrument_info {
// Mysqli connection
function __construct($link) {
$this->link = $link;
}
function execute_query($query, $server) {
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
$result = mysqli_query($this->link, $query) or die(mysqli_error($link));
$row = mysqli_fetch_array($result);
if($server == 'live')
$memcache->set($key, $row, 86400);
} // Close function
function check_something() {
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
$query = "SELECT something from somewhere";
if($_SERVER['HTTP_HOST'] != 'test.mytestserver') { // Live server
$key = md5($query);
$get_result = $memcache->get($key);
if($get_result) {
$row = $memcache->get($key);
} else {
$this->execute_query($query, 'live');
}
} else { // Test Server
$this->execute_query($query, 'prod');
}
} // Close function
} // Close Class
I would suggest that you read up on interface-based programming and dependency injection. Here's some example code that might give you an idea about how you should go about it.
interface CacheInterface {
function set($name, $val, $ttl);
function get($name);
}
class MemCacheImpl implements CacheInterface {
/* todo: implement interface */
}
class OtherCacheImpl implements CacheInterface {
/* todo: implement interface */
}
class InstrumentInfo {
private $cache;
private $link;
function __construct($link, $cache) {
$this->link = $link;
$this->cache = $cache;
}
function someFunc() {
$content = $this->cache->get('some-id');
if( !$content ) {
// collect content somehow
$this->cache->set('some-id', $content, 3600);
}
return $content
}
}
define('IS_PRODUCTION_ENV', $_SERVER['HTTP_HOST'] == 'www.my-real-website.com');
if( IS_PRODUCTION_ENV ) {
$cache = new MemCacheImpl();
} else {
$cache = new OtherCacheImpl();
}
$instrumentInfo = new InstrumentInfo($link, $cache);
BTW. You actually have the same problem when it comes to mysqli_query, your'e making your code dependent on a Mysql database and the mysqli extension. All calls to mysqli_query should also be moved out to its own class, representing the database layer.

Yii not able to save Image files CFileUploaded

I am not able to save the image uploaded from simple Form with this code
public function actionImage()
{
print_r($_FILES);
$dir = Yii::getPathOfAlias('application.uploads');
if(isset($_POST['img']))
{
$model = new FileUpload();
$model->attributes = $_POST['img'];
$model->image=CUploadedFile::getInstance($model,'image');
if($model->validate())
{
$model->image->saveAs($dir.'/'.$model->image->getName());
// redirect to success page
}
}
}
To Answer my own question instead of using above code I used this:
public function actionImage()
{
$dir = Yii::getPathOfAlias('application.uploads');
if (isset($_FILES['img']))
{
$image = CUploadedFile::getInstanceByName('img');
$image->saveAs($dir.'/'.$image->getName());
}
}
To Answer my own question instead of using above code I used this:
public function actionImage() {
$dir = Yii::getPathOfAlias('application.uploads');
if (isset($_FILES['img']))
{
$image = CUploadedFile::getInstanceByName('img');
$image->saveAs($dir.'/'.$image->getName());
} }