Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • PHP CURL: How to Crawl multiples pages in a loop

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 2.76k
    Comment on it

    How to crawl many pages at one page hit:

    I was facing issues for getting the data from multiple urls by loop. The problem is that in the first loop I was getting the data by function Cron, but when second page was called by Cron function then I was getting "No DATA RECEIVED" message by the browser. I also tried to delay the function but this logic did not work for me.

     

    This is code for crawling multiple pages at once:

    public function Crawl(){
    
        $url=array(0=>array("ABCD"=>
            array(0=>"http://abcd.com"
            )),
            1=>array("XYZ"=>
            array(0=>"xyz.com"
            ))
            );
    
            foreach ($url as $key => $value) {                
                $this->Cron($value);                
                sleep(5);
            }
    
    }
    
    
    public function Cron($url=null){        
    
        ob_start();
        set_time_out(0);
        foreach($url as $urlkey=>$urlvalue){
    
             for($prodcount=0;$prodcount<count($urlvalue);$prodcount++){
    
             $ch = curl_init();  // Initialising cURL
            curl_setopt($ch, CURLOPT_URL, $urlvalue[$prodcount]);    // Setting cURL's URL option with the $url variable passed into the function
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
            $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
            curl_close($ch);    // Closing cURL
             //echo $data;   // Returning the data from the function
    
            $html = str_get_html($data);
    
            echo $html;
    
             ob_flush();
        }
    
    }
    }

     

    This issue was resolved later. Solution was to add the following lines of code in html header:

     

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

     

    Thanks for reading the post.

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: