Showing posts with label PHP. Show all posts
Showing posts with label PHP. Show all posts

Sunday, April 6, 2014

Abstract File Systems with Flysystem

Reading and writing files is an integral aspect of any programming language, but the underlying implementation can vary enormously. For example, the finer details of writing data to the local filesystem compared to uploading over FTP is very different – yet conceptually, it’s very similar.

In addition to old warhorses like FTP, online storage is increasingly ubiquitous and inexpensive – with scores of services available such as Dropbox, Amazon’s S3 and Rackspace’s Cloud Files – but these all use subtly different methods for reading and writing.

That’s where flysystem comes in. It provides a layer of abstraction over multiple file systems, which means you don’t need to worry where the files are, how they’re stored, or need be concerned with low-level I/O operations. All you need to worry about are the high-level operations such as reading, writing and organizing into directories.

Such abstraction also makes it simpler to switch from one system to another without having to rewrite vast swathes of application code. It also provides a logical approach to moving or copying data from one storage system to another, without worrying about the underlying implementation.

You can use Dropbox, S3, Cloud Files, FTP or SFTP as if they were local; saving a file becomes the same process whether it’s being saved locally or transferred over the network. You can treat zip archives as if they were a bunch of folders, without worrying about the nitty gritty of creating and compressing the archives themselves.

Flysystem also supports technologies such as Memcached and Predis, treating them as if they were file systems. That way you can read a “file” without worrying whether it’s coming from disk (e.g. a local filesystem or an S3 bucket) or straight from memory (e.g. Memcached).

 

Installation and Basic Usage

 

As ever, Composer is the best way to install:
"league/flysystem": "0.2.*"

Now you can simply create one or more instances of League\Flysystem\Filesystem, by passing in the appropriate adapter.

For example, to use a local directory:

use League\Flysystem\Filesystem;
use League\Flysystem\Adapter\Local as Adapter;

$filesystem = new Filesystem(new Adapter('/path/to/directory'));

To use an Amazon S3 bucket, there’s slightly more configuration involved:

use Aws\S3\S3Client;
use League\Flysystem\Adapter\AwsS3 as Adapter;

$client = S3Client::factory(array(
    'key'    => '[your key]',
    'secret' => '[your secret]',
));

$filesystem = new Filesystem(new Adapter($client, 'bucket-name', 'optional-prefix'));

To use Dropbox:

use Dropbox\Client;
use League\Flysystem\Adapter\Dropbox as Adapter;

$client = new Client($token, $appName);
$filesystem = new Filesystem(new Adapter($client, 'optional/path/prefix'));

(To get the token and application name, create an application using Dropbox’s App Console.)
Here’s an example for SFTP – you may not need every option listed here:

use League\Flysystem\Adapter\Sftp as Adapter;

$filesystem = new Filesystem(new Adapter(array(
    'host' => 'example.com',
    'port' => 21,
'username' => 'username',
'password' => 'password',
'privateKey' => 'path/to/or/contents/of/privatekey',
'root' => '/path/to/root',
'timeout' => 10,
)));

Or Memcached:

use League\Flysystem\Adapter\Local as Adapter;
use League\Flysystem\Cache\Memcached as Cache;

$memcached = new Memcached;
$memcached->addServer('localhost', 11211);
$filesystem = new Filesystem(new Adapter(__DIR__.'/path/to/root'), 
new Cache($memcached, 'storageKey', 300));
// Storage Key and expire time are optional

For other adapters such as normal FTP, Predis or WebDAV, refer to the documentation.

 

Reading and Writing to a File System

 

As far as your application code is concerned, you simply need to replace calls such as file_exists(), fopen() / fclose(), fread / fwrite and mkdir() with their flysystem equivalents.
For example, take the following legacy code, which copies a local file to an S3 bucket:

    $filename = "/usr/local/something.txt";
    if (file_exists($filename)) {
        $handle = fopen($filename, "r");
        $contents = fread($handle, filesize($filename));
        fclose($handle);

        $aws = Aws::factory('/path/to/config.php');
        $s3 = $aws->get('s3');

        $s3->putObject(array(
            'Bucket' => 'my-bucket',
            'Key'    => 'data.txt',
            'Body'   => $content,
            'ACL'    => 'public-read',
        )); 
    }

Using flysystem, it might look a little like this:

    $local = new Filesystem(new Adapter('/usr/local/'));
    $remote = new Filesystem(
        S3Client::factory(array(
            'key'    => '[your key]',
            'secret' => '[your secret]',
        )),
        'my-bucket'
    );

    if ($local->has('something.txt')) {
        $contents = $local->read('something.txt');
        $remote->write('data.txt', $contents, ['visibility' : 'public']);
    }

Notice how we’re using terminology such as reading and writing, local and remote – high level abstractions, with no worrying about things like creating and destroying file handles.

Here’s a summary of the most important methods on the League\Flysystem\Filesystem class:

Method Example
Reading $filesystem->read('filename.txt')
Writing $filesystem->write('filename.txt', $contents)
Updating $filesystem->update('filename.txt')
Writing or updating $filesystem->put('filename.txt')
Checking existence $filesystem->has('filename.txt')
Deleting $filesystem->delete('filename.txt')
Renaming $filesystem->rename('old.txt', 'new.txt')
Reading Files $filesystem->read('filename.txt')
Getting file info $filesystem->getMimetype('filename.txt')

$filesystem->getSize('filename.txt')

$filesystem->getTimestamp('filename.txt')
Creating directories $filesystem->createDir('path/to/directory')
Deleting directories $filesystem->deleteDir('path/to/directory')

 

Automatically Creating Directories

 

When you call $filesystem->write(), it ensures that the directory you’re trying to write to exists – if it doesn’t, it creates it for you recursively.

So this…

$filesystem->write('/path/to/a/directory/somewhere/somefile.txt', $contents);
…is basically the equivalent of:
$path = '/path/to/a/directory/somewhere/somefile.txt';
if (!file_exists(dirname($path))) {
    mkdir(dirname($path), 0755, true);
}
file_put_contents($path, $contents);

 

Managing Visibility

 

Visibility – i.e., permissions – can vary in implementation or semantics across different storage mechanisms, but flysystem abstracts them as either “private” or “public”. So, you don’t need to worry about specifics of chmod, S3 ACLs, or whatever terminology a particular mechanism happens to use.
You can either set the visibility when calling write():

$filesystem->write('filename.txt', $data, ['visibility' : 'private']);

Prior to 5.4, or according to preference:

$filesystem->write('filename.txt', $data, array('visibility' => 'private'));

Alternatively, you can set the visibility of an existing object using setVisibility:

$filesystem->setVisibility('filename.txt', 'private');

Rather than set it on a file-by-file basis, you can set a default visibility for a given instance in its constructor:

$filesystem = new League\Flysystem\Filesystem($adapter, $cache, [
'visibility' => AdapterInterface::VISIBILITY_PRIVATE
]);

You can also use getVisibility to determine a file’s permissions:

if ($filesystem->getVisibility === 'private') {
    // file is secret
}

 

Listing Files and Directories

 

You can get a listing of all files and directories in a given directory like this:

$filesystem->listContents();

The output would look a little like this;

[0] =>
  array(8) {
  'dirname' =>
    string(0) ""
    'basename' =>
    string(11) "filters.php"
    'extension' =>
    string(3) "php"
    'filename' =>
    string(7) "filters"
    'path' =>
    string(11) "filters.php"
    'type' =>
    string(4) "file"
    'timestamp' =>
    int(1393246029)
    'size' =>
    int(2089)
}
[1] =>
array(5) {
    'dirname' =>
    string(0) ""
    'basename' =>
    string(4) "lang"
    'filename' =>
    string(4) "lang"
    'path' =>
    string(4) "lang"
    'type' =>
    string(3) "dir"
}

If you wish to incorporate additional properties into the returned array, you can use listWith():

$flysystem->listWith(['mimetype', 'size', 'timestamp']);

To get recursive directory listings, the second parameter should be set to TRUE:

$filesystem->listContents(null, true);

The null simply means we start at the root directory.
 

To get just the paths:

$filesystem->listPaths();

 

Mounting Filesystems

 

Mounting filesystems is a concept traditionally used in operating systems, but which can also be applied to your application. Essentially it’s like creating references – shortcuts, in a sense – to filesystems, using some sort of identifier.

Flysystem provides the Mount Manager for this. You can pass one or more adapters upon instantiation, using strings as keys:

$ftp = new League\Flysystem\Filesystem($ftpAdapter);
$s3 = new League\Flysystem\Filesystem($s3Adapter);

$manager = new League\Flysystem\MountManager(array(
    'ftp' => $ftp,
    's3' => $s3,
));

You can also mount a file system at a later time:

$local = new League\Flysystem\Filesystem($localAdapter);
$manager->mountFilesystem('local', $local);

Now you can use the identifiers as if they were protocols in URI’s:

// Get the contents of a local file…
$contents = $manager->read('local://path/to/file.txt');

// …and upload to S3
$manager->write('s3://path/goes/here/filename.txt', $contents);

It’s perhaps more useful to use identifiers which are generic in nature, e.g.:

$s3 = new League\Flysystem\Filesystem($s3Adapter);
$manager = new League\Flysystem\MountManager(array(
    'remote' => new League\Flysystem\Filesystem($s3Adapter),
));

// Save some data to remote storage
$manager->write('remote://path/to/filename', $data);

Friday, March 28, 2014

Image Scraping with Symfony’s DomCrawler

A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn’t present.

I didn’t want to go through the stress of downloading the images individually, so I wrote this PHP class to find, download and zip all images found on the website.

How the Class works

 

It searches a URL for images, downloads and saves the images into a folder, creates a ZIP archive of the folder and finally deletes the folder.

The class uses Symfony’s DomCrawler component to search for all image links found on the webpage and a custom zip function that creates the zip file. Credit to David Walsh for the zip function.

Coding the Class

 

The class consists of five private properties and eight public methods including the __construct magic method.



Below is the list of the class properties and their roles.

1. $folder: stores the name of the folder that contains the scraped images.
2. $url: stores the webpage URL.
3. $html: stores the HTML document code of the webpage to be scraped.
4. $fileName: stores the name of the ZIP file.
5. $status: saves the status of the operation. I.e if it was a success or failure.

Let’s get started building the class.

Create the class ZipImages containing the above five properties.

<?php
class ZipImages {
    private $folder;
    private $url;
    private $html;
    private $fileName;
    private $status;

Create a __construct magic method that accepts a URL as an argument.
The method is quite self-explanatory.

public function __construct($url) {
    $this->url = $url; 
    $this->html = file_get_contents($this->url);
    $this->setFolder();
}

The created ZIP archive has a folder that contains the scraped images. The setFolder method below configures this.

By default, the folder name is set to images but the method provides an option to change the name of the folder by simply passing the folder name as its argument.

public function setFolder($folder="image") {
    // if folder doesn't exist, attempt to create one and store the folder name in property $folder
    if(!file_exists($folder)) {
        mkdir($folder);
    }
    $this->folder = $folder;
}
setFileName provides an option to change the name of the ZIP file with a default name set to zipImages:
public function setFileName($name = "zipImages") {
    $this->fileName = $name;
}

At this point, we instantiate the Symfony crawler component to search for images, then download and save all the images into the folder.

public function domCrawler() {
    //instantiate the symfony DomCrawler Component
    $crawler = new Crawler($this->html);
    // create an array of all scrapped image links
    $result = $crawler
        ->filterXpath('//img')
        ->extract(array('src'));

// download and save the image to the folder 
    foreach ($result as $image) {
        $path = $this->folder."/".basename($image);
        $file = file_get_contents($image);
        $insert = file_put_contents($path, $file);
        if (!$insert) {
            throw new \Exception('Failed to write image');
        }
    }
}

After the download is complete, we compress the image folder to a ZIP Archive using our custom create_zip function.

public function createZip() {
    $folderFiles = scandir($this->folder);
    if (!$folderFiles) {
        throw new \Exception('Failed to scan folder');
    }
    $fileArray = array();
    foreach($folderFiles as $file){
        if (($file != ".") && ($file != "..")) {
            $fileArray[] = $this->folder."/".$file;
        }
    }

    if (create_zip($fileArray, $this->fileName.'.zip')) {
        $this->status = <<<HTML
File successfully archived. <a href="$this->fileName.zip">Download it now</a>
HTML;
    } else {
        $this->status = "An error occurred";
    }
}

Lastly, we delete the created folder after the ZIP file has been created.

public function deleteCreatedFolder() {
    $dp = opendir($this->folder) or die ('ERROR: Cannot open directory');
    while ($file = readdir($dp)) {
        if ($file != '.' && $file != '..') {
            if (is_file("$this->folder/$file")) {
                unlink("$this->folder/$file");
            }
        }
    }
    rmdir($this->folder) or die ('could not delete folder');
}

Get the status of the operation. I.e if it was successful or an error occurred.

public function getStatus() {
    echo $this->status;
}

Process all the methods above.

public function process() {
    $this->domCrawler();
    $this->createZip();
    $this->deleteCreatedFolder();
    $this->getStatus();
}

You can download the full class from Github.

Class Dependency

 

For the class to work, the Domcrawler component and create_zip function need to be included. You can download the code for this function here.

Download and install the DomCrawler component via Composer simply by adding the following require statement to your composer.json file:

"symfony/dom-crawler": "2.3.*@dev"
Run $ php composer.phar install to download the library and generate the vendor/autoload.php autoloader file.

Using the Class

 

  • Make sure all required files are included, via autoload or explicitly.
  • Call the setFolder , and setFileName method and pass in their respective arguments. Only call the setFolder method when you need to change the folder name.
  • Call the process method to put the class to work.
<?php
    require_once 'zipfunction.php';
    require_once 'vendor/autoload.php';
    use Symfony\Component\DomCrawler\Crawler;
    require_once 'vendor/autoload.php';

    //instantiate the ZipImages class
    $object = new ArchiveFile('http://sitepoint.com');
    // set the zip file name
    $object->setFolder('pictureFrames');
    // set the zip file name
    $object->setFileName('myframes');
    // initialize the class process
    $object->process();
 
Source: sitepoint.com 

Wednesday, March 26, 2014

Getting Started with PHP Extension Development via PHP-CPP

In your dealings with PHP, you may come to consider writing a PHP extension yourself. There are several reasons I can think of that motivate me to do so:
  • to extend PHP functionality for some very particular usage (mathematics, statistics, geometric, etc).
  • to have a higher performance and efficiency compared to a pure PHP implementation
  • to leverage the swiftness obtained from programming in another previously grasped language (for me, C++).
When it comes to choosing a tool to build PHP extensions, we see two different approaches:
  • Use more pro-PHP semantics, like Zephir.
  • Use more pro-C/C++ semantics, like PHP-CPP, which will be addressed in this article.
For me, the main drive to select the second approach is simple: I started my programming hobby with C/C++, so I still feel more comfortable writing those lower level modules in C/C++. PHP-CPP’s official site gives a few other reasons to do so.

Installation and configuration

PHP-CPP is evolving rapidly. At the time of this article’s writing, it is in version 0.9.1 (with 0.9.0 released about 2 days before). According to its documentation, “this is a feature-freeze release that prepares for the upcoming v1.0 version”, so we are confident we’ll see its 1.0 major release very soon.

It is thus recommended, at least during this interim period, to use git to clone the repository and get the latest update later via git pull

NOTE: The PHP-CPP documentation on installation states that for the time being, it “only supports single-threaded PHP installations” because “internally the Zend engine uses a very odd system to ensure thread safety”. Future releases may support multi-threaded PHP installations but let’s just keep this in mind for now and stick to its current limitation. Luckily, “single-threaded PHP installations” should be the case for most of the PHP installations out there.

PHP-CPP is written in C++11. Thus, the older version of g++ installed in my Ubuntu 12.04 LTS does not support it. We need to upgrade our g++ compiler to version 4.8.x above. There is an article detailing the steps to do the upgrading. Please follow the instructions listed there. 

Also, PHP-CPP compilation will use the php.h header file. This file is normally missing in an Ubuntu box, unless php-dev was installed. We can install PHP5 related development files by issuing this command: 

sudo apt-get install php5-dev
 
After upgrading g++ and installing the necessary header files, we can issue the following command to compile and install the PHP-CPP library file (libphpcpp.so):

make && sudo make install
 
The compilation will be quite fast. After the installation, the libphpcpp.so file will be copied over to /usr/lib, and all PHP-CPP header files will be copied to /usr/include and /usr/include/phpcpp folders.

The installation of PHP-CPP lib is now complete. It is quite straightforward and we can now move on to the programming part. 

Before we do that, we’ll discuss a few important concepts and terminologies used in PHP-CPP. The full documentation can be found on its official site, and everyone is encouraged to read through it BEFORE doing any real programming.

Skeleton (empty) extension project files

PHP-CPP provides a skeleton extension project, containing the following 3 files:
  • main.cpp: the main cpp file containing a get_module function (will be discussed in more detail later)
  • Makefile: the sample MAKE file to compile the extension
  • yourextension.ini: contains just one line for extension loading

Makefile

If you are familiar with *nix development, you are familiar with this Makefile. Some slight changes shall be made to customize this file to fit our needs:
  • Change NAME = yourextension to a more meaningful one, like NAME = skeleton.
  • Change INI_DIR = /etc/php5/conf.d to match your system’s configuration. In my case, it is INI_DIR = /etc/php5/cli/conf.d. I modified the INI path to enable the extension for PHP’s cli environment first.
These are all the changes I have made. The rest of the Makefile can be kept as it is.

yourextension.ini

I renamed this file to skeleton.ini and changed the only line in that file like this:
extension=skeleton.so

main.cpp

In the empty project provided by PHP-CPP, this file contains only one function: get_module(), which is excerpted below:

#include <phpcpp.h>

/**
 *  tell the compiler that the get_module is a pure C function
 */
extern "C" {

    /**
     *  Function that is called by PHP right after the PHP process
     *  has started, and that returns an address of an internal PHP
     *  strucure with all the details and features of your extension
     *
     *  @return void*   a pointer to an address that is understood by PHP
     */
    PHPCPP_EXPORT void *get_module() 
    {
        // static(!) Php::Extension object that should stay in memory
        // for the entire duration of the process (that's why it's static)
        static Php::Extension extension("yourextension", "1.0");

        // @todo    add your own functions, classes, namespaces to the extension

        // return the extension
        return extension;
    }
}  
For now, let’s just change this line to match the extension name we intend to create:

static Php::Extension extension("skeleton", "1.0"); // To be humble, we can change the version number to 0.0.1
get_module() is called by PHP when the latter tries to load a required library. It is considered the entry point for a lib. It is declared using the extern "C" modifier to comply with PHP lib requirement for the get_module() function. It also uses a macro PHPCPP_EXPORT which makes sure that get_module() is publicly exported, and thus callable by PHP.

So far, we have made some changes to the empty project to suit our needs. We can now compile and install this project and install the extension: 

make && sudo make install
 
Next, we need to copy the required files into the appropriate folders:

cp -f skeleton.so /usr/lib/php5/20121212
 
cp -f skeleton.ini /etc/php5/cli/conf.d
 
We just need to make sure that the skeleton.so lib is copied to the right location of PHP extensions (in my Ubuntu setup, it should be /usr/lib/php5/20121212 as shown above). 

We can then verify the extension is loaded in CLI by php -i | grep skeleton, and the terminal shall display something like this:


(Recall that the skeleton.ini is the file we modified above, which contains the extension=skeleton.so line.)

We have so far compiled and installed our first PHP extension using PHP-CPP. Of course, this extension does nothing yet. We will now create our first few functions to further understand the process of building PHP extensions.

“Hello, Taylor” function

The first function we create will be a slightly modified version of “Hello, World”. Let’s see the full code of main.cpp first:

#include <phpcpp.h>
 
#include <iostream>

void helloWorld (Php::Parameters &params)
{
    std::string name=params[0];
    std::cout<<"Hello "<<name<<"!"<<std::endl;

}

extern "C" {

    PHPCPP_EXPORT void *get_module() 
    {
        static Php::Extension extension("skeleton", "1.0");
        extension.add("helloWorld", helloWorld);

        return extension;
    }
}
According to the PHP-CPP documentation on “Register native functions“, it supports four types of function signatures to be called from PHP:

void example1();
 
void example2(Php::Parameters &params);
 
Php::Value example3();
 
Php::Value example4(Php::Parameters &params);
 
In this case, I am using the second signature and the parameters are passed by value in an array form (PHP feature). 

However, in helloWorld, we have specifically used C++ type std::string to grab the first parameter. We have also used C++ std lib to output a welcoming message. 

In get_module() function, after declaring the extension variable, we add the function we would like to export (helloWorld()) and assign a name visible to the PHP script (helloWorld). 

Now let’s compile and install the extension. If everything goes smoothly, the new skeleton.so file will be copied to the extension directory. 

We can write a simple script to test the function just created:
<?php

echo "Testing helloWorld in skeleton.so\n";
 
echo helloWorld('Taylor'); 
 
 echo helloWorld(1234+5678);
 
echo helloWorld(['string', 123+456]);
 
Please take some time to look at the output:

We will come back to what we have observed here later.

Function parameters by reference

Next, we will see another function which passes parameters by reference, a swap() function. In this function, we will also try to specify the number of parameters and their type. 

In main.cpp, we add one more function swap():
void swap(Php::Parameters &params) {
    Php::Value temp = params[0];
    params[0] = params[1];
    params[1] = temp;
}
And also export the function by specifying the number of parameters and their type:
extension.add("swap", swap,{
            Php::ByRef("a", Php::Type::Numeric),
            Php::ByRef("b", Php::Type::Numeric)
        });
We explicitly say that:
  • There will be two parameters (a and b);
  • They should be passed by reference (instead of by value);
  • They should be of type Numeric.
Let’s compile and install the updated extension again and write some code snippets to see how this new functions works:
<?php

$a=10;
$b=20;

// swap($a); 
// Above will create a segment fault

swap($a, $b);
echo $a."|".$b."\n";

$c=10;
$d="string";
swap($c, $d);
echo $c."|".$d."\n";

$e=10;
$f=new \DateTime();
swap($e, $f);
var_dump($e);
var_dump($f);
 
swap($a) will fail. This is expected and unexpected. The expected part is that we need two parameters and only one is given. But, shouldn’t that error be captured by PHP when calling the function swap and prompting us something like Not enough parameters?

The first call (swap($a, $b)) shows the expected result: 20|10. The function swaps the two numbers passed in. 

The second call is somehow unexpected: we have told PHP that we are to swap two numbers! But it just ignores the fact that the 2nd parameter passed is a string and does the swapping anyway!

Well, in a way, it is still expected. PHP does not really distinguish a number type and a string type. This behavior complies to the PHP standard. Also due to this behavior, we didn’t and can’t use C++ internal types for the temporary variable used in the function (temp) but used Php::Value as the variable type. 

The third call will work. The first var_dump will show the DateTime object and the second will show the integer. This is somehow very much unexpected (at least to me). After all, an object is quite different from a number/string. But after considering that this “swap” behavior is also doable in PHP, it fits in with PHP’s oddities. 

So, does it mean the “type” specification won’t have any impact? Not really. To further elaborate this, we create a third function:

void swapObject(Php::Parameters &params)
{
    Php::Value temp = params[0];
    params[0] = params[1];
    params[1] = temp;
}
And we register this function like this:
extension.add("swapObject", swap,{
            Php::ByRef("a", "sampleClass"),
            Php::ByRef("b", "sampleClass")
});
The testing code will be like this:
class sampleClass {
    var $i;
    public function __construct($n) {
        $this->i = $n;
    }
}

$o1 = new sampleClass(10);
$o2 = new sampleClass(20);

swapObject($o1, $o2);
echo $o1->i . "|" . $o2->i . "\n";

class anotherClass {
}

$d1 = new anotherClass();
$d2 = new anotherClass();

swapObject($d1, $d2);
 
The first call to swapObject() will work as we passed in the correct class type (sampleClass). The second will fail, displaying “PHP Catchable fatal error: Argument 1 passed to swapObject() must be an instance of sampleClass, instance of anotherClass given...“. 

The above code segment illustrates one important aspect on type restriction: scalar types declaration is not really implemented. PHP and thus PHP-CPP only enforce object-type declaration. Also, the number of parameters is not really enforced on the PHP side.

Source: Site Point