I have been doing a lot of automation and data collection via API / scraping and I have recently found an amazing toolset to allow me to use PHP to create scripts I would normally look to Ruby, Python or ReactJS to handle. It is called ReactPHP. There is also Amp which is similar, but I prefer ReactPHP because I was able to accomplish a LOT very quickly.
With ReactPHP you can create scripts that run indefinitely, can set concurrent connections for API request, setup a queue, setup timeouts, setup parallel downloads, etc… It is an absolute game changer for developers that need something that can handle batch processing, multiple services, true multi-threading, I/O tools for filesystem, database read / writes, etc…
I have been collecting data on products using Amazon’s Merchant Web Services. They throttle you at 720 request per hour, per account and a max of 17280 request per day, per account. You can make a request every 2.5 seconds and with ReactPHP I could manage multiple accounts, proxies, request, dynamically load balancing as accounts are added or removed. With ReactPHP I am able to literally max out the number of connections that can be made with a simple script. I guess it isn’t so simple when you first get started, but if you hang in there it starts making a lot of sense.
There are some great examples on GitHub showing you how to use many aspects of ReactPHP. My first script was a multithreaded image downloader that would PHash’ing / and do some image comparison, which was faster than Python, Ruby and NodeJS. I was able to process 1 million images in 19minutes. Ruby was able to download the images in that time, but ReactPHP did so much more. I will admit, my development environment is optimized for PHP-FPM version 7.2.7, so maybe that is why PHP was able to beat out the other languages. That and I didn’t try and optimize those scripts like I did with ReactPHP. Nevertheless, it means that from now on I don’t have to turn to other development environments to achieve what I have always felt PHP was missing… True multi-threading, asynchronous and parallel support. With tools such as RabbitMQ you can do so much more. No more scripts, that execute. You can create services now. In PHP 7.2.x, PHP got some goodies added in the way of basic CLI only support for multithreading but its still limited. With the introduction of ReactPHP into my skill sets and tool set, I can create almost any process and scale its speed indefinitely. You certainly have to think smaller, breaking things down and you spend a lot of time creating out classes and processes that chain together as you can only initiate a single loop at a time per script. So you have to break your processes down into little pieces but the end result is nothing less than spectacular.
Due to my agreement with the company I work with I am not at liberty to go into details or provide any kind of scripts. Sorry! I would just like all PHP developers out there that deal with process automation that ReactPHP is a great framework that might help you out in ways you never thought PHP was possible of doing.
Check out the best starting point: http://sergeyzhuk.me/reactphp-series and don’t forget to check out his GitHub Repo which is full of free goodies.
View Post