In summer 2011 I’ve been asked to develop a Web application in PHP to monitor the status of some sensors in a laboratory inside a big research centre. Data is gathered by a VB.NET software which communicates with a self made DAQ system built on top of a 80188-based card and I couldn’t make any change on it.
The application was thought to store data in a .mdb file on local hard drive for backup purpose; then, since path can be configured, the .mdb file has been moved on Microsoft DFS for using it as an Integration Database.
Distributed File System (DFS) is a set of client and server services that allow an organization using Microsoft Windows servers to organize many distributed SMB file shares into a distributed file system. DFS provides location transparency and redundancy to improve data availability in the face of failure or heavy load by allowing shares in multiple different locations to be logically grouped under one folder, or DFS root. Wikipedia
This kind of solutions will involve lots of performance, reliability and consistence issues.
- Performance: mind that a mdb is not a RDBMS, there’s no centralized server but just two files, and all the RDBMS logic is client-side in the JET engine, that means if you have gigabytes of data on it you will wait a lot for queries which require a full table scan, and doing that from a Web server which handles lots of Web sites is even worse. The time needed for opening a connection is too long, for this reason you need to keep it always open.
- Reliability: keeping a file open 24/7 on DFS is not a good idea, because the infrastructure won’t be able to handle replication, and sometimes happens that someone or something forces the read-only flag. Opening a connection means just opening a .mdb file and place a record in the .ldb one, which will be removed when closing the “connection”. If there’s a temporary network failure, that record will be kept forever and there’s a maximum of 255, after that JET will stop working.
- Consistence: for obvious performance reasons network file systems have big buffers which means delays on writing data. What If you have more than one application writing on same file?
Ugly problems often require ugly solutions. Solving an ugly problem in a pure manner is bloody hard.
Rasmus Lerdorf on php.devel mailing list
To gain decent performance on the Web application I found a solution with some workarounds:
- Forget PHP: forget to use anything but ADO.NET for accessing a mdb file, with the obvious consequence that Web application has been implemented in ASP.NET
- Singleton connection: the Web server allocates only one thread to the application (pool size 1), for this reason I didn’t use any pooling framework.
- Keep data in memory: data was timestamped so I made a “circular buffer” too keep last 24 hours of data in memory.
- Query hacking: adding conditions to avoid a full table scan (e.g. id > (SELECT MAX(id) – 50 FROM table))
Microsoft Access is good for small data sets and no network in between, using it on DFS is challenging and always gives problems, I got the project done but obviously I was not satisfied about final result.
Moving to something reliable like Oracle (which is widely used in that research centre and it’s given as a service with no charge on lab budget), SQL Server (which is seamlessly integrated in Access), MySQL (whose on-demand instances are available for free) or PostgreSQL (which is a free software), would be a better choice.