Problems with importing many Express Objects (10k+)

Permalink
Currently I am developing a project based on concrete5. This project is playing with a lot of data. There are about 20 tables / entities.
Some of these are very small and have only a few entries. However, there are three mega tables with a huge number of entries. The „booking times“ table e.g. contains over 5 million entries.
In General I see no problem. Once the data is stored in the table, Doctrine can handle it pretty well. (Succesfully tested with 20,000 entries; Backend view was fluid)

But I have many troubles with importing the data:
When I import the data in the traditional way as described in the developer's documentation with Express:buildEntry(), I am getting out-of-time-out issues.

Personally, I see two solution approaches here:

1) Generate MySQL Insert String's that add multiple entries at once.

INSERT INTO myTable (column1, column2…) VALUES (.., ..), (.., ..), (.., ..), (.., ..), (.., ..), (.., ..)……


Here I have already created a proof on Concept Code for this. The performance is definitely much better. Benchmark on my machine: 10,000 entries in 2.5 seconds.

The problem is that, so many tables are included for creating just a single Express Entry and all depending on attributes and associations.

Of course I could spend a long time and implement a mega method that would handle this, but
I could imagine when I am finish that the performance will be just as bad as the object-oriented way because of the the much queries and conditions.

Other points that are against this method: This way would be vulnerable to multibyte SQL injections and would only works with MySQL driver.

So I'm not happy with this method I think there are better solutions.

2) Doctrine Bulk Processing

http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/re...

Has anybody some experience with that? I do not know if it is possible, but I could imagine that it possible to subclass e.g. the EntityManager class and temporarily disable the flush() method and then trigger this method in 50s-steps manually. What do you think of this approach?

Or maybe someone of your have even a better/other approach? I would really like to implement the project with Express because it simply brings many benefits.

fabianbitter
 
fabianbitter replied on at Permalink Reply
fabianbitter
Here is my normal import method (the default way).

private function testBigDataImport()
    {
        $pkg = Package::getByHandle($this->pkgHandle);
        $csvFile = $pkg->getPackagePath() . "/content_files/import_data/default_location_indicators.csv";
        $csv = new \parseCSV();
        $csv->delimiter = ",";
        $csv->parse($csvFile);
        $start = round(microtime(true) * 1000);
        $i = 0;
        $testTotal = 10;
        foreach ($csv->data as $row) {
            // @todo: solve problem with handling big data
            $entry = Express::buildEntry('location_indicator')
                ->setRecordType($row["record_type"])
                ->setTextIndicator($row["text_indicator"])


And this are the results:

Total Entries:   16169
Avg. Time:       757.9ms / Entry
Est. Total Time: 12254.49s
MrKDilkington replied on at Permalink Reply
MrKDilkington
Hi fabianbitter,

What you describe sounds useful.

I recommend creating a GitHub discussion issue for this.
fabianbitter replied on at Permalink Reply
fabianbitter
Good idea. I will create a thread there.
fabianbitter replied on at Permalink Best Answer Reply
fabianbitter
@all: I have developed a composer package for fast importing data records into concrete5 Express Objets. This package reduces the import time per item from 750 ms up to 1 ms and less.

But after my Express evaluation i decided to use native Doctrine for my Project instead of Doctrine. The general performance is just too weak for big data projects. My result after 2 weeks of Express: For small projects + prototypes Express is very cool and useful but for big projects it is better to work with Doctrine / Native Sql.

Anyway. My import package is available under:
https://bitbucket.org/fabianbitter/concrete5_express_batch_importer...

Cheers :)
mrjcgoodwin replied on at Permalink Reply
mrjcgoodwin
Just come across this - also looks useful for something I'm working on fabianbitter.

What I can't work out from the C5 docs is whether it's possible to do something like this to update an existing object rather than build a new one?
surefyre replied on at Permalink Reply
surefyre
It's worth noting that Express is much much slower than a DB. Tried this last year with far fewer records (maybe 3000) and the performance was awful. Seems mostly OK when you're building it but when you have many users using the site it degrades fast.

I put everything into MySQL tables in the end. Express is great for things like email/SMS templates and containing occasional use data like that but high-volume high-availability it is not.