Migrating data from an existing site is one of the most complex tasks involved in a website redesign, but that in no way tarnishes its validity.
When you renew a design, it's a good time to review the internal workings of a site in order to reduce the response time. It is therefore common for the database service to change (backend) or for the old site to use different data sources that you want to group together under the same database service. Everything that will result from this analysis process is of unwavering relevance, as it will form the foundation of our project. Several obstacles will probably obstruct our journey towards the completion of our quest, but we must always keep in mind that there is a solution to each problem.
The first test that can be encountered is the difference in technology between the source and the destination. At this time, it is essential to have direct access to the source of the data, as this reduces transfer errors. For example, if you are migrating from a WordPress site, you should avoid using extensions (plugins), because they often provide incomplete data in poorly malleable formats (xml, csv). In this way, the use of phpMyAdmin or a similar interface is desirable because of its intelligibility and the possibility of directly extracting the relevant data with MySQL queries.
In the case where our source is not MySQL, and if desired, it is possible to use MySQL WorkBench to perform migrations from different types of database, such as MsSQL or postgresql. Be careful, this is a tedious solution that involves deleting relationships between tables and where you may encounter some format errors, especially on timestamps.
From the moment we have access to a stable database and before moving on to the migration as such, we must ensure efficient data recovery. The Django framework offers a solution to this problem by generating models automatically from a compatible data source (MySQL, postgreSQL...). Thus, to carry out the data migration, it will be a question of communicating between temporary and final models, but of the same type, which will greatly facilitate the process.
The data recovery being done, we can move on to the migration as such. However, before starting to program, it is appropriate to plan our migration steps well, to evaluate the possibility of optimizing the relational model of the database and to project a flexible programming structure. To this end, it is desirable to adopt a test-driven development attitude. The test-driven developer breaks down his program into small methods that are easily testable in order to easily maintain a program, easily insert adjustments to the general procedure, allow code reuse, etc.
When migrating data, it may happen that HTML content used in a text editor (wysiwyg) is transferred. This can cause many problems that will require dealing with large texts. For example, there could be links that you would like to change or even html tags to put back on the content of your fields. To remedy this situation, the best solution is the use of regular expressions to extract, replace, remove elements from an html string. Moreover, there is a class in python - HtmlParser - which uses regex to loop through all the content tags of an html string.
This significantly improves the recognition of the elements to be modified. On the other hand, this process is very greedy in memory, it will certainly slow down your program.
The few obstacles mentioned above are only the tip of the iceberg, however most can be removed by a precise analysis of the requirements of our task and our relational model. Obviously, the experience we acquire at each iteration allows us to develop tools that make us more and more efficient.
At Nixa, we have already amassed ammunition to perform this kind of task. Our arsenal allows us to optimize databases to improve the speed of a website. Because after all database management is the root of a website and determines its effectiveness.