Populating the database
Problem definition
Lets say that I have a database containing prepared activities and educational content for boy scouts group. Its purpose is to serve as a source of inspiration and eventually of collection of high quality content appropriate for kids during our weekly meetings or other occasions. From which source should be easy to get precisely only the desired results based on defined properties on each activity (eg. time needed for preparation, minimal number of players, etc.)
BUT actually its currently more like a proof of concept and not a really operational tool. So accordingly there aren't yet any real records to speak about. This very early stage of development is exactly the reason why I think it is reasonable to think about how will the database be populated with data. As this will be crucial for its future usability and success. Therefore, the simulation should focus on modeling this data population process.
To be precise, the simulation will be concerned with user behavior and related volume of stored records.
It is important to say that user interface for reading is only the default web gui of the database itself (Neo4j), so the users will have to use the Cypher query language. So after they go through such a way to access the records, there better be some decent amount of useful ones in the database. This is because otherwise their whole effort would be pointless and the user would be discouraged from using the database again.
Creating the records on the other hand will be easy and accessible as much as possible (user writes to google document in specified folder and with specified structure which enables then loading it programmaticaly into the database). This ease of creation is key to encouraging consistent contributions.
And here comes the question whether this "natural" way of data population is enough, or rather how long will it take for the database to be able to provide satisfactory results for the user in majority of the cases and as such could be called a useful tool. If the time required would be too long, the users could lose interest of even contributing new content . Therefore, the simulation will model the rate of content creation and the subsequent impact on database searchability and usability.
Additionally will be also evaluated the "batch population approach" to see weather it is worth considering to implement. This will involve creating a model of user behavior and content creation, simulating different scenarios, and analyzing the results.
This simulation aims to predict the growth of the database, determine the feasibility of the current data population strategy and evaluate the alternative.
- Predict the time needed to reach a critical mass of data sufficient for effective use.
- Evaluate the effectiveness of the current data population method relatively to alternative.
Agent-based model
Single run
Multi run
