We have a science project that has outgrown its experimental setup and needs a new data architecture to enable further scaling. The task is to design one!
CURRENT SETUP WHICH NEEDS TO BE UPGRADED WITH A BETTER DATA ARCHITECTURE
We simply download the datasets from the structured data sources - in the native format that they are provided there (XML or .csv, for instance) - and store them locally.
Datafiles are then processed by R scripts, whereby one R script can be calling several locally stored datafiles, then processing them and storing the outputs locally again.
Different datasets can relate to one another with one or more common keys (identical variables).
Datasets range from several hundred to several hundred thousand observations.
As you see, existing setup is rather primitive, and so it is to be replaced with a new data architecture. You are rather unconstrained in coming up with an optimal solution. However, you will not only need to make a proposal, but also justify your design to us (non-experts in the best practices for database management).
The task covers everything from
- the server choice: what local hardware or cloud service is appropriate for minimising the cost, yet still attaining full functionality?, to
- the type of the database able to efficiently handle datasets that will typically reach up to several GB in size, at most, to
- database update strategy that will allow to efficiently update our new database with new datafiles, where the source providers regularly - for instance, monthly - provide a new dump file containing an updated dataset. Likewise there should be an easy way to make an update from the source that provides an API, to
- ensuring and enabling full compatibility with and full optimisation for R programming language as a tool of choice to work with the data.
It is important that the new data architecture is future-proof: scalable and enabling multi-year projects that rely on the collected data.
BIDDING & CONTRACT
The job is in both articulating your proposed data architecture and in assisting with migration from the current setup.
Initially we request that you provide your:
- estimated fixed bid for the entire budget
- how many hours will it take you to complete most of the work
- your availability in hours per day.
The contractor will be selected based on the entire project bid. We understand that the specification will be further collaboratively refined during the project execution. Therefore please calculate your budget in such a way that it would cover most of the task described above (80%). Where we would like to have major extensions/additions, we will create a separate follow-up milestone with a separate budget.
Consequently this first project can possibly lead to an open-ended engagement on a milestone-based or hourly-rate retainer basis. We are therefore looking for a person who would be interested in/available for an extended collaboration.
In our experience with freelance contracting, we typically receive more qualified bids than we can award. Therefore, please have understanding that only shortlisted applicants will be contacted for the round two of additional questions & answers.
技能： Datatables, R 编程语言
查看更多： design data for various packings, architecture project brochure design, graphic design budget report architecture project, definition design data directory project, logo architecture project design company, data mining project design, design data entry project aspnet, design bbq school project, information system design data entry, data service project, data entry project based jobs, freelance data conversion project, data structure project, build data mining project, excel data mining project