Which db manager for a 100Go Table? [closed]

Which db manager for a 100Go Table? [closed]

I realize a 2G / 3G / 4G data retrieval project as part of my studies.
I have to store this data, and to make queries on it.
My table : [freq {float}, dbm {float}, timestamp {init}]
I receive about 15GB per day of data, from 100000 to 200000 entries per minute, and that’s for 6 day’s.
I could use a simple DBMS (MySQL / Postgre) but I’m afraid that performance is not there. I tried with InfluxDB, but the number of lines recorded per minute is less than my needs.
Do you have another solution?
Thank’s a lot,
J-F

Solutions/Answers:

Solution 1:

I use all databases you mentioned. For this load I can recommend MySQL or PostgreSQL because I already worked with even higher load on PostgreSQL. But MySQL will do the same job too – maybe even better because it was designed from the beginning for high insert load.

Solution on PostgreSQL I worked with was used for storing system messages from telecommunication network and was able to collect ~300GB of data per a day on one machine without problems. But you need proper HW architecture.

Related:  Is there a way to send data from InfluxDB to Kafka?

You need machine with at least 8 CPU but more is better and you need to have several inserting queues. Use loader in Java or C or golang with more parallel threats and do bulk inserts from every threat using COPY command for ~10000 records in one bulk. You must use connection pool because PostgreSQL has higher overhead for opening a new connection.

It will also help you to distribute data over more tablespaces each tablespace on separate physical disk or better on separate physical disk array. If possible do not use indexes on raw data. Separate your raw data from aggregated results.

We had another solution using pl/proxy extension for PostgreSQL and several physical machines for raw data partitioned by time. This system was able to collect at least 1TB per day but with proper amount of slave databases even more.

But you have to understand that to really process this amount of data you need proper hardware with proper configuration. There is no magic database which will do miracles on some “notebook like configuration”…

Related:  How to get InfluxDB version via shell

InfluxDB is really great timeseries database and we use it for monitoring. I believe with enough CPUs and really lot of memory you will be able to use it too. I estimate you will need minimum 64 GB of RAM because inserts are more memory expensive. So with more inserting queues database will need a lot more memory because it stores everything in memory and makes automatically indexes on tags.

References