2gb table with 10 million rows, late pagination select is slow

2gb table with 10 million rows, late pagination select is slow

I have table in MySQL with 10 million rows with 2 GB data
selecting IN LIFO format data is slow
Table engine is = InnoDB
table has one primary key and one unique key
SELECT * FROM link LIMIT 999999 , 50;

how I improve the performance of the table. ?
table structure
id int(11) NO PRI NULL auto_increment
url varchar(255) NO UNI NULL
website varchar(100) NO NULL
state varchar(10) NO NULL
type varchar(100) NO NULL
prio varchar(100) YES NULL
change varchar(100) YES NULL
last varchar(100) YES NULL

NOTE:
SELECT * FROM link LIMIT 1 , 50; is taking .9ms but current sql is taking 1000ms its 100 time taking more

Solutions/Answers:

Solution 1:

This most likely is due to “early row lookup”

MySQL can be forced to do “late row lookup”. Try below query

SELECT  l.*
FROM    (
        SELECT  id
        FROM    link
        ORDER BY
                id
        LIMIT 999999 , 50
        ) q
JOIN    link l
ON      l.id = q.id

Check this article

MySQL limit clause and rate low lookups

Solution 2:

For the Next and Prev buttons you can use a WHERE clause instead of OFFSET.

Example (using LIMIT 10 – Sample data explained below): You are on some page which shows you 10 rows with the ids [2522,2520,2514,2513,2509,2508,2506,2504,2497,2496]. This in my case is created with

select *
from link l
order by l.id desc
limit 10
offset 999000

For the next page you would use

limit 10
offset 999010

getting rows with ids [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].

For the previous page you would use

limit 10
offset 998990

getting rows with ids [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].

All above queries execute in 500 msec. Using the “trick” suggested by Sanj it still takes 250 msec.

Now with the given page with minId=2496 and maxId=2522 we can create queries for the Next and Last buttons using the WHERE clause.

Related:  How is MyBatis dealing with an empty result set?

Next button:

select *
from link l
where l.id < :minId -- =2496
order by l.id desc
limit 10

Resulting ids: [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].

Prev button:

select *
from link l
where l.id > :maxId -- =2522
order by l.id asc
limit 10

Resulting ids: [2524,2525,2527,2530,2533,2535,2538,2540,2541,2542].

To reverse the order you can use the query in a subselect:

select *
from (
    select *
    from link l
    where l.id > 2522
    order by l.id asc
    limit 10
) sub
order by id desc

Resulting ids: [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].

These queries execute in “no time” (less than 1 msec) and provide the same result.

You can not use this solution to create page numbers. But i don’t think you are going to output 200K page numbers.

Test data:

Data used for the example and benchmarks has been created with

CREATE TABLE `link` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `url` VARCHAR(255) NOT NULL,
    `website` VARCHAR(100) NULL DEFAULT NULL,
    `state` VARCHAR(10) NULL DEFAULT NULL,
    `type` VARCHAR(100) NULL DEFAULT NULL,
    `prio` VARCHAR(100) NULL DEFAULT NULL,
    `change` VARCHAR(100) NULL DEFAULT NULL,
    `last` VARCHAR(100) NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `url` (`url`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;

insert into link
    select i.id
        , concat(id, '-', rand()) url
        , rand() website
        , rand() state
        , rand() `type`
        , rand() prio
        , rand() `change`
        , rand() `last`
    from test._dummy_indexes_2p23 i
    where i.id <= 2000000
      and rand() < 0.5

where test._dummy_indexes_2p23 is a table containing 2^23 ids (about 8M). So the data contains about 1M rows randomly missing every second id. Table size: 228 MB

Related:  Mysql - Add auto_increment to primary key

Solution 3:

Due to the large amount of data,

There are few tips for improve the query response time:

  1. Change the storage engine Innodb to myisam.
  2. Create table partitioning
    (https://dev.mysql.com/doc/refman/5.7/en/partitioning-management.html)
  3. Mysql cluster (http://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-overview.html)
  4. Increase hardware capacity.

Thanks

Solution 4:

First of all running on your table without any order doesn’t guaranty your query will return the same data if ran twice.
It’s better adding an ORDER BY clause. Taking id as a good candidate, as it’s your primary key and seems unique (as it’s an auto_increment value).

You could use this as your base:

SELECT * FROM link ORDER BY id LIMIT 50;

This will give you the first 50 rows in your table.

Now for the next 50 rows, instead of using OFFSET, we could save our last location in the query.

You would save the id from the last row last id from the previous query and use it in the next query:

SELECT * FROM link WHERE id > last_id ORDER BY id LIMIT 50;

This will give you the next 50 rows after the last id.

Related:  SequelizeConnectionError: Client does not support authentication protocol requested by server; consider upgrading MariaDB client

The reason your query runs slowly on high values of OFFSET is because mysql has to run on all rows in the given OFFSET and return the last LIMIT number of rows. This means that the bigger OFFSET is the slower the query will run.

The solution I showed above, doesn’t depend on OFFSET, thus the query will run at the same speed independent of the current page.

See also this useful article that explains a few other options you can choose from: http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

Solution 5:

I have updated my SQL Query to this and this is taking less amount of time.

 SELECT * FROM link ORDER BY id LIMIT 999999 , 50  ;

References