# Mempool Archive

{% hint style="warning" %}
Support for Mempool Archive ended on March 1, 2025. Data from Nov 1, 2019, until then will remain accessible to the community. [Click here to learn more](https://discord.com/channels/542403978693050389/542406724427055135/1339585307867091025).
{% endhint %}

Blocknative actively maintains the most comprehensive historical dataset of mempool transaction events within the Ethereum ecosystem. This collection contains transaction detection events since November 1st, 2019.&#x20;

* Blocknative logs all mempool transactions from nodes in multiple geographical regions for the Ethereum mainnet blockchain.
* It is updated daily at 13 UTC time with a typical update containing 11M events for a 0.012TB size, though the heaviest days on the network can be as large as 41M events and 0.3TB size.&#x20;
* This uninterrupted dataset covers major scenarios the network has encountered over the years, including massive surges in traffic, huge gas spikes, bidding wars, the launch of MEV-boost, the price of ETH collapsing, EIP-1559, Black Thursday, and major hacks.&#x20;
* This data covers 27 data fields, such as gas details, input data, time pending in the mempool, failure reasons, and regional timestamps for each instance seen by our global network of nodes.
* Our self-operated infrastructure provides the earliest detection times from North America, Asia, and Europe.

## **Getting Started**

Each date has been partitioned into its own folder named in `YYYYMMDD` format. Within each date partition, there are 24 files, named by two digit hour (ie 02.csv.gz) that the transaction event was detected in. These files are tab delimited gzipped csvs.

For example, if you wanted to access transactions on June 16th, 2023 from 12pm-1pm, your URL would be: archive.blocknative.com/20230616/12.csv.gz

## **How to download**

Query, download and store the data slices locally using the steps below:

```awk
 curl https://archive.blocknative.com/YYYYMMDD/HH.csv.gz
```

### **Fetching a full day of data**

Here is a script you can use to download all slices in a day on your computer. Just modify with the `DATE`

```bash
#!/bin/bash

# Set the date
DATE="YYYYMMDD"
DOMAIN="https://archive.blocknative.com/"
BASE_URL="${DOMAIN}${DATE}/"

# Initialize a variable to track successful downloads
SUCCESSFUL_DOWNLOADS=0

# Loop through each hour (00 to 23)
for HOUR in {00..23}; do
    # Construct the URL for the current hour's data and Filename
    if [ $HOUR -lt 10 ]; then
        URL="${BASE_URL}0${HOUR}.csv.gz"
        FILENAME="0${HOUR}.csv.gz"
    else
        URL="${BASE_URL}${HOUR}.csv.gz"
        FILENAME="${HOUR}.csv.gz"
    fi

    # Initialize a variable to keep track of retries
    RETRIES=0

    # Loop to handle retries on 404, 429, and 504 responses
    while true; do
        # Download the data and check the response status code
        HTTP_STATUS=$(curl -o "$FILENAME" -w "%{http_code}" "$URL")

        # Check the status code and print a message
        if [ "$HTTP_STATUS" -eq 200 ]; then
            echo "Downloaded $FILENAME"
            ((SUCCESSFUL_DOWNLOADS++))
            break  # Exit the retry loop on success
        elif [ "$HTTP_STATUS" -eq 429 ] || [ "$HTTP_STATUS" -eq 504 ]; then
            echo "Received $HTTP_STATUS. Retrying in 1 second..."
            sleep 1  # Wait for 1 second before retrying
            ((RETRIES++))
            if [ $RETRIES -ge 3 ]; then
                echo "Retry limit reached. Exiting."
                exit 1
            fi
        elif [ "$HTTP_STATUS" -eq 404 ]; then
            echo "File not found (404). Exiting for $FILENAME."
            break  # Exit the retry loop for 404
        else
            echo "Error downloading $FILENAME - Status code: $HTTP_STATUS"
            rm "$FILENAME"  # Remove the empty file
            break  # Exit the retry loop on other errors
        fi
    done
done

if [ "$SUCCESSFUL_DOWNLOADS" -eq 24 ]; then
    echo "All slices downloaded successfully!"
else
    echo "Some slices were not downloaded successfully."
fi
```

Save this script to a file, for example, `download_slices.sh`, and make it executable using the following command:

```llvm
chmod +x download_slices.sh
```

Then, run the script by executing:

```solidity
./download_slices.sh
```

### **Fetching on a custom range**

Here is a script you can use to (1) download all hourly slices in a specific range of days on your computer, or (2) all specific hourly slices on a specific day.

Options:

1. **`-date-range`**: for downloading full hourly slices for all days within this range (both dates inclusive). Format: **`YYYYMMDD-YYYYMMDD`**
   * For date range: `./download_mempool.sh --date-range YYYYMMDD-YYYYMMDD`\ <br>
2. **`-hour-range`**: for downloading data for specific hours on a particular day. Format: **`YYYYMMDD:HH-HH`**
   * For hour range: `./download_mempool.sh --hour-range YYYYMMDD:HH-HH`

```bash

#!/bin/bash

# Fetch arguments
while [[ $# -gt 0 ]]; do
    key="$1"
    case $key in
        --date-range)
            DATE_RANGE="$2"
            shift; shift
            ;;
        --hour-range)
            HOUR_RANGE="$2"
            shift; shift
            ;;
        *)
            shift
            ;;
    esac
done

DOMAIN="https://archive.blocknative.com/"
SUCCESSFUL_DOWNLOADS=0

download_data() {
    local DATE=$1
    local HOUR_START=$2
    local HOUR_END=$3
    local BASE_URL="${DOMAIN}${DATE}/"

    for HOUR in $(seq -w $HOUR_START $HOUR_END); do
        URL="${BASE_URL}${HOUR}.csv.gz"
        FILENAME="${DATE}_${HOUR}.csv.gz"
        RETRIES=0

        while true; do
            HTTP_STATUS=$(curl -o "$FILENAME" -w "%{http_code}" "$URL")

            if [ "$HTTP_STATUS" -eq 200 ]; then
                echo "Downloaded $FILENAME"
                ((SUCCESSFUL_DOWNLOADS++))
                break
            elif [ "$HTTP_STATUS" -eq 429 ] || [ "$HTTP_STATUS" -eq 504 ]; then
                echo "Received $HTTP_STATUS. Retrying in 1 second..."
                sleep 1
                ((RETRIES++))
                if [ $RETRIES -ge 3 ]; then
                    echo "Retry limit reached. Exiting."
                    exit 1
                fi
            elif [ "$HTTP_STATUS" -eq 404 ]; then
                echo "File not found (404). Exiting for $FILENAME."
                break
            else
                echo "Error downloading $FILENAME - Status code: $HTTP_STATUS"
                rm "$FILENAME"
                break
            fi
        done
    done
}

# Date Range Mode
if [ ! -z "$DATE_RANGE" ]; then
    IFS='-' read -ra DATES <<< "$DATE_RANGE"
    START_DATE=${DATES[0]}
    END_DATE=${DATES[1]}

    for DATE in $(seq -w $START_DATE $END_DATE); do
        download_data $DATE 00 23
    done
fi

# Hour Range Mode
if [ ! -z "$HOUR_RANGE" ]; then
    IFS=':' read -ra PARTS <<< "$HOUR_RANGE"
    DATE=${PARTS[0]}
    IFS='-' read -ra HOURS <<< "${PARTS[1]}"
    HOUR_START=${HOURS[0]}
    HOUR_END=${HOURS[1]}

    download_data $DATE $HOUR_START $HOUR_END
fi

if [ "$SUCCESSFUL_DOWNLOADS" -gt 0 ]; then
    echo "All slices downloaded successfully!"
else
    echo "Some slices were not downloaded successfully."
fi

```

Save this script to a file, for example, `download_mempool.sh`, and make it executable using the following command:

```llvm
chmod +x download_mempool.sh
```

Then, run the script by executing the command specified above the script.

## Data Schema

Blocknative logs all mempool transactions from nodes in multiple geographical regions for the Ethereum mainnet blockchain. The Archive contains historic events for all transactions:&#x20;

* entering the mempool&#x20;
* denied entry into the mempool (rejection with reason)&#x20;
* exiting the mempool (eviction with reason)&#x20;
* replacing existing mempool transaction (speedup or cancel)&#x20;
* finalized on chain (confirmed or failed)

{% hint style="info" %}
The number of times a transaction appears in the Archive corresponds to the number of status changes it undergoes. The **`detecttime`** field indicates the time when the status change was first observed.
{% endhint %}

Below you can find the complete schema for the data:

<table data-full-width="true"><thead><tr><th width="230.24999999999997">Field Name</th><th width="195">Description</th><th width="152">Data Type</th><th>Example</th></tr></thead><tbody><tr><td>detecttime</td><td>Timestamp that the transaction was detected in mempool.</td><td>timestamp</td><td><code>2020-03-12 00:00:00.409000</code></td></tr><tr><td>hash</td><td>Unique identifier hash for a given transaction.</td><td>string</td><td><code>0x6b4104838fd153b2d1ab705737843f5ea99666794391dd52653960970dc7e5ef</code></td></tr><tr><td>status</td><td>Status of the transaction.</td><td>string</td><td><code>Pending</code> , <code>speedup</code> , <code>cancel</code> , <code>failed</code> , <code>stuck</code> , <code>dropped</code> , <code>confirmed</code> , <code>evicted</code> , <code>rejected</code></td></tr><tr><td>region</td><td>The geographic region for the node that detected the transaction.</td><td>string</td><td><code>us-east-1</code> , <code>eu-central-1</code> , <code>ap-southeast-1</code></td></tr><tr><td>reorg</td><td>If there was a reorg, refers to the blockhash of the reorg.</td><td>string</td><td><code>0xf2ec4b2a7b951e4400e99d1171c4fb875fd388b15b6cb97bf5ad1c8dbea3a73a</code></td></tr><tr><td>replace</td><td>If the transaction was replaced (speedup/cancel), the transaction hash of the replacement.</td><td>string</td><td><code>0xcea6244a7f0a7c2630085ca3e47e1ecfc28a5c03a08a8f3ec5f43fbef3d83dd5</code></td></tr><tr><td>curblocknumber</td><td>The block number the event was detected in.</td><td>decimal(18,0)</td><td><code>12429202</code></td></tr><tr><td>failurereason</td><td>If a transaction failed, this field provides contextual information.</td><td>string</td><td><code>Reverted: ""UniswapV2Router: INSUFFICIENT_OUTPUT_AMOUNT""</code></td></tr><tr><td>blockspending</td><td>If a transaction was finalized (confirmed, failed), this refers to the number of blocks that the transaction was waiting to get on-chain.</td><td>int</td><td><code>2</code></td></tr><tr><td>timepending</td><td>If a transaction was finalized (confirmed, failed), this refers to the time in milliseconds that the transaction was waiting to get on-chain.</td><td>bigint</td><td><code>4678</code></td></tr><tr><td>nonce</td><td>A unique number which counts the number of transactions sent from a given address.</td><td>decimal(38,0)</td><td><code>27744</code></td></tr><tr><td>gas</td><td>The maximum number of gas units allowed for the transaction.</td><td>decimal(38,0)</td><td><code>55588</code></td></tr><tr><td>gasprice</td><td>The price offered to the miner/validator per unit of gas. Denominated in wei.</td><td>decimal(38,0)</td><td><code>1200000000</code></td></tr><tr><td>value</td><td>The amount of ETH transferred or sent to contract. Denominated in wei.</td><td>decimal(38,0)</td><td><code>147940000000000</code></td></tr><tr><td>toaddress</td><td>The destination of a given transaction.</td><td>string</td><td><code>0x501c885e8f519feeb1a8f9429ea586ebd378b549</code></td></tr><tr><td>fromaddress</td><td>The source/initiator of a given transaction.</td><td>string</td><td><code>0xf974334a62b3aab3e2b5509f65b9b2141d8efa03</code></td></tr><tr><td>input</td><td>Additional data that can be attached to a transaction. This field can be used to tell a smart contract to execute a function.</td><td>string</td><td><code>0xa9059cbb000000000000000000000000955a0ef4e120528f8486c04c97388d530cfbf239000000000000000000</code></td></tr><tr><td>network</td><td>The specific Ethereum network used.</td><td>string</td><td><code>bsc-main</code> , <code>goerli</code>, <code>kovan</code>, <code>main</code>, <code>rinkeby</code>, <code>ropsten</code>, <code>xdai</code></td></tr><tr><td>type</td><td><p>Post EIP-1559, this indicates how the gas parameters are submitted to the network: <br>- type 0 - legacy</p><p>- type 1 - usage of access lists according to EIP-2930<br>- type 2 - using <code>maxpriorityfeepergas</code> and <code>maxfeepergas</code></p></td><td>int</td><td><code>0</code>, <code>1</code>, <code>2</code> </td></tr><tr><td>maxpriorityfeepergas</td><td>The maximum value for a tip offered to the miner/validator per unit of gas. The actual tip paid can be lower if (<code>maxfee</code> - <code>basefee</code>) &#x3C; <code>maxpriorityfee</code>. Denominated in wei.</td><td>decimal(38,0)</td><td><code>111373960022</code></td></tr><tr><td>maxfeepergas</td><td>The maximum value for the transaction fee (including <code>basefee</code> and tip) offered to the miner/validator per unit of gas. Denominated in wei.</td><td>decimal(38,0)</td><td><code>111373960022</code></td></tr><tr><td>basefeepergas</td><td>The fee per unit of gas paid and burned for the <code>curblocknumber</code>. This fee is algorithmically determined. Denominated in wei.</td><td>decimal(38,0)</td><td><code>111373960022</code></td></tr><tr><td>dropreason</td><td>If the transaction was dropped from the mempool, this describes the contextual reason for the drop.</td><td>string</td><td><code>unexecutable-txs</code> , <code>unpayable-txs</code>, <code>replaced-txs</code>, <code>account-cap-txs</code>, <code>old-txs</code>, <code>underpriced-txs</code>, <code>low-nonce</code></td></tr><tr><td>rejectionreason</td><td>If the transaction was rejected from the mempool, this describes the contextual reason for the rejection.</td><td>string</td><td><code>exceeds block gas limit</code>, <code>insufficient funds for gas * price + value intrinsic gas too low</code>, <code>non transaction</code>, <code>underpriced</code></td></tr><tr><td>stuck</td><td>A transaction was detected in the queued area of the mempool and is not eligible for inclusion in a block.</td><td>boolean</td><td><code>1</code></td></tr><tr><td>gasused</td><td>If the transaction was published on-chain, this value indicates the amount of gas that was actually consumed. Denominated in wei.</td><td>decimal(38,0)</td><td><code>111373960022</code></td></tr><tr><td>detect_date</td><td>A truncated version of <code>detecttime</code>. Best used as a partition for large datasets and as a search parameter to speed up queries.</td><td>string</td><td><code>2023-10-10</code></td></tr><tr><td>blobversionedhashes</td><td>String representation of versioned blob hashes associated with the transaction's EIP-4844 data blobs.</td><td>string</td><td>0x01f3ee17d9bd3b1e37df90813b95b21ec3504d66c5fe52974712bc4efb7db300</td></tr><tr><td>maxfeeperblobgas</td><td>The maximum total fee per blob gas the sender is willing to pay for blob gas in wei</td><td>decimal(38,0)</td><td>242082408240</td></tr></tbody></table>

{% hint style="warning" %}
Mempool archive is missing drops firehose data from **January 4 to January 7, 2025**, due to a silent failure in Geth's drop subscription.
{% endhint %}

## Frequently Asked Questions

### **What attribution must I provide when using the Blocknative Data Archive?**

The archive is publicly available according to open data standards and licenses datasets under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.&#x20;

**2.1 Attribution** — End Users must give appropriate credit, provide a link to the license, and indicate if changes were made. End Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses End Users or their use.\
**2.2 NonCommercial** — End Users may not use the material for commercial purposes.\
**2.3 ShareAlike** — If End Users remix, transform, or build upon the material, End Users must distribute their contributions under the same license as the original.

Please use the following as a guideline for attribution:

1. **Papers**: Data provided by [Blocknative](https://www.blocknative.com/)
2. **Graphs/Charts/Images**: Source: Blocknative\
   \&#xNAN;*Please add the text to the image*\
   ![](https://3295439492-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LmQ_1MIOGRk17Wz50Bx%2Fuploads%2Fs3DAn7uCY4t4DOee5Z25%2FScreen%20Shot%202023-10-24%20at%2011.13.01%20AM.png?alt=media\&token=b7842ec4-8eac-4ba6-a814-728a89229d14)
3. **Social posts**: Data from @[blocknative](https://twitter.com/blocknative) \
   \&#xNAN;*Please link appropriate Blocknative social handle*\
   &#x20;![](https://3295439492-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LmQ_1MIOGRk17Wz50Bx%2Fuploads%2Fo9FChOJC4SsEW5XLwjmV%2FScreen%20Shot%202023-10-24%20at%2011.12.03%20AM.png?alt=media\&token=1700530a-27db-40e6-aa53-fa01b52466ab)

If you have any questions please reach out to us on [Discord](https://discord.gg/r9MVKwvCbt).

### **What format is the data?**&#x20;

The data is stored in hourly slices with file format `*.csv.gz` The data is tab delimited.

### **How many nodes are gathering mempool data?**

We run highly redundant node infrastructure in each region to ensure strong uptime.

### **How can I identify on-chain transactions?**&#x20;

On-chain transaction have a `confirmed` status.&#x20;

```sql
SELECT *
FROM mempool_archive
WHERE status = 'confirmed'
```

### **How can I identify private transactions?**&#x20;

A private transaction does not have a `pending` event. `timepending` is determined from the difference between a transaction's `pending` event and `confirmed` event.

```sql
SELECT *
FROM mempool_archive
WHERE timepending = 0
AND status = 'confirmed'
```

### **What is the difference between** `dropreason` **and** `rejectionreason`**?**

A dropped transaction might have been valid but deemed less important or lower-priority. A rejected transaction is one that is fundamentally flawed or invalid according to the Ethereum protocol rules.&#x20;

A drop reason could be that there isn't enough ETH in the EOA to cover gas fees. A rejection reason could be incorrect transaction signatures. \
\
Dropped transactions existed in the mempool, but are dropped to make room for incoming transactions. Rejected transactions never make it to the mempool.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.blocknative.com/data-archive/mempool-archive.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
