Scrape Instagram Without Getting Blocked: Code + Proxies

Instagram Scraping Without Getting Blocked: A Practical 2026 Guide

If you've tried to scrape Instagram at any meaningful volume, you've seen the failure modes: HTTP 429s, redirects to the login wall, blank JSON responses, and the friendly challenge_required sent right before your account gets disabled. This guide is the field manual we wished existed when we started running HikerAPI: a breakdown of how Instagram detects scrapers in 2026, what tactics actually move the needle, and how to decide between building your own infrastructure and using a managed API.

Skip to:

How Instagram Detects Scrapers

Instagram's anti-abuse stack is layered. A request typically passes through four filters before it returns data, and your scraper has to clear all four — one weak link is enough to get rate-limited or banned.

1. Network reputation

Every request is scored against the source IP's history.

  • Datacenter IPs (AWS, GCP, DigitalOcean, OVH ranges) get the strictest limits. A single datacenter IP making more than ~30 requests per minute to i.instagram.com will start throwing 403s within hours.
  • Residential IPs (Comcast, BT, Vodafone…) are graded on behavioral consistency. An IP that has never hit Instagram before suddenly making 200 RPM looks suspicious. An IP with months of normal app usage that occasionally fires API requests blends in.
  • Mobile carrier IPs (T-Mobile, Verizon LTE, EE) get the most generous treatment because Instagram is a mobile-first product — traffic from mobile NAT pools is statistically normal.

Internally, Instagram cross-references the IP with the ASN, the geographic region, and the historical login fingerprint of the account. A US-based account suddenly logging in from a Vietnamese datacenter is an instant challenge_required.

2. Session and device fingerprint

Every authenticated mobile-API request carries a device fingerprint built from these headers:

  • X-IG-App-ID — must match a real app build, e.g. 936619743392459 for Instagram for iOS
  • User-Agent — must match the app version + device profile (Instagram 312.0.0.34.111 Android (33/13; 420dpi; 1080x2210; samsung; SM-G991B; o1s; exynos2100; en_US; 562092456))
  • X-IG-Device-ID, X-IG-Android-ID — UUIDs that should persist across requests for the same session
  • X-MID, IG-INTENDED-USER-ID
  • Accept-Language

If your User-Agent claims to be the Instagram Android app but your TLS JA3 fingerprint matches Python's urllib3, that mismatch is caught at the edge before the backend sees the payload. Most DIY scrapers fail here without realizing it — the request returns "successfully" with stripped-down or empty data.

3. Behavioral signals

After authentication, Instagram tracks what your session does over time:

  • Request velocity — sustained RPS to a single endpoint
  • Request shape — 100 user/info/ calls without ever hitting feed/timeline/ looks like enumeration
  • Pagination patterns — linearly paging through followers at maximum speed = bot
  • Idle time — real users have gaps; bots fire requests every 800 ms ±50 ms

Each session accumulates a trust score. Sessions below a threshold get rate-limited; sessions far below get logged out and require re-authentication, often with a phone or email challenge.

4. Account history

The account behind the session matters too. Newly-created accounts (less than 14 days old) get tighter rate limits and trip CAPTCHA flows on almost any unusual action. Accounts with a phone number, profile picture, posted content, real follower history, and a few months of normal use get treated as "warm" and tolerate more programmatic activity.

The Four Block Types

Recognizing the failure mode correctly is the foundation of any retry policy. A 429 means rotate the IP; a feedback_required means rotate the account. Treating them the same is one of the most common reasons DIY scrapers spiral into total bans.

Symptom What it means Recovery
HTTP 429 Too Many Requests IP-level rate limit Wait 5–60 min, rotate IP
feedback_required JSON error Account flagged for unusual behavior Cool down session 24–72 h
challenge_required JSON error CAPTCHA / SMS / email check needed Solve challenge or rotate account
login_required redirect Endpoint that used to be public now requires auth Authenticate and retry
Empty or partial JSON Shadow rate-limit Slow down, rotate session

DIY Mitigation: What Works

If you decide to build your own scraper, here's the realistic shopping list and the order it matters in. None of these alone is enough — Instagram's detection is multiplicative, so missing layers compound.

Residential proxies

The single biggest reliability lever. Datacenter proxies will not work above hobby scale.

  • Bright Data: ~$8/GB, largest pool (~150M IPs), best country targeting, sticky sessions
  • Smartproxy: ~$7/GB, smaller pool but cleaner reputation
  • Soax / IPRoyal: ~$4–6/GB, mixed quality

Budget ≈ $300–500/month for ~50–100K Instagram requests, depending on payload size. Costs go up fast if you fetch media URLs or full follower lists.

Mobile proxies

Higher reliability, higher price.

  • Airproxy / Proxy-Cheap mobile pools: $80–150 per dedicated 4G modem per month
  • Self-rolled (USB modem rack at home): $40 modem + $20/mo SIM — cheapest per-IP if you can manage 5–20 modems

Mobile proxies are typically the difference between "works for a week" and "works for a year."

Session pools

Each session = one warmed-up Instagram account + persisted cookies + device fingerprint. You need a pool, not a single account:

  • 5–10 sessions for ~10K req/day
  • 50+ for 100K+ req/day
  • Each session "costs" ~$2–8 to acquire (account purchase) + warm-up time of ~3–5 days of human-like programmatic browsing

Sessions burn out. Plan on losing 5–15% per week even with careful behavior.

Per-session rate limits

Limits we've measured holding steady through Q1 2026:

  • user/info/: ~150 req/h per session
  • user/followers/: ~30 paginated chunks/h before degradation
  • media/comments/: ~80 req/h
  • feed/user/: ~120 req/h

Distribute across the pool. With 10 sessions you can sustain ~1500 user lookups per hour. Going faster shortens session lifespan more than linearly — push 2× and you'll lose sessions 4× faster.

Library choice

  • instagrapi (Python) — the de-facto open-source client, handles the mobile API protocol and most detection-evasion plumbing
  • graphql-instagram (Node) — for the public GraphQL surface only
  • Roll your own: viable only if you're going to invest 200+ engineering hours

Why DIY Fails at Scale

A common pattern: a team builds a scraper that works fine for 5K req/day, then tries to take it to 100K and watches it fall apart. Here's what changes:

Factor 5K/day 100K/day 1M/day
Sessions needed 1–3 30–80 300+
Proxy bandwidth ~10 GB/mo ~200 GB/mo ~2 TB/mo
Engineering time/mo 4 h 40 h 200 h
Account replacement cost ~$10/mo ~$250/mo ~$2,500/mo
Effective cost per request $0.001 $0.002 $0.002

The break-even point against managed APIs is typically around 50–100K requests per day, and only if you already have proxy and account infrastructure and are willing to staff at least one engineer to keep it running.

The Managed-API Alternative

The reason we built HikerAPI is that we did the math above for our own analytics products and decided that running session-pool ops in-house was not a sustainable line item. A managed Instagram data API moves all of the above into a service:

  • Pay per successful request, no monthly proxy bills
  • 100+ endpoints across mobile API, GraphQL, and JSON
  • Sessions, proxies, rate limits, and account churn managed centrally
  • 99%+ success rates, measured continuously and published on the public status page
import requests

API_KEY = "your_access_key"
profile = requests.get(
    "https://api.hikerapi.com/v1/user/by/username",
    params={"username": "natgeo"},
    headers={"x-access-key": API_KEY},
).json()

print(profile["follower_count"], "followers")

You don't see proxies, sessions, or 429s — you see JSON. When something breaks at the Instagram side, our on-call rolls forward; your code keeps running.

Decision Framework

Volume Use case Recommendation
< 1K req/month One-off research Free tier of a managed API, or instagrapi with a single proxy
1K–50K/month Side project, MVP Managed API — DIY won't pay back
50K–500K/month Growing product Managed API + caching layer
500K+/month Production product, dedicated team Compare per-request cost vs your in-house ops cost honestly
Compliance-sensitive Enterprise Managed API with SLA + DPA

If you spend your engineering hours building proxy rotation logic instead of your product, you've made the wrong call.

Code Patterns

Python — fetch a profile and the last 12 posts

import requests
from itertools import islice

API_KEY = "your_access_key"
HEADERS = {"x-access-key": API_KEY}
BASE = "https://api.hikerapi.com"

def fetch(path, **params):
    r = requests.get(f"{BASE}{path}", params=params, headers=HEADERS, timeout=30)
    r.raise_for_status()
    return r.json()

profile = fetch("/v1/user/by/username", username="natgeo")
posts = fetch("/v1/user/medias/chunk", user_id=profile["pk"])

for post in islice(posts.get("items", []), 12):
    caption = (post.get("caption") or {}).get("text", "")
    print(post["code"], post.get("like_count"), caption[:60])

Node — paginate followers with retry on 429

const BASE = "https://api.hikerapi.com";
const HEADERS = { "x-access-key": process.env.HIKERAPI_KEY };

async function getFollowers(userId) {
  let max_id = null;
  const out = [];
  do {
    const url = new URL(`${BASE}/v1/user/followers/chunk`);
    url.searchParams.set("user_id", userId);
    if (max_id) url.searchParams.set("max_id", max_id);

    const r = await fetch(url, { headers: HEADERS });
    if (r.status === 429) {
      await new Promise((res) => setTimeout(res, 5000));
      continue;
    }
    if (!r.ok) throw new Error(`status ${r.status}`);

    const data = await r.json();
    out.push(...(data.users || []));
    max_id = data.next_max_id;
  } while (max_id);
  return out;
}

curl — quick sanity check

curl -H "x-access-key: $HIKERAPI_KEY" \
  "https://api.hikerapi.com/v1/user/by/username?username=natgeo"

Related guides

FAQ

How many requests per minute can I send to Instagram before getting blocked?
There's no fixed number — it depends on the endpoint, account age, and IP. As a rough lower bound: 1–2 requests per second per warmed-up session, distributed across multiple sessions, sustains for hours. Above ~3 RPS per session, you start triggering soft limits within minutes.

Will residential proxies alone solve scraping blocks?
No. Residential IPs help with the network filter, but Instagram still inspects session fingerprint, headers, and behavior. Proxies without proper sessions are about a 30% reliability improvement; sessions without proxies are about 50%; both together are 90%+.

How much does an Instagram session "cost" on the grey market?
Burner accounts trade for $1–8 each in 2026. Warm-up takes 3–5 days of programmatic human-like browsing before the account tolerates programmatic API calls. Expect 5–15% session loss per week even when you do everything right.

Is scraping Instagram legal?
Scraping public data is generally permitted under U.S. law (see hiQ v. LinkedIn). Meta's TOS prohibit automated collection, so the contractual risk falls on the party hitting Instagram. A managed API moves that contractual relationship to the provider — you only sign the API's TOS.

Can I scrape Instagram stories without a logged-in session?
No. Stories require an authenticated session. HikerAPI handles this — GET /v1/user/stories returns story media for any public account.

Does Instagram's GraphQL endpoint require auth?
The /graphql/query/ web endpoint accepts unauthenticated requests but is heavily rate-limited per IP and returns reduced data (no email, sometimes no follower counts). The mobile API gives the full payload but requires a session.

What's the cheapest reliable way to get Instagram data without writing infrastructure?
A pay-per-request managed API. HikerAPI is $0.0006 per request with no monthly minimum and 100 free requests on signup — see the pricing page for the full schedule.


Get started

The fastest way to skip all of the above is to create a free HikerAPI account — 100 requests included, no credit card required.

Related Guides

Ready to get started?

100 free API requests. No credit card required.

Sign Up Free