一个普通的网络安全工程师。 在 GithubEmail 可以联系我

GLB下部署的Sanic应用获取用户ip

by vivi
unimportant

前情提要

最近要把给部署的机器都升级,从Debian9升级到Debian10. Debian9的默认python版本是python3.5,而Debian10的默认版本是python3.7,为了部署方便和日后升级,我把要部署的Sanic应用也从python3.5升级到python3.7了。

升级完之后,在测试的时候却发现,request.remote_addr不能像往常一样获取客户端IP了,经过一番折腾,终于搞定。

环境

我们的应用架构较为简单,GLB --> Nginx --> Sanic.

  • GLB:Google Load Balancing. 谷歌的全球范围的负载均衡器,我们通过GLB,来进行路由转发、流量分发到相应的backend。每个backend是一个实例组,实例组下面有若干个VM实例(VM,既虚拟机)。

  • Nginx:不多说。每个实例运行Nginx进行机器内部的负载均衡,把流量分发到机器内部的若干个Sanic进程

  • Sanic:一个python高性能web异步框架,框架特性开箱即用,无需像Flask一样需要部署Gunicone 或者uWISG之类的Web Server。此时升级是从 Sanic 19.3.1升级到Sanic==21.6.0

Nginx 配置如下

    location / {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;

        real_ip_header    X-Forwarded-For;
        real_ip_recursive on;
        proxy_pass http://backend_hosts;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }

源码解析

查看Sanic源码,发现从 Sanic 19.3.1升级到Sanic==21.6.0后,request.remote_addr代码有所更改。

remote_addr

# Sanic 19.3.1
@property
def remote_addr(self):
    """Attempt to return the original client ip based on X-Forwarded-For.
    :return: original client ip.
    """
    if not hasattr(self, "_remote_addr"):
        forwarded_for = self.headers.get("X-Forwarded-For", "").split(",")
        remote_addrs = [
            addr
            for addr in [addr.strip() for addr in forwarded_for]
            if addr
        ]
        if len(remote_addrs) > 0:
            self._remote_addr = remote_addrs[0]
        else:
            self._remote_addr = ""
    return self._remote_addr
# Sanic 21.6.0
    @property
    def remote_addr(self) -> str:
        """
        Client IP address, if available.
        1. proxied remote address `self.forwarded['for']`
        2. local remote address `self.ip`

        :return: IPv4, bracketed IPv6, UNIX socket name or arbitrary string
        :rtype: str
        """
        if not hasattr(self, "_remote_addr"):
            self._remote_addr = str(
                self.forwarded.get("for", "")
            )  # or self.ip
        return self._remote_addr

从源码看来,在19.3.1版本获取ip时,是直接从http header获取X-Forwarded-For字段,并且如果X-Forwarded-For里面存在多个ip,取第一个。而升级到了21.6.0的时候,则多了一些处理。

forwarded

    @property
    def forwarded(self) -> Options:
        """
        Active proxy information obtained from request headers, as specified in
        Sanic configuration.

        Field names by, for, proto, host, port and path are normalized.
        - for and by IPv6 addresses are bracketed
        - port (int) is only set by port headers, not from host.
        - path is url-unencoded

        Additional values may be available from new style Forwarded headers.

        :return: forwarded address info
        :rtype: Dict[str, str]
        """
        if self.parsed_forwarded is None:
            self.parsed_forwarded = (
                parse_forwarded(self.headers, self.app.config)      # 注释1
                or parse_xforwarded(self.headers, self.app.config)  # 注释2
                or {}
            )
        return self.parsed_forwarded

注释1是从http header的forwarded字段解析, forwarded可以简单理解成X-Forwarded-For, X-Forwarded-Host,X-Forwarded-Proto的结合体,我们结合GLB下的使用,通过X-Forwarded-ForX-Real-Ip获取客户端ip。

parse_xforwarded

def parse_xforwarded(headers, config) -> Optional[Options]:
    """Parse traditional proxy headers."""
    real_ip_header = config.REAL_IP_HEADER
    proxies_count = config.PROXIES_COUNT
    addr = real_ip_header and headers.getone(real_ip_header, None)
    if not addr and proxies_count:
        assert proxies_count > 0
        try:
            # Combine, split and filter multiple headers' entries
            forwarded_for = headers.getall(config.FORWARDED_FOR_HEADER)
            proxies = [
                p
                for p in (
                    p.strip() for h in forwarded_for for p in h.split(",")
                )
                if p
            ]
            addr = proxies[-proxies_count]
        except (KeyError, IndexError):
            pass
    # No processing of other headers if no address is found
    if not addr:
        return None
      
    def options():
        yield "for", addr
        for key, header in (
            ("proto", "x-scheme"),
            ("proto", "x-forwarded-proto"),  # Overrides X-Scheme if present
            ("host", "x-forwarded-host"),
            ("port", "x-forwarded-port"),
            ("path", "x-forwarded-path"),
        ):
            yield key, headers.getone(header, None)

    return fwd_normalize(options())

看到这里突然发现多了config.REAL_IP_HEADERconfig.PROXIES_COUNT这两个东西。到官网一看才发现,Sanic基于安全考虑,在后面升级的版本,也就是在("19.3.1", "20.6.0")这两个版本的中间的版本加入了这个功能:反向代理后的服务必须要设置如下一项或多项 配置, 才能正常获取用户ip,以防止一些恶意客户端可能会使用代理头来隐藏自己的 IP。

  • FORWARDED_SECRET

  • REAL_IP_HEADER

  • PROXIES_COUNT

测试

测试代码如下:

@app.route("/")
    async def index(request):
        logging.info(request.headers)
        logging.info(request.remote_addr)
        return sanic.response.text("Ok")

通过X-Forwarded-For获取,在启动参数加入以下

app.config.PROXIES_COUNT = 2

结果如下:

INFO [2021-11-28 11:15:07,597][16889][root] <Header('host': 'xxx.com', 'x-real-ip': '35.190.xx.xx','x-scheme': 'http', 'user-agent': 'curl/7.64.1', 'accept': '*/*',  'x-forwarded-for': '113.118.xxx.xxx, 35.190.xx.xx', 'x-forwarded-proto': 'https')>
INFO [2021-11-28 11:15:07,597][16889][root] 113.118.xxx.xxx

通过X-Real-IP获取,在启动参数加入以下

app.config.REAL_IP_HEADER = "X-Real-IP"

结果如下:

INFO [2021-11-28 11:35:36,176][18851][root] <Header('host': 'test.roiquery.com', 'x-real-ip': '35.190.xx.xx', 'x-scheme': 'http', 'user-agent': 'curl/7.64.1', 'accept': '*/*', 'x-cloud-trace-context': '62fee86592b6df1704b808585d1098b8/4774134022823104383', 'via': '1.1 google', 'x-forwarded-for': '113.118.x x x.xxx, 35.190.xxx.xxx', 'x-forwarded-proto': 'https')>
INFO [2021-11-28 11:35:36,176][18851][root] 35.190.xx.xx

发现问题,request.remote_addr获取到的ip是GLB的ip。为什么呢?

理解这个问题首先要知道Nginx配置里面的几个常用配置的含义

proxy_set_header X-Real-IP $remote_addr;
real_ip_header    X-Forwarded-For;                        
real_ip_recursive on;
  • proxy_set_header X-Real-IP $remote_addr;

    $remote_addr表示Nginx获取到的用户ip,并把ip放到http header的X-Real-IP字段里,传到下一端。在我们的架构中,Nginx上一层是GLB

  • real_ip_header X-Forwarded-For;

    是指从接收到报文的哪个http首部去获取前代理传送的用户ip。参考GLB文档,GLB会把客户端ip和GLB自身ip设置到http header的X-Forwarded-For里。

  • real_ip_recursive on;

    If recursive search is disabled, the original client address that matches one of the trusted addresses is replaced by the last address sent in the request header field defined by the [real_ip_header](http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header) directive. If recursive search is enabled, the original client address that matches one of the trusted addresses is replaced by the last non-trusted address sent in the request header field.
    当real_ip_recursive为off时,nginx会把real_ip_header指定的HTTP头中的最后一个IP当成真实IP,当real_ip_recursive为on时,nginx会把real_ip_header指定的HTTP头中的最后一个不是信任服务器的IP当成真实IP。在我们的测试场景下,`X-Forwarded-For`值为'113.118.x x x.xxx, 35.190.xxx.xxx'。
    

那什么是信任服务器呢?这就得提到另外一个配置:set_real_ip_from

  • set_real_ip_from
    Defines trusted addresses that are known to send correct replacement addresses.
      
    定义已知可发送正确替换地址的可信地址。有些拗口,大概的意思是告诉Nginx,这个配置的ip不是真正的用户ip,这个ip的服务器是信任的服务器。
    

也就是说,其实在我这次测试场景中,real_ip_recursive 不管是off还是on,最终获取的ip都是35.190.xx.xx。因为35.190.xxx.xxx是最后一个ip。而当我把以下配置加上

set_real_ip_from 35.190.00.00/16;

最终结果如下

INFO [2021-11-28 12:10:41,205][16889][root] <Header('host': 'xxx.com', 'x-real-ip': '113.118.xxx.xxx','x-scheme': 'http', 'user-agent': 'curl/7.64.1', 'accept': '*/*',  'x-forwarded-for': '113.118.xxx.xxx, 35.190.xx.xx', 'x-forwarded-proto': 'https')>
INFO [2021-11-28 12:10:41,205][16889][root] 113.118.xxx.xxx

总结

获取ip的这个两种方式的区别就不说了,大伙儿按需选择,网上众说纷云,稍微查下就有答案。

参考链接: