Parsing URL query parameters in Python

I’ve been dabbling with Python for several months now, but I’m not quite as proficient with it as I’d like.

I was hacking on some stuff recently, and needed to parse the query parameters in a URL. Python has URL parsing, but it doesn’t include querystring parsing.

This was the pleasantly easy solution:

from urlparse import urlparse
url = urlparse('http://www.google.com/search?hl=en&safe=off&q=atomized&btnG=Search')
params = dict([part.split('=') for part in url[4].split('&')])

What’s going on here

Let’s break the example down a bit.

  1. from urlparse import urlparse

    This pulls in Python’s URL parser.

  2. url = urlparse('http://www.google.com/search?hl=en&safe=off&q=atomized&btnG=Search')

    This passes the URL through urlparse, leaving us with a tuple of URL components.

  3. url[4].split('&')

    This splits the querystring on “&” characters, leaving us with a list of “key=val” scalars. The output of this would be:

    ['hl=en', 'safe=off', 'q=atomized', 'btnG=Search']

  4. [part.split('=') for part in url[4].split('&')]

    This is Python’s list mapping, also known as list comprehension. It allows us to map a function over a list - in this case, part.split('=') on the list of “key=value” pairs from the previous step. This leaves us with:

    [['hl', 'en'], ['safe', 'off'], ['q', 'atomized'], ['btnG', 'Search']]

    That is, an array of arrays, where the first member of the child array is the key and the second is the value.

  5. params = dict([part.split('=') for part in url[4].split('&')])

    The last part of this is dict(), which turns the array structure above into a dictionary:

    {'q': 'atomized', 'safe': 'off', 'btnG': 'Search', 'hl': 'en'}
    

    Thix is roughly equivalent to a hash map or associative array. It allows us to access specific keys, such as params['q'].

10 Responses to “Parsing URL query parameters in Python”

  1. Jan Says:

    Aren’t URLs parsed from HTML actually supposed to use “&” instead of just the & in URLs?

  2. Ian Says:

    When they’re output in HTML or XML, yes, they need to be escaped. You want to avoid escaping them until it’s time to output them.

    Ideally, this stuff is transparent. The escaping is only a property of HTML and XML, so when you get values out, you get the unescaped original value.

  3. web design Says:

    Thanks for that article.
    Can somebody refer me the samo but for PHP

    Regards
    Dimi

  4. Ian Says:

    PHP does it for you: $_GET.

  5. lurker Says:

    It’ll be better if
    part.split(’=')
    changed to
    part.split(’=',1).

    Thanks a lot.

  6. Ian Says:

    That’s a good tip, and it should improve performance a bit. Any actual = in the strings would get URL encoded to %3D.

  7. Stou Says:

    You don’t need the square brackets, in the dict expression… this will work as well and it’s slightly more efficient since it uses a generator expression

    params = dict(part.split(’=') for part in url[4].split(`&`))

    No spam intended but if you are “just getting into python” check this article a friend wrote that is basically a dense list of python tricks and hacks http://www.siafoo.net/article/52

  8. ast Says:

    No urlparse needed:

    def parse_qs(u):
    return ‘?’ in u and dict(p.split(’=') for p in u[u.index('?') + 1:].split(’&’)) or {}

  9. t Says:

    Batteries included:

    import urlparse, cgi
    cgi.parse_qs(urlparse.urlsplit(foo).query)

  10. Ace Says:

    t’s comment is correct, and also the only complete solution compared to the others suggested. The cgi.parse_qs function will correctly handle replacing plus signs with space, as well as decoding stuff like %20.

Leave a Reply