Parsing URL query parameters in Python

I’ve been dabbling with Python for several months now, but I’m not quite as proficient with it as I’d like.

I was hacking on some stuff recently, and needed to parse the query parameters in a URL. Python has URL parsing, but it doesn’t include querystring parsing.

This was the pleasantly easy solution:

from urlparse import urlparse
url = urlparse('http://www.google.com/search?hl=en&safe=off&q=atomized&btnG=Search')
params = dict([part.split('=') for part in url[4].split('&')])

What’s going on here

Let’s break the example down a bit.

  1. from urlparse import urlparse

    This pulls in Python’s URL parser.

  2. url = urlparse('http://www.google.com/search?hl=en&safe=off&q=atomized&btnG=Search')

    This passes the URL through urlparse, leaving us with a tuple of URL components.

  3. url[4].split('&')

    This splits the querystring on “&” characters, leaving us with a list of “key=val” scalars. The output of this would be:

    ['hl=en', 'safe=off', 'q=atomized', 'btnG=Search']

  4. [part.split('=') for part in url[4].split('&')]

    This is Python’s list mapping, also known as list comprehension. It allows us to map a function over a list – in this case, part.split('=') on the list of “key=value” pairs from the previous step. This leaves us with:

    [['hl', 'en'], ['safe', 'off'], ['q', 'atomized'], ['btnG', 'Search']]

    That is, an array of arrays, where the first member of the child array is the key and the second is the value.

  5. params = dict([part.split('=') for part in url[4].split('&')])

    The last part of this is dict(), which turns the array structure above into a dictionary:

    {'q': 'atomized', 'safe': 'off', 'btnG': 'Search', 'hl': 'en'}
    

    Thix is roughly equivalent to a hash map or associative array. It allows us to access specific keys, such as params['q'].

2008/06/02
Previously On Atomized:

Discussion

Aren’t URLs parsed from HTML actually supposed to use “&” instead of just the & in URLs?

Jan
2008/06/03

When they’re output in HTML or XML, yes, they need to be escaped. You want to avoid escaping them until it’s time to output them.

Ideally, this stuff is transparent. The escaping is only a property of HTML and XML, so when you get values out, you get the unescaped original value.

Ian
2008/06/03

Thanks for that article.
Can somebody refer me the samo but for PHP

Regards
Dimi

web design
2008/06/27

PHP does it for you: $_GET.

Ian
2008/06/27

It’ll be better if
part.split(‘=’)
changed to
part.split(‘=’,1).

Thanks a lot.

lurker
2008/07/10

That’s a good tip, and it should improve performance a bit. Any actual = in the strings would get URL encoded to %3D.

Ian
2008/07/11

You don’t need the square brackets, in the dict expression… this will work as well and it’s slightly more efficient since it uses a generator expression

params = dict(part.split(‘=’) for part in url[4].split(`&`))

No spam intended but if you are “just getting into python” check this article a friend wrote that is basically a dense list of python tricks and hacks http://www.siafoo.net/article/52

Stou
2008/07/30

No urlparse needed:

def parse_qs(u):
return ‘?’ in u and dict(p.split(‘=’) for p in u[u.index('?') + 1:].split(‘&’)) or {}

ast
2008/08/06

Batteries included:

import urlparse, cgi
cgi.parse_qs(urlparse.urlsplit(foo).query)

t
2008/09/13

t’s comment is correct, and also the only complete solution compared to the others suggested. The cgi.parse_qs function will correctly handle replacing plus signs with space, as well as decoding stuff like %20.

Ace
2008/11/03

import cgi
from urlparse import urlparse

url = “http://www.example.com/string?key1=value1&key2=value2&key3=value3&”
dic = cgi.parse_qs(urlparse(url)[4])
for i in dic.keys():
dic[i] = “”.join(dic[i])
print dic

output: {‘key3′: ‘value3′, ‘key2′: ‘value2′, ‘key1′: ‘value1′}

Vinay
2010/02/18

Hey, thanks for the little tutorial! It was really helpful and I was looking over an hour for such a thing like this :)

Valentin
2010/05/26

MY EYES ARE BLEEDING

steve O
2010/05/30

Excellent one line-function ! Thank you !

Strzelewicz Alexandre
2010/12/08

Participate