The answer is that can get unexpected results if you server works in Django (and probably in any python framework/application). That's because python's BaseHTTPServer.BaseHTTPRequestHandler handles urls according to standards, not from a human point of view.
Let's see with an example, consider the next request:
http://vaig.be/identify_myself?name=Marc Garcia&country=Catalonia
if you request it with a browser, it will escape the space in the url, so the server will get:
http://vaig.be/identify_myself?name=Marc%20Garcia&country=Catalonia
but what if the client uses, for example, python's urllib2.urlopen without escaping (using urllib.quote)? Of course it is a mistake, but you, as server side developer can't control your clients.
In that case the whole request that server receives is:
GET http://vaig.be/identify_myself?name=Marc Garcia&country=Catalonia HTTP/1.1
and after being processed (splitted) by python's BaseHTTPServer.BaseHTTPRequestHandler, what we'll get from django is:
request.method == 'GET'
request.META['QUERY_STRING'] == 'name=Marc'
request.META['SERVER_PROTOCOL'] == 'Garcia&country=Catalonia HTTP/1.1'
so our request.GET dictionary will look like:
request.GET == {'name': 'Marc'}
what is not the expected value (from a human point of view).
So, what we can do for avoiding this result is quite easy (and of course tricky), and is getting the GET values not from django request.GET dictionary but from the one returned by this function:
def _manual_GET(request):
if ' ' in request.META['SERVER_PROTOCOL']:
query_string = ' '.join(
[request.META['QUERY_STRING']] +
request.META['SERVER_PROTOCOL'].split(' ')[:-1]
)
args = query_string.split('&')
result = {}
for arg in args:
key, value = arg.split('=', 1)
result[key] = value
return result
else:
return request.GET
request.META['SERVER_PROTOCOL'] will still be wrong. What you can do to solve this problem is make this function return a tuple of objects, the first one is the request.GET and the second request.META['SERVER_PROTOCOL'].
ReplyDeleteAnother good thing to do with this code is put it inside a proccess_view() middleware function, so you will not have to call this function on every request.
You're right, and probably it would be easy to create a middleware that just modifies the META and GET attributes of the request to the "correct" ones, so you forget on that issue just installing it.
ReplyDeleteActually what I posted is what I needed, just a patch for one view that return the GET values (I really don't care on the protocol value in this case).
Thanks for your comment.
In my opinion you better don't mess with these things. It's against the HTTP protocol if the client doesn't escape the URL properly. This has nothing to do with human failure. The HTTP protocol is not designed to be used by humans manually.
ReplyDeleteSpammer bots came to my site and produced same error. Now I know why - they are using various url get libraries without escaping.
ReplyDeleteI don't get it. Wouldn't you want something that is violating the specs to fail rather than just disguising the problem?
ReplyDeleteOk, it looks like I don't care not respecting the protocol, and it's not the case at all. I really hate to be flexible with some parts of the reality that are wrong, but sadly I can't change them.
ReplyDeleteLet me use an example, that it'll sound familiar to all of you, and that is similar to my case...
You know that there is a standard for HTML and CSS, as we have it for HTTP. And when you develop a website, you spend more time creating bad html and css than following the standards, just because a browser that you hate, and you can't do anything to change (ie6).
So my case is exactly the same. My server has clients that doesn't follow the standards (because they are not escaping urls). So I have two options, give good results even when this happens, or just save and give wrong data because being strict with the standards. And the fact is that I'm not going to get wrong results just because of following the standards.
I hope you can get my point of view, and the moral of the post now. :)