![]() Soup = BeautifulSoup.BeautifulSoup(code('utf-8','ignore'))īy doing this you will discard any wrong symbols from the page source and BeautifulSoup will guess the encoding correctly. In your case this page has wrong utf-8 data which confuses BeautifulSoup and makes it think that your page uses windows-1252, you can do this trick: The code seems to be wrong (double BeatifulSoup?): AttributeError: type object ‘BeautifulSoup’ has no attribute ‘BeautifulSoup’ – maybe the interface changed? I guess this is because the content is in ISO-8859-1(5) and the meta http-equiv content-type incorrectly says “UTF-8″. Oh, in this special case you could also x._str_(encoding='latin1'). I’ve tried doing code(‘utf-8’), code(‘latin-1′), also tried messing around with the fromEncoding parameter to BeautifulSoup, setting it to fromEncoding=’utf-8′ and fromEncoding=’latin-1’, but still no dice. It’s apparently garbling up all the spanish special characters (accents and whatnot). However, once I try to feed the content variable to BeautifulSoup it all gets messed up: If I do a print of the content variable at that point, all the spanish special characters seem to be working fine. I’m getting the contents with the requests library: I’m writing a crawler with Python using BeautifulSoup, and everything was going swimmingly till I ran into this site:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |