The Fall Of Mighty Django, Exploiting Unicode Case Transformations

~ 5 minute read
Crafted : 10 months ago Updated : 10 months ago
#infosec #cybersecurity #websec #bug-bounty #guide #vulnerability #unicode

Hello Luvs,

Thousands of websites use Django from personal to cryptocurrency APIs I consider Django a reasonably secure framework. Even our cases study vulnerability discovered by the Django team.


Table Of Contents 

Case Study CVE-2019-19844

Patch Analysis

Exploiting Unicode Case Transformations


Hunting for variation enter CodeQL



As somebody who uses Django, I can tell you it's a beautiful piece of Software. As far as the web exists, handling different languages (a.k.a character encoding) was and still is causing various issues, but most impactful are security ones. And when it comes to character encoding and security, security researchers have found multiple ways to exploit encoding issues. here are a few examples: 


  • Bypassing security counters (e.g., WAF, Browser XSS filters) 
  • Memory corruptions  
  • IDN homograph attacks 
  • Normalization vulnerabilities 


Case Study CVE-2019-19844

Now let's get on CVE-2019-19844 is an interesting vulnerability. Let's start with the advisory first.

Django's password-reset form uses a case-insensitive query to retrieve accounts matching the email address requesting the password reset. Because this typically involves explicit or implicit case transformations, an attacker who knows the email address associated with a user account can craft an email address which is distinct from the address associated with that account, but which -- due to the behavior of Unicode case transformations -- ceases to be distinct after case transformation, or which will otherwise compare equal given database case-transformation or collation behavior. In such a situation, the attacker can receive a valid password-reset token for the user account.

Patch Analysis

Here is the vulnerable version look at the final return statement. 

class ReadOnlyPasswordHashWidget(forms.Widget):

   template_name = 'auth/widgets/read_only_password_hash.html'

   read_only = True

def get_users(self, email):

     that prevent inactive users and users with unusable passwords from

     resetting their password.


     email_field_name = UserModel.get_email_field_name()

     active_users = UserModel._default_manager.filter(**{

       '%s__iexact' % UserModel.get_email_field_name(): email,

       'is_active': True,


    return (u for u in active_users if u.has_usable_password())

In the Patched version, we have a new function and a different return statement.

   def _unicode_ci_compare(s1, s2):


   Perform a case-insensitive comparison of two identifiers, using the

   a recommended algorithm from Unicode Technical Report 36, section



   return unicodedata.normalize('NFKC', s1).casefold() == unicodedata.normalize('NFKC', s2).casefold()

return (

       u for u in active_users

       if u.has_usable_password() and

       _unicode_ci_compare(email, getattr(u, email_field_name))



As you can see in the patched version, two things are changed. First, data is passed from the database instead of user input. The second _unicode_ci_compare will normalize and compare Unicode data. 

Exploiting Unicode Case Transformations

The most common way to abuse these issues is by exploiting forget password mechanisms. Because usually domains and emails are lowercased before usage, Before we continue, please read this

Specially pay attention to this part: 

One lesser-known occurrence is Unicode Case Mapping Collisions. Loosely speaking, a collision occurs when two different characters are uppercased or lowercased into the same character. This effect is commonly found at the boundary between two different protocols, like email and domain names.

Here is a list of collisions suggested by this article. 

💥 Uppercase Transformation Collisions

CharCode PointOutput Char

ß 0x00DF SS

ı 0x0131 I

ſ 0x017F S

ff 0xFB00 FF

fi 0xFB01 FI

fl 0xFB02 FL

ffi 0xFB03 FFI

ffl 0xFB04 FFL

ſt 0xFB05 ST

st 0xFB06 ST

💥 Lowercase Transformation Collisions

CharCode PointOutput Char

K 0x212A k


I didn't fuzz for more because, for my test case, the character "i" is enough, and you will find out why. Now we know these collisions can happen, we can have some test cases.

I tested Python, PHP, and Javascript, and C-Sharp. 


// Chrome 79 / NodeJS 

'[email protected]ıo'.toUpperCase() === '[email protected]'.toUpperCase()


// Python 3.8

'\u0131'.upper() == "i".upper()



//Net core 3.1

"\u0131".ToUpper() == "i".ToUpper()



// PHP 7.4

strtoupper('i') == strtoupper('ı');



Interestingly, PHP 7.4 and .NET Core 3.1 doesn’t look vulnerable. But both Python and JavaScript are. What makes this so interesting is the issue is more significant than a web framework, and probably a lot more programming languages out there are prone to these attacks. 



to demonstrate exploitation, I've containerized a vulnerable docker image based on this repo . You can download it from my github here

All you have to is running the following commands.

docker-compose run --service-ports web python migrate --no-input

docker-compose run --service-ports web python createsuperuser [email protected] --username 0xsha

Then type the password two times. 

finally head on to http://localhost:8000/accounts/password-reset 

enter [email protected]ıo and hit enter (note that the character "i" is malformed) 

And look and the console specially email part. Now you can check the console, and you'll see the password reset token the malformed sent to the malicious email. 


Here is the video for demo lovers. 

As you can see, it accepts our malicious email and sends a password reset token to us. well played. 


Hunting for variation enter CodeQL


QL is under active development, and in my opinion, is one of the very few static analysis tools that makes sense in the modern era. CodeQL helps you explore code quickly to find and eradicate all variants of vulnerabilities before they become a problem.By automating the variant analysis, CodeQL enables product security teams to find zero-days and variants of critical vulnerabilities.

As you may have noticed, this particular vulnerability type is not Django specific and its language level and how default functions (upper(), strtoupper(), etc.) handle encoding situations. So as a bug bounty hunter, you can recheck your entire bounty pipeline for this vulnerability.

In the following days, I'll be in touch with Github security lab guys to develop a generic query to find the variation of this vulnerability. Here is what we can do.

  1.  check if the input is controlled 
  2.  check there are no normalization functions (e.g., unicodedata.normalize in python)
  3.  check the data is case transformed (e.g., upper(), lower() in python)
  4.  check data passed to an exploitable sink ( e.g., default sendmail() in python)

I'll update this post when I have news about queries, till then you can find similar vulnerabilities manually or by using simple fuzzing scripts. 


Even reasonably secure frameworks like Django do have vulnerabilities. All you have to is look for them. Character encoding and transformation can lead to serious security vulnerabilities. I believe this topic has a lot of research potential for future vulnerabilities, just like HTTP request smuggling, which recently got more attention due to this research, transformation and encoding are issues are ages but still full of mystery and surprises. guess how many bug bounty sites are vulnerable ? ;) Due to the nature of character encoding issues, it's not that hard to build very specific fuzzers to perform more tests on both databases and programming languages.  

Happy hunting. 

Till Then Luvs 


Assist me:
Buy Me a Coffee at