Skip to content

Errors resulting from unicode normalisation #137465

@gertjanvanzwieten

Description

@gertjanvanzwieten

Bug report

Bug description:

The CPython interpreter appears to apply some form of unicode normalization to variable names, but not to strings, leading to surprising errors as demonstrated in the following code. Though the code contains only the combination d, U+0307 (COMBINING DOT ABOVE) to form a dotted ḋ, some of these get altered to U+1E0B (LATIN SMALL LETTER D WITH DOT ABOVE) resulting in key and type errors. Is this behaviour as intended?

good_dict = {'ḋ': 1}
# {'ḋ': 1} <--- not changed

good_dict['ḋ']
# ok

bad_dict = dict( = 1)
# {'ḋ': 1} <--- changed

#bad_dict['ḋ'] # uncomment to trigger error
# KeyError: 'ḋ'

def f():
    print()

f(=1)
# ok

#f(**good_dict) # uncomment to trigger error
# TypeError: f() got an unexpected keyword argument 'ḋ'

f(**bad_dict)
# ok

CPython versions tested on:

3.13, 3.14, 3.12, 3.11

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedtopic-unicodetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions