Python: String ConcatenationPosted on 19 Jul
I have recently revisited my notes around string concatenation and along side a web search found a great deal of debate on what is the best method. Even the python wiki page that I used to point people to prove my claim (that using + (plus) is least efficient) now has a warning that this may no longer be true. (Note: This does not apply to python 3.0+, when using python 3+ plus has been optimized to not use intermediate strings when appending multiple strings).
There is no debate that there is inefficient use of plus, such as the following way to concat a list of strings:
l = [str(i) for i in range(100)] ss = '' for s in l: ss += s
The more effecient (and intuitive way) would be to do the following:
ss = ''.join(l)
However the debate comes as to which of the following cases is most effecient:
# plus ss = l + l + l # + ... to l # join ss = ''.join(l) # string formating ss = ('%s' * len(l)) % tuple(l)
I found various published experiments which do not produce statistically significant numbers and jump to conclusions about the speed of the various methods, so I decided to run my own little experiment.
The test code used to run the experiments is published on github. Reading through the code you can see that we are testing the three styles as explained (under background) above in the latter code block.
The experiments were run on a hex core hyper threaded 64 bit server on python 2.7 in an openvz virtual machine running debian 7 (wheezy).
Concat strings using the plus operator is slightly faster for when you are concatenating less than 15 elements. Overall though the join operation is the most superior followed by the %s string formatting. I am still sticking to not recommending the plus operator on python 2.7. However, it is no longer as much of a pet peeve to me as it used to be.
One thing to mention that many people do not know about, in python (much like C) strings that are placed right after each other are automatically concatenated by the interpreter
s = 'abcde' \ 'fg' 'hijk' s == 'abcdefghijk' # True