[Python] Python and concatenating strings

Bob Miller kbob at jogger-egg.com
Tue Feb 27 16:10:27 PST 2007


Rob Hudson wrote:

> There's lots of that.  Is there a faster way to build a string?

Keep in mind that Python strings are immutable.  The += operator
creates a new string and binds the variable to it.  In other words,
the Python code

   a += 'bcd'

is roughly equal to this C code.

    tmp = malloc(strlen(a) + strlen("bcd") + 1;
    strcpy(tmp, a);
    strcpy(tmp + strlen(a), "bcd");
    a = tmp;

(except strlen() on Python strings runs in constant time).

> I know this is particularly slow in Java since it allocates only enough 
> for the string each time.  If you know you're going to do a lot of this, 
> they recommend StringBuilder.

Python has StringIO and cStringIO classes.  But they are much slower
than naive string concatenation.

    ~> cat stringbuf.py
    #!/usr/bin/python

    import profile
    import StringIO
    import cStringIO

    def test_string(nrep, ncat):
	for i in range(nrep):
	    s = ''
	    for j in range(ncat):
		s += 'word'

    def test_StringIO(nrep, ncat):
	for i in range(nrep):
	    s = StringIO.StringIO()
	    for j in range(ncat):
		s.write('word')
	    s.getvalue()

    def test_cStringIO(nrep, ncat):
	for i in range(nrep):
	    s = cStringIO.StringIO()
	    for j in range(ncat):
		s.write('word')
	    s.getvalue()

    test_string(10, 10)
    test_StringIO(10, 10)
    test_cStringIO(10, 10)

    profile.run('test_string(10, 1000)')
    profile.run('test_StringIO(10, 1000)')
    profile.run('test_cStringIO(10, 1000)')
    ~> python stringbuf.py | grep seconds
	     15 function calls in 0.004 CPU seconds
	     50065 function calls in 0.920 CPU seconds
	     10035 function calls in 0.200 CPU seconds


-- 
Bob Miller                              K<bob>
                                        kbob at jogger-egg.com


More information about the Python mailing list