fbpx

Loop vs Map vs List Comprehension

As a Python developer I have seen lots of different preferences in this topic. These preferences aside, I was set to find which of them is faster in which situations.

TLDR;

If you require a list of results almost always use a list comprehension. If no results are required, using a simple loop is simpler to read and faster to run. Never use the builtin map, unless its more aesthetically appealing for that piece of code and your application does not need the speed improvement.

No Result Required

The first pattern I have seen is the use of Map, List Comprehension vs a standard loop in the case where there is no result required. Therefore the resulting list (returned by python’s map function or list comp) is ignored. For example, the following

for entry in entries:
    process(entry)

vs

map(process, entries)
[process(entry) for entry in entries]

The results are as follows, there were three variations of the test, the first is a single loop (O(n)), second is loop within a loop (O(n2)), and third is O(n3), three loops. The test code can be seen on github.

Result Required

This is the traditional pattern, where the generated list is useful. With this approach the following is how fast each approach is in python. For example

results = []
for entry in entries:
    results.append(process(entry))

vs

results = map(process, entries)
results = [process(entry) for entry in entries]

Same as the previous test, the three variations were tested, O(n), O(n2), and O(n3).

Conclusions

If you require a list of results almost always use a list comprehension. If no results are required, using a simple loop is simpler to read and faster to run. Never use the builtin map, unless its more aesthetically appealing for that piece of code and your application does not need the speed improvement.

7zip vs Gzip … compression and speed

Since we work with twitter, this comparison will be using twitter as a data-source. We have randomly selected 5k users and used their public tweets for this test.

Setup

The setup is not extremely important as we are more concerned with the ratio’s for our conclusions, for the sake of completeness the tests were run in python 2.7.3 (gcc 4.7.2) with the latest version of pylzma from pypi (0.4.4) and gzip from the standard python library. The hardware configuration is CPU E5-2620 @ 2.00GHz and the software was allowed to use the full capacity of this hardware.

Data

As mentioned a random sample of 5k users were selected for this experiment. The following are some stats around the data.

Users:                      5,000
Avg # of Tweets:            1433
Std Dev:                    1018
Avg uncompressed size:      1228518 bytes (1.2Mb)
Std Dev size:               874973 bytes (0.8 Mb)

Compression

There is no doubt that Lzma compresses much smaller than zlib, in these tests we want to see exactly how much better the compression ratio actually is.

lzma avg size: 100286 (8.2%, 97kb)
zlib avg size: 142456 (11.6%, 139kb)

Speed

Lzma compresses better than BZ2 and faster, but it is well known that zlib compresses faster. Here is a comparison of the compression speed difference on our dataset.

lzma  3884.27s (776.8ms / user)
zlib   184.40s (36.9ms / user)

Note: lzma used 2x as much memory as the zlib test

Conclusion

With our dataset, lzma compressed down to an average of 8% of the dataset size, while zlib compressed to 12%. In measurable numbers, for 5,000 users tweets using lzma would save 200Mb, an average savings of 41kb per user. Regarding compression speed, using 7zip we spent 1 hour more, an average of 0.7s more spent per user.

To put things into perspective, if we are processing 1 million users, gzip would compress 9 days sooner, but have an extra overhead of 40 Gb.

Neither the compression speed nor the size are really negligible in this case, so depending on your specific needs you may pick one over the other. Generally though, for many people / usecases disk space is usually not a concern as much as speed.

Introducing A Smarter Dashboard For Identifying The ROI of Social Media

This is an update I’ve been waiting for. It’s always been a challenge to quantify the actual value of a social lead and track their value over time like in a traditional sales model. This latest update, provides LeadSift clients with exactly that! Today we’re excited to share with you one of the latest things we’ve been working on at LeadSift: the launch of an ROI driven dashboard. LeadSift now delivers clients with insights and information that will help them better understand their sales funnel and track the value of their social leads. Let’s dive in so you can see exactly what I’m talking about:

Accessing the LeadSift Dashboard
For those of you who have already accessed the analytics section, some of this is just a refresher. For anyone who is responsible for delivering success metrics for their campaigns, this is going to be very helpful. To start, you simply click “Analytics” in the top corner of the LeadSift platform. LeadSift Analytics Platform

Once you’ve clicked “Analytics” you will be taken to the dashboard where all the magic begins.

Monitor Social Leads In Your Pipeline
As a tool that is committed to delivering businesses and brands with relevant social leads, we now arm you with the ability to assign and track the dollar value of leads through the entire engagement cycle. You can assign the value of a lead, track their progress through the funnel and assign value to the various types of engagement you may have. To see how many leads have come through the LeadSift platform, select the appropriate date range and you’ll see the analytics graph: LeadSift Platform | Dashboard | Lead Identification

The dashboard will highlight how many posts were scanned by LeadSift to identify the total number of social leads in your pipeline. What gets me most excited, is the opportunity index. The opportunity index is the total value of all the leads that are in the pipeline times the value of a converted social lead. For example, if you’re selling a product worth $100 and there are 1,000 leads in the pipeline; the total value of the opportunities delivered by LeadSift is $100,000. In the example above, the total value of the leads delivered by LeadSift was $115,370 – That’s the potential ROI.

Track The Value Of The Social Leads In Your Pipeline
We recognize that every business and brand will value social leads differently. Thus, when we created the opportunity index we wanted to ensure that you had the ability to customize the value based on your situation. To identify the value of your actual lead, you simply click “edit” at the bottom of the dashboard: LeadSift | Sales Funnel | Lead Identification Once here, it’s time for you to identify the base value of a social lead. Ask yourself how much is the Customer Life Time Value (CLTV) if you sold one of your product or service to one of the prospects. That value is what you would then place in the “Base lead value” section. Social leads deliver these potential customers to your sales pipeline. You can assign the probability of closing the deal on social by assigning values based on interactions (demo, pitch, proposal submission etc.) with them; that is how a Sales pipeline is built. Using this logic, identify the close rate probability based on each social engagement, as each interaction would be helping move that user further into your sales funnel. LeadSift Edit Screen The value of the leads you’ve interacted with are then calculated based on these figures and delivered back to the main dashboard: LeadSift Funnel
Understand Who & What Is In Your Sales Pipeline
Finally, we’ve given you the ability to see the social leads that are most likely to convert and the demographics for the leads we’ve delivered. This can help you drill down and identify which leads require some serious attention while the demographics provide insight that can guide a series of marketing and communications decisions.

LeadSift | Demographic Insights | Lead Software Understand The Success Of Your Efforts

Screen Shot 2014-03-18 at 4.31.26 PM

The new LeadSift dashboard also highlights key insights around how your audience is interacting with your content. It tracks the levels of engagement, the click through rates on your content and even tracks your average response time. The combination of these insights give you clear indicators of what areas you need to improve in your social media efforts and arm you with the knowledge to deliver richer content and better customer service.

For the first time, you can now attribute a value to social leads and truly track the ROI of social media. We’re always working hard to create new features that will help you drive real results through social media. We’re continuing to evolve and develop LeadSift and would love to hear what you want in the future.

Sign up for a 7-day trial today!

LeadSift Now Offering Self Serve Software For Small And Medium Sized Businesses

Our new self serve system enables small and medium size businesses to tap into the power of social data, identify current and potential customers, grow their audience, and distribute targeted messages. By delivering quality leads with a classification metric similar to a Klout Score, this LeadSift platform will allow SMB’s to bring in new business, fill sales quotas, build meaningful relationships through engagement, save time and gather key insights surrounding their audience.

Our latest release is an easy to use platform that we’re excited to share. Customers have already been using our self-serve platform and have accomplished key goals and objectives as a result. Here’s what two of our customers had to say:

“LeadSift is an easy to use platform that has added to our sales channel mix and provided us with new leads we would’ve otherwise missed.” – Aaron Hanson Account Director at Exygy

“Engaging with customers around custom products that make them stick is very important for StickerYou. LeadSift helps to clear through the clutter of conversations in social media and find only most relevant conversations.” – Brad Lister, Sales and Marketing Manager @ StickerYou.com

To help illustrate the power of our self-serve platform, imagine if you were to take one of the famous, Where’s Waldo books and on every page, Waldo was already circled. Now replace the books with your industry, and Waldo with your customers; that’s the power of using LeadSift. We reveal and deliver social engagement opportunities directly to our customers to save them time and money.

Here are a few other ways our self serve platform can help:

  • Quality leads  – LeadSift ranks list of leads by a LeadScore calculated through natural language processing, which currently uses over 50 signals to identify and score leads from social data and accurately identify them as hot, warm or cold.
  • Target customers in specific areas – Geo-targets leads for businesses by allowing them to choose location of leads from city-specific to global target areas.
  • Manage and track social leads – Engage with leads directly through the platform and keep track of the results of each engagement.
  • Personalize engagement – Advanced customer profile information associated with each lead through examining historical posts for every lead to find psychographic information
  • Save valuable time – LeadSift cuts through the noise by sifting through millions of Twitter conversations to find only the most relevant business opportunities.

Interested in signing up for LeadSift? Click here and get started today!

LeadSift Launches App Integration in HootSuite

Our biggest goal here at LeadSift is to help businesses find relevant business leads on social media. We want to make it easier for business owners, community managers and communications teams to find meaningful results through social media. We’re excited to announce that an app integration partnership with HootSuite, the most widely adopted social relationship platform, that was released today is going to put our technology into the hands of new users around the world.

The LeadSift app is now available in the HootSuite App Directorybringing in a new layer of social engagement, social selling and brand monitoring.This new application will allow HootSuite users to build the traditional search stream column but with an added a layer of intelligence that will cut through the noise to deliver both timely and relevant business opportunities.

Each lead in the HootSuite stream column is ranked with a LeadScore of 1-100 to indicate the readiness and quality and then categorized based on the type of lead. Our software will scan through millions of conversations happening online, to find the most relevant opportunities while also differentiating between customer service, social selling, customer churn and more.

HootSuite is a social media management system for businesses and organizations to collaboratively execute campaigns across multiple social networks from one secure, web-based dashboard. Launch marketing campaigns, identify and grow audience, and distribute targeted messages using HootSuite’s unique social media dashboard. Our integration with this platform will help businesses save time and improve their chances of driving meaningful and measurable results directly from social media.

Here’s how it will work:

1. a) If you are an existing HootSuite user, click here to login and directly install the LeadSift app. Skip to step 4.

b) If you do not have a HootSuite account, sign-up here.

2. Once you’re all set up, visit the App Directory from the left navigation bar:

3. Find the LeadSift App and install it, or click this link.

4. Within the LeadSift stream, create a search with keywords specific to your industry and add it as a stream in your HootSuite dashboard. Once added, use HootSuite to engage immediately or archive the lead for future engagement.

And from there you’re on your way! It takes less than 30 seconds to get up and running. If you have any questions about our integration with HootSuite, feel free to reach out – Happy to chat!

Python: String Concatenation

I have recently revisited my notes around string concatenation and along side a web search found a great deal of debate on what is the best method. Even the python wiki page that I used to point people to prove my claim (that using + (plus) is least efficient) now has a warning that this may no longer be true. (Note: This does not apply to python 3.0+, when using python 3+ plus has been optimized to not use intermediate strings when appending multiple strings).

Background

There is no debate that there is inefficient use of plus, such as the following way to concat a list of strings:

l = [str(i) for i in range(100)]
ss = ''
for s in l:
 ss += s

The more effecient (and intuitive way) would be to do the following:

ss = ''.join(l)

However the debate comes as to which of the following cases is most effecient:

# plus
ss = l[0] + l[1] + l[2] # + ... to l[99]
# join
ss = ''.join(l)
# string formating
ss = ('%s' * len(l)) % tuple(l)

I found various published experiments which do not produce statistically significant numbers and jump to conclusions about the speed of the various methods, so I decided to run my own little experiment.

Setup

The test code used to run the experiments is published on github. Reading through the code you can see that we are testing the three styles as explained (under background) above in the latter code block.

The experiments were run on a hex core hyper threaded 64 bit server on python 2.7 in an openvz virtual machine running debian 7 (wheezy).

Results

Conclusions

Concat strings using the plus operator is slightly faster for when you are concatenating less than 15 elements. Overall though the join operation is the most superior followed by the %s string formatting. I am still sticking to not recommending the plus operator on python 2.7. However, it is no longer as much of a pet peeve to me as it used to be.

One thing to mention that many people do not know about, in python (much like C) strings that are placed right after each other are automatically concatenated by the interpreter

s = 'abcde' \
    'fg' 'hijk'

s == 'abcdefghijk' # True