Using Lists when dealing with large dataset is not always the best choice. As the data gets too big it will quickly consume your computer’s memory.

That’s when you should consider using Python’s Generators.

Let’s say you want to calculcate the sum of 10 000 items:

1
2
3
my_dataset = [i for i in range(10000)]
print(sum(my_dataset))
#49995000

This would of course work, but the impact on memory would be very high. We can do the same thing with generators and then compare the sizes.

1
2
3
my_dataset_gen = (i for i in range(10000)) # generator comprehension
print(sum(my_dataset_gen))
#49995000

The key difference is that a generator computes the elements lazily, meaning it produces only one item at a time and only when asked for it.

Now let’s compare the sizes of both results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import sys

my_dataset = [i for i in range(10000)]
print(sum(my_dataset))

print(sys.getsizeof(my_dataset), 'bytes')
# 87 632 bytes

my_dataset_gen = (i for i in range(10000)) # generator comprehension
print(sum(my_dataset_gen))

print(sys.getsizeof(my_dataset_gen), 'bytes')
# 128 bytes

Using List resulted in 87 632 bytes, while using Generator resulted only in 128 bytes. That surely is a difference that is worth considering.

I guess considering which one to use depends on the situation, but knowing about this great pythonic feature can save a lot of time and issues down the road.