Fastest way to pop N items from a large dict The Next CEO of Stack OverflowWhat is the fastest...
MessageLevel in QGIS3
Novel about a guy who is possessed by the divine essence and the world ends?
Is micro rebar a better way to reinforce concrete than rebar?
Elegant way to replace substring in a regex with optional groups in Python?
How do I go from 300 unfinished/half written blog posts, to published posts?
Preparing Indesign booklet with .psd graphics for print
Written every which way
Why do airplanes bank sharply to the right after air-to-air refueling?
Which tube will fit a -(700 x 25c) wheel?
Can you replace a racial trait cantrip when leveling up?
What can we do to stop prior company from asking us questions?
Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?
Is there a difference between "Fahrstuhl" and "Aufzug"
Inappropriate reference requests from Journal reviewers
Unreliable Magic - Is it worth it?
Does it take more energy to get to Venus or to Mars?
Are there any limitations on attacking while grappling?
Sending manuscript to multiple publishers
Why am I allowed to create multiple unique pointers from a single object?
What happened in Rome, when the western empire "fell"?
Why didn't Khan get resurrected in the Genesis Explosion?
Is it possible to search for a directory/file combination?
What connection does MS Office have to Netscape Navigator?
Do I need to enable Dev Hub in my PROD Org?
Fastest way to pop N items from a large dict
The Next CEO of Stack OverflowWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = {i: i ** 3 for i in range(1000000)}
# Taking items 1 by 1 (~0.0059s)
dst = {}
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
add a comment |
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = {i: i ** 3 for i in range(1000000)}
# Taking items 1 by 1 (~0.0059s)
dst = {}
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
add a comment |
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = {i: i ** 3 for i in range(1000000)}
# Taking items 1 by 1 (~0.0059s)
dst = {}
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
I have a large dict src
(up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst
and leave only the remaining items in src
. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.
Fastest approach I've found so far:
src = {i: i ** 3 for i in range(1000000)}
# Taking items 1 by 1 (~0.0059s)
dst = {}
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]
Is there anything better? Even a marginal gain would be good.
python python-3.x performance dictionary optimization
python python-3.x performance dictionary optimization
edited Mar 16 at 17:16
martineau
69.8k1092186
69.8k1092186
asked Mar 16 at 17:00
Ivailo KaramanolevIvailo Karamanolev
570615
570615
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
This is a bit faster still:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict
instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR
opcode instead of needing a function call:
>>> dis.dis(method_4)
2 0 LOAD_GLOBAL 0 (dict)
2 LOAD_GLOBAL 1 (islice)
4 LOAD_FAST 0 (d)
6 LOAD_ATTR 2 (items)
8 CALL_FUNCTION 0
10 LOAD_CONST 1 (20000)
12 CALL_FUNCTION 2
14 CALL_FUNCTION 1
16 STORE_FAST 1 (result)
3 18 SETUP_LOOP 18 (to 38)
20 LOAD_FAST 1 (result)
22 GET_ITER
>> 24 FOR_ITER 10 (to 36)
26 STORE_FAST 2 (k)
28 LOAD_FAST 0 (d)
30 LOAD_FAST 2 (k)
32 DELETE_SUBSCR
34 JUMP_ABSOLUTE 24
>> 36 POP_BLOCK
4 >> 38 LOAD_FAST 1 (result)
40 RETURN_VALUE
Compared with the iterator in method_2
:
>>> dis.dis(d.popitem() for _ in range(20000))
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (_)
6 LOAD_GLOBAL 0 (d)
8 LOAD_ATTR 1 (popitem)
10 CALL_FUNCTION 0
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
which needs a Python to C function call for each item.
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
add a comment |
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = {i: i ** 3 for i in range(1000000)}
def method_1(d):
dst = {}
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre♦
Mar 16 at 17:18
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
Mar 16 at 17:37
3
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
Mar 16 at 17:49
|
show 5 more comments
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = {key:value for key,value in (src.popitem() for _ in range(20000))}
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is a bit faster still:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict
instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR
opcode instead of needing a function call:
>>> dis.dis(method_4)
2 0 LOAD_GLOBAL 0 (dict)
2 LOAD_GLOBAL 1 (islice)
4 LOAD_FAST 0 (d)
6 LOAD_ATTR 2 (items)
8 CALL_FUNCTION 0
10 LOAD_CONST 1 (20000)
12 CALL_FUNCTION 2
14 CALL_FUNCTION 1
16 STORE_FAST 1 (result)
3 18 SETUP_LOOP 18 (to 38)
20 LOAD_FAST 1 (result)
22 GET_ITER
>> 24 FOR_ITER 10 (to 36)
26 STORE_FAST 2 (k)
28 LOAD_FAST 0 (d)
30 LOAD_FAST 2 (k)
32 DELETE_SUBSCR
34 JUMP_ABSOLUTE 24
>> 36 POP_BLOCK
4 >> 38 LOAD_FAST 1 (result)
40 RETURN_VALUE
Compared with the iterator in method_2
:
>>> dis.dis(d.popitem() for _ in range(20000))
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (_)
6 LOAD_GLOBAL 0 (d)
8 LOAD_ATTR 1 (popitem)
10 CALL_FUNCTION 0
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
which needs a Python to C function call for each item.
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
add a comment |
This is a bit faster still:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict
instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR
opcode instead of needing a function call:
>>> dis.dis(method_4)
2 0 LOAD_GLOBAL 0 (dict)
2 LOAD_GLOBAL 1 (islice)
4 LOAD_FAST 0 (d)
6 LOAD_ATTR 2 (items)
8 CALL_FUNCTION 0
10 LOAD_CONST 1 (20000)
12 CALL_FUNCTION 2
14 CALL_FUNCTION 1
16 STORE_FAST 1 (result)
3 18 SETUP_LOOP 18 (to 38)
20 LOAD_FAST 1 (result)
22 GET_ITER
>> 24 FOR_ITER 10 (to 36)
26 STORE_FAST 2 (k)
28 LOAD_FAST 0 (d)
30 LOAD_FAST 2 (k)
32 DELETE_SUBSCR
34 JUMP_ABSOLUTE 24
>> 36 POP_BLOCK
4 >> 38 LOAD_FAST 1 (result)
40 RETURN_VALUE
Compared with the iterator in method_2
:
>>> dis.dis(d.popitem() for _ in range(20000))
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (_)
6 LOAD_GLOBAL 0 (d)
8 LOAD_ATTR 1 (popitem)
10 CALL_FUNCTION 0
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
which needs a Python to C function call for each item.
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
add a comment |
This is a bit faster still:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict
instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR
opcode instead of needing a function call:
>>> dis.dis(method_4)
2 0 LOAD_GLOBAL 0 (dict)
2 LOAD_GLOBAL 1 (islice)
4 LOAD_FAST 0 (d)
6 LOAD_ATTR 2 (items)
8 CALL_FUNCTION 0
10 LOAD_CONST 1 (20000)
12 CALL_FUNCTION 2
14 CALL_FUNCTION 1
16 STORE_FAST 1 (result)
3 18 SETUP_LOOP 18 (to 38)
20 LOAD_FAST 1 (result)
22 GET_ITER
>> 24 FOR_ITER 10 (to 36)
26 STORE_FAST 2 (k)
28 LOAD_FAST 0 (d)
30 LOAD_FAST 2 (k)
32 DELETE_SUBSCR
34 JUMP_ABSOLUTE 24
>> 36 POP_BLOCK
4 >> 38 LOAD_FAST 1 (result)
40 RETURN_VALUE
Compared with the iterator in method_2
:
>>> dis.dis(d.popitem() for _ in range(20000))
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (_)
6 LOAD_GLOBAL 0 (d)
8 LOAD_ATTR 1 (popitem)
10 CALL_FUNCTION 0
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
which needs a Python to C function call for each item.
This is a bit faster still:
from itertools import islice
def method_4(d):
result = dict(islice(d.items(), 20000))
for k in result: del d[k]
return result
Compared to other versions, using Netwave's testcase:
Method 1: 0.004459443036466837 # original
Method 2: 0.0034434819826856256 # Netwave
Method 3: 0.002602717955596745 # chepner
Method 4: 0.001974945073015988 # this answer
The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict
instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR
opcode instead of needing a function call:
>>> dis.dis(method_4)
2 0 LOAD_GLOBAL 0 (dict)
2 LOAD_GLOBAL 1 (islice)
4 LOAD_FAST 0 (d)
6 LOAD_ATTR 2 (items)
8 CALL_FUNCTION 0
10 LOAD_CONST 1 (20000)
12 CALL_FUNCTION 2
14 CALL_FUNCTION 1
16 STORE_FAST 1 (result)
3 18 SETUP_LOOP 18 (to 38)
20 LOAD_FAST 1 (result)
22 GET_ITER
>> 24 FOR_ITER 10 (to 36)
26 STORE_FAST 2 (k)
28 LOAD_FAST 0 (d)
30 LOAD_FAST 2 (k)
32 DELETE_SUBSCR
34 JUMP_ABSOLUTE 24
>> 36 POP_BLOCK
4 >> 38 LOAD_FAST 1 (result)
40 RETURN_VALUE
Compared with the iterator in method_2
:
>>> dis.dis(d.popitem() for _ in range(20000))
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (_)
6 LOAD_GLOBAL 0 (d)
8 LOAD_ATTR 1 (popitem)
10 CALL_FUNCTION 0
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
which needs a Python to C function call for each item.
edited Mar 17 at 11:55
answered Mar 16 at 20:51
jpajpa
5,4681226
5,4681226
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
add a comment |
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
I was researching this! what a sync!
– Netwave
Mar 17 at 11:58
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@Netwave Do you think this should now be the accepted answer?
– Ivailo Karamanolev
Mar 17 at 18:43
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
@IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.
– Netwave
Mar 18 at 3:13
add a comment |
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = {i: i ** 3 for i in range(1000000)}
def method_1(d):
dst = {}
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre♦
Mar 16 at 17:18
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
Mar 16 at 17:37
3
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
Mar 16 at 17:49
|
show 5 more comments
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = {i: i ** 3 for i in range(1000000)}
def method_1(d):
dst = {}
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre♦
Mar 16 at 17:18
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
Mar 16 at 17:37
3
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
Mar 16 at 17:49
|
show 5 more comments
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = {i: i ** 3 for i in range(1000000)}
def method_1(d):
dst = {}
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
A simple comprehension inside dict
will do:
dict(src.popitem() for _ in range(20000))
Here you have the timing tests
setup = """
src = {i: i ** 3 for i in range(1000000)}
def method_1(d):
dst = {}
while len(dst) < 20000:
item = d.popitem()
dst[item[0]] = item[1]
return dst
def method_2(d):
return dict(d.popitem() for _ in range(20000))
"""
import timeit
print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))
print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))
Results:
Method 1: 0.007701821999944514
Method 2: 0.004668198998842854
edited Mar 17 at 11:57
answered Mar 16 at 17:16
NetwaveNetwave
13.5k22246
13.5k22246
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre♦
Mar 16 at 17:18
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
Mar 16 at 17:37
3
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
Mar 16 at 17:49
|
show 5 more comments
3
so much for dict comprehension. Good one usingdict
!!!
– Jean-François Fabre♦
Mar 16 at 17:18
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
You can shave a little more time off by saving the bound method first.f = d.popitem; return dict(f() for _ in range(20000))
.
– chepner
Mar 16 at 17:37
3
Usingitertools.islice
anditertools.repeat
is even a little faster still:dict(f() for f in islice(repeat(d.popitem), 20000))
.
– chepner
Mar 16 at 17:49
3
3
so much for dict comprehension. Good one using
dict
!!!– Jean-François Fabre♦
Mar 16 at 17:18
so much for dict comprehension. Good one using
dict
!!!– Jean-François Fabre♦
Mar 16 at 17:18
2
2
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)
– Netwave
Mar 16 at 17:19
1
1
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.
– Jean-François Fabre♦
Mar 16 at 17:23
4
4
You can shave a little more time off by saving the bound method first.
f = d.popitem; return dict(f() for _ in range(20000))
.– chepner
Mar 16 at 17:37
You can shave a little more time off by saving the bound method first.
f = d.popitem; return dict(f() for _ in range(20000))
.– chepner
Mar 16 at 17:37
3
3
Using
itertools.islice
and itertools.repeat
is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000))
.– chepner
Mar 16 at 17:49
Using
itertools.islice
and itertools.repeat
is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000))
.– chepner
Mar 16 at 17:49
|
show 5 more comments
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = {key:value for key,value in (src.popitem() for _ in range(20000))}
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = {key:value for key,value in (src.popitem() for _ in range(20000))}
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
add a comment |
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = {key:value for key,value in (src.popitem() for _ in range(20000))}
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range
that yields & unpacks the keys & values
dst = {key:value for key,value in (src.popitem() for _ in range(20000))}
on my machine:
your code: 0.00899505615234375
my code: 0.007996797561645508
so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer
This approach can be useful if you want to transform the keys or values in the process.
edited Mar 16 at 17:22
answered Mar 16 at 17:15
Jean-François Fabre♦Jean-François Fabre
106k1057115
106k1057115
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown