Fastest way to pop N items from a large dict The Next CEO of Stack OverflowWhat is the fastest...

MessageLevel in QGIS3

Novel about a guy who is possessed by the divine essence and the world ends?

Is micro rebar a better way to reinforce concrete than rebar?

Elegant way to replace substring in a regex with optional groups in Python?

How do I go from 300 unfinished/half written blog posts, to published posts?

Preparing Indesign booklet with .psd graphics for print

Written every which way

Why do airplanes bank sharply to the right after air-to-air refueling?

Which tube will fit a -(700 x 25c) wheel?

Can you replace a racial trait cantrip when leveling up?

What can we do to stop prior company from asking us questions?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

Is there a difference between "Fahrstuhl" and "Aufzug"

Inappropriate reference requests from Journal reviewers

Unreliable Magic - Is it worth it?

Does it take more energy to get to Venus or to Mars?

Are there any limitations on attacking while grappling?

Sending manuscript to multiple publishers

Why am I allowed to create multiple unique pointers from a single object?

What happened in Rome, when the western empire "fell"?

Why didn't Khan get resurrected in the Genesis Explosion?

Is it possible to search for a directory/file combination?

What connection does MS Office have to Netscape Navigator?

Do I need to enable Dev Hub in my PROD Org?



Fastest way to pop N items from a large dict



The Next CEO of Stack OverflowWhat is the fastest way to get the value of π?Fastest way to convert string to integer in PHPFastest way to determine if an integer's square root is an integerHow to randomly select an item from a list?How to remove items from a list while iterating?Fastest way to list all primes below NHow to remove the first Item from a list?Fastest way to check if a value exist in a listIs there any pythonic way to combine two dicts (adding values for keys that appear in both)?Fastest way to determine if an integer is between two integers (inclusive) with known sets of values












16















I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



Fastest approach I've found so far:



src = {i: i ** 3 for i in range(1000000)}

# Taking items 1 by 1 (~0.0059s)
dst = {}
while len(dst) < 20000:
item = src.popitem()
dst[item[0]] = item[1]


Is there anything better? Even a marginal gain would be good.










share|improve this question





























    16















    I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



    Fastest approach I've found so far:



    src = {i: i ** 3 for i in range(1000000)}

    # Taking items 1 by 1 (~0.0059s)
    dst = {}
    while len(dst) < 20000:
    item = src.popitem()
    dst[item[0]] = item[1]


    Is there anything better? Even a marginal gain would be good.










    share|improve this question



























      16












      16








      16


      1






      I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



      Fastest approach I've found so far:



      src = {i: i ** 3 for i in range(1000000)}

      # Taking items 1 by 1 (~0.0059s)
      dst = {}
      while len(dst) < 20000:
      item = src.popitem()
      dst[item[0]] = item[1]


      Is there anything better? Even a marginal gain would be good.










      share|improve this question
















      I have a large dict src (up to 1M items) and I would like to take N (typical values would be N=10K-20K) items, store them in a new dict dst and leave only the remaining items in src. It doesn't matter which N items are taken. I'm looking for the fastest way to do it on Python 3.6 or 3.7.



      Fastest approach I've found so far:



      src = {i: i ** 3 for i in range(1000000)}

      # Taking items 1 by 1 (~0.0059s)
      dst = {}
      while len(dst) < 20000:
      item = src.popitem()
      dst[item[0]] = item[1]


      Is there anything better? Even a marginal gain would be good.







      python python-3.x performance dictionary optimization






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 16 at 17:16









      martineau

      69.8k1092186




      69.8k1092186










      asked Mar 16 at 17:00









      Ivailo KaramanolevIvailo Karamanolev

      570615




      570615
























          3 Answers
          3






          active

          oldest

          votes


















          7














          This is a bit faster still:



          from itertools import islice
          def method_4(d):
          result = dict(islice(d.items(), 20000))
          for k in result: del d[k]
          return result


          Compared to other versions, using Netwave's testcase:



          Method 1:  0.004459443036466837  # original
          Method 2: 0.0034434819826856256 # Netwave
          Method 3: 0.002602717955596745 # chepner
          Method 4: 0.001974945073015988 # this answer


          The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR opcode instead of needing a function call:



          >>> dis.dis(method_4)
          2 0 LOAD_GLOBAL 0 (dict)
          2 LOAD_GLOBAL 1 (islice)
          4 LOAD_FAST 0 (d)
          6 LOAD_ATTR 2 (items)
          8 CALL_FUNCTION 0
          10 LOAD_CONST 1 (20000)
          12 CALL_FUNCTION 2
          14 CALL_FUNCTION 1
          16 STORE_FAST 1 (result)

          3 18 SETUP_LOOP 18 (to 38)
          20 LOAD_FAST 1 (result)
          22 GET_ITER
          >> 24 FOR_ITER 10 (to 36)
          26 STORE_FAST 2 (k)
          28 LOAD_FAST 0 (d)
          30 LOAD_FAST 2 (k)
          32 DELETE_SUBSCR
          34 JUMP_ABSOLUTE 24
          >> 36 POP_BLOCK

          4 >> 38 LOAD_FAST 1 (result)
          40 RETURN_VALUE


          Compared with the iterator in method_2:



          >>> dis.dis(d.popitem() for _ in range(20000))
          1 0 LOAD_FAST 0 (.0)
          >> 2 FOR_ITER 14 (to 18)
          4 STORE_FAST 1 (_)
          6 LOAD_GLOBAL 0 (d)
          8 LOAD_ATTR 1 (popitem)
          10 CALL_FUNCTION 0
          12 YIELD_VALUE
          14 POP_TOP
          16 JUMP_ABSOLUTE 2
          >> 18 LOAD_CONST 0 (None)
          20 RETURN_VALUE


          which needs a Python to C function call for each item.






          share|improve this answer


























          • I was researching this! what a sync!

            – Netwave
            Mar 17 at 11:58











          • @Netwave Do you think this should now be the accepted answer?

            – Ivailo Karamanolev
            Mar 17 at 18:43











          • @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

            – Netwave
            Mar 18 at 3:13



















          16














          A simple comprehension inside dict will do:



          dict(src.popitem() for _ in range(20000))


          Here you have the timing tests



          setup = """
          src = {i: i ** 3 for i in range(1000000)}

          def method_1(d):
          dst = {}
          while len(dst) < 20000:
          item = d.popitem()
          dst[item[0]] = item[1]
          return dst

          def method_2(d):
          return dict(d.popitem() for _ in range(20000))
          """
          import timeit
          print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

          print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


          Results:



          Method 1:  0.007701821999944514
          Method 2: 0.004668198998842854





          share|improve this answer





















          • 3





            so much for dict comprehension. Good one using dict!!!

            – Jean-François Fabre
            Mar 16 at 17:18






          • 2





            Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

            – Netwave
            Mar 16 at 17:19






          • 1





            for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

            – Jean-François Fabre
            Mar 16 at 17:23








          • 4





            You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

            – chepner
            Mar 16 at 17:37






          • 3





            Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

            – chepner
            Mar 16 at 17:49



















          2














          I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



          dst = {key:value for key,value in (src.popitem() for _ in range(20000))}


          on my machine:



          your code: 0.00899505615234375
          my code: 0.007996797561645508


          so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



          This approach can be useful if you want to transform the keys or values in the process.






          share|improve this answer


























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            7














            This is a bit faster still:



            from itertools import islice
            def method_4(d):
            result = dict(islice(d.items(), 20000))
            for k in result: del d[k]
            return result


            Compared to other versions, using Netwave's testcase:



            Method 1:  0.004459443036466837  # original
            Method 2: 0.0034434819826856256 # Netwave
            Method 3: 0.002602717955596745 # chepner
            Method 4: 0.001974945073015988 # this answer


            The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR opcode instead of needing a function call:



            >>> dis.dis(method_4)
            2 0 LOAD_GLOBAL 0 (dict)
            2 LOAD_GLOBAL 1 (islice)
            4 LOAD_FAST 0 (d)
            6 LOAD_ATTR 2 (items)
            8 CALL_FUNCTION 0
            10 LOAD_CONST 1 (20000)
            12 CALL_FUNCTION 2
            14 CALL_FUNCTION 1
            16 STORE_FAST 1 (result)

            3 18 SETUP_LOOP 18 (to 38)
            20 LOAD_FAST 1 (result)
            22 GET_ITER
            >> 24 FOR_ITER 10 (to 36)
            26 STORE_FAST 2 (k)
            28 LOAD_FAST 0 (d)
            30 LOAD_FAST 2 (k)
            32 DELETE_SUBSCR
            34 JUMP_ABSOLUTE 24
            >> 36 POP_BLOCK

            4 >> 38 LOAD_FAST 1 (result)
            40 RETURN_VALUE


            Compared with the iterator in method_2:



            >>> dis.dis(d.popitem() for _ in range(20000))
            1 0 LOAD_FAST 0 (.0)
            >> 2 FOR_ITER 14 (to 18)
            4 STORE_FAST 1 (_)
            6 LOAD_GLOBAL 0 (d)
            8 LOAD_ATTR 1 (popitem)
            10 CALL_FUNCTION 0
            12 YIELD_VALUE
            14 POP_TOP
            16 JUMP_ABSOLUTE 2
            >> 18 LOAD_CONST 0 (None)
            20 RETURN_VALUE


            which needs a Python to C function call for each item.






            share|improve this answer


























            • I was researching this! what a sync!

              – Netwave
              Mar 17 at 11:58











            • @Netwave Do you think this should now be the accepted answer?

              – Ivailo Karamanolev
              Mar 17 at 18:43











            • @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

              – Netwave
              Mar 18 at 3:13
















            7














            This is a bit faster still:



            from itertools import islice
            def method_4(d):
            result = dict(islice(d.items(), 20000))
            for k in result: del d[k]
            return result


            Compared to other versions, using Netwave's testcase:



            Method 1:  0.004459443036466837  # original
            Method 2: 0.0034434819826856256 # Netwave
            Method 3: 0.002602717955596745 # chepner
            Method 4: 0.001974945073015988 # this answer


            The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR opcode instead of needing a function call:



            >>> dis.dis(method_4)
            2 0 LOAD_GLOBAL 0 (dict)
            2 LOAD_GLOBAL 1 (islice)
            4 LOAD_FAST 0 (d)
            6 LOAD_ATTR 2 (items)
            8 CALL_FUNCTION 0
            10 LOAD_CONST 1 (20000)
            12 CALL_FUNCTION 2
            14 CALL_FUNCTION 1
            16 STORE_FAST 1 (result)

            3 18 SETUP_LOOP 18 (to 38)
            20 LOAD_FAST 1 (result)
            22 GET_ITER
            >> 24 FOR_ITER 10 (to 36)
            26 STORE_FAST 2 (k)
            28 LOAD_FAST 0 (d)
            30 LOAD_FAST 2 (k)
            32 DELETE_SUBSCR
            34 JUMP_ABSOLUTE 24
            >> 36 POP_BLOCK

            4 >> 38 LOAD_FAST 1 (result)
            40 RETURN_VALUE


            Compared with the iterator in method_2:



            >>> dis.dis(d.popitem() for _ in range(20000))
            1 0 LOAD_FAST 0 (.0)
            >> 2 FOR_ITER 14 (to 18)
            4 STORE_FAST 1 (_)
            6 LOAD_GLOBAL 0 (d)
            8 LOAD_ATTR 1 (popitem)
            10 CALL_FUNCTION 0
            12 YIELD_VALUE
            14 POP_TOP
            16 JUMP_ABSOLUTE 2
            >> 18 LOAD_CONST 0 (None)
            20 RETURN_VALUE


            which needs a Python to C function call for each item.






            share|improve this answer


























            • I was researching this! what a sync!

              – Netwave
              Mar 17 at 11:58











            • @Netwave Do you think this should now be the accepted answer?

              – Ivailo Karamanolev
              Mar 17 at 18:43











            • @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

              – Netwave
              Mar 18 at 3:13














            7












            7








            7







            This is a bit faster still:



            from itertools import islice
            def method_4(d):
            result = dict(islice(d.items(), 20000))
            for k in result: del d[k]
            return result


            Compared to other versions, using Netwave's testcase:



            Method 1:  0.004459443036466837  # original
            Method 2: 0.0034434819826856256 # Netwave
            Method 3: 0.002602717955596745 # chepner
            Method 4: 0.001974945073015988 # this answer


            The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR opcode instead of needing a function call:



            >>> dis.dis(method_4)
            2 0 LOAD_GLOBAL 0 (dict)
            2 LOAD_GLOBAL 1 (islice)
            4 LOAD_FAST 0 (d)
            6 LOAD_ATTR 2 (items)
            8 CALL_FUNCTION 0
            10 LOAD_CONST 1 (20000)
            12 CALL_FUNCTION 2
            14 CALL_FUNCTION 1
            16 STORE_FAST 1 (result)

            3 18 SETUP_LOOP 18 (to 38)
            20 LOAD_FAST 1 (result)
            22 GET_ITER
            >> 24 FOR_ITER 10 (to 36)
            26 STORE_FAST 2 (k)
            28 LOAD_FAST 0 (d)
            30 LOAD_FAST 2 (k)
            32 DELETE_SUBSCR
            34 JUMP_ABSOLUTE 24
            >> 36 POP_BLOCK

            4 >> 38 LOAD_FAST 1 (result)
            40 RETURN_VALUE


            Compared with the iterator in method_2:



            >>> dis.dis(d.popitem() for _ in range(20000))
            1 0 LOAD_FAST 0 (.0)
            >> 2 FOR_ITER 14 (to 18)
            4 STORE_FAST 1 (_)
            6 LOAD_GLOBAL 0 (d)
            8 LOAD_ATTR 1 (popitem)
            10 CALL_FUNCTION 0
            12 YIELD_VALUE
            14 POP_TOP
            16 JUMP_ABSOLUTE 2
            >> 18 LOAD_CONST 0 (None)
            20 RETURN_VALUE


            which needs a Python to C function call for each item.






            share|improve this answer















            This is a bit faster still:



            from itertools import islice
            def method_4(d):
            result = dict(islice(d.items(), 20000))
            for k in result: del d[k]
            return result


            Compared to other versions, using Netwave's testcase:



            Method 1:  0.004459443036466837  # original
            Method 2: 0.0034434819826856256 # Netwave
            Method 3: 0.002602717955596745 # chepner
            Method 4: 0.001974945073015988 # this answer


            The extra speedup seems to come from avoiding transitions between C and Python functions. From disassembly we can note that the dict instantiation happens on C side, with only 3 function calls from Python. The loop uses DELETE_SUBSCR opcode instead of needing a function call:



            >>> dis.dis(method_4)
            2 0 LOAD_GLOBAL 0 (dict)
            2 LOAD_GLOBAL 1 (islice)
            4 LOAD_FAST 0 (d)
            6 LOAD_ATTR 2 (items)
            8 CALL_FUNCTION 0
            10 LOAD_CONST 1 (20000)
            12 CALL_FUNCTION 2
            14 CALL_FUNCTION 1
            16 STORE_FAST 1 (result)

            3 18 SETUP_LOOP 18 (to 38)
            20 LOAD_FAST 1 (result)
            22 GET_ITER
            >> 24 FOR_ITER 10 (to 36)
            26 STORE_FAST 2 (k)
            28 LOAD_FAST 0 (d)
            30 LOAD_FAST 2 (k)
            32 DELETE_SUBSCR
            34 JUMP_ABSOLUTE 24
            >> 36 POP_BLOCK

            4 >> 38 LOAD_FAST 1 (result)
            40 RETURN_VALUE


            Compared with the iterator in method_2:



            >>> dis.dis(d.popitem() for _ in range(20000))
            1 0 LOAD_FAST 0 (.0)
            >> 2 FOR_ITER 14 (to 18)
            4 STORE_FAST 1 (_)
            6 LOAD_GLOBAL 0 (d)
            8 LOAD_ATTR 1 (popitem)
            10 CALL_FUNCTION 0
            12 YIELD_VALUE
            14 POP_TOP
            16 JUMP_ABSOLUTE 2
            >> 18 LOAD_CONST 0 (None)
            20 RETURN_VALUE


            which needs a Python to C function call for each item.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 17 at 11:55

























            answered Mar 16 at 20:51









            jpajpa

            5,4681226




            5,4681226













            • I was researching this! what a sync!

              – Netwave
              Mar 17 at 11:58











            • @Netwave Do you think this should now be the accepted answer?

              – Ivailo Karamanolev
              Mar 17 at 18:43











            • @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

              – Netwave
              Mar 18 at 3:13



















            • I was researching this! what a sync!

              – Netwave
              Mar 17 at 11:58











            • @Netwave Do you think this should now be the accepted answer?

              – Ivailo Karamanolev
              Mar 17 at 18:43











            • @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

              – Netwave
              Mar 18 at 3:13

















            I was researching this! what a sync!

            – Netwave
            Mar 17 at 11:58





            I was researching this! what a sync!

            – Netwave
            Mar 17 at 11:58













            @Netwave Do you think this should now be the accepted answer?

            – Ivailo Karamanolev
            Mar 17 at 18:43





            @Netwave Do you think this should now be the accepted answer?

            – Ivailo Karamanolev
            Mar 17 at 18:43













            @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

            – Netwave
            Mar 18 at 3:13





            @IvailoKaramanolev, yes, we were searching for the fastest, and indeed this on is.

            – Netwave
            Mar 18 at 3:13













            16














            A simple comprehension inside dict will do:



            dict(src.popitem() for _ in range(20000))


            Here you have the timing tests



            setup = """
            src = {i: i ** 3 for i in range(1000000)}

            def method_1(d):
            dst = {}
            while len(dst) < 20000:
            item = d.popitem()
            dst[item[0]] = item[1]
            return dst

            def method_2(d):
            return dict(d.popitem() for _ in range(20000))
            """
            import timeit
            print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

            print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


            Results:



            Method 1:  0.007701821999944514
            Method 2: 0.004668198998842854





            share|improve this answer





















            • 3





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              Mar 16 at 17:18






            • 2





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              Mar 16 at 17:19






            • 1





              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              Mar 16 at 17:23








            • 4





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              Mar 16 at 17:37






            • 3





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              Mar 16 at 17:49
















            16














            A simple comprehension inside dict will do:



            dict(src.popitem() for _ in range(20000))


            Here you have the timing tests



            setup = """
            src = {i: i ** 3 for i in range(1000000)}

            def method_1(d):
            dst = {}
            while len(dst) < 20000:
            item = d.popitem()
            dst[item[0]] = item[1]
            return dst

            def method_2(d):
            return dict(d.popitem() for _ in range(20000))
            """
            import timeit
            print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

            print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


            Results:



            Method 1:  0.007701821999944514
            Method 2: 0.004668198998842854





            share|improve this answer





















            • 3





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              Mar 16 at 17:18






            • 2





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              Mar 16 at 17:19






            • 1





              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              Mar 16 at 17:23








            • 4





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              Mar 16 at 17:37






            • 3





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              Mar 16 at 17:49














            16












            16








            16







            A simple comprehension inside dict will do:



            dict(src.popitem() for _ in range(20000))


            Here you have the timing tests



            setup = """
            src = {i: i ** 3 for i in range(1000000)}

            def method_1(d):
            dst = {}
            while len(dst) < 20000:
            item = d.popitem()
            dst[item[0]] = item[1]
            return dst

            def method_2(d):
            return dict(d.popitem() for _ in range(20000))
            """
            import timeit
            print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

            print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


            Results:



            Method 1:  0.007701821999944514
            Method 2: 0.004668198998842854





            share|improve this answer















            A simple comprehension inside dict will do:



            dict(src.popitem() for _ in range(20000))


            Here you have the timing tests



            setup = """
            src = {i: i ** 3 for i in range(1000000)}

            def method_1(d):
            dst = {}
            while len(dst) < 20000:
            item = d.popitem()
            dst[item[0]] = item[1]
            return dst

            def method_2(d):
            return dict(d.popitem() for _ in range(20000))
            """
            import timeit
            print("Method 1: ", timeit.timeit('method_1(src)', setup=setup, number=1))

            print("Method 2: ", timeit.timeit('method_2(src)', setup=setup, number=1))


            Results:



            Method 1:  0.007701821999944514
            Method 2: 0.004668198998842854






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 17 at 11:57

























            answered Mar 16 at 17:16









            NetwaveNetwave

            13.5k22246




            13.5k22246








            • 3





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              Mar 16 at 17:18






            • 2





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              Mar 16 at 17:19






            • 1





              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              Mar 16 at 17:23








            • 4





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              Mar 16 at 17:37






            • 3





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              Mar 16 at 17:49














            • 3





              so much for dict comprehension. Good one using dict!!!

              – Jean-François Fabre
              Mar 16 at 17:18






            • 2





              Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

              – Netwave
              Mar 16 at 17:19






            • 1





              for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

              – Jean-François Fabre
              Mar 16 at 17:23








            • 4





              You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

              – chepner
              Mar 16 at 17:37






            • 3





              Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

              – chepner
              Mar 16 at 17:49








            3




            3





            so much for dict comprehension. Good one using dict!!!

            – Jean-François Fabre
            Mar 16 at 17:18





            so much for dict comprehension. Good one using dict!!!

            – Jean-François Fabre
            Mar 16 at 17:18




            2




            2





            Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

            – Netwave
            Mar 16 at 17:19





            Thank you! Hopefully next moderator @Jean-FrançoisFabre ;)

            – Netwave
            Mar 16 at 17:19




            1




            1





            for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

            – Jean-François Fabre
            Mar 16 at 17:23







            for my solution, at least, I've upped all 10 times and it's still faster. The key is to make the shorter code possible and rely on native functions.

            – Jean-François Fabre
            Mar 16 at 17:23






            4




            4





            You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

            – chepner
            Mar 16 at 17:37





            You can shave a little more time off by saving the bound method first. f = d.popitem; return dict(f() for _ in range(20000)).

            – chepner
            Mar 16 at 17:37




            3




            3





            Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

            – chepner
            Mar 16 at 17:49





            Using itertools.islice and itertools.repeat is even a little faster still: dict(f() for f in islice(repeat(d.popitem), 20000)).

            – chepner
            Mar 16 at 17:49











            2














            I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



            dst = {key:value for key,value in (src.popitem() for _ in range(20000))}


            on my machine:



            your code: 0.00899505615234375
            my code: 0.007996797561645508


            so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



            This approach can be useful if you want to transform the keys or values in the process.






            share|improve this answer






























              2














              I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



              dst = {key:value for key,value in (src.popitem() for _ in range(20000))}


              on my machine:



              your code: 0.00899505615234375
              my code: 0.007996797561645508


              so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



              This approach can be useful if you want to transform the keys or values in the process.






              share|improve this answer




























                2












                2








                2







                I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



                dst = {key:value for key,value in (src.popitem() for _ in range(20000))}


                on my machine:



                your code: 0.00899505615234375
                my code: 0.007996797561645508


                so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



                This approach can be useful if you want to transform the keys or values in the process.






                share|improve this answer















                I found this approach slightly faster (-10% speed) using dictionary comprehension that consumes a loop using range that yields & unpacks the keys & values



                dst = {key:value for key,value in (src.popitem() for _ in range(20000))}


                on my machine:



                your code: 0.00899505615234375
                my code: 0.007996797561645508


                so about 12% faster, not bad but not as good as not unpacking like Netwave simpler answer



                This approach can be useful if you want to transform the keys or values in the process.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 16 at 17:22

























                answered Mar 16 at 17:15









                Jean-François FabreJean-François Fabre

                106k1057115




                106k1057115






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55199303%2ffastest-way-to-pop-n-items-from-a-large-dict%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Nidaros erkebispedøme

                    Birsay

                    Was Woodrow Wilson really a Liberal?Was World War I a war of liberals against authoritarians?Founding Fathers...