- 0 kr
Priority queues in Java and Python
How do you talk about a "priority queue", a queue data structure where elemens get to "cut in line" if they're important? In this article, we'll compare the (quite different) answers from Java and Python standard libraries.
Cutting in line is not nice, and people who do it are often jerks, or VIP. Or both. But sometimes the semantics behind this can be useful, and there's a data structure that allows this called the priority queue.
Values you insert into a priority queue don't necessarily go at the end; instead they get inserted in an order-preserving way. So if I insert a 5 into this priority queue:
1 3 4 7 8
It will look like this after insertion:
1 3 4 5 7 8
"So, it's like a list that keeps itself sorted?" the skeptical reader may ask at this point. Yes, and no. What you see above looks from the outside like a list that keeps itself sorted. But internally, we can store things as a heap or as a binary tree, and gain a speed advantage when we insert or access/remove elements. A good way to think of these internal data structures is that they are like different sorting algorithms, but "frozen in time" as data structures. (That's not just poetic; Wikipedia makes the equivalence explicit here.)
Priority queues are good for a number of things. Maybe you're implementing a job queue of future tasks to process, but some jobs should be given priority over others. (These more important jobs are the jerks, or VIPs, cutting in line so that us regular jobs have to wait longer.) The whole thing affords a fair amount of flexibility; you can have one level of urgency, or several.
Or maybe you're building a discrete event simulation, simulating an elevator or a pool game or a SimCity-like world by processing events in chronological order. A priority queue helps by constantly serving up the next event to be processed. The value that the events are being ordered by here is an increasing time coordinate; the beauty of the data structure is that we don't have to constantly re-sort things — the priority queue upholds the ordering for us.
Priority queues are a generalization of regular queues. If we have a priority queue, we can make it behave like a regular queue by making the priority value be an ever-incrementing sequence number, placing all the inserted values at the end. Regular queue; no jerks.
"Priority queue" is the name of the abstract data structure, similar to "list" or "dictionary". The abstract concept doesn't say anything about how things are implemented under the hood. Implementations of a priority queue (concrete data structures) have names like "binary heap" or "balanced binary tree". This distinction between abstract and concrete will be important in what follows.
Java and Python, both object-oriented languages, each have implementations of priority queues. They end up in the same place and you can do the same things with them, but the way they expose these data structures is quite distinct.
The Java Collections Framework is thorough and impressive, and priority queues are no exception. The PriorityQueueclass gives you a priority queue.
Using our running example (and Java 11's
var queue = new PriorityQueue<>(List.of(4, 8, 7, 3, 1)); queue.add(5);
We'd use the
add method, as above, to insert new elements into the queue. The
5 ends up between the
4 and the
7in the sense that if we start taking elements out, it will be the fourth element coming out.
queue.poll(); // 1 queue.poll(); // 3 queue.poll(); // 4 queue.poll(); // 5
These integers get stored in ascending order, because that's the natural ordering for
Integer. We also have the option, when creating the
PriorityQueue, to pass in a custom
Comparatorspecifying a preferred ordering.
PriorityQueue implements the interface
Queue. If we wanted a regular queue, we would probably go with
ArrayDeque. In the code above, if we changed
ArrayDeque, the code would still work, but the
5would be inserted at the end instead.
poll elements in (amortized) constant time. A
PriorityQueueneeds to uphold the ordering, and so takes logarithmic time for both
Finally, if you are sharing the queue across threads, you might want to swap in a
PriorityBlockingQueue. (Again, the code above will Just Work if you do that.) With it, you get thread safety thrown into the deal — many simultaneous
poll calls from different threads won't screw up the order or throw a
ConcurrentModificationException. (Instead, the calls will "line up" to use the queue, as it were. A wee bit meta.)
This plug-and-play aspect of the Collections Framework is one of its many underappreciated advantages. There are a few general interfaces (like Queue or Deque), and many implementations of them.
Python has a different take on priority queues. This Python code (run in the Python REPL) is the moral equivalent of the Java PriorityQueue code:
>>> queue = [4, 8, 7, 3, 1] >>> import heapq >>> heapq.heapify(queue) >>> heapq.heappush(queue, 5)
Wait, what? A priority queue is a list in Python!? Well, yes. More exactly, it's a regular list where we promise to uphold the "heap invariant", which expects elements to be sorted to a certain extent. For example, if we print our queueat this point:
>>> queue [1, 3, 5, 4, 8, 7]
Just as before, when we start popping out elements at the front, the come out in ascending order:
>>> heapq.heappop(queue) 1 >>> heapq.heappop(queue) 3 >>> heapq.heappop(queue) 4 >>> heapq.heappop(queue) 5 >>> queue [7, 8]
A colleague of mine who reviewed this article looked at the behavior of heapq above and mumbled "there must be some hidden data structure somewhere, keeping track of the order..." But no, it's all done through the list itself. Coming back to the [1, 3, 5, 4, 8, 7] list and why that upholds the invariant, that's best shown through a picture:
1 | +----+----+ | | 3 5--------------+ | | +---------+----+ 7 | | 4 8
The list encodes an implicit tree. An element at index
k have children (if they exist) at index
k*2+2. The heap invariant requires that children be greater than or equal to their parents, which is true for the above list. Running
heapq.heapify (and after subsequent operations), there's just enough sorting for the heap invariant to be satisfied.
(My colleague took all this in, and said "I prefer the Java approach". Fair enough.)
heapq has a "bring your own data structure" feel compared to Java's
PriorityQueue. The pipes are showing; the underlying list is just a regular list, and the heap invariant is yours to screw up. (But if you do, then that's egg on your face; your code won't work as it's supposed to.)
This shows a difference in philosophy and culture between Java and Python. Java exposes a class
PriorityQueue which encapsulates and hides all the implementation details; Python exposes a module of static methods, and gives you the algorithm "priority heap" which you can use to model a priority queue.
I was curious, so I went back in the
python-dev email archives to see if this approach was ever discussed. I found this quote by Guido van Rossum back in 2002:
[...] a class seems too heavy for this, just like it is overkill for bisect [...]
(Another developer had written an alternative implementation which hid everything in a class. So they had the chance to encapsulate in a class, but chose to expose everything.)
A tale of two pities
In Java, everything is neatly tucked away. In Python, the pipes are showing — you're given the tools, but it's up to you to use them right. There's no right or wrong here, it's just two approaches to library API design.
But it does nicely mirror how both languages (and their surrounding cultures) think about privacy. In Java, you declare your fields private as a matter of course — this is what helps protect your object's invariants, and allows future refactoring. Python has no
<em>private</em> keywordand no corresponding idiomatic notion of protecting your object's data — the general expectation is that you store your object data as attributes, and then no consumer of the object abuses the right to read or write them. Java's system is based on enforcement; Python's is based on social convention.
Could this cultural difference be tied to where the languages are (often) used, and the expectations of their communities? Java, found in the enterprise where "signing off on code" is a thing, operates in environments where a colleague not messing with your class's invariants is an overriding concern; Python, used in education, in startups, and in open-source projects, prioritizes openness and "showing your work", and accepts the risk of abuse with the attitude of "don't do that". Almost two decades of
heapq usage in the wild show that the informal rules are easy to stick to.
I'm actually not sure which approach I like the best. Java's enapsulation or Python's implicit trust? I can sort of argue either side on that one. It might just be that both have their place in the world.
By: Carl Mäsak