Send VBO to GPU in separate thread

Send VBO to GPU in separate thread - java

I'm working on an LWJGL game in which the world changes quite often, and because it's rather large, it causes the game to freeze for a fraction of a second every time the world and consequently the corresponding VBO is updated. I've reduced the time the game freezes for by moving all logic to a separate thread (well, it's actually the rendering code that's in a separate thread), but pushing the data to the graphics card still seems to create a noticeable delay. Is it possible to send that VBO in my logic thread so as not to slow down the game?
Additionally, if this belongs on gamedev.so, let me know so I can move it. I wasn't quite sure, so I decided to post here.

You can't use multitasking for rendering using OpenGL. Apparently I was wrong, though the rest of the answer is still great, if you want to fix the problem using only a single OpenGL Context.
Though I came across the same problem when I made an Infinite Procedural Terrain Generator, the problem was the same as your problem, each time the world updated or generated a new Terrain Chunk it would freeze for just a fraction of a second.
How to fix this
Basically how I fixed this was by doing the following.
Create a Thread Pool/Thread Queue, the each time the world changes, you let a separate Thread process/update or recreate the FloatBuffer (or which other buffer you use). Since usually that is the reason for the freezing, simply because it takes a lot of time to create the buffers, input and change all the data etc.
Here is a layout of what I mean.
class VBOAntiFreeze {
FloatBuffer vertex_data;
// Just add the rest of the FloatBuffers you use as well, like FloatBuffer
// normal_data; FloatBuffer color_data; etc.
// You would also have all the other variables as the vbo_handle, vertices count, etc.
boolean fresh = true;
boolean dirty, updating;
public void updateVBO() {
if (fresh && !updating) {
updating = true;
// You could execute new Threads by creating a
// Thread Pool/Thread Queue, that way, you will
// have some more control over all the threads.
new Thread(new Runnable() {
public void run() {
// Update and process all the FloatBuffers here!
dirty = true;
fresh = false;
updating = false;
}
}).start();
}
}
public void renderVBO() {
if (updating) {
return;
}
else if (fresh) {
updateVBO();
return;
}
if (dirty) {
// Buffer all the newly updated data
}
// Render the VBO here
}
}
By using this idea you probably won't be experiencing the random freezing, unless your VBO are insanely huge, thereby the if (dirty) buffer new data will/can probably still freeze a little, though I've never experienced that before. But just saying, to inform you!

I decided to write my comment up as an answer because it differs slightly from the other two answers in the overhead on the drawing thread. Instead of stalling the pipeline when you upload your vertex data or unmap your buffer, I propose double-buffering and exchanging the VBO used for drawing with the one used for submitting data after your worker thread finishes its update.
You will need two render contexts for this approach, each of which shares resources.
In your worker thread, you can allocate/stream subdata into a VBO like you normally would in your current approach, the only difference is you will be doing this to a VBO that is not used for drawing. When you complete filling this VBO with data, let your drawing thread know and then exchange the VBO used for drawing with the VBO used for streaming vertex data when it comes time to draw. Your worker thread should block until the drawing thread swaps VBOs in this scenario.
This way, instead of stalling your drawing thread when new data has to be submitted to the GPU, you will instead stall the thread used to stream new data until the drawing thread swaps VBOs. As a result rendering will be smooth(er), but updates may occur at more variable frequency. This is generally a desirable characteristic in interactive software like games - an extra frame of latency before something new pops up is often better than a frame that takes twice as long to finish.
If you want to queue up more than one update at a time, so that your update thread does not have to block as often, I would suggest implementing a circular buffer like derhass mentioned in his answer. But it sounds like you only need a front/back buffer from your problem description.

In general, a GL context can be current to a single thread at any time, and a thread can have one current GL context at any time. If you want to do parallel updates of GL objects, you have two options:
Use shared contexts. That way, each trhead can have its own GL context, but the objects like buffers textures, ... can be used (and modified) by both threads.
For your particular scenario. it might be enough to use (maybe a ring buffer of) mapped VBOs. You need the GL context only to map/unmap the buffer, but while it is mapped, you can access it in arbitrary threads - which don't need a GL context at all. Usually, a ring buffer of VBOs is used in such a scenario to avoid excessive synchronisation between the threads.

Related

Trigger CPU cache write back manually in java: possible? necessary?

I am writing a video game in my spare time and have a question about data consistency when introducing mult-threading.
At the moment my game is single threaded and has a simple game loop as it is taught in many tutorials:
while game window is not closed
{
poll user input
react to user input
update game state
render game objects
flip buffers
}
I now want to add a new feature to my game where the player can automate certain tasks that are long and tedious, like walking long distances (fast travel). I may chose to simply "teleport" the player character to their destination but I would prefer not to. Instead, the game will be sped up and the player character will actually walk as if the player was doing it manually. The benefit of this is that the game world will interact with the player character as usual and any special events that might happen will still happen and immediately stop the fast travel.
To implement this feature I was thinking about something like this:
Start a new thread (worker thread) and have that thread update the game state continuously until the player character reaches its destination
Have the main thread no longer update the game state and render the games objects as usual and instead display the travel progress in a more simplistic manner
Use a synchronized message queue to have the main thread and the worker thread communicate
When the fast travel is finished or canceled (by player interaction or other reasons) have the worker thread die and resume the standard game loop with the main thread
In pseudo code it may look like this:
[main thread]
while game window is not closed
{
poll user input
if user wants to cancel fast travel
{
write to message queue player input "cancel"
}
poll message queue about fast travel status
if fast travel finished or canceled
{
resume regular game loop
} else {
render travel status
flip buffers
}
}
[worker thread]
while (travel ongoing)
{
poll message queue
if user wants to cancel fast travel
{
write to message queue fast travel status "canceled"
return
}
update game state
if fast travel is interrupted by internal game event
{
write to message queue fast travel status "canceled"
return
}
write to message queue fast travel status "ongoing"
}
if travel was finished
{
write to message queue fast travel status "finished"
}
The message queue will be some kind of two-channeled synchronized data structure. Maybe two ArrayDeque's with a Lock for each. I am fairly certain this will not be too much trouble.
What I am more concerned is caching problems with the game data:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
4) Is there a better approach? Is my concept perhaps ill conceived?
Thanks for any help and sorry for the big chunk of text. I thought it would be easier to answer the question if you know the intended use.

Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. That's not how any of this works. Let me give you very short answers to explain why you are thinking about this the wrong way:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
Sure. Or it might for some other reason. Memory visibility is not guaranteed, so you can't rely on it unless you use something guaranteed to provide memory visilbity.
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
No. Any method of assuring memory visibility will work. You don't have to do it any particular way.
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
Probably. This would probably be the worst possible way to do it.
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. Since there is no "write cache back to memory" operation that assures memory visibility. Your platform may not even have such caches and the issue might be something else entirely. You're writing Java code, you don't have to think about how your particular CPU works, what cores or caches it has, or anything like that. That's one of the big advantages of using a language with semantics that are guaranteed and don't talk about cores, caches, or anything like this.
4) Is there a better approach? Is my concept perhaps ill conceived?
Absolutely. You are writing Java code. Use the various Java synchronization classes and functions and rely on them to prove the semantics they're documented to provide. Don't even think about cores, caches, flushing to memory, or anything like that. Those are hardware details that, as a Java programmer, you don't even have to ever think about.
Any Java documentation you see that talks about cores, caches, or flushes to memory is not actually talking about real cores, caches, or flushes to memory. It's just giving you some ways to think about hypothetical hardware so you can wrap your brain around why memory visibility and total ordering don't always work perfectly just by themselves. Your real CPU or platform may have completely different issues that bear no resemblance to this hypothetical hardware. (And real-world CPUs and systems have cache coherency guaranteed by hardware and their visibility/ordering issues in fact are completely different!)

Generating data for a Java game using a second Thread

I am creating a voxel game in Java. Currently, I am using perlin noise to generate data for 3d chunks (16x16x16 short arrays) which are contained in a HashMap. This all works correctly. When the player moves, I want to render the chunks near the player (right now, 5 chunks in any direction). If a chunk does not exist, it should generate it.
The problem is that it takes about half a second to generate a chunk so when the player moves out of the generated area, the game loop freezes for a couple seconds while it generates the necessary chunks and then resumes.
I am using lwjgl for OpenGL and my game loop looks something like this:
while (!Display.isCloseRequested()){
update(); //my update method
render(); //my render method
Display.update(); //refresh the screen
Display.sync(60); //sync to 60 fps
}
I have tried, unsuccessfully, to use a second thread to generate data while updating and rendering but I could not figure out how to do it without freezing the game loop. I think there should be a way to queue chunks to generate in a second thread and then run that thread in short bursts but I have little to no experience with multithreading in Java so any help with that would be appreciated.

If the player can see the five chunks around him, you can maybe generate the chunks six rows away before he enters one direction. This way you have done the work early enough and you can display the chunks directly.
This task of generating the chunks you can do in a separate thread. It doesn't have to be called in the game-loop. You have to invoke the generater-thread before entering the game-loop.

I'd have a background thread that's notified when the player moves, then pregenerates chunks adjacent to the where the player moved. It's backed by a priority queue so that backlogs of never-visited cells aren't at the top of the queue, and old queue entries are removed once they're a certain distance from the player.
The key to leveraging threading is that you're generating the chunks before the player moves there, and taking down time in movements to generate chunks, and possibly unneeded chunks.
And the big caveat: if you offload all generation to that thread, you're going to need to use Futures so you always get back a generated chunk.

Okay, I solved my problem using only one thread. I created a TaskManager which was responsible for holding an arraylist of tasks to be done. So every time I needed to generate a chunk, I would pass a task object that contained information to actually perform the task. Then, every update, I would call TaskManager.next() which would perform the next task.
Now, instead of generating 49 new chunks in one update which froze the framerate, it generates one chunk per update until they are all generated.

java Swing using JFrame.pack() repeatedly - how efficient is it?

In short I was wondering, if I call JFrame.pack() on a frame that is already sized, will it take a long time to analyse this or will it simply return immediately? I ask for efficiency reasons. There is a picture in my frame that is being updated many times a second within a loop. Now JFrame.pack() has to be called after at least the first picture is drawn to insure that the frame is the right size.
To handle this what I have (in pseudo code) is:
boolean flag = false
while (condition) {
getNextPicture();
updateFrameWithPicture()
if (!flag) {
frame.pack()
flag = true;
}
}
Now I was wondering if there would be a problem if I just left out the check for the flag and always called frame.pack(). Could the program decide fast enough if the frame is already the correct size?

Tricky to estimate differences in performance as I am displaying images received on an RTP socket.
You are micro optimizing your code. Getting external data through a socket will always be slower than repainting a frame.
I am actually using a swing timer
You should NOT use the Timer to do the actual reading of the image on the socket. The GUI will block while the image is being read. You should be using a separate Thread to read the image. It will probably be better for you to use a SwingWorker and then publish() the images as they become available. Read the section from the Swing tutorial on Concurrency for more information.
just thought it would be easier to describe my question in terms of a while loop.
No, it is better to give the actual code as you can see your current code is incorrect. Don't make use guess what you are doing.

Updating scenes in advance while still rendering previous frame

I'm currently working on a multi-threading safe rendering system and I would like to know your thoughts on how to correctly update the next step in the game world while the previous scene is currently rendering. Currently I am using LWJGL Opengl Bindings with Java. Here is pseudocode for my game loop as it is currently set up (which is probably just a basic loop that most people are familiar with):
//DrawingLayers are wrappers for in game entities and has an update
//and render method
game-loop:
addInputEventsToInputStack()
removeCompletedDrawingLayers()
foreach layer in DrawingLayerQueue :
layer.update(deltaTime) //update position/color/size for entity
layer.render() //performs OpenGL calls
if(layer.isCompleted):
addToCompletedDrawingLayersList()
swapBuffers() //blocks until scene is fully rendered
goto game-loop
My problem lies in the swapBuffers() method as it blocks until the scene is rendered which means I cannot perform any updates while that is going on. My thought on how to get around this is to:
Have a copy of all DrawingLayers that I use for updating the state of the entities and have the other copy as a reference for the rendering thread. And while a frame is rendering, kick off a thread just before swapBuffers() to update the copy that is not in use.
I'm wary of this approach as I believe creating the copies before every frame would slow the system down more than I would like.
Does my approach make sense, and if not, do you guys have any recommendations for how to do this? I'm open to a complete restructuring.
Updated: Based on datenwolf's suggestion I've changed my gameloop to the following:
//DrawingLayers are wrappers for in game entities and has an update
//and render method
//A future for the pre-processing task
Future preProcess = null
game-loop:
//Update: checks if we have a preprocessed update to wait for
//and waits for it to complete
if(preProcess != null):
preProcess.get()
preProcess = null
addInputEventsToInputStack()
removeCompletedDrawingLayers()
foreach layer in DrawingLayerQueue :
layer.render() //performs OpenGL calls
if(layer.isCompleted):
addToCompletedDrawingLayersList()
//UPDATE: the following just calls all the update methods for the layers
// in a new thread
preProcess = executorService.submit(new UpdateRunnable())
swapBuffers() //blocks until scene is fully rendered
goto game-loop
So far with this I've got a significant improvement in performance. There may be some race condition issues that I cant see, but overall Im happy with this improvement.

in the swapBuffers() method as it blocks until the scene is rendered
The blocking of the buffer swap is only partial by finishing the rendering. It usually also blocks due to wait for the retrace. However OpenGL guarantees you, that after any drawing command returns, the buffers accessed by it can be safely modified without any pending rendering operations being impaired. The implementation is required to make copies or copy-on-write mappings to all data.
Or in short terms: Just modify the data in the buffers. As soon as drawing calls (glDrawArrays, glDrawElements) return it's safe to do so.

How to manage the game state in face of the EDT?

I'm developing a real time strategy game clone on the Java platform and I have some conceptional questions about where to put and how to manage the game state. The game uses Swing/Java2D as rendering. In the current development phase, no simulation and no AI is present and only the user is able to change the state of the game (for example, build/demolish a building, add-remove production lines, assemble fleets and equipment). Therefore, the game state manipulation can be performed in the event dispatch thread without any rendering lookup. The game state is also used to display various aggregated information to the user.
However, as I need to introduce simulation (for example, building progress, population changes, fleet movements, manufacturing process, etc.), changing the game state in a Timer and EDT will surely slow down the rendering.
Lets say the simulation/AI operation is performed in every 500ms and I use SwingWorker for the computation of about 250ms in length. How can I ensure, that there is no race condition regarding the game state reads between the simulation and the possible user interaction?
I know that the result of the simulation (which is small amount of data) can be efficiently moved back to the EDT via the SwingUtilities.invokeLater() call.
The game state model seems to be too complex to be infeasible for just using immutable value classes everywhere.
Is there a relatively correct approach to eliminate this read race condition? Perhaps doing a full/partial game state cloning on every timer tick or change the living space of the game state from EDT into some other thread?
Update: (from the comments I gave)
The game operates with 13 AI controlled players, 1 human player and has about 10000 game objects (planets, buildings, equipment, research, etc.). A game object for example has the following attributes:
World (Planets, Players, Fleets, ...)
Planet (location, owner, population, type,
map, buildings, taxation, allocation, ...)
Building (location, enabled, energy, worker, health, ...)
In a scenario, the user builds a new building onto this planet. This is performed in EDT as the map and buildings collection needs to be changed. Parallel to this, a simulation is run on every 500ms to compute the energy allocation to the buildings on all game planets, which needs to traverse the buildings collection for statistics gathering. If the allocation is computed, it is submitted to the EDT and each building's energy field gets assigned.
Only human player interactions have this property, because the results of the AI computation are applied to the structures in EDT anyway.
In general, 75% of the object attributes are static and used only for rendering. The rest of it is changeable either via user interaction or simulation/AI decision. It is also ensured, that no new simulation/AI step is started until the previous one has written back all changes.
My objectives are:
Avoid delaying the user interaction, e.g. user places the building onto the planet and only after 0.5s gets the visual feedback
Avoid blocking the EDT with computation, lock wait, etc.
Avoid concurrency issues with collection traversal and modification, attribute changes
Options:
Fine grained object locking
Immutable collections
Volatile fields
Partial snapshot
All of these have advantages, disadvantages and causes to the model and the game.
Update 2: I'm talking about this game. My clone is here. The screenshots might help to imagine the rendering and data model interactions.
Update 3:
I'll try to give a small code sample for clarify my problem as it seems from the comments it is misunderstood:
List<GameObject> largeListOfGameObjects = ...
List<Building> preFilteredListOfBuildings = ...
// In EDT
public void onAddBuildingClicked() {
Building b = new Building(100 /* kW */);
largeListOfGameObjects.add(b);
preFilteredListOfBuildings.add(b);
}
// In EDT
public void paint(Graphics g) {
int y = 0;
for (Building b : preFilteredListOfBuildings) {
g.drawString(Integer.toString(b.powerAssigned), 0, y);
y += 20;
}
}
// In EDT
public void assignPowerTo(Building b, int amount) {
b.powerAssigned = amount;
}
// In simulation thread
public void distributePower() {
int sum = 0;
for (Building b : preFilteredListOfBuildings) {
sum += b.powerRequired;
}
final int alloc = sum / (preFilteredListOfBuildings.size() + 1);
for (final Building b : preFilteredListOfBuildings) {
SwingUtilities.invokeLater(=> assignPowerTo(b, alloc));
}
}
So the overlapping is between the onAddBuildingClicked() and distributePower(). Now imagine the case where you have 50 of these kind of overlappings between various parts of the game model.

This sounds like it could benefit from a client/server approach:
The player is a client - interactivity and rendering happen on that end. So the player presses a button, the request goes to the server. The reply from the server comes back, and the player's state is updated. At any point between these things happening, the screen can be re-painted, and it reflects the state of the game as the client currently knows it.
The AI is likewise a client - it's the equivalent of a bot.
The simulation is the server. It gets updates from its clients at various times and updates the state of the world, then sends out these updates to everyone as appropriate. Here's where it ties in with your situation: The simulation/AI requires a static world, and many things are happening at once. The server can simply queue up change requests and apply them before sending the updates back to the client(s). So as far as the server's concerned, the game world isn't actually changing in real time, it's changing whenever the server darn well decides it is.
Finally, on the client side, you can prevent the delay between pressing the button and seeing a result by doing some quick approximate calculations and displaying a result (so the immediate need is met) and then displaying the more correct result when the server gets around to talking to you.
Note that this does not actually have to be implemented in a TCP/IP over-the-internet sort of way, just that it helps to think of it in those terms.
Alternately, you can place the responsibility for keeping the data coherent during the simulation on a database, as they're already built with locking and coherency in mind. Something like sqlite could work as part of a non-networked solution.

Not sure I fully understand the behavior you are looking for, but it sounds like you need something like a state change thread/queue so all state changes are handled by a single thread.
Create an api, maybe like SwingUtilities.invokeLater() and/or SwingUtilities.invokeAndWait() for your state change queue to handle your state change requests.
How that is reflected in the gui I think depends on the behavior you are looking for. i.e. Can't withdraw money because current state is $0, or pop back to the user that the account was empty when the withdraw request was processed. (probably not with that terminology ;-) )

The easiest approach is to make the simulation fast enough to run in the EDT. Prefer programs that work!
For the two-thread model, what I suggest is synchronise the domain model with a rendering model. The render model should keep data on what came from the domain model.
For an update: In the simulation thread lock the render model. Traverse the render model updating where things are different from what is expected update the render model. When finished traversing, unlock the render model and schedule a repaint. Note that in this approach you don't need a bazillion listeners.
The render model can have different depths. At one extreme it might be an image and the update operation is just to replace a single reference with the new image object (this wont handle, for instance, resizing or other superficial interaction very well). You might not bother checking whether an item has change and just update eveything.

If changing the game state is fast (once you know what to change it to) you can treat the game state like other Swing models and only change or view the state in the EDT. If changing the game state is not fast, then you can either synchronize state change and do it in swing worker/timer (but not the EDT) or you can do it in separate thread that you treat similarly to the EDT (at which point you look at using a BlockingQueue to handle change requests). The last is more useful if the UI never has to retrieve information from the game state but instead has the rendering changes sent via listeners or observers.

Is it possible to incrementally update the game state and still have a model that is consistent? For example recalculate for a subset of planet/player/fleet objects in between renders/user updates.
If so, you could run incremental updates in the EDT that only calculate a small part of the state before allowing the EDT to process user inputs and render.
Following each incremental update in the EDT you would need to remember how much of the model remains to be updated and schedule a new SwingWorker on the EDT to continue this processing after any pending user inputs and rendering has been performed.
This should allow you to avoid copying or locking the game model while still keeping the user interactions responsive.

I think you shouldn't have World store any data or make changes to any objects itself, it should only be used to maintain a reference to an object and when that object needs to be changed, have the Player making the change change it directly. In this event, the only thing you need to do is synchronize each object in the game world so that when a Player is making a change, no other Player can do so. Here's an example of what I'm thinking:
Player A needs to know about a Planet, so it asks World for that Planet (how is dependent upon your implementation). World returns a reference to the Planet object Player A asked for. Player A decides to make a change, so it does so. Let's say it adds a building. The method to add a building to the Planet is synchronized so only one player can do so at a time. The building will keep track of its own construction time (if any) so the Planet's add building method would be freed up almost immediately. This way multiple players can ask for information on the same planet at the same time without affecting each other and players can add buildings almost simultaneously without much appearance of lag. If two players are looking for a place to put the building (if that is part of your game), then checking the suitability of a location will be a query not a change.
I'm sorry if this doesn't answer you're question, I'm not sure if I understood it correctly.

How about implementing a pipes and filters architecture. Pipes connect filters together and queue requests if the filter is not fast enough. Processing happens inside filters. The first filter is the AI engine while the rendering engine is implemented by a set of subsequent filters.
On every timer tick, the new dynamic world state is computed based on all the inputs (Time is also an input) and a copy inserted into the first pipe.
In the simplest case your rendering engine is implemented as a single filter. It just takes the state snapshots from the input pipe and renders it together with the static state. In a live game, the rendering engine may want to skip states if there are more than one in the pipe while if you're doing a benchmark or outputting a video you'll want to render every one.
The more filters you can decompose your rendering engine into, the better the parallelism will be. Maybe it is even possible to decompose the AI engine, e.g. you may want to separate dynamic state into fast changing and slow changing state.
This architecture gives you good parallelism without a lot of synchronization.
A problem with this architecture is that garbage collection is going to run frequently freezing all the threads every time, possible killing any advantage gained from multi-threading.

It looks like you need a priorityqueue to put the updates to the model on, in which updates frmo the user have priority over the updates from the simulation and other inputs. What I hear you saying is that the user always needs immediate feedback over his actions wheras the other inputs (simulation, otherwise) could have workers that may take longer than one simulation step.
Then synchronize on the priorityqueue.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.