Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
For the "what part of the image is most like a dog?" check out deep dream, it emphasises parts of the image that look like the dog, making everything a dog
-
@retoor lol. Dont know about brains. Like I said this is an obvious next step. If researchers are doing searches for sub graphs that perform as well as the whole graph (or using a network as a teacher to train miniture version of itself, also called "distillation"), the next logical step is simply to do a lot of testing to find a partial activation (inference) of the network that performs nearly as well as any full activation.
What we are after is useful partial network activations that generalize as well as full activations of the graph. We dump the network state for future use, and when a prompt is entered reload this partial state, and insert a lora-like layer for our prompt, calculating the transform of this new layer with the precomputed prior inference state, therefore cutting computation time down significantly. -
What you said about recognizing dogs and stuff. We can just ask ChatGPT what makes a dog a dog. I'll check. I'm interested
-
Ah, he great answer if you ask him how to distinguish a dog from other animals. Laptop doesn't want to post the answer for some reason
-
Deep dream > stablediffusion.
If you ask for nightmare fuel it gives you nightmare fuel.
Jokes aside, I was thinking more along the lines of YOLOv3, where it attends to many patches and tries to detect candidate samples for a certain number of categories of object. It can be reasonably expected that any sufficiently advanced object detection algorithm will approximate this same process, either directly or indirectly through some sub graphs that emerge while training (incidentally, see emergent inference heads for a similar phenomenon in transformers).
Knowing this, an implementation of semantic centering might offer better priors for earlier probable detection of any given instances of a subset of categories a system is trained to detect, thus cutting down the amount of area and categories that need to be considered all together for any given run.
Related Rants
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.
random
ml
chatgpt
stable diffusion
llm