Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "long lead items"
-
FINALLY FINISHED THE THING I WAS WORKING ON SINCE LAST MARCH!!! THIS HAS PISSED ME OFF SO MUCH!!! TEARS WERE SHED!!!FEELINGS HURT!!! BUT IT'S DONE!!! I AM NEVER TRANSLATING SOMETHING FROM C++ TO C# AGAIN!!!
-
Hey, been gone a hot minute from devrant, so I thought I'd say hi to Demolishun, atheist, Lensflare, Root, kobenz, score, jestdotty, figoore, cafecortado, typosaurus, and the raft of other people I've met along the way and got to know somewhat.
All of you have been really good.
And while I'm here its time for maaaaaaaaath.
So I decided to horribly mutilate the concept of bloom filters.
If you don't know what that is, you take two random numbers, m, and p, both prime, where m < p, and it generate two numbers a and b, that output a function. That function is a hash.
Normally you'd have say five to ten different hashes.
A bloom filter lets you probabilistic-ally say whether you've seen something before, with no false negatives.
It lets you do this very space efficiently, with some caveats.
Each hash function should be uniformly distributed (any value input to it is likely to be mapped to any other value).
Then you interpret these output values as bit indexes.
So Hi might output [0, 1, 0, 0, 0]
while Hj outputs [0, 0, 0, 1, 0]
and Hk outputs [1, 0, 0, 0, 0]
producing [1, 1, 0, 1, 0]
And if your bloom filter has bits set in all those places, congratulations, you've seen that number before.
It's used by big companies like google to prevent re-indexing pages they've already seen, among other things.
Well I thought, what if instead of using it as a has-been-seen-before filter, we mangled its purpose until a square peg fit in a round hole?
Not long after I went and wrote a script that 1. generates data, 2. generates a hash function to encode it. 3. finds a hash function that reverses the encoding.
And it just works. Reversible hashes.
Of course you can't use it for compression strictly, not under normal circumstances, but these aren't normal circumstances.
The first thing I tried was finding a hash function h0, that predicts each subsequent value in a list given the previous value. This doesn't work because of hash collisions by default. A value like 731 might map to 64 in one place, and a later value might map to 453, so trying to invert the output to get the original sequence out would lead to branching. It occurs to me just now we might use a checkpointing system, with lookahead to see if a branch is the correct one, but I digress, I tried some other things first.
The next problem was 1. long sequences are slow to generate. I solved this by tuning the amount of iterations of the outer and inner loop. We find h0 first, and then h1 and put all the inputs through h0 to generate an intermediate list, and then put them through h1, and see if the output of h1 matches the original input. If it does, we return h0, and h1. It turns out it can take inordinate amounts of time if h0 lands on a hash function that doesn't play well with h1, so the next step was 2. adding an error margin. It turns out something fun happens, where if you allow a sequence generated by h1 (the decoder) to match *within* an error margin, under a certain error value, it'll find potential hash functions hn such that the outputs of h1 are *always* the same distance from their parent values in the original input to h0. This becomes our salt value k.
So our hash-function generate called encoder_decoder() or 'ed' (lol two letter functions), also calculates the k value and outputs that along with the hash functions for our data.
This is all well and good but what if we want to go further? With a few tweaks, along with taking output values, converting to binary, and left-padding each value with 0s, we can then calculate shannon entropy in its most essential form.
Turns out with tens of thousands of values (and tens of thousands of bits), the output of h1 with the salt, has a higher entropy than the original input. Meaning finding an h1 and h0 hash function for your data is equivalent to compression below the known shannon limit.
By how much?
Approximately 0.15%
Of course this doesn't factor in the five numbers you need, a0, and b0 to define h0, a1, and b1 to define h1, and the salt value, so it probably works out to the same. I'd like to see what the savings are with even larger sets though.
Next I said, well what if we COULD compress our data further?
What if all we needed were the numbers to define our hash functions, a starting value, a salt, and a number to represent 'depth'?
What if we could rearrange this system so we *could* use the starting value to represent n subsequent elements of our input x?
And thats what I did.
We break the input into blocks of 15-25 items, b/c thats the fastest to work with and find hashes for.
We then follow the math, to get a block which is
H0, H1, H2, H3, depth (how many items our 1st item will reproduce), & a starting value or 1stitem in this slice of our input.
x goes into h0, giving us y. y goes into h1 -> z, z into h2 -> y, y into h3, giving us back x.
The rest is in the image.
Anyway good to see you all again.26 -
Allright, so now I have to extend a brand new application, released to LIVE just weeks ago by devs at out client's company. This application is advertised as very well structured, easy to work on, µservices-based masterpiece.
Well either I lack a loooot of xp to understand the "µservices", "easy to work on" and "well structured" parts in this app or I'm really underpaid to deal with all of this...
- part of business logic is implemented in controllers. Good luck reusing it w/o bringing up all the mappings...
- magic numbers every-fucking-where... I tried adding some constants to make it at least a tiny bit more configurable... I was yelled at by the lead dev of the app for this later.
- crud-only subservices (wrapped by facade-like services, but still.. CRUD (sub)services? Then what's a repository for...?). As a result devs didn't have a place where they could write business logic. So business logic is now in: controllers (also responsible for mapping), helpers (also application layer; used by controllers; using services).
- no transactions wrapping several actions, like removing item from CURRENT table first and then recreating it in HISTORY table. No rollback/recovery mechanism in service layers if things go South.
- no clean-code. One can easily find lines (streams) 400+ cols long.
- no encapsulation. Object fields are accessed directly
- Controllers, once get result from Services (i.e. Facade), must have a tree of: if (result instanceof SomeService.SomeSubservice1.Item1) {...} else if (result instanceof SomeService.SomeSubservice2.Item4) {...} etc. to build a proper DTO. IMO this is not a way to make abstraction - application should NOT know services' internals.
- µservices use different tables (hats off for this one!) but their records must have the same IDs. E.g. if I order a burger and coke - there are 2 order items in my order #442. When I make a payment I create an invoice which must have an id #442. And I'm talking about data layer, not service or application (dto)! Shouldn't µservices be loosely coupled and be able to serve independently...? What happens if I reuse InvoiceµService in some other app?
What are your thoughts?1