91
Root
6y

API Guy.

He has a serious regex problem.
Regexes are never easy to read, but the ones he uses just take the cake. They're either blatantly wrong, or totally over-engineered garbage that somehow still lacks basic functionality. I think "garbage" here is a little too nice, since you can tell what garbage actually is/was without studying it for five minutes.

In lieu of an actual rant (mostly because I'm overworked), I'll just leave a few samples here. I recommend readying some bleach before you continue reading.

Not a valid url name regex:
VALID_URL_NAME_REGEX = /\A[\w\-]+\Z/

Semi-decent email regex: (by far the best of the four)
VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

Over-engineered mess that only works for (most) US numbers:
VALID_PHONE_REGEX = /1?\s*\W?\s*([2-9][0-8][0-9])\s*\W?\s*([2-9][0-9]{2})\s*\W?\s*([0-9]{4})(\se?x?t?(\d*))?/

and for the grand finale:

ZIP_CODE_REGEX = /(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$)|GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}/

^ which, by the way, doesn't match e.g. Australian zip codes. That cost us quite a few sales. And yes, that is 512 characters long.

Comments
  • 52
    *sounds of something break*
  • 35
    I still laugh when I see someone try to go all out with regex to validate an email address. It will never work for everything, no matter how elaborate (and long!) it gets. The only way to truly validate it is to--wait for it--send it an email!
  • 7
    Also, wow on those other ones.
  • 27
    Damn, scrolling down into that last one was like lifting up the carpet and finding a nest of spiders underneath. 😨😱
  • 8
    @duckWit email validation for specific domains is about all you can do safely.

    Otherwise yep, send it an email and await the click back.
  • 9
    @shellbug The zip code regex is actually pretty simple if you break it into its three clauses; it just looks like a scary monstrosity. I find the phone number regex significantly more confusing.
  • 11
    So being an API guy, he's rejecting requests that don't match in these instances? Unless you know exactly what you're looking for, that's a really bad practice.
  • 10
  • 10
  • 9
  • 9
  • 6
  • 13
    @projektaquarius I know, just inspect the <h1> and give it max-width: 60vw if you want to read it
  • 7
    @devTea I like server side solutions to my client side problems. I am a typical user.
  • 14
    @duckWit yep!
    This, among other reasons, is why we've lost a lot of sales and signups. Can't buy or register if your zip isn't valid! What's valid? Who knows! Same for email, phone, ...

    My nickname for him is API Guy because every API he has ever written has been untested, broken, and buggy as hell even after he's"fixed" it several times. (And no, it's not limited to just his APIs: basically everything he wrote is shoddy at best.)
  • 3
    Iamroot iamroot iaaammroot
    l+a+m+r+oo+t+
  • 4
    I like the phone regex becaused it has sex in it. Hihi :)
  • 9
    @AlexDeLarge I'm so glad he quit. I'd feel bad chastising him every single day about shit like this.

    Well, maybe.
    He's absolutely deserving of my ire.

    Bah, here I am being nice again. I'm disgusted with myself.
  • 3
    Well, at least it wasn't 511 or 513 😁
  • 3
    For email validation there is a regex on StackOverflow.
    It's just fucking stupid of him to tinker together his own shitty pile of characters.
    https://stackoverflow.com/a/201378
  • 1
    @projektaquarius rip RAM-- just a few tabs...
  • 5
    @Root wow, wait a minute, don't go saying I'm untested, broken and buggy like that, let people figure on their own.
  • 4
    @hell 😅💛
  • 3
    Don’t hate me, but I actually like to optimize regexs..
    Whenever I find one in the code I lose myself in the deepest darkness of sublime.

    Maybe because they really are always over-engineered, that makes it so easy to optimize..
  • 5
    @just8littleBit I love regexes, too 😊 but only for simple pattern matching. Anything complicated or with lots of variation like phone numbers, zip codes, email addresses, markup, etc. is almost always better handled with a parser.
  • 2
    @Root agree. And of course for filtering log files by whatever needed:)
  • 3
    @devTea Z̨̥̘̘̘̬̻̖̲̰͒̔̉̔͛̿͛̊̀͛ą̡̛̠̜͚̼͙͍̞̠̈́͛̄͑̇̍̓͠͝ļ̖͇͇͙̹̦̗̭͙̈͑͑́͑̈́̀̈́̕͝g̢̹̘͚̬̩̪̦̟͒͒́͌̉̔̑̾̑̂ͅǫ̛̛̮̥͕̠̝̪̙̩̝͗̄͌̌́̒̊͝ ̛̺̖̺͖̰͔̘̭̼͑͐̉͊̏̀͗̌͘͜r̟̹̲̥͔̱͖̘̳͕̃̀͛͑̄̎͊̎̀̇e̡̛̟̹̱̫̝̞̱̰̮͐́͆̓̎̔̌̋̚ģ̡̘̦̠̘̣̤͎̰̀͗́̈́͒͌̊͑̿͗e̢̛̬̹͈͎̬̮͕̓͋̇͑̓̌̋͒͘͜ͅx̡̧̲̰̩͍̖̻̙̤̎̈́̿͂̍͐̌̓̍̅
  • 3
  • 7
    Regex email verification usually doesn't allow "+" sign. Which messes up my gmail aliases. Once I had to fill up my prepaid mobile phone and changed js on the website because the latter part of the form activated only when the email was "valid" by their standards. Of course when reporting it, some consultant said my email was invalid etc. AFTER I FREAKING TOLD THEM EXACTLY THE LINE THE REGEX IS AND ASKED TO FORWARD MY MESSAGE TO TECH GUYS.

    [The other phone company I have number with, redirected my message to tech guys like I asked. And I got bonus money on my prepaid number "for inconvenience caused".]
  • 3
    Jesus Christ that is a lot of tabs, @projektaquarius. And I thought I was bad! 😂
  • 1
    @duckWit validating an e-mail address itself can be regex-based, but people take it much too far.

    /(.?)+@(.?)+.(.?)/
    And be fucking done with it...
    Or just use the damn tools your language gives you, eg. filter_var in PHP

    Validating the server and/or domain behind it is different, and will still never be the final check you'd need.

    Validating if the mail has been sent will still let you send e-mails to un-catched, and even invalid addresses; even though the IMAP/POP server will probably just say 'Thanks' and still it could end up nowhere at all, never be read, etc..

    Validating if the mail has been received happens on a totally different level and will never be totally water-proof at all.

    Even human-interaction, like e-mail confirmation links, will never be end-game, thinking of bot-mailboxes and temporary boxes, or messed up aliases, spf, dmarc, anti-spam, anti-virus, firewalls, encoding shit keys,...
  • 3
    @xewl I agree with everything except that regex 😋
  • 2
    Yeah that’s a little too lax imo @xewl. That would match @@@... as a valid email address.

    I think at the very least throw some character classes in there.
  • 1
    @Root Yeah, it's too blatant, but it was out of my head

    The bottom one on https://regular-expressions.info/em... seems fine to me.
  • 1
    Honestly though why is this even something we need to worry about? Why don’t platforms/languages have built in ways to detect these things. I mean, at least when it comes to standard formats like emails and urls.
  • 2
    @xewl this is what I do. Simply:

    1) check for '@' symbol.

    2) send an email with a verification link

    3) Not let the user do anything until they verify the email.

    The End.

    It's already an impossible task for all the reasons you mentioned, no use in over complicating it. If you need to, use a service that identifies temporary email addresses and then reject it if they match, but of course not even that is perfect.

    I find there is no point in spending so much time on something that has far too many loopholes. If the user wants to complicate their experience I can't stop them.
  • 2
    @devios1 Because, even languages aren't always using the correct checks. Some languages do have a way to do these things, but they either come later when the language reaches a certain maturity, or they're implemented totally wrong. :')

    This is what RFC's are for though.
  • 1
    @duckWit What if you get an error trying to send the email to an invalid address?
  • 1
    @duckWit Time. That's exactly why we c/p this shit :')

    @devios1 You delete (or don't create) the actual account and redirect back to the register form? xD
  • 1
    @xewl I’m assuming they would be competent enough to implement it according to the standard, but you’re right maybe that’s putting too much faith in them.
  • 1
    @devios1 then the user can't do anything until they correct their email. Give them blocking options, something like this:

    We've sent your verification email. Don't see it? Check your spam/junk folder, or:

    1) Click here to resend verification email

    2) Click here to change email
  • 2
    @duckWit Nah that’s fucked, sorry. You gotta at least do your job as a programmer.
  • 1
    @devios1 huh? How is that not doing my job as a programmer? I'm simply putting all the responsibility of the user verifying their email on the user.

    It feels like there's a misunderstanding in what I'm saying.
  • 1
    @duckWit Ignoring errors and pretending they didn't happen? That sounds highly fucked to me.
  • 1
    @devios1 the general idea of what I'm saying is being overlooked with a focus on implementation details.

    Obviously go ahead and record the error if the email didn't go through if you want, and enhance that blocking prompt by saying something like "Looks like there was a problem with your email address, we tried sending you the verification email to [email address] but it didn't work:

    1) resend verification email

    2) update/change email address"

    My point remains: the only real validation we can do if the requirements are to validate the user's email address, is to not let the user use the application until they have verified the email by clicking the verification link.
  • 2
    @duckWit Yes, but the act of verifying the email address is in itself a safety check right? I mean you could just assume that they typed the right email in the first time and not even bother to verify it.

    Anyway, I wouldn't argue client-side validation is necessarily always a requirement, but at the very least you need to catch any real errors that occur and communicate them to the user somehow. That's really the only beef I had with what you said.

    How far you go to make things easy on your users beyond that is up to you; I'm just saying if I type in "123 Falsewood Lane" into the email address field, it shouldn't tell me "We've sent your verification email", because it hasn't.
  • 3
    @devios1 I agree and understand with what you're saying. What you are saying was implied in what I said, at least that was my intent. The user is given all the necessary tools to correct the issue. How detailed you want to get when displaying those correction tools is implementation details.

    Playing devil's advocate, even if I don't tell them there was an error with sending the email, they *can* investigate and learn that they typed in the wrong email and then correct it. Obviously that's not the best user experience, but it is still within their ability to rectify.
Add Comment