Read this post and wrote a simple Tampermonkey script as a solution.

// ==UserScript==
// @name         Fix community link
// @version      0.1
// @description  try to take over the world!
// @match        https://sh.itjust.works/post/*
// ==/UserScript==

(function() {
    'use strict';
    const postLinks = document.getElementById("postContent").querySelectorAll("a:not(.community-link)") // get every links that is NOT a community link

    const fixLink = (aTags) => {
        for (let aTag of aTags) {
            const isCommunityLink = aTag.pathname.startsWith("/c/");
            aTag.href = isCommunityLink?aTag.pathname + "@" + aTag.host:aTag.href
        };
    }

    fixLink(postLinks)

    const comments = document.getElementsByClassName("comment-content");

    for (let comment of comments) {
        let commentLinks = comment.querySelectorAll("a:not(.community-link)");
        fixLink(commentLinks)
    }
})();

Any advice? I especially hate the fact that the way to check if it's a link for lemmy community is through pathname but I thought there's can't be a real solution besides listing all the lemmy instances or actually making a request somehow.

Any inputs are welcome!

  • @Andy@programming.dev
    link
    fedilink
    310 months ago

    I just posted this on the post you linked, but yeah I am hardcoding a list of instances into my solution. Here's my comment, copied:


    I'm using the Firefox addon Redirector, and one of my rules there redirects community links using regex. I keep adding to the pattern as I encounter more instances. Currently it's:

    Pattern: https://(lemdro\.id|lemmy\.run|beehaw\.org|lemmy\.ml|sh\.itjust\.works|lemmy\.fmhy\.ml|lemmy\.dbzer0\.com|lemmy\.world|sopuli\.xyz|lemmy\.kde\.social|lemmygrad\.ml|mander\.xyz|lemmy\.ca|zerobytes\.monster)/c/(.*)
    
    Redirect to: https://programming.dev/c/$2@$1
    
    • @ezchili@iusearchlinux.fyi
      link
      fedilink
      3
      edit-2
      10 months ago

      Never use regex on URLs, they're not enough to properly do the job, you have the perfect, fast and correct URL parser already in your browser or your node binary, you need to use it, make a list of hostnames and use the browser's URL api to extract hostnames then match against the list

        • @ezchili@iusearchlinux.fyi
          link
          fedilink
          2
          edit-2
          9 months ago

          Oh man I was hoping you'd ask because URLs are way worse than people imagine and that's still not even a tenth of what emails can do

          HtTpS://user:pw@lemdro.id:443 is a valid url to lemdro.id and should match but will not

          Http://maliciouswebsite.to/?q=http://lemdro.id will match but should not

          To give you an idea of how bad this is I suggest anyone tell me if their lemmy app parsed those properly because Thunder treats the 1st one as an email and Jerboa thinks both are URLs

          You also have the instances that have a valid address with www in front of them for old school internet habits, there's urls that can have quotes in them, urls with chinese characters or russian characters that are both valid in their encoding but have a canonical form in ASCII

          It's a mess, and the correct way to do this is still faster than your regex in the end which is crazy

          • @Andy@programming.dev
            link
            fedilink
            09 months ago

            HtTpS://user:pw@lemdro.id:443 is a valid url to lemdro.id and should match but will not

            Well that one:

            • technically shouldn't match, because it's not a community URL
            • is not the form I expect and want to be using for this case
            • may actually work (if it were a real community link, which again, it's not), because after authentication I think the browser strips the credentials

            Http://maliciouswebsite.to/?q=http://lemdro.id will match but should not

            No, it does not match.


            AFAICT, this solution is working properly, but if you can find a URL that breaks it, please let me know.

            • @ezchili@iusearchlinux.fyi
              link
              fedilink
              0
              edit-2
              9 months ago

              Lmao ah yes, one of those

              If you're not convinced with this you never will

              You can just wrap your var with "new URL()" and have something faster, correct and easier to read, but I'm guessing you'll change your ways silently in a few years when you've forgotten about this interaction and managed to convince yourself it was your own idea!

              Until then i guess you can add /c/whatev at the end of my two examples and find something else to criticize and decide not to support

              • @Andy@programming.dev
                link
                fedilink
                19 months ago

                I'm not trying to be combative, I'm trying to understand. I'd like to see the failure in action so I can appreciate and pursue the proposed solution.

                But when I added the community bit to the first URL, the browser resolved it and stripped the credentials, so the resulting URL matched. But the example credentials weren't real so it's not a great test; if there's an real example I can test, please share it. Though I don't see why I'd auth to an instance just to view it from a different instance.

                When I added the community bit to the second URL, it was not a match, as it shouldn't be. The pattern must match the entire URL.

                • @ezchili@iusearchlinux.fyi
                  link
                  fedilink
                  0
                  edit-2
                  9 months ago

                  You can find it in action on regex101 with the regex indeed matching the query string in the maliciouswebsite and not matching even just something with the port and no user/password

                  It is valid (just weird & not recommended) to give a user:pw combo to a website that doesn't ask for one in the headers. Browsers stripping it off is a different thing

                  The sheer number of things you have to take into account to properly parse a URL should convince you to not use regexes for it

                  The fact that it's less code, more correct, faster and more readable to use new URL() should also be enough to convince you to not use regexes

                  • @Andy@programming.dev
                    link
                    fedilink
                    19 months ago

                    I meant to communicate that the Redirector addon uses the given pattern to see if the entire URL string matches, not part of it. So the malicious URL does not match.

                    I'm wondering if there's a real URL for which the Redirector approach will not work.

    • TerrorBite :veripawed3:
      link
      fedilink
      1
      edit-2
      10 months ago

      @Andy @BeanCounter Given how many of these start with "Lemmy" you could simplify this to:

      https://(lemmy\\.(?:run|(?:fmhy\\.)?ml|dbzer0\\.com|world|kde\\.social|ca)|lemmygrad\\.ml|lemdro\\.id|beehaw\\.org|sh\\.itjust\\.works|(?:sopuli|mander)\\.xyz|zerobytes\\.monster)/c/(.\*)

      Or just assume that anything matching https://(lemmy\\.[^/]+)/c/(.\*) is a Lemmy server, which will probably be correct.

      Edit: some kind of interaction between Mastodon and Lemmy has doubled all my backslashes. That is not intentional.