• 0 Posts
  • 18 Comments
Joined 5 months ago
cake
Cake day: January 25th, 2024

help-circle
rss


  • For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.



  • Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.


  • Yeah sure, I’ll compile it in my OS. For any other OS, either I’m not knowledgeable about the tools available, and many of them that I am not going to spend money to acquire. If providing the binary a developer compiles for themselves would solve it, we’d not have that problem at all.

    I specifically hate when program or libraries are only in compiled form, and then I get an error messages talking about an absolute path it has with some usernames I’ve never seen before, and no way to correct it as there’s no code. Turns out when people pass compiled versions to the OS they don’t use themselves they don’t encounter the errors and think it works fine.


  • I haven’t tried many CAD softwares but AutoCAD has really intuitive UI. I used to be able to find most things by just thinking what tab it should be based on what it is. It actually inspired me to learn better programming and software design to make something intuitive. I haven’t used it in years since I came to Linux so as long as they haven’t changed it.







  • Dam, this is actually awesome, I actually wanted to do something similar to this but couldn’t figure things out.

    I need to try again, might be able to make a virtual desktop screen and make xrandr pipe that output to that window.

    Basically the thing I wanted to make is kind of like the picture in picture mode we have in browsers but for anything. Like just have those app open in a different virtual window and only remap the portion you wanna see into a always on top window.


  • Hi there, I did say it’s easily doable, but I didn’t have a script because I run things based on the image before OCR manually (like the negating the dark mode I tried in this script; when doing manually it’s just one command as I know whether it’s dark mode of not myself; similar for the threshold as well).

    But here’s a one I made for you:

    #!/usr/bin/env bash
    
    # imagemagic has a cute little command for importing screen into a file
    import -colorspace gray /tmp/screenshot.png
    mogrify /tmp/screenshot.png -color-threshold "100-200"
    # extra magic to invert if the average pixel is dark
    details=`convert /tmp/screenshot.png -resize 1x1 txt:-`
    total=`echo $details | awk -F, '{print $4}'`
    value=`echo $details | awk '{print $7}'`
    darkness=$(( ${value#_(%_)} * 100 / $total ))
    if (( $darkness < 50 )); then
       mogrify -negate /tmp/screenshot.png
    fi
    
    # now run the OCR
    text=`tesseract /tmp/screenshot.png -`
    echo $text | xclip -selection c
    notify-send OCR-Screen "$text"
    

    So the middle part is to accommodate images in dark mode. It negates it based on the threshold that you can change. Without that, you can just have import for screen capture, tesseract for running OCR. and optionally pipe it to xclip for clipboard or notify-send for notification.

    In my use case, I have keybind to take a screenshot like this: import png:- | xclip -selection c -t image/png which gives me the cursor to select part of the screen and copies that to clipboard. I can save that as an image (through another bash script), or paste it directly to messenger applications. And when I need to do OCR, I just run tesseract in the terminal and copy the text from there.


  • Not for handwritten text, but for printed fonts, getting OCR is as easy as just making a box in screen with current technology. So I don’t think we need AI things for that.

    Personally I use tesseract. I have a simple bash script that when run let’s me select a rectangle in screen, save that image and run OCR in a temp folder and copy that text to clipboard. Done.

    Edit: for extra flavor you can also use notify-send to send that text over a notification so you know what the OCR produced without having to paste it.


  • You said you’re not allowed to use a horse on freeway and it’s not a fair comparison. But I think it is exactly that. Freeway is where the majority of traffic is and it’s analogous to some of those major platforms where everyone is nowadays. You can use a horse and go to any place as long as there is land. It’s just not practical to do it.

    Yes you can make a website anyone can access but how will they find that website? You’ll need to inform the people in the web, and that’s dominated by those platforms. When people did the reddit blackout thing, reddit removed the posts and moving to lemmy, so without those posts we can’t expect people to know about alternatives. There are probably so many websites that host contents for users to post and such, but how many have we heard about? How many can we find with an internet search?


  • Common knowledge doesn’t mean people use it. It’s easy to forget even if you studied about it in school.

    For example you is singular and plural. But we rarely use you for multiple people nowadays, we just go “you guys”, “you all”, “all of you”, or something else to disambiguate.

    Languages move towards easy communication and simplicity.