I want to extract and process the metadata from PNG images and the first line of .safetensors files for LLM’s and LoRA’s. I could spend ages farting around with sed or awk but formats of files are constantly changing. I’d like a faster way to see a summary of training and a few other details when they are available.

    • @pingveno@lemmy.ml
      link
      fedilink
      English
      110 days ago

      Yeah, I’ve been learning some nushell. If you’re dealing with data, it’s just a great tool. So many sharp edges in the POSIX shell come from it being stringly typed, so having a strongly typed shell is extremely helpful.

  • @Nibodhika@lemmy.world
    link
    fedilink
    811 days ago

    A week ago I would have said jq, but just the other day I discovered nushell and have been loving it, if you deal with structured data often it’s way easier, just bear in mind it’s not POSIX compatible

  • @Hammerheart@programming.dev
    link
    fedilink
    611 days ago

    What are some goos resources for learning jq? I really struggle when it comes to nested keys/values which obviously limits my ability to use it.

    • Beej Jorgensen
      link
      fedilink
      410 days ago

      I hate to do this, but AI chatbots are typically pretty good at giving examples for things like this and you can learn from it.

        • Beej Jorgensen
          link
          fedilink
          15 days ago

          I definitely use them a lot, but I think “very” is too strong a word. It’s pretty easy to get confident, contradictory information from them. They’re a good place to start and brainstorm, but all the information has to be verified either by running and testing the code, or by finding a human source.

          • @xavier666@lemm.ee
            link
            fedilink
            English
            15 days ago

            True. I wouldn’t use them for very complicated stuff. I currently use them for “what is x?” and “how is x different from y?” kinds of question.

            One advantage of using an AI is that it removes a lot of fluff that you get on blogs. However, that can change very soon when our AI overlords figure out monetization.

      • @Hammerheart@programming.dev
        link
        fedilink
        3
        edit-2
        10 days ago

        I have perused it, but its both so dense and so broad that its not that helpful unless i know exactly what I’m looking for. I have also tried info and tldr. I actually like tldr the most,. although the exhaustiveness of the man pages must be admired. I dont find it to be the best teacher.

    • @timbuck2themoon@sh.itjust.works
      link
      fedilink
      English
      411 days ago

      Online json parser. Throw in some data and then structure a query.

      It’ll keep updating the results as you tweak your query. A simple search will probably give you twenty that’ll work. I can’t remember what i normally use off the top of my head.

  • @Diplomjodler3@lemmy.world
    link
    fedilink
    611 days ago

    Python is very good for working with JSON. Definitely will get you there faster than awk for anything not completely trivial.

  • ᗺark dor
    link
    fedilink
    4
    edit-2
    11 days ago

    Pipe to jless first to pick out targets then jq

    If it is a small file and I want to do edits then use Yq to send it to Yaml and back again

    Looking at whether duckdb is a better approach especially for querying, bulk transforms, python

  • tiredofsametab
    link
    fedilink
    411 days ago

    Previously, I coded something in Rust real quick to spit out and manipulate some JSON, but it looks like the jq/yq below would work fine.

  • palordrolap
    link
    fedilink
    311 days ago

    There are probably pre-written awk scripts out there that already do what you want, not that I know where they’d be.

    That said, you might be better off using one of the bigger but still fairly commonly installed languages. There’s bound to be things on PyPI (for Python) or CPAN (for Perl) that could be bolted together for example.

    If you’re really lucky there might even be something that covers your whole use-case, but I haven’t checked.

    • Semperverus
      link
      fedilink
      English
      311 days ago

      Python has built-in json parsing, as does (and i know this isnt gonna be popular) PowerShell.

    • @j4k3@lemmy.worldOP
      link
      fedilink
      English
      210 days ago

      I found a Python project that does enough for my needs. Jq looks super powerful though. Thanks. I managed to get yq working for PNG’s, but I had trouble with both jq and yq with safetensor files. I couldn’t figure out how to parse a string embedded in an inconsistent starting binary, and with massive files. I could get in and grab the first line with head. I tried some stuff with expansions, but that didn’t work and sent me looking for others that have solved the issue better than myself.