Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)B
Posts
11
Comments
403
Joined
3 yr. ago

  • Cool.

    Is it all in rust-mail repo?

    And how much of "Rust" in this image is actually open?

  • Let's do this incrementally, shall we?

    First, let's make get_files_in_dir() idiomatic. We will get back to errors later.

      rust
        
    fn get_files_in_dir(dir: &str) -> Option<Vec<PathBuf>> {
        fs::read_dir(dir)
            .ok()?
            .map(|res| res.map(|e| e.path()))
            .collect::<Result<Vec<_>, _>>()
            .ok()
    }
    
      


    Now, in read_parquet_dir(), if the unwraps stem from confidence that we will never get errors, then we can confidently ignore them (we will get back to the errors later).

      rust
        
    fn read_parquet_dir(entries: &Vec<String>) ->  impl Iterator<Item = record::Row> {
        // ignore all errors
        entries.iter()
            .cloned()
            .filter_map(|p| SerializedFileReader::try_from(p).ok())
            .flat_map(|r| r.into_iter())
            .filter_map(|r| r.ok())
    }
    
      


    Now, let's go back to get_files_in_dir(), and not ignore errors.

      rust
        
    fn get_files_in_dir(dir: &str) -> Result<Vec<PathBuf>, io::Error>
    {
        fs::read_dir(dir)?
            .map(|res| res.map(|e| e.path()))
            .collect::<Result<Vec<_>, _>>()
    }
    
      

      diff
        
     
     fn main() -> Result<(), io::Error> {
         let args = Args::parse();
    -    let entries = match get_files_in_dir(&args.dir)
    -    {
    -        Some(entries) => entries,
    -        None => return Ok(())
    -    };
    -
    +    let entries = get_files_in_dir(&args.dir)?;
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
         for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
    
    
      


    Now, SerializedFileReader::try_from() is implemented for &Path, and PathBuf derefs to &Path. So your dance of converting to display then to string (which is lossy btw) is not needed.

    While we're at it, let's use a slice instead of &Vec<_> in the signature (clippy would tell you about this if you have it set up with rust-analyzer).

      rust
        
    
    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = record::Row> {
        // ignore all errors
        entries.iter()
            .filter_map(|p| SerializedFileReader::try_from(&**p).ok())
            .flat_map(|r| r.into_iter())
            .filter_map(|r| r.ok())
    }
    
      

      diff
        
         let entries = get_files_in_dir(&args.dir)?;
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
    -    for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
    +    for (idx, row) in read_parquet_dir(&entries).enumerate() {
             let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
             if idx == 0 {
                 wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
    
    
    
      


    Now let's see what we can do about not ignoring errors in read_parquet_dir().


    Approach 1: Save intermediate reader results

    This consumes all readers before getting further. So, it's a behavioral change. The signature may also scare some people 😉

      rust
        
    fn read_parquet_dir(entries: &Vec<PathBuf>) ->  Result<impl Iterator<Item = Result<record::Row, ParquetError>>, ParquetError> {
        Ok(entries
            .iter()
            .map(|p| SerializedFileReader::try_from(&**p))
            .collect::<Result<Vec<_>, _>>()?
            .into_iter()
            .flat_map(|r| r.into_iter()))
    }
    
      


    Approach 2: Wrapper iterator type

    How can we combine errors from readers with flat record results?

    This is how.

      rust
        
    enum ErrorOrRows {
        Error(Option<ParquetError>),
        Rows(record::reader::RowIter<'static>)
    }
    
    impl Iterator for ErrorOrRows {
        type Item = Result<record::Row, ParquetError>;
    
        fn next(&mut self) -> Option<Self::Item> {
            match self {
                Self::Error(e_opt) => e_opt.take().map(Err),
                Self::Rows(row_iter) => row_iter.next(),
            }
        }
    }
    
    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>>
    {
        entries
            .iter()
            .flat_map(|p| match  SerializedFileReader::try_from(&**p) {
                Err(e) => ErrorOrRows::Error(Some(e)),
                Ok(sr) => ErrorOrRows::Rows(sr.into_iter()),
            })
    }
    
      

      diff
        
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
         for (idx, row) in read_parquet_dir(&entries).enumerate() {
    +        let row = row?;
             let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
             if idx == 0 {
                 wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
    
      


    Approach 3 (bonus): Using unstable #![feature(gen_blocks)]

      rust
        
    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>> {
        gen move {
            for p in entries {
                match SerializedFileReader::try_from(&**p) {
                    Err(e) => yield Err(e),
                    Ok(sr) => for row_res in sr { yield row_res; }
                }
            }
        }
    }
    
      
  • NCDC (No Code, Don't Care)

  • As with all ads, especially M$ ones..No Code, Don't Care

    At least if the code was available, I would find out what they mean by "spoofed Mime" and how that attack vector works (Is the actual file "magic" header spoofed, but the file still manages to get parsed with its non-"spoofed" actual format none the less?!, How?).

    Also, I would have figured out if this is a new use of "at scale" applied to purely client code, or if a service is actually involved.

  • dyn compatibility of the trait itself is another matter. In this case, an async method makes a trait not dyn-compatible because of the implicit -> impl Future opaque return type, as documented here.

    But OP didn't mention whether dyn is actually needed or not. For me, dyn is almost always a crutch (exceptions exist).

  • If I understand what you're asking...

    This leaves out some details/specifics out to simplify. But basically:

      rust
        
    async fn foo() {}
    
    // ^ this roughly desugars to
    
    fn foo() -> impl Future<()> {}
    
      

    This meant that you couldn't just have (stable) async methods in traits, not because of async itself, but because you couldn't use impl Trait in return positions in trait methods, in general.

    Box<dyn Future> was an unideal workaround (not zero-cost, and other dyn drawbacks). async_trait was a proc macro solution that generated code with that workaround. so Box<dyn Future> was never a desugaring done by the language/compiler.

    now that we have (stable) impl Trait in return positions in trait methods, all this dance is not strictly needed anymore, and hasn't been needed for a while.

  • I was just referring to the fact that they are macros.

  • printf uses macros in its implementation.

      c
        
    int
    __printf (const char *format, ...)
    {
      va_list arg;
      int done;
    
      va_start (arg, format);
      done = __vfprintf_internal (stdout, format, arg, 0);
      va_end (arg);
    
      return done;
    }
    
      

    ^ This is from glibc. Do you know what va_start and va_end are?

    to get features that I normally achieve through regular code in other languages.

    Derives expand to "regular code". You can run cargo expand to see it. And I'm not sure how that's an indication of "bare bone"-ness in any case.

    Such derives are actually using a cool trick, which is the fact that proc macros and traits have separate namespaces. so #[derive(Debug)] is using the proc macro named Debug which happens to generate "regular code" that implements the Debug trait. The proc macro named Debug and implemented trait Debug don't point to the same thing, and don't have to match name-wise.

  • Not sure if you're talking about the language, or the core/alloc/std libraries, or both/something in-between?

    Can you provide specific examples, an which specific languages are you comparing against?

  • Programming @programming.dev

    Wild linker v0.8 released (and updated benchmarks)

    github.com /davidlattimore/wild/releases/tag/0.8.0
  • (didn't read OP, didn't keep up with chimera recently)

    From the top of my head:The init system. Usable FreeBSD utils instead of busybox overridable by gnu utils (which you will have to do because the former are bare-bones). Everything is built with LLVM (not gcc). Extra hardening (utilizing LLVM). And it doesn't perform like shit in some multi-threaded allocator-heavy loads because they patch musl directly with mimalloc. It also doesn't pretend to have a stable/release channel (only rolling).

    So, the use of apk is not that relevant. "no GNU" is not really the case with Alpine. They do indeed have "musl" in common, but Chimera "fixes" one of the most relevant practical shortcomings of using it. And finally, I don't think Chimera really targets fake "lightweight"-ness just for the sake of it.

  • '0'..'9' (characters in ASCII) are (0+48)..(9+48) when read as integer values.

    For readability you can do:

      c
        
      unsigned char zero = '0';
      int h = getchar() - zero;
      int l = getchar() - zero;
    
      

    And as I mentioned in another comment, if this was serious code, you would check that both h and l are between 0 and 9.

    Note that one of the stupid quirks about C is that char is not guaranteed to be unsigned in certain implementations/architectures. So it's better to be explicit about expecting unsigned values. This is also why man 3 getchar states:

    fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.

    getchar() is equivalent to fgetc(stdin).

  • How is this literal joke still getting so much engagement?

  • nice CLAUDE.md. got it on the contributor list too.

  • software-rendered implemented-in-C++ terminal

    you fail the cult test 😉

  • I don't do Go (thankfully). But that description reminded me of "A False Midnight", which is a Python story from almost 12 years ago (time flies).

    The fact that these two stories concern Python and Go, the two supposedly easy and simple languages, are good examples of why such descriptors were always intellectual smell.

  • This is unnecessarily complicated

    really!

    and I don’t see how your second version is supposed to be more optimal?

    It was a half-joke. But since you asked, It doesn't do any duplicate range checks.

    But it's not like any of this is going to be measurable.

    Things you should/could have complained about:

    • [semantics] not checking if h and l are in the [0, 9] range before taking the result of h*10 + l.
    • [logical consistency] not using a set bet for [0, 100] and a set bit for [1, 12], and having both bits set for the latter.
    • [cosmetic/visual] not having the props bits for p0 on the left in the switch.

    And as a final note, you might want to check what kind of code compilers actually generate (with -O2/-O3 of course). Because your complaints don't point to someone who knows.

  • The whole premise is wrong, since it's based on the presumption of C++ and Rust being effectively generational siblings, with the C++ "designers" (charitable) having the option to take the Rust route (in the superficial narrow aspects covered), but choosing not to do so. When the reality is that C++ was the intellectual pollution product of "next C" and OOP overhype from that era (late 80's/ early 90's), resulting in the "C with classes" moniker.

    The lack of both history (and/or evolution) and paradigm talk is telling.

  • Maybe something like this

      C
        
    #include <stdio.h>
    
    // reads next 4 chars. doesn't check what's beyond that.
    int get_pair() {
      int h = getchar() - 48;
      int l = getchar() - 48;
    
      return h * 10 + l;
    }
    
    int main(){
      int p0 = get_pair();
      int p1 = get_pair();
      if (p0 < 0 || p1 < 0 || p0 > 100 || p1 > 100) {
       // not 4 digi seq, return with failure if that's a requirement 
      }
    
      if ((p0 == 0 || p0 > 12) && (p1 >= 1 && p1 <= 12)) {
        printf("YYMM");
      } else if ((p1 == 0 || p1 > 12) && (p0 >= 1 && p0 <= 12)) {
        printf("MMYY");
      } else if ((p0 >= 1 && p0 <= 12) && (p1 >= 1 && p1 <= 12)) {
        printf("AMBIGUOUS");
      } else {
        printf("NA");
      }
      return 0;
    }
    
      

    or if you want to optimize

      C
        
    #include <stdio.h>
    #include <stdint.h>
    
    // reads next 4 chars. doesn't check what's beyond that.
    int get_pair() {
      int h = getchar() - 48;
      int l = getchar() - 48;
    
      return h * 10 + l;
    }
    
    uint8_t props (int p) {
      if (p >= 1 && p <= 12) {
        return 0b10;
      } else if (p < 0 || p >= 100) {
        return 0b11;
      } else {
        return 0b00;
      }
    }
    
    int main(){
      int p0 = get_pair();
      int p1 = get_pair();
    
      switch (props(p0) | (props(p1) << 2)) {
        case 0b1010: printf("AMBIGUOUS"); break;
        case 0b1000: printf("YYMM"); break;
        case 0b0010: printf("MMYY"); break;
        default: printf("NA");
      }
      return 0;
    }
    
      
  • Programming @programming.dev

    koto v0.16.0 released (koto is a scripting programming language)

    github.com /koto-lang/koto/releases/tag/v0.16.0
  • Programming Circlejerk @programming.dev

    When I found out even Rust needed the clib, it was like seeing an iron-clad fortress only to look closer and see it was being held up by sticks, ducktape, and prayers.

    github.com /rust-lang/rfcs/issues/2610
  • Programming @programming.dev

    Rust tops a diverse list of implementation languages in projects getting NLnet grants, Python 2nd, C is alive, and C++ is half dead!

  • Rust @programming.dev

    Rust tops a diverse list of implementation languages in projects getting NLnet grants, Python 2nd, C is alive, and C++ is half dead!

  • Rust @programming.dev

    Koto: a simple and expressive programming language, usable as an extension language for Rust applications, or as a standalone scripting language

    koto.dev
  • Programming @programming.dev

    Koto: a simple and expressive programming language, usable as an extension language for Rust applications, or as a standalone scripting language

    koto.dev
  • Rust @programming.dev

    kdl 6.0.0-alpha.1 (first version with a KDL v2 implementation)

    github.com /kdl-org/kdl-rs/blob/f67e3d2998dcf0d198b4d03be7b23062cab21723/CHANGELOG.md
  • Rust @programming.dev

    COSMIC ALPHA 1 Released (Desktop Environment Written In Rust From System76)

    system76.com /cosmic
  • Rust @programming.dev

    cushy v0.3.0 Released

    github.com /khonsulabs/cushy/releases/tag/v0.3.0
  • Rust @programming.dev

    slint 1.6.0 Released

    github.com /slint-ui/slint/releases/tag/v1.6.0