Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
I've gone away from csv for this and any number of other reasons. Orc and jsonl have been much more reliable and less prone to bad data.
At the risk of being "the defender" (dibs on Jessica Jones), you can also use the poorly named RandomAccessFile object to stream byte[]. I usually couple this with stream.iterator to make the code more compact, handle grouping, etc. -
NoMad136505y@SortOfTested I don't think with the amount of data it is really an option. I'm thinking of piping shit into a new database and then use that.
-
NoMad136505y@SortOfTested why does java and all java developers hate people? This is why I have issue with java defenders. Barely a handful of them make usable, scalable, maintainable systems.
-
@NoMad
Probably not a bad idea. Definitely accomplish the task.
The other way is just to pull a full index of just partition key data from the file, and an additional line index. From there you can compute the groups, and get a group worth of members at a time using try(var lines = Files.lines(path))....skip(n).findFirst().get(). Small enough sets of discrimonators, you can also use tuples for added memory pressure reduction. Should be pretty lightweight on memory up to a few dozen million records. -
Its funny, i did an example in another rant about how shitty doing this in Java is. I feel your pain.
-
@NoMad java is shit for this.
But try this for size: sqlite.
read your csv into a generic list<list<string>>, open an in memory sqlite db, generate a a table using the header, push all the data into the table.
And then sqlite sql all things... -
Haha csv is not problem, you have problem with java file handling
Also java is only designed to make code platform agnostic
Scalability, usable and maintainable was not in mind. Java’s mvp is portability -
Also I think you can use Java object serialization to read the csv into custom data type (like tuple or anything) , I might be wrong, it’s been a while
-
@hardfault
Native tuples will throw an exception if any value is null, so you'd have to abstract that in the preprocessing.
Javatuples would work better to arity 10, as decade is it's max size tuple
It's also a large document per previous questioning, so streaming is desired. -
Voxera113775yOne reason for the complex way to read the files is probably memory.
Csv files can in many cases be quite big and if you read it all in one go you could end up OOM. So the default way is to read it in pieces.
Even if you start out with small files they might end up growing and by enforcing a more robust solution you avoid bugs later. -
donuts236215yJava is a verbose language. Got better with 8 but it has a lot of history... BufferedReader had been around since the day is was created.
Java basically is "tell me exactly what you want"
I used to be C# dev and took me awhile to appreciate Java's verbosity, (better for learning algorithms and data structures, there are different types of Lists).
Python, C# is "I assume you want this which should be good for most use cases... unless you say differently" -
donuts236215yI think one time in Python I wanted a Set or SortedSet.... There is no built in implementation. Need to pip install some package which I first have to find and then manually download because it's not in my company's internal repo
-
Save yourself a lot of trouble:
https://commons.apache.org/proper/...
Example:
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : records) {
String lastName = record.get("Last Name");
String firstName = record.get("First Name");
}
Granted, still gotta iterate the records, but certainly sounds better than what you're trying to do and almost certainly more robust (and I'm not defending Java per se, just trying to make your life easier). -
Voxera113775y@NoMad actually I think Java is on the right path, and I do not use java, I am c# :)
But Java has started to evolve again after several years of almost no evolution and they have implemented quite alot of new features. -
amoux2685yThings that should be fast, except once begin, you see their true nature - you start to appreciate what others judge. When I use other languages, I think of how I could have done some tedious shit with a simple >> pip install <package> or/and two lines of code (including import statement). My time is limited; performance is infinite. For example, since I mostly work with datasets - the pre-processing more than often requires a custom solution, and its a joy to instantly open JupyterNotebook and prototype something out of nothing without worrying about details not associated to the problem at hand.
-
donuts236215y@NoMad explain JavaScript then .. It's good life was over around 2005 when the tech bubble burst.
Look at it now... -
NoMad136505y@billgates stop the what-about-ism. We're talking about java, not js. And this is not a court or professional argument club. You don't "win" by silencing others.
Related Rants
Java. AGAIN. 😡
so, I am trying to get a csv opened and read, and then search through it based on values. Easy peasy lemon squeezy in python, right?
Well, damned be java. You need a buffered reader to read the file. Then you have to "while(has next)" the whole damn thing, then you have to do something with the data that you read one by one, right? Well, not to be disappointed, they do have json libraries, but you **have to install** the plugins for it. Aka you have to manually add the libraries or use some backwards manager like maven.
Gotta admit, jdbc is neat if you're anal about your sql statements, but bring the same jazz to csv, and all the hell will break loose.
Now, if you just read your json data into multiple objects and throw them in an array... Kiss shorthand search's ass goodbye, because this mofo can't search through lists without licking the arse of every object. And now, you have to find another way because this way, you can't group shit you just read from csv. (or, I haven't found a way after 5 hours of dealing with the godforsaken shitshow that java libraries are.)
Like, I'm devastated. If this rant doesn't make much sense to you, blame some java library for it.
Shouldn't be too hard.
rant
java
java is shit