zlacker

Recutils – Tools and libraries to access plain text databases called Recfiles

submitted by jdemle+(OP) on 2017-09-21 10:28:15 | 106 points 46 comments
[view article] [source] [links] [go to bottom]
replies(13): >>_euac+W3 >>ajsalm+J4 >>jdemle+L7 >>Mizza+l8 >>fimdom+xc >>hellma+Ue >>rb808+Ai >>rwmj+vp >>ausjke+0s >>majkin+iu >>oblib+D41 >>kwhite+cU1 >>sether+4Y3
1. _euac+W3[view] [source] 2017-09-21 11:17:10
>>jdemle+(OP)
Erm... am I the only one that's a little thrown off by the "mascot" in this project?
replies(11): >>agumon+h4 >>tripa+l4 >>dvfjsd+w4 >>anc84+E4 >>johnco+Q4 >>rgrau+i5 >>westme+q5 >>rmchug+s5 >>nasred+O6 >>JustSo+V8 >>twic+gl
◧◩
2. agumon+h4[view] [source] [discussion] 2017-09-21 11:22:23
>>_euac+W3
I'm not fond of the joke right there but alas.

ps: I hope they have org-mode interop.

replies(1): >>mbrock+W4
◧◩
3. tripa+l4[view] [source] [discussion] 2017-09-21 11:23:15
>>_euac+W3
Apparently it made it to the FAQ.
◧◩
4. dvfjsd+w4[view] [source] [discussion] 2017-09-21 11:27:09
>>_euac+W3
Gay animals are a part of nature. There is nothing to be offended about.
replies(1): >>dguara+wd
◧◩
5. anc84+E4[view] [source] [discussion] 2017-09-21 11:30:44
>>_euac+W3
It's two turtles on top of each other, what's the problem?
replies(1): >>aeorgn+mm
6. ajsalm+J4[view] [source] 2017-09-21 11:32:18
>>jdemle+(OP)
The link to the newer presentation video on the page is broken but this is probably the same one: https://fscons.org/videos/2011/gnu-recutils-changed-title-an...
replies(1): >>neulan+8G
◧◩
7. johnco+Q4[view] [source] [discussion] 2017-09-21 11:34:17
>>_euac+W3
Probably.
◧◩◪
8. mbrock+W4[view] [source] [discussion] 2017-09-21 11:34:56
>>agumon+h4
It says so in the feature list so I guess they probably do.
replies(1): >>agumon+35
◧◩◪◨
9. agumon+35[view] [source] [discussion] 2017-09-21 11:37:58
>>mbrock+W4
dammit I failed at simple search, it's indeed listed
◧◩
10. rgrau+i5[view] [source] [discussion] 2017-09-21 11:42:13
>>_euac+W3
https://www.gnu.org/software/recutils/faq.html#whyturtles
◧◩
11. westme+q5[view] [source] [discussion] 2017-09-21 11:43:19
>>_euac+W3
No I wasn't expecting turtles humping as soon as I landed on the page either.
◧◩
12. rmchug+s5[view] [source] [discussion] 2017-09-21 11:44:23
>>_euac+W3
It's turtles all the way down! https://en.wikipedia.org/wiki/Turtles_all_the_way_down
◧◩
13. nasred+O6[view] [source] [discussion] 2017-09-21 12:00:31
>>_euac+W3
About the logo

Why is the logo depicting a pair of copulating turtles?

Ask ams@gnu.org.

What is the name of the turtles?

They are called Fred and George. And yes, they are both male.

14. jdemle+L7[view] [source] 2017-09-21 12:10:39
>>jdemle+(OP)
At CurrySoftware we use recfiles combined with git for all business-processes (incoming and outgoing invoices, customers, etc). It allows us to automate everything we want with simple bash scripts. But we remain flexible because we can perform non-automated tasks manually.
replies(1): >>weeber+2k
15. Mizza+l8[view] [source] 2017-09-21 12:17:33
>>jdemle+(OP)
I have a growing affinity for non-database-databases for personal and low/sparse-traffic projects. Lots less hassle.

Here's one I maintain designed for use with AWS Lambda which uses S3 as a Pythonic data-store: https://github.com/Miserlou/NoDB

replies(1): >>maxeri+vf
◧◩
16. JustSo+V8[view] [source] [discussion] 2017-09-21 12:22:46
>>_euac+W3
Yeah, this needs to be marked NSFW. ;)

But, yes, I too am a little thrown off by it.

17. fimdom+xc[view] [source] 2017-09-21 12:53:50
>>jdemle+(OP)
Apparently you can get output in csv (natively) and json (python one liner) without too much assle (http://swick.2flub.org/recutils_JSON_output.html) which suddently makes it even more interesting.
◧◩◪
18. dguara+wd[view] [source] [discussion] 2017-09-21 13:01:18
>>dvfjsd+w4
I'm not offended at all, just wasn't expecting it. If anything it made me chuckle, it was so out of place, heh.
19. hellma+Ue[view] [source] 2017-09-21 13:10:36
>>jdemle+(OP)
I do something similar with toml files for simple stuff. Python for piping around, but maybe this is more convenient on the commandline.
◧◩
20. maxeri+vf[view] [source] [discussion] 2017-09-21 13:15:21
>>Mizza+l8
I'm always fascinated that people get stuck differentiating between using a database for indexing and as a canonical data store.

Like the majority of media apps, it's either impossible or a huge pain to get them to index without managing.

21. rb808+Ai[view] [source] 2017-09-21 13:36:08
>>jdemle+(OP)
I really like the idea of plain text data files so am really interested in this.

YAML serves this purpose too but I'm not a huge fan of indenting so recfiles look great. Anyone compared and contrasted?

Also are there other resources on this? Would be nice to have Java/C++/Python libraries. (As well as convert to parquet, arrow etc )

replies(1): >>jstimp+rj
◧◩
22. jstimp+rj[view] [source] [discussion] 2017-09-21 13:41:25
>>rb808+Ai
I have done some prototyping on a similar idea, but I think with a more idiomatic approach. The idea is mostly adding relational structure (schema) to CSV, and enabling a cleaner lexical syntax (get rid of the line noise).

Might some day dust it off and try to bring it to a more serious level (performance, tooling etc).

http://jstimpfle.de/projects/python-wsl/main.html

◧◩
23. weeber+2k[view] [source] [discussion] 2017-09-21 13:44:22
>>jdemle+L7
Your processes made me think about a email based interface instead of a bash script, this may allows to easily interact with the database bot without knowing bash or python.
replies(1): >>jdemle+Nk
◧◩◪
24. jdemle+Nk[view] [source] [discussion] 2017-09-21 13:49:11
>>weeber+2k
We plan use a Telegam interface for many things (status checks, new invoices etc). Its easier and faster than E-Mail and available everywhere!
replies(1): >>aeorgn+hl
◧◩
25. twic+gl[view] [source] [discussion] 2017-09-21 13:52:44
>>_euac+W3
This reminds me that kame.net has a turtle as a logo, but as an incentive to upgrade, when accessed over IPv6, the turtle is animated. So just be grateful the recutils developers didn't have IPv6 when they were looking for inspiration.
◧◩◪◨
26. aeorgn+hl[view] [source] [discussion] 2017-09-21 13:52:44
>>jdemle+Nk
Telegram (Messenger)?
replies(1): >>jdemle+0m
◧◩◪◨⬒
27. jdemle+0m[view] [source] [discussion] 2017-09-21 13:56:32
>>aeorgn+hl
Yes. Communicating with telegram from bash is simple. Check out https://www.curry-software.com/en/blog/telegram_unit_fail/ for example.
replies(1): >>alvil+IC
◧◩◪
28. aeorgn+mm[view] [source] [discussion] 2017-09-21 13:58:15
>>anc84+E4
A question in the [FAQ](https://www.gnu.org/software/recutils/faq.html#whyturtles):

> Why is the logo depicting a pair of copulating turtles?

29. rwmj+vp[view] [source] 2017-09-21 14:15:58
>>jdemle+(OP)
I guess these are quite slow (because no indexing) once you have a serious number of records? That in itself isn't a problem as long as you understand the scope of the project. I wonder why they didn't use (a well-defined subset of) CSV as the format however.
replies(3): >>jdemle+bq >>Comodo+3G >>zbuf+sv3
◧◩
30. jdemle+bq[view] [source] [discussion] 2017-09-21 14:20:04
>>rwmj+vp
CSV is neither human-readable nor -writable.

And I don't think the performance issue exists. Computers are fast nowadays. Parsing recfiles is straightforward. Also you could easily archive historic/old/probably irrelevant records.

replies(1): >>rwmj+Iq
◧◩◪
31. rwmj+Iq[view] [source] [discussion] 2017-09-21 14:24:01
>>jdemle+bq
This is why I was very careful to say "well-defined subset". I wrote a full CSV library[1], and so I'm well aware of how deceptively difficult CSV is to deal with. However with a well-defined subset (and perhaps not using "," as a separator as well) it should be editable for at least simple changes.

[1] https://github.com/Chris00/ocaml-csv

32. ausjke+0s[view] [source] 2017-09-21 14:30:39
>>jdemle+(OP)
I noticed this is GPLv3, which means if you use this library, all your application will have to be open source, however IANAL.
replies(3): >>neulan+cH >>chubot+QR >>creato+571
33. majkin+iu[view] [source] 2017-09-21 14:44:57
>>jdemle+(OP)
Fro docs:

YAML 1 is an example of a hierarchical data storage format which is much more readable than XML. The problem with YAML is that it was designed as a “data serialization language” and thus to map the data constructs usually found in programming languages. That makes it too complex for the simple task of storing plain lists of items.

I dont see how this is true. Provided sample with books is almost identical in yaml.

The main benefit over yaml looks like more control of individial fields but again, yaml based db app could do that too.

◧◩◪◨⬒⬓
34. alvil+IC[view] [source] [discussion] 2017-09-21 15:30:30
>>jdemle+0m
I'm wondering if it is possible also with Signal (signal.org)
replies(1): >>detaro+O64
◧◩
35. Comodo+3G[view] [source] [discussion] 2017-09-21 15:49:25
>>rwmj+vp
No built-in indexing, but no one forbids you from indexing text files if you need it.
replies(1): >>neulan+sG
◧◩
36. neulan+8G[view] [source] [discussion] 2017-09-21 15:50:08
>>ajsalm+J4
Thanks! I was a bit dissapointed that only the older video link was working.
◧◩◪
37. neulan+sG[view] [source] [discussion] 2017-09-21 15:51:36
>>Comodo+3G
But, any indexing system you create won't work with the rec* tools. For example, `recsel` will not be any faster on large files.

Not sure if they have indexing on the roadmap, but it does make sense to me for people that have adopted it and are starting to get bigger databases.

Of course, you could argue that when the files get too big, it's time to switch to a different solution.

It seems that's kind of a natural tension in projects. Do you grow the scope to accommodate existing users with growing use cases? Or, do you draw the line in the sand and have people move on to a different solution?

◧◩
38. neulan+cH[view] [source] [discussion] 2017-09-21 15:54:47
>>ausjke+0s
I was surprised that this wasn't LGPL, which seems more suited. Granted it's GNU. So, they're going to do it their way.
◧◩
39. chubot+QR[view] [source] [discussion] 2017-09-21 16:58:57
>>ausjke+0s
"Using" doesn't require it to be open source. Only if you distribute the resulting binaries, which is basically the "SaaS loophole".

If it were AGPL, then what you said would be more accurate.

40. oblib+D41[view] [source] 2017-09-21 18:27:53
>>jdemle+(OP)
This reminds me a bit of using CGI.pm's "Save" function. I built a pretty decent invoicing app using that and the searches for data in documents saved in that format are pretty fast.

I won't pretend to know the ins-and-outs of that but was told on a Perl mail list that the server created a "B-Tree" index when an initial search was made and used that afterwards.

◧◩
41. creato+571[view] [source] [discussion] 2017-09-21 18:49:56
>>ausjke+0s
They have a page explaining their reasoning: https://www.gnu.org/licenses/why-not-lgpl.en.html
replies(1): >>yellow+I64
42. kwhite+cU1[view] [source] 2017-09-22 04:17:21
>>jdemle+(OP)
If you want something lighter to use text files as tables you could try TextQL: https://github.com/dinedal/textql.
◧◩
43. zbuf+sv3[view] [source] [discussion] 2017-09-22 20:29:26
>>rwmj+vp
CSV locks you to the same fields per data entry, that makes it a little less flexible. Plus one of the appeals for me is readability of the raw data; CSV gets long and thin very quickly. Granted each will best suit a certain type of data.
44. sether+4Y3[view] [source] 2017-09-23 00:41:55
>>jdemle+(OP)
Did this come out of Amazon ? I remember a similar set of tools for passing along "recs" through pipes.
◧◩◪
45. yellow+I64[view] [source] [discussion] 2017-09-23 03:00:30
>>creato+571
The only reasoning they really need is "we wrote the GPL and we're gonna goddamn use it".
◧◩◪◨⬒⬓⬔
46. detaro+O64[view] [source] [discussion] 2017-09-23 03:03:04
>>alvil+IC
Harder, since they don't have an open API and don't want people using non-standard clients. The clients are open-source though, so probably you can do it.
[go to top]