Jump to content

HTML2ABC Awk script


Roger Hare

Recommended Posts

I've recently come across a load of Hungarian tunes in some (very?) old ABC files and I'm trying to 'rescue' some of them.

 

They all make reference to HTML2ABC, which presumably was an Awk script for converting HTML to ABC.

 

However, the results seem a little spotty. Here's an example:

X:7
T:Ablakomba, ablakomba
O:Hungary
A:szab\'o
N:Elhagyott a bab\'am
Z:HTML2ABC - AWK script 
M:2/4
L:1/8
Q:120
K:Gmaj
4-4
EGFE | B2 e2 d2 B2 | ABAF | G E3 E2 z2 |
EGFE | B2 e2 d B3 | ABAF | G E3 E2 z2 |
g2 F2 e f3 | e3 dB d3 | e2 c2 A c3 | d2 B2 G E3 |
EGFE | B2 e2 d B3 | ABAF | G E3 E2 z2 ||

As you can see, several of the bars are too long for 2/4, and the whole thing doesn't seem sure about whether its 2/4 or 4/4 (that's how I interpret the '4-4' just after the key sig.).

 

Anyone have any experience with this mystery Awk script, or in a position to comment on the validity of the ABC code it generates?

 

I've done a rudimentary search for HTML2ABC and come up with zilch.

 

Ta.

Edited by lachenal74693
Link to comment
Share on other sites

2 hours ago, eskin said:

Interesting, do you have a link to the original ABC files you can share?

Unfortunately not. The files came from a very large, unstructured database of tunes which came with a program called ABCEdit, which I downloaded several years ago. I just tried to find the program again, and got into a problem with the links I found. It's not clear to me that the links are to the same program, but when I tried to download one of the programs just now, the cheeky bastards at the other end started running a 'security check' on my computer, so I shut down immediately!

 

I never used the program, but I saved a copy of the database of tunes. I've attached a zipped version of the 'hungary' file which came with the database. It doesn't contain any clue as to the location of the Awk script, but you can see the reference to the Awk script in many of the tunes. I may have done some rudimentary editing of the file (eg: started to replace accented characters with their equivalent 'backslashed sequences'), but it's basically as downloaded ...

 

The whole database is huge - ~25000 tunes - the zipped version runs to ~3.6Mb. It's a real rat's nest with lots of duplicated tunes, peculiar 'country of origin' attributions. and more peculiar ABC code. I'm currently trying to 'rescue' tunes from the whole database - there is some good stuff in there if you are bloody-minded enough to extract it. There's also some crap...

 

Edit: I found the original link - it's dead...☹️ If anyone's interested, I'll upload the database to my DropBox archive, but it will be Monday before I can do this. As far as I can see, the 'hungary' file is the only one in the database which contains references to the HTML2ABC script...

Hungary.zip

Edited by lachenal74693
Link to comment
Share on other sites

I'm a programmer by trade, and this sounds like a perfect candidate for a little custom python script. I'll have a download of the database when it's uploaded, but if someone can manually fix a few (~3) of the files to give me a good comparison of what the conversion should be like, I can try write a small script to automate the process. No promises, but it's worth a shot; 25000 tunes is a lot to convert otherwise.

Link to comment
Share on other sites

3 hours ago, TehRazorBack said:

I'm a programmer by trade, etc...

Me too - or I was before I retired.

 

As I see it the point here is that the files (see later in this para.) have already been converted, and as far as I'm aware, the pre-Awk-processed data is not available. In fact, now I've looked at it in some more detail, the only file in the database which references this HTML2ABC script is the 'hungary' file, and only about 75% of the tunes in that file seem to have been processed in this way. It's probably not worth pursuing the matter particularly deeply.

 

I'll either post the zipped database on my Dropbox, or (if it will let me) post it as an attachment here - but it will be tomorrow at the earliest. There is some good stuff in there, but it takes some hard listening to flush it out.

 

25,000 tunes is a lot, but I don't know if the database is amenable to being 'cleaned-up' by an all-embracing script (in any language) - the ABC coding standard/style is so variable that 'editing-by-hand' is likely to be the 'best' option...

 

We seem to have come some distance from my original query about the Awk script...

 

 

Edited by lachenal74693
  • Like 1
Link to comment
Share on other sites

Sorry if I misunderstood the situation. Tbh I don't know much about ABC notation other than seeing a few files here and there, but I have created some manual file conversion programs before. I was hoping the fix would be something similar, but if they're already converted, maybe an "ABC Repairer" type program exists somewhere?

 

Still, if there's anything can do to help, please let me know. I'll probably still give a few files a look-see.

Link to comment
Share on other sites

On 6/4/2023 at 9:56 PM, TehRazorBack said:

Sorry if I misunderstood the situation. Tbh I don't know much about ABC notation other than seeing a few files here and there, but I have created some manual file conversion programs before. I was hoping the fix would be something similar, but if they're already converted, maybe an "ABC Repairer" type program exists somewhere?

 

Still, if there's anything can do to help, please let me know. I'll probably still give a few files a look-see.

Not surprising. It's an unclear situation, and we've drifted somewhat away from my original question about the HTML2ABC script, which I was just curious to see (many, many moons ago, I was a bit of an Awk user). It looks as if that script has gone the way of all flesh, a long time ago...☹️😊

 

I now realise that the facility to attach files to posts here is (just barely) able to accept files as large as the database which is now being discussed if attached in zipped format, so, (as a couple of folks have expressed interest, both publically, and privately via PMs) I have attached it here, (rather than fannying about with Dropbox).

 

To recap, it is the database of 25000 tunes which came with the program ABCEdit, which now appears to be defunct - or at least, the link which enabled one to download the program is 'dead'. There are about fifty separate files in the database.

 

The database contains much that might be of interest; much that probably isn't; some rather odd attributions of 'origin'; some rather flaky ABC code; and much duplication (I think I counted seven exact duplicates of the same tune in different files in one instance). If you are inclined to download the database, and do not like what you see, please do not shoot the messenger - enjoy!

____________

Sidebar:

I don't know about an "ABC Repairer", but I have a program wot I wrote which 're-formats' ABC files in the sense that it:

 

Inserts white space in order to make the code more readable (increases file size)

partially optimises ABC code by reducing things like 'x3/2y/2' to 'x>y' or [xnyn] to [xy]n (reduces file size)

tries to render accented characters in their backslashed form, for example á becomes \'a 

and so on...

 

The program is not a panacea! There are some 'gotchas', and there is still quite a lot of 'hand-editing' to do to knock an old ABC file into some sort of decent style/format. The overall effect of the re-formatting is to increase file size by ~8-11%. I've used this program almost daily for 2 years (version 1.0 = 21 May 2021)...

 

folder1.zip

Edited by lachenal74693
Link to comment
Share on other sites

The idea of a HTML to ABC converter doesn't make much sense to me because HTML isn't a music description language. I wonder if it was a script that scraped ABC formatted tunes from a web page (basically just removing all the HTML tags that surrounded the ABC content)? If so then any problems with the output ABC code might have also been present in the original web pages.

Link to comment
Share on other sites

50 minutes ago, alex_holden said:

The idea of a HTML to ABC converter doesn't make much sense to me because HTML isn't a music description language. I wonder if it was a script that scraped ABC formatted tunes from a web page (basically just removing all the HTML tags that surrounded the ABC content)? If so then any problems with the output ABC code might have also been present in the original web pages.

Frankly, it didn't make much sense to me either, when I came across the references to it in the first place. It was curiosity about such an idea that prompted my original query.

 

After thinking about it, I've asked myself much the same question(s), and arrived at much the same answer(s) as you mention - specifically the idea of 'removing all the HTML tags that surrounded the ABC content'.

Edited by lachenal74693
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...