Skip to content

rumca-js/Internet-Places-Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.

Project Logo

Acceptable link types

Not acceptable link types

  • malware sites
  • porn, casino, gambling etc.
  • IT infrastructure domains, CDN domains
  • analytic domains that are used for user surveillance

Some zen rules:

  • Anything not obeying the law will be removed from lists
  • Internet operates in ... many countries, so there are many laws
  • If things are offensive, they do not have to be removed
  • If page content is obnoxious, it can, and possible should be demoted
  • I do not always follow these rules strictly

If any link is suspicious, and should be removed, plaese create an Issue in this repository. Links are captured from the Internet automatically. I do not have resources to verify them all. Use 'votes' to see credibility of domains.

Sources of data

Obtained by the Django-link-archive web crawler.

Sources:

Meme

Alternative solutions

Files

The database is distributed as a set of JSON files. We do not want to store binary data, binary files. SQL files should be fine, but I am going with JSON files for now.

Each link contains a set of attributes, like:

  • title
  • description
  • page rating
  • date of creation
  • date of last seen
  • etc.

Page rating

Content ranking is established by the Django link archive project.

To have a good page rating, it is desireable to follow good standards:

  • Schema Validator
  • W3C Validator
  • Provide HTML meta information. More info in Open Graph Protocol
  • Provide valid title, which is concise, but not too short
  • Provide valid description, which is concise, but not too short
  • Provide valid publication date
  • Provide valid thumbnail, media image
  • Provide a valid HTML status code. No fancy redirects, JavaScript redirects
  • Provide RSS feed. Provide HTML meta information for it https://www.petefreitag.com/blog/rss-autodiscovery/
  • Provide search engine keywords tags

Your page, domain exist alongside thousands of other pages. Imagine your meta data have an impact on your recognition, and page ranking.

Remember: a good page is always ranked higher.

You may wonder, why am I writing about search engine "keywords" meta field, if Google does not need them. Well I don't like Google. If we want alternative solutions to exist, it should be possible to easily find your page from simpler search engines. Provide keywords field if you support open web.

Tags

Each entry can be tagged. Most notable examples of tags

  • open source - if entry is "open source" related
  • personal - if it seems to be a personal website
  • self-host - software that can be self-hosted
  • company - if entry exists just to provide information about company
  • university, museum, etc - if entry provides details about a university, museum, etc.
  • disinformation / misinformation / conspiracy theories - self explanatory
  • news - if it is "news" content farm. Might be also "game news", "tech news", etc.
  • amiga - anything amiga related
  • wtf - for really interesting finds
  • link service - bitly or other services that provide shortened versions of links
  • movie - page describing a movie
  • video game - page describing a video game, etc.
  • interesting page design - self explanatory. Some pages are just fun
  • online tool - some things are web programs, that are not accessible if you are offline
  • monetization - if page includes some kind of monetization, subscription, loot boxes
  • ad business - if page owner work in this sector

Notes

  • Not all domains have to be stored here. I think it would be best to have valuable domains. Certainly we do not want content farms. We do not need sites that do not contribute anything useful to the society, to the reader
  • The distinction is not that clear-cut, but more lenient rules apply toward personal sites
  • I am not that interested in marking substack, or medium as "personal" sites, as I do not feel that it should be tagged as such

Demo database

Might not be working. Used for development: https://renegat0x0.ddns.net/apps/places/.

Meme

About

Database of Internet places. Mostly domains

Topics

Resources

License

GPL-3.0, CC-BY-4.0 licenses found

Licenses found

GPL-3.0
LICENSE
CC-BY-4.0
LICENSE_DATA

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published